Home Teaching Research Publications Resources
home
teaching
research
publication
resources

An Exemplar Theory

[Personal story] I came across the Exemplar Model or Exemplar Theory thanks to Steve Winters when I was about to complete my Ph.D. degree in 2007. In particular, I was introduced to XMOD, a model proposed by Keith Johnson. As I remember, the model worked only on an old apple computer (the Bondi Blue iMac G3 which was popular the latter part of the century). Unfortunately, I lost almost all the papers, while I was moving to Canada via Germany, including the instruction that contains how to operate XMOD and what to expect as an outcome. Ever since I was too busy with other things. I remember how it worked vaguely. But I think I can build a similar model from the scratch up. Given the current interests in exemplar model in the field, I think it is worth giving it a try to build a model similar to XMOD even for an educational purpose.]

Exemplar Theory was first introduced in psychology as a model of perception and categorization, and recently applied to speech perception (Lacerda 1995, Johnson 1997, Pierrehumbert 2001).

Description of Johnson's XMOD (Quoted from Boomershine 2006; pp. 59-60)

"Exemplar Theory, or exemplar-based models of speech perception and processing, is a framework that allows for detailed representation of input to be stored in the lexicon. This model is used by several linguists, including Pisoni (1990, 1992, 1997), Pisoni et al. (1985), Johnson (1990, 1997), Goldinger (1990, 1996, 1997), and Goldinger et al. (1991). Detailed information from the speech signal is processed by the listener and become part of the stored representation in the lexicon. Therefore, listeners encode this very specific, detailed information rather than discard it. ...

[W]hen a listener is presented with input from a specific talker, that talker's category is activated, along with the category of exemplars that matches the input phonetically. An exempla of such an activation, based on Johnson's XMOD, is given below, in Figure 1.

In this figure, the speech input is the word 'sosa'. Exemplars in the lexicon are activated according to their phonetic similarity to the input. Exemplars retain auditory and phonetic details of the talker. These activated exemplars in turn activate categories such as talker (Maria vs. Jose) and lexical categories, which in this case are words. The weight of the line corresponds to the amount of activation for each item. Therefore, in the figure below, the second exemplar has the highest activation based on the input. The talker with the highest activation level is Jose, as the talker-specific details in the input best match those stored for Jose. Then, the category that has the highest activation level is the word 'sosa.' It should be noted that this is a rough simulation of how XMOD works.

In exemplar-based approaches to speech perception and processing, such as XMOD (Johnson 1997), the item to be stored (i.e., the input) is compared to all existing exemplars in the lexicon. If it is very similar to an item already stored, then it will be stored as an instance of that exemplar. If it is dissimilar enough, then it will be stored as its own exemplar. The auditory properties of the input are compared with the auditory properties of the exemplars in the lexicon. The similarity between the input and the exemplars' auditory properties determine the activation level of each exemplar. If a given exemplar receives a very high activation level, then the input will be stored as part of the exemplar." (Boomershine 2006: 59-60)

What I've done...

No systematic modeling yet... but I built a toy example that illustrates how a speaker can be identified.

A toy example that identifies speaker from an input speech file: There are three data stored in the lexicon, let's call it Tom, Dick and Harry. MFCC (Mel frequency Cepstral Coefficients) with energy information are used as feature matrices of the speech sounds. MFCC coefficients are the same features that are used in XMOD, with some details aside. An EM (Expectation and Maximization) algorithm was applied to the data to obtain useful statistics. A testing speech sound is fed into the system, and the test speech sound is compared with the stored data in the lexicon. Negative log-likelihood is used as a means of comparing the input speech sound with the speech sound in the lexicon. If we followed the diagram above for XMOD, then two things need to be compared. First, lexical access need to be activated, so that the input speech sound is matched a set of candiate lexical items, and second, the speaker needs to be indentified. So, the output will be something like 'the input speech sound is a word 'sosa' produced by Harry.' However, I haven't integrated the lexical access component. The toy example below demonstrates that the input data is produced by this or that speaker.

A screen shot of a toy example of speaker identification. Features of input speech material indexed such as 1, 2, and 3 are matched to those features stored in the lexicon. In this toy example, speakers Harry, Dick and Harry are stored in the lexicon. The first example shows that the input speech 3 is matched to Harry's voice.

What I need to do...

Workshop

Researchers

References

Boomershir, Amanda. Perceiving and processing dialectal variation in Spanish: An exemplar theory approach. (eds.) Timothy L. Face  Carol E. Klee. Selected Proceedings of the 8th Hispanic Linguistics Symposium, pp. 58-72. [link to pdf]

Goldinger, Stephen D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory. Journal of Experimental Psychology 22: 1166-1183.

Goldinger, Stephen D. (1997). Perception and production in an episodic lexicon. In Keith Johnson and John W. Mullennix, editors. Talker Variability in Speech Processing, pages 33-66. Academic Press, San Diego 1997.

Johnson, Keith. (1997). The auditory/perceptual basis for speech segmentation. Ohio State University Working Papers in Linguistics 50: 101-113. [pdf]

Johnson, Keith. 2004. Class notes on XMOD and Exemplar Theory. Linguistics 825, Seminar in advanced phonetics: exemplar modeling. Winter Quarter 2004, The Ohio State University.

Kirchner, Robert. PEBLS [link]

Levinson, Stephen E. (2005). Mathematical Models for Speech Technology. Wiley.

Pierrehumbert, Janet (2001). Exemplar dynamics: Word frequency, lenition and contrast. In Joan L. Bybee and Paul Hopper, editors, Frequency and the Emergence of Linguistic Structure, pp. 137-157. John Benjamins, Amsterdam.

Pierrehumbert, Janet (2002). Word-specific phonetics. In Laboratory Phonology 7, pages 101-139. Mouton de Gruyter, Berlin/New York.

Pierrhumbert, Janet (2003). Probabilistic phonology: Discrimination and robustness. In Rens Bod, Jennifer Hay, and Stephanie Jannedy, editors, Probabilistic Linguistics, pages 97-138. MIT Press, Malden MA, 2003.

Pierrehumbert, Janet. (2003) Exemplar Theory. A talk given at LSA on Probability Theory in Linguistics 2.

 

 

 

 

Last maintained January 26, 2009 (First created January 26 2009)