Home Teaching Research Publications Resources
home
teaching
research
publication
resources

Research Interests

My research investigates human speech as a complex, dynamic system that communicates multiple layers of linguistic and extra-linguistic information through a shared acoustic medium. My current work focuses on speech produced in real communicative contexts, in a search for acoustic and perceptual evidence for prosody in spoken language, and the relationship between prosodic structure and 'higher' levels of linguistic organization. This research is based on the analysis of large corpora of read and spontaneous speech, and building from the phonological and phonetic analysis of prosody, employs methods from signal processing and computational linguistics to find evidence for prosody in naturally occurring speech. The goals of my current research are to avance linguistic understanding of prosody and, in an applied vein, to improve the performance of automatic speech recognition and speech synthesis systems through prosody modeling.

Speech prosody is manifest in the acoustic signal through the modification of pitch, loudness, duration, and source characteristics (voice quality), which combine to encode the prosodic structure of an utterance. This structure defines the location of prominent words and syllables, and the grouping of words into phonological phrases. Prosodic utterances, in turn, relate the phonological form of an utterance to its morphological, syntactic, semantic and pragmatic context. The listener's task in comprehending speech includes decoding prosodic structure to aid in identifying the morphological, syntactic, semantic and pragmatic context that comprise the meaning of an utterance. My doctoral dissertation, conducted under the direction of Professor Jennifer Cole, models this process of prosody decoding using a machine learning paradign combined with research methods from acoustic phonetics, signal processing, and parsing. I use machine learning algorithms trained on speech corpus data to detect the prosodic structures of an utterance based on a set of features that encode the phonetic, phonological, syntactic and semantic properties of the local context. When the algorithm is tested on unseen data, it reliably predicts the prosodic structure for an utterance, including location and type of prominence and boundary features, with accuracy levels between 87-92%. This computational approach to prosody modeling leads to new discoveries and advances in prosodic theory; e.g., my research uncovers new and robus acoustic evidence for some controversial elements of prosodic structure, including prosodic phrase juncture and downstepped pitch-accent in American English, in features related to F0, duration, and intensity, and in spectral measures that relate to voice quality. I have also demonstrated the usefulness of voice quality in improving the performance of automatic speech recognition. Furthermore, I have shown in a series of machine learning experiments that these acoustic-prosodic features alone are not sufficient to predict the prosodic structure of an utterance, but in combination with features from 'higher' levels of linguistic organization, they allow very accurate prediction of prosody.

My research advances understanding of the interface between components of linguistic grammar, in demonstrating the dependencies between phonetics, phonology, syntax, and semantics in the encoding of prosody. In addition, my work building a stochastic model of prosody prediction has a direct application in the development of speech technologies that incorporate linguistic models of prosody, including text-to-speech and automatic speech recognition system. As I have already shown, applied prosody modeling has great potential ft feed back to theory development through evidence gained from large databases of naturally-produced speech. I plan to extend my work in the future to include other speech styles and languages. In particular, I will work on methods of automatic prosody labeling, supervised and unsupervised, to provide prosodic annotation for new corpora.

My research interest is not limited to prosody alone. I have worked on, and have strong interest in such areas as the phonetic and cognitive foundations of phonology, second language phonology acquisition, the effect of probability and entropy in shaping phonological systems, modeling of disfluencies in spontaneous speech, and computational approaches to phonological and morphological learning.

 
2. Effects of accent on length in a length sensitive language such as Finnish
3. L2 speakers' production and perception of suprasegmentals
 
 
 
 
 
 

Vowel Production (1520 data points taken from Peterson & Barney 1952)


Confusion Matrix (1520 tokens classified by computer)

Confusion Matrix (106400 tokens classified by 70 listeners; Peterson & Barney 1952)