Louis Goldstein : Research

Research Program:Articulatory Phonology

At the broadest level, my work aims to be a contribution to a very old question-the relation of mind and body. When humans communicate through speech (or through sign language), messages are transferred from the mind of the sender to the mind of the receiver through actions of the body (primarily the vocal organs for speech and the hands for sign). Uncovering the fundamental nature of those vocal actions for speech and how they come to carry messages are the underlying theoretical enterprises of my work, which straddles the linguistic subfields of phonetics and phonology.

My approach to this task has been to view the vocal organs during speech as engaging in a kind of dance and to decompose the dance into primitive actions (like dance steps). These actions, or gestures as they have been dubbed, function to constrict the flexible vocal tube somewhere along is length. The key hypothesis linking this dance analysis to matters of mind is that these constriction actions (gestures) are also units of information. Word-forms in a language can be viewed as ensembles of gestural units, with distinct word-forms selecting distinct ensembles. For example, "bad" differs from "dad" because one begins with a lip gesture and the other with a tongue tip gesture. Ensembles may be distinct from one another in their composition or their temporal organization.

Viewed slightly differently, constriction gestures are the combinatorial units of phonology. A small number of discrete constriction actions can combine to form the (potentially limitless) set of word-forms of the language. Because the human constricting organs are the same from language to language, the set of potential gestures is the same from language to language. One major way in which languages do in fact differ is in the principles they rely on for combining gestures into larger constituents. Discovering these principles, their underlying rationale, and the limits of language differentiation have been major foci of my work.

Mathematically, gestures are defined in terms of controls for abstract dynamical systems that produce constrictions. A given dynamical system can give rise to different motions when the gesture is embedded in different contexts, which can begin to explain the ubiquitous contextual variability of speech units. Development of the appropriate dynamical models and testing that they yield appropriate contextual variation has been another major thrust of my work.

Dynamical systems are relevant not only to the definition of individual speech gestures, but also to understanding how they cohere in larger units. I have been involved in work that attempts to model this coherence explicitly in terms of networks of coupled oscillatory units associated with the gestures. This work seeks to provide principled mathematical explanations for how gestures are coordinated in time, why languages have the kinds of gesture combinations that they do, and where relatively free combination of gestures can be found (and where it is more constrained).

Informational units must be shared in common between sender and receiver, otherwise communicating messages would not be possible. So if the information in speech indeed resides in the gestures, listening to speech must involve recovering the gestures that the talker employed when producing it. While it is by no means straightforward how this takes place, research that I (and others) have pursued shows a systematic relation between gestures produced and gestures perceived.

The cognitive (mental) structure of the speech system extends well beyond its informational function in distinguishing messages. Speakers have a rich system of phonological knowledge regarding the speech system of their language. For example, speakers can embed words appropriately in larger phrasal structures, and they can render the appropriate changes to a form (alternations) as a function of morpho-phonological context. Another major direction of my research activity has been to show that a phonological grammar can be simplified when it employs representations based on the experimentally observable gestures of the speaker, rather than the sounds that those actions produce or our transcriptions of those sounds.

The articulatory phonology research program has matured in the two decades that my colleagues and I have spent developing it. This framework is being explored and evaluated by linguists, psycholinguists, speech laboratory scientists, and computer scientists in universities in the US and internationally.

Research Activities

Carrying out this research program has required undertaking three distinct types of activities: developing the theoretical and mathematical constructs, building an explicit computational model that produces articulatory movements and sound to use as a test bed, and collecting articulatory data against which model predictions can be tested. This has resulted over the last two decades in federal grant support and numerous publications spanning several specific areas of linguistics.

(1) phonetic and phonological alternations

Early research in articulatory phonology showed that many post-lexical alternations dependent on speaking rate and style could be more adequately (and simply) described as quantitative changes to gestures and their timing, rather than as feature- or segment- changing rules. In faster, more fluent speech, gestures may exhibit reduction in magnitude and increase in temporal overlap. The effect of extreme overlap can result in some gesture's being perceptually hidden for a listener, even though it is still being produced by the speaker.

(2) syllable structure

In the theory we have developed, gestures are coordinated into larger structures by means of coupling the "clocks" associated with the individual gestures to one another. In doing so, the speech system can take advantage of intrinsically available modes of coordination, in-phase and anti-phase. The use of these modes allows us to reconstruct the traditional internal representation of syllables (onset, nucleus, rime), in a way that simultaneously accounts for the phonological (combinatorial) properties of syllable constituents and their physical (timing) properties.

(3) emergence of gestural structures

Recent work has investigated how the system of gestures and their coupling emerge in the developing child, and how they might have emerged in the evolution of human language. One example is how gestures come to be discrete. We have hypothesized that the distinct constricting organs (lips, tongue, velum, glottis) are differentiated at birth and that these provide the initial basis for discreteness. Analysis of early children's word productions supports this hypothesis by showing that between-organ contrasts are mastered earlier than within-organ contrasts. Another example is the early emergence of CV syllables, which can be explained by the ease of in-phase coordination of actions.

(4) phonological encoding in speech production

Traditional theories of speech production have assumed that the basic units of assembly are symbolic phonological segments. One source of evidence for that view is speech errors. These errors appear to involve shifts in abstract segment-sized units. However, our studies of articulatory data collected during speech error-producing tasks have shown that many (or most) errors are in fact not replacements of one symbolic phonological unit with a different one. Rather, most errors involve the production of "extra" (intrusive) articulatory gestures concurrently with the intended ones. Thus, gestures are a much better candidate for units of the speech production process than segments.

Future Research Goals

I plan further development of articulatory phonology along the following lines. For each of these, grant applications are either funded, pending, or being written.

Stress and intonation. We will add foot- and phrase-level oscillators to the dynamical model to account for stress and will investigate the coordination of pitch gestures (for intonation) with constriction gestures.

Syllable structure. We are developing novel techniques to probe the ontogeny of syllable structure in infants, as well as investigating (adult) languages with complex consonant sequences to test the hypothesis that gestural timing is a diagnostic for syllabification.

Speech recognition. We will develop techniques for recovering gestures from sound, so we can incorporate our model of gestural variability into computer speech recognition systems.

Speech errors. We are developing methods to test whether our findings generalize to more natural speaking contexts, and to further test our dynamical model of errors.