Introduction:
This project addresses the problem of natural child-machine communication specifically targeting preschool age (aged 2.5-4) and early elementary school children (aged 4-6). While the state of art in sensory interfaces to information (e.g., machine speech recognition & synthesis), is still not perfect for the adult population, the task of enabling and measuring the effect of sensory technologies for children poses even greater challenges. Children are a crucial segment of the society that will benefit from advances in multimedia information and communication technologies that educate and entertain. Three major topics addressed by this project are:

Experimental Design and Data Collection:
The major components of the proposed research were highlighted in figure above.
The project is data driven and relies on spoken interaction data of children interacting with a computer agent.
Toward that goal, we have initiated a Wizard of Oz experiment, where a series of age-appropriate cognitive challenges
are implemented in an interactive game with a spoken dialog agent.
[Further details]
The experimental set-up has been constructed and equipped to accommodate this research,
with test room observable from a laboratory through a one-way mirror. High quality audio recordings and
video recordings (two cameras, one focus on the child's faced from the front and the other capturing the child
and the computer from the side).

Data Transcription and Annotation:
The speech data from each session will be organized, transcribed and annotated, by a native English speaker
using modified version of CHILDES format. A second researcher will independently verify transcriptions. The rich transcription
includes orthography, any phonetic modifications of lexical items, details on non-lexical items (disfluencies, speech errors
etc.), and speech act information. Speech Acts coding scheme is adapted from TRAINS and DAMSL coding system
[Speech Act Coding System]
A second stage of transcription will involve adding gestural information from video data, adapting existing tools
and annotation techniques for synchronous analysis of our multimodal speech and video data. To enable an integrated analysis,
a multi track annotation board is constructed using the ANVIL tool kit [M. Kipp, Eurospeech, 1367-1370 (2001)]. Along with speech
transcriptions and acoustic analysis, non-lexical and discourse characteristics, and child's gesture (facial expressions, body movements,
hand/head movements) are annotated in a synchronized multi layer system. [Annotation Details]
Project Members:
| Faculty | Graduate Students |
| S. Narayanan Associate Professor Director, Speech Analysis & Interpretation Laboratory (SAIL) Department of Electrical Engineering-Systems |
S. Yildirim PhD Candidate Electrical Engineering-Systems,USC |
| E. Andersen Professor Linguistics, Psychology, Neuroscience |
S. Montanari PhD Student Linguistics, USC |
Publications: