SAIL Publications
All publications
2012
| and , On Signal Representations within the Bayes Decision Framework (2012), in: Pattern Recognition, 45:5(1853-1865) |
[DOI] [URL] |
| and , Novel Variations of Group Sparse Regularization Techniques with Applications to Noise Robust Automatic Speech Recognition (2012), in: IEEE Transactions on Audio, Speech and Language Processing, 20:4(1337-1346) |
[DOI] [URL] |
| and , Effects of Emotion on the Lower Lip Movements at Phrase Boundaries, in: Proceedings of Speech Prosody, Shanghai, CN, 2012 |
| and , Nearly Optimal Estimation of Mutual Information based on a Complexity Regularized Tree-Structured Partition (2012), in: IEEE Transactions on Information Theory, 58:3(1940 - 1952) |
[DOI] |
| , , and , Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , and , A hierarchical framework for modeling multimodality and emotional evolution in affective dialogs, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , , and , AN ACOUSTIC ANALYSIS OF SHARED ENJOYMENT IN ECA INTERACTIONS OF CHILDREN WITH AUTISM, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , and , CLASSIFICATION OF EMOTIONAL CONTENT OF SIGHS IN DYADIC HUMAN INTERACTIONS, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , and , Analyzing Quality of Crowd-Sourced Speech Transcriptions of Noisy Audio for Acoustic Model Adaptation, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , , and , Creating Ensemble of Diverse Maximum Entropy Models, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , and , OBJECT CLASSIFICATION IN SIDESCAN SONAR IMAGES WITH SPARSE REPRESENTATION TECHNIQUES', in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , , , , and , Automatic Recognition of Emotion Evoked by General Sound Events, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| and , IMPROVEMENTS IN PREDICTING CHILDREN'S OVERALL READING ABILITY BY MODELING VARIABILITY IN EVALUATORS' SUBJECTIVE JUDGMENTS, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , , , and , ANALYZING THE MEMORY OF BLSTM NEURAL NETWORKS FOR ENHANCED EMOTION CLASSIFICATION IN DYADIC SPOKEN INTERACTIONS, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012 |
|
| , and , Supervised acoustic topic model with a consequent classifier for unstructured audio classification, in: Proceedings of the 10th Workshop on Content-Based Multimedia Indexing (CBMI 2012), Annecy, France, 2012 |
| , , and , Improved Imaging of Lingual Articulation Using Real-Time Multislice MRI (2012), in: Journal of Magnetic Resonance Imaging, 35:4(943-948) |
[DOI] [URL] |
| , and , Automatic Speaker Age and Gender Recognition Using Acoustic and Prosodic Level Information Fusion (2012), in: Computer, Speech, and Language |
|
| , and , Enriching machine-mediated speech-to-speech translation using contextual information (2012), in: Computer, Speech, and Language |
[DOI] [URL] |
| , and , Enabling Effective Design of Multimodal Interfaces for Speech-to-Speech Translation System: An Empirical Study of Longitudinal User Behaviors over Time and User Strategies for Coping with Errors (2012), in: Computer, Speech, and Language |
[DOI] [URL] |
| , , and , High-quality bilingual subtitle document alignments with application to spontaneous speech translation (2012), in: Computer, Speech, and Language |
[DOI] [URL] |
| , , , , , and , Paralinguistics in Speech and Language--State-of-the-Art and the Challenge (2012), in: Computer, Speech, and Language |
| , and , Unsupervised Data Processing for Classifier-based Speech Translator (2012), in: Computer, Speech, and Language |
[DOI] [URL] |
2011
| and , Automatic Speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion (2011), in: J. Acoust. Soc. Am. Express Letters, 130:4(EL251-El257) |
[DOI] [URL] |
| , , and , A perplexity based Cover song Matching System for short length queries, in: Proceedings of International Society for Music Information Retrieval Conference (ISMIR), Miami, FL, 2011 |
|
| , , , and , "That's aggravating, very aggravating": Is it possible to classify behaviors in couple interactions using automatically derived lexical features?, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Lecture Notes in Computer Science, Memphis, TN, 2011 |
|
| , , and , Multiple Instance Learning for Classification of Human Behavior Observations, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Lecture Notes in Computer Science, Memphis, TN, 2011 |
|
| , , , , and , Affective State Recognition in Married Couples' Interactions Using PCA-based Vocal Entrainment Measures with Multiple Instance Learning, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Lecture Notes in Computer Science, Memphis, TN, 2011 |
|
| , , and , Emotion Twenty Questions: Toward a Crowd-Sourced Theory of Emotions, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Lecture Notes in Computer Science, Memphis, TN, 2011 |
|
| , and , Auditory-like filterbank: An optimal speech processor for efficient human speech communication (2011), in: Springer Proceedings of Indian Academy of Sciences (Sadhana), 36:5(699-712) |
[URL] |
| , Toward a Computational Approach for Natural Language Description of Emotions, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Memphis, TN, pages 216-223, Springer, 2011 |
|
| , , , and , EMO20Q Questioner Agent, in: Proceedings of Affective Computing and Intelligent Interaction (ACII), Memphis, TN, pages 313-314, Springer, 2011 |
|
| , and , Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End (2011), in: IEEE Transactions on Audio, Speech and Language Processing, 19:8(2418 - 2429) |
[DOI] |
| , , , and , Emotion recognition using a hierarchical binary decision tree approach (2011), in: Speech Communication, 53:9-10(1162-1171) |
[DOI] [URL] |
| , and , Detailed Study of Articulatory Kinematics of Critical Articulators and Non‐critical Articulators of Emotional Speech, in: Proceedings of the Meeting of the Acoustical Society of America, San Diego, California, 2011 |
|
| , and , Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order (2011), in: Magnetic Resonance in Medicine, 65:5(1365-1371) |
[DOI] |
| and , A subject-independent acoustic-to-articulatory inversion, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , and , Tracking Changes in Continuous Emotion States using Body Language and Prosodic Cues, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , and , Iterative Feature Normalization for Emotional Speech Detection, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| and , A Hierarchical Static-Dynamic Framework For Emotion Classification, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , and , Overlapped speech detection using long-term spectro-temporal similarity in stereo recording, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| and , Robust Talking Face Video Verification Using Joint Factor Analysis And Sparse Representation On GMM Mean Shifted Supervectors, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| and , Emotion Classification From Speech Using Evaluator Reliability-Weighted Combination Of Ranked Lists, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , , and , Directional Descriptors Using Zernike Moment Phases For Object Orientation Estimation In Underwater Sonar Images, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), 2011 |
|
| , and , Accurate Transcription Of Broadcast News Speech Using Multiple Noisy Transcribers And Unsupervised Reliability Metrics, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , and , Bilingual Audio-Subtitle Extraction Using Automatic Segmentation Of Movie Audio, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , , , and , Estimation of ordinal approach-avoidance labels in dyadic interactions: ordinal logistic regression approach, in: In Proceedings of IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011 |
|
| , , and , A Novel 16-Channel Receive Coil Array for Accelerated Upper Airway MRI at 3 Tesla (2011), in: Magnetic Resonance in Medicine, 65:6(1711-1717) |
|
| , and , Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures (2011), in: Journal of the Acoustical Society of America, 129:6(4014-4022) |
[DOI] [URL] |
| , , and , Automatic Analysis of Geminate Consonant Articulation using Real-time Magnetic Resonance Imaging, in: Proc. 9th Intl. Seminar on Speech Production (ISSP'11), 2011 |
|
| , and , Morphological Variation in the Adult Vocal Tract: A Study Using rtMRI, in: Proc. 9th Intl. Seminar on Speech Production (ISSP'11), 2011 |
|
| , , , and , Automatic identification of stable modes and fluctuations in a repetitive task using real‐time MRI, in: Proc. International Seminar on Speech Production (ISSP'11), Montreal, Canada, 2011 |
|
| and , Information theoretic analysis of direct and estimated articulatory features for phonetic discrimination, in: Proc. International Seminar on Speech Production (ISSP'11), Montreal, Canada, 2011 |
|
| , , , and , Planning and execution in soprano singing and speaking behavior: An acoustic/articulatory study using real‐time MRI, in: Proc. International Seminar on Speech Production (ISSP'11), Montreal, Canada, 2011 |
|
| , , and , An MRI study of articulatory settings of L1 and L2 speakers of American English, in: Proc. International Seminar on Speech Production (ISSP'11), Montreal, Canada, 2011 |
|
| , , and , Temporal Coupling Between Speech and Manual Motor Actions, in: Proc. 9th Intl. Seminar on Speech Production (ISSP'11), 2011 |
|
| , , , and , Design of an Emotionally Targeted Interactive Agent for Children with Autism, in: Proceedings of IEEE International Conference on Multimedia & Expo (ICME), Barcelona, Spain, pages 1-6, 2011 |
[DOI] |
| , and , Detecting emotional state of a child in a conversational computer game (2011), in: Computer Speech and Language, 25:1(29-44) |
|
| and , Joint source-filter optimization for robust glottal source estimation in the presence of shimmer and jitter (2011), in: Speech Communication, 53:1(98-109) |
|
| , , , and , SailAlign: Robust long speech-text alignment, in: Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research, 2011 |
[URL] |
| , and , Behavioral Signal Processing for Understanding (Distressed) Dyadic Interactions: Some Recent Developments, in: Third International Workshop on Social Signal Processing (SSPW’11), ACM Multimedia’11, Scottsdale, AZ, pages 7-12, 2011 |
|
| , , and , Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis, in: NIST 2011 Speaker Recognition Workshop, Atlanta, 2011 |
|
| , , and , EmotiWord: Affective Lexicon Creation with Application to Interaction and Multimedia Data, in: MUSCLE International Workshop on Computational Intelligence for Multimedia Understanding, Pisa, Italy, 2011 |
| , and , Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , and , "You made me do it": Classification of Blame in Married Couples' Interaction by Fusing Automatically Derived Speech and Language Information, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , and , The USC CARE Corpus: Child-Psychologist Interactions of Children with Autism Spectrum Disorders, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , and , Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors., in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , and , Enhancements to the Training Process of Classifier-based Speech Translator via Topic Modeling., in: Proceedings of Interspeech, Florence, Italy, 2011 |
| and , Analysis of inter-articulator correlation in acoustic-to-articulatory inversion using generalized smoothness criterion, in: Proceedings of Interspeech, Florence, Italy, 2011 |
| , , and , Automatic Identification of Salient Acoustic Instances in Couples' Behavioral Interactions using Diverse Density Support Vector Machines, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Validating rt-MRI based articulatory representations via articulatory recognition, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Determining What Questions To Ask, with the Help of Spectral Graph Theory, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , and , An Exploratory Study of the Relations between Perceived Emotion Strength and Articulatory Kinematics, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Visualization of vocal tract shape using interleaved real-time MRI of multiple scan planes., in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , and , An Analysis of PCA-based Vocal Entrainment Measures in Married Couples' Affective Spoken Interactions, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Speaker Verification using Sparse Representations on Total Variability I-Vectors, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , and , Kernel models for affective lexicon creation, in: Proceedings of Interspeech, Florence, Italy, 2011 |
| , , and , A study of the effectiveness of articulatory strokes for phonemic recognition, in: Proceedings of Interspeech, Florence, Italy, 2011 |
| , , , , and , Analyzing the Nature of ECA Interactions in Children with Autism, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , , , , , and , A Multimodal Real-Time MRI Articulatory Corpus for Speech Research, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , and , Direct Estimation of Articulatory Kinematics from Real-time Magnetic Resonance Image Sequences, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , and , Automatic Data-Driven Learning of Articulatory Primitives from Real-time MRI Data using Convolutive NMF with Sparseness Constraints, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , and , Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions, in: Proceedings of Interspeech, Florence, Italy, 2011 |
|
| , , , , , , and , Modeling High-Level Descriptions of Real-Life Physical Activities Using Latent Topic Modeling of Multimodal Sensor Signals, in: 33rd Annual International IEEE EMBS conferemce (EMBC), Boston, USA, 2011 |
|
| , , and , Automatically Assessing the ABCs: Verification of Children's Spoken Letter-Names and Letter-Sounds (2011), in: ACM Transactions on Speech and Language Processing, 7:4(15:1 - 15:17) |
|
| , , and , A generative student model for scoring word reading skills (2011), in: IEEE Transactions on Audio, Speech, and Language Processing, 19:2(348 - 360) |
[DOI] [URL] |
| , and , Robust voice activity detection using long-term signal variability (2011), in: IEEE Transactions on Audio, Speech and Language Processing, 19:3(600-613) |
[DOI] [URL] |
| , and , A Framework for Automatic Human Emotion Classification Using Emotional Profiles (2011), in: IEEE Transactions on Audio, Speech and Language Processing, 19:5(1057-1070) |
[DOI] [URL] |
| , and , Automatic Prediction of Children's Reading Ability for High-level Literacy Assessment (2011), in: IEEE Transactions on Audio, Speech and Language Processing, 19:4(1015 - 1028) |
[DOI] [URL] |
| , , , , , , and , Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection (2011), in: IEEE Transactions on Signal Processing, 59:4(1843-1857) |
[DOI] |
| , , , , , , and , Recognition of Physical Activities in Overweight Hispanic Youth Using KNOWME Networks (2011), in: Journal of Physical Activity & Health |
| , and , Automatic Analysis of Singleton and Geminate Consonant Articulation using Real-time Magnetic Resonance Imaging, Proceedings of Interspeech, Florence, Italy, 2011, 2011 |
|
| , , , , and , Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification (2011), in: IEEE Transactions on Affective Computing |
| , , , , , , and , Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features (2011), in: Speech Communication |
[DOI] [URL] |
2010
| , and , Improved Real-time MRI of Oral-Velar Coordination Using a Golden-ratio Spiral View Order, in: In Proceedings of InterSpeech, Makuhari, Japan, 2010 |
|
| , , and , Rapid Semi-automatic Segmentation of Real-time Magnetic Resonance Images for Parametric Vocal Tract Analysis, in: In Proceedings of InterSpeech, 2010 |
|
| , and , Data-Driven Analysis of Realtime Vocal Tract MRI using Correlated Image Regions, in: In Proceedings of InterSpeech, 2010 |
|
| , , and , Investigating Articulatory Setting—Pauses, Ready Position, and Rest—Using Real-Time MRI, in: In Proceedings of InterSpeech, 2010 |
|
| , , and , Statistical multi-stream modeling of real-time MRI articulatoryspeech data, in: In Proceedings of InterSpeech, 2010 |
|
| and , Vocal tract contour analysis of emotional speech by the functional data curve representation, in: In Proceedings of InterSpeech, 2010 |
|
| , , , and , Automatic Speech Recognition System Channel Modeling, in: In Proceedings of InterSpeech, 2010 |
|
| , , and , Robust voice activity detection in stereo recording with crosstalk, in: In Proceedings of InterSpeech, 2010 |
|
| , and , Hierarchical Classification for Speech-to-Speech Translation, in: In Proceedings of InterSpeech, 2010 |
|
| , , , and , Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling, in: In Proceedings of InterSpeech, 2010 |
|
| , and , Acoustic Feature Analysis in Speech Emotion Primitives Estimation, in: In Proceedings of InterSpeech, 2010 |
|
| , and , A Study of Interplay between Articulatory Movement and Prosodic Characteristics in Emotional Speech Production, in: In Proceedings of InterSpeech, 2010 |
|
| , , and , A Study of Intra-Speaker and Inter-Speaker Affective Variability using Electroglottograph and Inverse Filtered Glottal Waveforms, in: In Proceedings of InterSpeech, 2010 |
|
| , , and , A Cluster-Profile Representation of Emotion Using Agglomerative Hierarchical Clustering, in: In Proceedings of InterSpeech, 2010 |
|
| and , Data-dependent evaluator modeling and its application to emotional valence classification from speech, in: In Proceedings of InterSpeech, 2010 |
|
| , , , , and , A new multichannel multimodal dyadic interaction database, in: In Proceedings of InterSpeech, 2010 |
|
| , , , , , , and , Automatic Classification of Married Couples' Behavior using Audio Features, in: In Proceedings of InterSpeech, 2010 |
|
| , , , , , , and , Quantification of Prosodic Entrainment in Affective Spontaneous Spoken Interactions of Married Couples, in: In Proceedings of InterSpeech, 2010 |
|
| and , An Improved Cluster Model Selection Method for Agglomerative Hierarchical Speaker Clustering using Incremental Gaussian Mixture Models, in: In Proceedings of InterSpeech, 2010 |
|
| , , , and , A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification, in: In Proceedings of InterSpeech, 2010 |
|
| , , , , , and , The INTERSPEECH 2010 Paralinguistic Challenge, in: In Proceedings of InterSpeech, 2010 |
|
| , , , , , , , , , , , , , , , , and , Ada and Grace: Toward Realistic and Engaging Virtual Museum Guides, in: In Proceedings of the 10th International Conference on Intelligent Virtual Agents (IVA), 2010 |
| , and , Combining Five Acoustic Level methods for Automatic Speaker Age and Gender Recognition, in: In Proceedings of InterSpeech, Makuhari, Japan, 2010 |
|
| , and , Locally-Weighted Regression for Estimating the Forward Kinematics of a Geometric Vocal Tract Model, in: Interspeech, Makuhari, Japan, 2010 |
|
| , and , An N-gram model for unstructured audio signals toward information retrieval, in: In Proceedings of Multimedia Signal Processing (MMSP), 2010 |
|
| , and , Para-Linguistic Mechanisms of Production in Human ’Beatboxing’: a Real-time MRI Study, in: Proceedings of InterSinging 2010, 2010 |
|
| , , and , A joint acoustic-articulatory study of nasal spectral reduction in read versus spontaneous speaking styles, in: Proceedings of the Speech Prosody Conference, 2010 |
|
| , , , and , The USC CreativeIT Database: A Multimodal Database of Theatrical Improvisation, in: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality (MMC), 2010 |
|
| , , and , Multimodal speaker segmentation and identification in presence of overlapped speech segments (2010), in: Journal of Multimedia, Special Issue on Data Semantics and Multimedia Information Management, 5:4(322-331) |
[DOI] |
| , , and , Visual emotion recognition using compact facial representations and viseme information, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| , and , Decision level combination of multiple modalities for recognition and analysis of emotional expression, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| and , Predicting interruptions in dyadic spoken interactions, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| , , and , Using naive text queries for robust audio information retrieval, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| , and , An exploratory study of manifolds of emotional speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| , and , Language model adaptation using WWW documents obtained by utterance-based queries, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2010 |
|
| and , Bark frequency transform using an arbitrary order allpass filter (2010), in: IEEE Signal Processing Letters, 17:6(543--546) |
|
| and , On Data-Driven Histogram-Based Estimation for Mutual Information, in: Proceedings of The IEEE International Symposium on Information Theory (ISIT), Austin, TX, 2010 |
|
| and , A Near-Optimal (Minimax) Tree-Structured Partition for Mutual Information Estimation, in: Proceedings of The IEEE International Symposium on Information Theory (ISIT), Austin, TX, 2010 |
|
| and , Non-Product Data-Dependent Partitions for Mutual Information Estimation: Strong Consistency and Applications (2010), in: IEEE Transactions on Signal Processing, 58:7(3497-3511) |
|
| , , and , Speech Emotion Estimation in 3D Space, in: Proceedings of 2010 IEEE International Conference on Multimedia & Expo (ICME 2010), Singapore, 2010 |
| , and , Robust Representations for Out-of-Domain Emotions Using Emotion Profiles, in: IEEE Workshop on Spoken Language Technology (SLT), Berkeley, CA, 2010 |
|
| , and , Supervised acoustic topic model for unstructured audio information retrieval, in: Proceedings of Asia Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference, 2010 |
|
| , , , , , and , Optimal Arousal Identification and Classification for Affective Computing Using Physiological Signals: Virtual Reality Stroop Task (2010), in: IEEE Transactions on Affective Computing, 1:2(109--118) |
[DOI] [URL] |
| , , , , , , , , , , , , , , , , and , Virtual Museum Guides, in: IEEE Workshop on Spoken Language Technology (SLT), Berkeley, CA, 2010 |
| and , Robust ECG Biometrics by Fusing Temporal and Cepstral Information, in: Proceedings of 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010 |
|
| , and , Towards modeling user behavior in interactions mediated through an automated bidirectional speech translation system (2010), in: Computer Speech and Language, 24:2(232-256) |
|
| and , Information Divergence Estimation based on Data-Dependent Partitions (2010), in: Journal of Statistical Planning and Inference, 140:11(3180-3198) |
|
| , and , Robust Multimodal Person Recognition Using Low-Complexity Audio-Visual Feature Fusion Approaches (2010), in: International Journal of Semantic Computing, 4:2(155-179) |
|
| , , , , , , , and , Multimodal physical activity recognition by fusing temporal and cepstral information (2010), in: IEEE Transactions on Neural Systems and Rehabilitation Engineering, 18:4(369-380) |
|
| and , A generalized smoothness criterion for acoustic-to-articulatory inversion (2010), in: J. Acoust. Soc. Am., 128:4(2162-2172) |
|
| , , and , Acoustic Stopwords for unstructured audio information retrieval, in: Proceedings of European Signal Processing Conference (EUSIPCO), Aalborg, Denmark, 2010 |
|
| , and , Unstructured Environmental Audio: Representation, Classification and Modeling, chapter 1, pages 1-21, Information Science Reference (IGI Global), Machine Audition: Principles, Algorithms and Systems, 2010 |
|
| , , , , , , and , KNOWME: An Energy-Efficient, Multimodal Body Area Network for Physical Activity Monitoring (2010), in: ACM Transactions on Embedded Computing Systems |
|
| , , and , Gestural control in the English past-tense suffix: an articulatory study using real time MRI, in: LabPhon, 2010 |
|
| and , Real-time MRI investigation of resonance tuning in soprano singing (2010), in: J. Acoust. Soc. Am. Express Letters, 128:5(EL335-EL341) |
[DOI] [URL] |
| , , and , Temporal analysis of articulatory speech errors using direct image analysis of real time magnetic resonance imaging (2010), in: Journal of the Acoustical Society of America, 128:4(2289-2289) |
2009
| and , Pitch contour stylization using an optimal piecewise polynomial approximation (2009), in: IEEE Signal Processing Letters, 16:9(810-813) |
|
| , , , , , and , Interpreting ambiguous emotional expressions, in: Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII), 2009 |
|
| , , and , Automatically rating pronunciation through articulatory phonology, in: Proceedings of InterSpeech, 2009 |
|
| , , , and , Connecting rhythm and prominence in automatic ESL pronunciation scoring, in: Proceedings of InterSpeech, 2009 |
| , , , , and , An articulatory analysis of phonological transfer using real-time MRI, in: Proceedings of InterSpeech, 2009 |
|
| , , and , Predicting children’s reading ability using evaluator-informed features, in: Proceedings of InterSpeech, 2009 |
|
| , , , and , Estimation of articulatory gesture patterns from speech acoustics, in: Proceedings of InterSpeech, 2009 |
|
| and , Improved speaker diarization of meeting speech with recurrent selection of representative speech segments and participant interaction pattern modeling, in: Proceedings of InterSpeech, 2009 |
|
| and , Signature cluster model selection for incremental Gaussian mixture cluster modeling in agglomerative hierarchical speaker clustering, in: Proceedings of InterSpeech, 2009 |
|
| , and , A detailed study of word-position effects on emotion expression in speech, in: Proceedings of InterSpeech, 2009 |
|
| , , and , Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions, in: Proceedings of InterSpeech, 2009 |
|
| , , , and , Emotion recognition using a hierarchical binary decision tree approach, in: Proceedings of InterSpeech, 2009 |
|
| , and , Evaluating evaluators: A case study in understanding the benefits and pitfalls of multi-evaluator modeling, in: Proceedings of InterSpeech, 2009 |
|
| , , and , Context-driven automatic bilingual movie subtitle alignment, in: Proceedings of InterSpeech, 2009 |
|
| , , , , , and , Energy-efficient multihypothesis activity-detection for health-monitoring applications, in: Proceedings of the Annual International IEEE Engineering in Medicine and Biology Society (EMBS) Conference, 2009 |
|
| and , Continuous speech recognition using attention shift decoding with soft decision, in: Proceedings of InterSpeech, 2009 |
|
| , , , , , , , , , and , Assessment of emerging reading skills in young native speakers and language learners (2009), in: Speech Communication, 51:10(968–984) |
|
| , and , Combining lexical, syntactic and prosodic cues for improved online dialog act tagging (2009), in: Computer Speech and Language, 23:4(407-422) |
|
| , , , and , Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation (2009), in: Journal of the Acoustical Society of America Express Letters, 126:5(EL160-EL165) |
|
| , and , Acoustic topic model for audio information retrieval, in: Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2009 |
|
| , and , Saliency-driven unstructured acoustic scene classification using latent perceptual indexing, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), 2009 |
|
| , , and , Comparison of child-human and child-computer interactions based on manual annotations, in: Proceedings of the Workshop on Child, Computer and Interaction, 2009 |
|
| and , Recognizing child’s emotional state in problem-solving child-machine interactions, in: Proceedings of the Workshop on Child, Computer and Interaction, 2009 |
|
| , , and , A review of ASR technologies for children’s speech, in: Proceedings of the Workshop on Child, Computer and Interaction, 2009 |
|
| , and , Analysis of emotionally salient aspects of fundamental frequency for emotion detection (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:4(582-596) |
|
| and , Discriminative wavelet packet filter bank selection for pattern recognition (2009), in: IEEE Transactions on Signal Processing, 57:5(1796-1810) |
|
| , , , and , An articulatory study of lexicalized and epenthetic schwa using real time magnetic resonance imaging, in: Proceedings of the Meeting of the Acoustical Society of America, 2009 |
|
| , , , and , Real-time MRI tracking of articulation during grammatical and ungrammatical pauses in speech, in: Proceedings of the Meeting of the Acoustical Society of America, 2009 |
|
| and , Region segmentation in the frequency domain applied to upper airway real-time magnetic resonance images (2009), in: IEEE Transactions on Medical Imaging, 28:3(323-338) |
[URL] |
| and , Histogram-based estimation for the divergence revisited, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 468-472, 2009 |
|
| and , A divide-and-conquer approach to latent perceptual indexing of audio for large web 2.0 applications, in: Proceedings of the International Conference on Multimedia & Expo (ICME), pages 466-469, 2009 |
|
| , , , , , , and , Optimal allocation of time-resources for multihypothesis activity-level detection, in: Proceedings of the International Conference on Distributed Computing in Sensor Systems (DCOSS), pages 273-286, 2009 |
|
| , , , , , , and , Differentiating physical activity modalities in youth using heartbeat waveform shape and differences between adjacent waveforms, in: Proceedings of the International Conference on Diet and Activity Methods (ICDAM), 2009 |
| , and , Accelerated 3D upper airway MRI using compressed sensing (2009), in: Magnetic Resonance in Medicine, 61:6(1434-1440) |
|
| , , , , , , and , Sensing for obesity: KNOWME implementation and lessons for an architect, in: Proceedings of the Workshop on Biomedicine in Computing: Systems, Architectures, and Circuits (BiC), 2009 |
|
| and , Closure duration analysis of incomplete stop consonants due to stop-stop interaction (2009), in: Journal of the Acoustical Society of America, 126:1(EL1-EL7) |
|
| and , Prominence detection using auditory attention cues and task-dependent high level information (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:5(1009-1024) |
|
| , , and , Timing effects of syllable structure and stress on nasals: A real-time MRI examination (2009), in: Journal of Phonetics, 37:1(97-110) |
|
| , , and , An iterative relative entropy minimization based data selection approach for n-gram model adaptation (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:1(13-23) |
|
| and , Automatic detection of disfluency boundaries in spontaneous speech of children using audio-visual information (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:1(2-12) |
|
| and , Unsupervised adaptation of categorical prosody models for prosody labeling and speech recognition (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:1(138-149) |
|
| , and , Effect of bandwidth extension to telephone speech recognition in cochlear implant users (2009), in: Journal of the Acoustical Society of America, 125:2(EL77-EL83) |
|
| , and , Lattice-based lexical cues for word fragment detection in conversational speech, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2009 |
|
| , and , A low-complexity dynamic face-voice feature fusion approach to multimodal person recognition, in: Proceedings of the IEEE International Symposium on Multimedia (ISM), 2009 |
|
| , , and , Audio scene understanding using topic models, in: Proceedings of the Neural Information Processing Systems (NIPS) Workshop, 2009 |
|
| , and , Environmental sound recognition with time–frequency audio features (2009), in: IEEE Transactions on Audio, Speech, and Language Processing, 17:6(1142-1158) |
[URL] |
| , and , Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information (2009), in: IEEE Transactions on Multimedia, 11:5(843-855) |
|
| , , , and , Automatic pronunciation verification of English letter-names for early literacy assessment of preliterate children, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
| , and , An analysis of articulatory-acoustic data based on articulatory strokes, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4493-4496, 2009 |
|
| , and , A robust harmony structure modeling scheme for classical music opus identification, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1961-1964, 2009 |
|
| , , and , Robust word boundary detection in spontaneous speech using acoustic and lexical cues, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4785-4788, 2009 |
|
| , and , Accelerated 3D MRI of vocal tract shaping using compressed sensing and parallel imaging, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 389-392, 2009 |
|
| , , , , , , and , Optimal time-resource allocation for activity-detection via multimodal sensing, in: Proceedings of the International Conference on Body Area Networks (BodyNets), 2009 |
|
| , and , A semi-supervised learning approach to online audio background detection, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1629-1632, 2009 |
|
| and , Human-centric interfaces for ambient intelligence, in: Speech Synthesis Systems in Ambient Intelligence Environments, Elsevier, 2009 |
| , , , and , Articulatory comparison of Tamil liquids and stops using real-time Magnetic Resonance Imaging. (2009), in: Journal of the Acoustical Society of America, 125:4(2568-2568) |
[URL] |
2008
| and , The expression and perception of emotions: Comparing assessments of self versus others, in: Proceedings of InterSpeech, pages 257-260, 2008 |
|
| and , Scripted dialogs versus improvisation: Lessons learned about emotional elicitation techniques from the IEMOCAP database, in: Proceedings of InterSpeech, pages 1670-1673, 2008 |
|
| , and , Factored translation models for enriching spoken language translation with prosody, in: Proceedings of InterSpeech, pages 2723-2726, 2008 |
|
| and , Combining task-dependent information with auditory attention cues for prominence detection in speech, in: Proceedings of InterSpeech, pages 1064-1067, 2008 |
|
| , and , An analysis of multimodal cues of interruption in dyadic spoken interactions, in: Proceedings of InterSpeech, pages 1678-1681, 2008 |
|
| and , Tree grammars as models of prosodic structure, in: Proceedings of InterSpeech, pages 2286-2289, 2008 |
|
| and , Better nonnative intonation scores through prosodic theory, in: Proceedings of InterSpeech, pages 1813-1816, 2008 |
|
| , , , and , Pronunciation verification of English letter-sounds in preliterate children, in: Proceedings of InterSpeech, pages 2783-2786, 2008 |
|
| , , and , Estimation of children's reading ability by fusion of automatic pronunciation verification and fluency detection, in: Proceedings of InterSpeech, pages 2779-2782, 2008 |
|
| , and , Towards unsupervised training of the classifier-based speech translator, in: Proceedings of InterSpeech, pages 2739-2742, 2008 |
|
| and , Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling, in: Proceedings of InterSpeech, pages 20-23, 2008 |
|
| , and , An interval type-2 fuzzy logic system to translate between emotion-related vocabularies, in: Proceedings of InterSpeech, pages 2747-2750, 2008 |
|
| , , , , and , An analysis of vocal tract shaping in English sibilant fricatives using real-time magnetic resonance imaging, in: Proceedings of InterSpeech, pages 2823-2826, 2008 |
|
| , and , Relation between geometry and kinematics of articulatory trajectory associated with emotional speech production, in: Proceedings of InterSpeech, pages 2290-2293, 2008 |
|
| , and , A generative model for scoring children's reading comprehension, in: Proceedings of the Workshop on Child, Computer and Interaction, 2008 |
|
| , and , An empirical analysis of user uncertainty in problem-solving child-machine interactions, in: Proceedings of the Workshop on Child, Computer and Interaction, 2008 |
|
| , and , Linguistic analysis of spontaneous children speech, in: Proceedings of the Workshop on Child, Computer and Interaction, 2008 |
|
| , and , The SAIL speaker diarization system for analysis of spontaneous meetings, in: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), pages 966-971, 2008 |
|
| and , Dynamic chroma feature vectors with applications to cover song identification, in: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), pages 984-987, 2008 |
|
| , and , Strategies to improve the robustness of agglomerative hierarchical clustering under data source variation for speaker diarization (2008), in: IEEE Transactions on Audio, Speech, and Language Processing, 16:8(1590-1601) |
|
| , , , , , , , , and , Multimodal sensing for pediatric obesity applications, in: Proceedings of the International Workshop on Urban, Community, and Social Applications of Networked Sensing Systems (UrbanSense), pages 21-25, 2008 |
|
| , , , and , Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging (2008), in: IEEE Signal Processing Magazine, 25:3(123-132) |
|
| , , and , Effect of spectral normalization on different talker speech recognition by cochlear implant users (2008), in: Journal of the Acoustical Society of America, 123:5(2836-2847) |
|
| , and , Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework (2008), in: IEEE Transactions on Audio, Speech, and Language Processing, 16:4(797-811) |
|
| and , Recording audio-visual emotional databases from actors: A closer look, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pages 17-22, 2008 |
|
| , , and , Detecting prominence in conversational speech: Pitch accent, givenness and focus, in: Proceedings of the Speech Prosody Conference, pages 453-456, 2008 |
|
| and , Data-driven unsupervised adaptation of acoustic-prosodic models, in: Proceedings of the Speech Prosody Conference, pages 161-164, 2008 |
|
| and , On the robustness of overall F0-only modifications to the perception of emotions in speech (2008), in: Journal of the Acoustical Society of America, 123:6(4547-4558) |
|
| , and , The Vera am Mittag German audio-visual emotional speech database, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 865-868, 2008 |
|
| , and , Music fingerprint extraction for classical music cover song identification, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 1261-1264, 2008 |
|
| , , and , Joint-processing of audio-visual signals in human perception of conflicting synthetic character emotions, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 961-964, 2008 |
|
| and , Classification of sound clips by two schemes: using onomatopoeia and semantic labels, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 1341-1344, 2008 |
|
| , and , Enriching spoken language translation with dialog acts, in: Proceedings of the Association for Computational Linguistics (ACL) Conference, pages 225-228, 2008 |
|
| , and , Knowledge as a constraint on uncertainty for unsupervised classification: A study in part-of-speech tagging, in: Proceedings of the International Conference on Machine Learning (ICML), 2008 |
|
| and , Using articulatory representations to detect segmental errors in nonnative pronunciation (2008), in: IEEE Transactions on Audio, Speech, and Language Processing, 16:1(8-22) |
|
| and , Automatic prosodic event detection using acoustic, lexical, and syntactic evidence (2008), in: IEEE Transactions on Audio, Speech, and Language Processing, 16:1(216-228) |
|
| , and , On energy-based acoustic source localization for sensor networks (2008), in: IEEE Transactions on Signal Processing, 56:1(365-377) |
|
| , , and , Challenging uncertainty in query by humming systems: A fingerprinting approach (2008), in: IEEE Transactions on Audio, Speech, and Language Processing, 16:2(359-371) |
|
| , , , , , , , and , IEMOCAP: Interactive emotional dyadic motion capture database (2008), in: Journal of Language Resources and Evaluation, 42:4(335-359) |
|
| , and , Audio-visual emotion recognition using Gaussian mixture models for face and voice, in: Proceedings of the IEEE International Symposium on Multimedia (ISM), pages 250-257, 2008 |
|
| , and , Selection of emotionally salient audio-visual features for modeling human evaluations of synthetic character emotion displays, in: Proceedings of the IEEE International Symposium on Multimedia (ISM), pages 190-195, 2008 |
|
| , , and , Multimodal speaker segmentation in presence of overlapped speech segments, in: Proceedings of the IEEE International Symposium on Multimedia (ISM), pages 679-684, 2008 |
|
| , and , Mitigation of data sparsity in classifier-based translation, in: Proceedings of the International Conference on Computational Linguistics (COLING), pages 1-4, 2008 |
|
| and , Investigating automatic assessment of reading comprehension in young children, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5057-5060, 2008 |
|
| , and , Environmental sound recognition using MP-based features, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1-4, 2008 |
[URL] |
| , and , Modeling the intonation of discourse segments for improved online dialog act tagging, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5033-5036, 2008 |
|
| and , A novel algorithm for unsupervised prosodic language model adaptation, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4181-4184, 2008 |
|
| and , Fine-grained pitch accent and boundary tone labeling with parametric F0 features, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4545-4548, 2008 |
|
| , and , Automatic classification of question turns in spontaneous speech using lexical and prosodic evidence, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5005-5008, 2008 |
|
| , , and , Human perception of synthetic character emotions in the presence of conflicting and congruent vocal and facial expressions, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 2201-2204, 2008 |
|
| and , A novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4373-4376, 2008 |
|
| and , Audio retrieval by latent perceptual indexing, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 49-52, 2008 |
|
| and , A top-down auditory attention model for learning task dependent influences on prominence detection in speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 3981-3984, 2008 |
|
| , and , Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4629-4632, 2008 |
|
| , , and , Fundamental frequency analysis for speech emotion processing, in: The Role of Prosody in Affective Speech, pages 309-337, Peter Lang Publishing Group, 2008 |
2007
| and , Early auditory processing inspired features for robust automatic speech recognition, in: Proceedings of European Signal Processing Conference (EUSIPCO), 2007 |
|
| , , , , , , and , Hassan: A Virtual Human for Tactical Questioning, in: Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, ACL, Antwerp, Belgium, pages 71-74, 2007 |
|
| , , , , , , , , , , , , and , A system for technology based assessment of language and literacy in young children: The role of multiple information sources, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 26-30, 2007 |
|
| , , and , Multimodal meeting monitoring: Improvements on speaker tracking and segmentation through a modified mixture particle filter, in: Proceedings of IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 60-65, 2007 |
|
| , , and , Statistical modeling and retrieval of polyphonic music, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 405-409, 2007 |
|
| , and , Analyzing the multimodal behaviors of users of a speech-to-speech translation device by using concept matching scores, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 259-263, 2007 |
|
| , , and , Real-time emotion detection system using speech: Multi-modal fusion of different timescale features, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 48-51, 2007 |
|
| and , Joint analysis of the emotional fingerprint in the face and speech: A single subject study, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 43-47, 2007 |
|
| and , Experiments in automatic genre classification of full-length music tracks using audio activity rate, in: Proceedings of the IEEE International Workshop on Multimedia Signal Processing (MMSP), pages 98-102, 2007 |
|
| and , Interrelation between speech and facial gestures in emotional utterances: A single subject study (2007), in: IEEE Transactions on Audio, Speech, and Language Processing, 15:8(2331-2347) |
|
| and , Robust speech rate estimation for spontaneous speech (2007), in: IEEE Transactions on Audio, Speech, and Language Processing, 15:8(2190-2201) |
|
| , , and , Primitives-based evaluation and estimations of emotions in speech (2007), in: Speech Communication, 49:10-11(787-800) |
|
| , , , and , Rigid head motion in expressive speech animation: Analysis and synthesis (2007), in: IEEE Transactions on Audio, Speech, and Language Processing, 15:3(1075-1086) |
|
| and , Universal consistency of data-driven partitions for divergence estimation, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 2021-2025, 2007 |
|
| and , Automatic acoustic synthesis of human-like laughter (2007), in: Journal of the Acoustical Society of America, 121:1(527-535) |
|
| and , Robust speaker identification based on selective use of feature vectors (2007), in: Pattern Recognition Letters, 28:1(85-89) |
|
| and , An acoustic measure for word prominence in spontaneous speech (2007), in: IEEE Transactions on Audio, Speech, and Language Processing, 15:2(690-701) |
|
| , and , Robust speaker clustering strategies to data source variation for improved speaker diarization, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, pages 262-267, 2007 |
|
| , and , Analysis of emotional speech prosody in terms of part of speech tags, in: Proceedings of InterSpeech, pages 626-629, 2007 |
|
| , and , Using neutral speech models for emotional speech analysis, in: Proceedings of InterSpeech, pages 2225–2228, 2007 |
|
| , and , Pitch period estimation using multipulse model and wavelet transform, in: Proceedings of InterSpeech, pages 2761-2764, 2007 |
|
| and , A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system, in: Proceedings of InterSpeech, pages 1853-1856, 2007 |
|
| and , A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech, in: Proceedings of InterSpeech, pages 1941-1944, 2007 |
|
| and , Prosody-enriched lattices for improved syllable recognition, in: Proceedings of InterSpeech, pages 1813-1816, 2007 |
|
| , and , Exploiting prosodic features for dialog act tagging in a discriminative modeling framework, in: Proceedings of InterSpeech, pages 150-153, 2007 |
|
| , , , , , , , and , A Bayesian network classifier for word-level reading assessment, in: Proceedings of InterSpeech, pages 2185–2188, 2007 |
|
| , and , A text-free approach to assessing nonnative intonation, in: Proceedings of InterSpeech, pages 2169-2172, 2007 |
|
| , , , and , Automatic detection and classification of disfluent reading miscues in young children's speech for the purpose of assessment, in: Proceedings of InterSpeech, pages 206-209, 2007 |
|
| , , and , Investigating implicit cues for user state estimation in human-robot interaction using physiological measurements, in: Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pages 1125-1130, 2007 |
|
| and , Minimum probability of error signal representation, in: Proceedings of the IEEE Machine Learning for Signal Processing (MLSP) Workshop, pages 348-353, 2007 |
|
| , and , Support vector regression for automatic recognition of spontaneous emotions in speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1085-1088, 2007 |
|
| and , Optimal wavelet packets decomposition based on a rate-distortion optimality criterion, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 817-820, 2007 |
|
| , and , Real-time monitoring of participants' interaction in a meeting using audio-visual sensors, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 685-688, 2007 |
|
| , , and , Information theoretic analysis of direct articulatory measurements for phonetic discrimination, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 457-460, 2007 |
|
| and , Analysis of audio clustering using word descriptions, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 769-772, 2007 |
|
| and , Improved speech recognition using acoustic and lexical correlates of pitch accent in a N-best rescoring framework, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 873-876, 2007 |
|
| , and , Data driven approach for language model adaptation using stepwise relative entropy minimization, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 177-180, 2007 |
|
| , and , A statistical approach for modeling prosody features using POS tags for emotional speech synthesis, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1237-1240, 2007 |
|
| and , Discriminating two types of noise sources using cortical representation and dimension reduction technique, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 213-216, 2007 |
|
| , and , Exploiting acoustic and syntactic features for prosody labeling in a maximum entropy framework, in: Proceedings of the Human Language Technologies (HLT) Conference, pages 797–811, 2007 |
|
| , , and , Learning expressive human-like head motion sequences from speech, in: Data-Driven 3D Facial Animations, pages 113-131, Springer-Verlag Press, 2007 |
|
2006
| , , , , and , Expressive facial animation synthesis by learning speech coarticulation and expression spaces (2006), in: IEEE Transactions on Visualization and Computer Graphics, 12:6(1523-1534) |
[URL] |
| , and , A split lexicon approach for improved recognition of spoken names (2006), in: Speech Communication, 48:9(1126-1136) |
|
| , and , Acoustic analysis and automatic recognition of spontaneous children's speech, in: Proceedings of InterSpeech, 2006 |
|
| and , Combining acoustic, lexical, and syntactic evidence for automatic unsupervised prosody labeling, in: Proceedings of InterSpeech, pages 297–300, 2006 |
|
| , , , and , A study of emotional speech articulation using a fast magnetic resonance imaging technique, in: Proceedings of InterSpeech, 2006 |
|
| , , , , , and , Pronunciation verification of children's speech for automatic literacy assessment, in: Proceedings of InterSpeech, 2006 |
|
| , , , , , and , Automatic detection of voice onset time contrasts for use in pronunciation assessment, in: Proceedings of InterSpeech, 2006 |
|
| , and , "Yeah right": Sarcasm recognition for spoken dialogue systems, in: Proceedings of InterSpeech, pages 1838-1841, 2006 |
|
| , , , , , and , Radiobot-CFF: A spoken dialogue system for military training, in: Proceedings of InterSpeech, 2006 |
|
| , and , Cross-lingual dialog model for speech to speech translation, in: Proceedings of InterSpeech, 2006 |
|
| , , and , Combining categorical and primitives-based emotion recognition, in: Proceedings of the European Signal Processing Conference (EUSIPCO), 2006 |
|
| and , Analysis of disfluent repetitions in spontaneous speech recognition, in: Proceedings of the European Signal Processing Conference (EUSIPCO), 2006 |
|
| , , and , Synchronized and noise-robust audio recordings during realtime magnetic resonance imaging scans (2006), in: Journal of the Acoustical Society of America, 120:4(1791-1794) |
|
| , and , Using model trees for evaluating dialog error conditions based on acoustic speech Information, in: Proceedings of the International Workshop on Human-Centered Multimedia (HCM), 2006 |
|
| , and , User modeling in a speech translation driven mediated interaction setting, in: Proceedings of the International Workshop on Human-Centered Multimedia (HCM), pages 75-80, 2006 |
|
| , , , and , A dictionary based approach for robust and syllable-independent audio input transcription for query by humming systems, in: Proceedings of the Audio and Music Computing for Multimedia (AMCMM) Workshop, pages 37-44, 2006 |
|
| and , An attribute-based approach to audio description applied to segmenting vocal sections in popular music songs, in: Proceedings of the International Workshop on Multimedia Signal Processing (MMSP), pages 103-107, 2006 |
|
| , and , Content analysis for acoustic environment classification in mobile robots, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Fall Symposium, 2006 |
|
| and , Vector-based representation and clustering of audio using onomatopoeia words, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Fall Symposium, 2006 |
|
| and , Average divergence distance as a statistical discrimination measure for hidden Markov models (2006), in: IEEE Transactions on Audio, Speech, and Language Processing, 14:3(890-906) |
|
| , , , , , , , , , , , , , , , , and , Speech recognition engineering issues in speech to speech translation system design for low resource languages and domains, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006 |
|
| , , , , and , Text-independent voice conversion based on unit selection, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 81-84, 2006 |
|
| , and , Modeling emotion expression and perception behavior in auditive emotion evaluation, in: Proceedings of the International Conference on Speech Prosody, pages 9-12, 2006 |
|
| , , and , Analyzing children's speech: An acoustic study of consonants and consonant-vowel transition, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 393-396, 2006 |
|
| , and , Smooth GMM based multi-talker spectral conversion for spectrally degraded speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 141-144, 2006 |
|
| , , and , Not all errors are created equal: Pedagogical contextualization of language learner speech errors, in: Proceedings of the The Computer Assisted Language Instruction Consortium (CALICO), 2006 |
| , and , Speaker and listener variations in emotion assessment, in: Proceedings of the German Annual Meeting of Acoustics (DAGA), pages 335-336, 2006 |
|
| , and , Selecting relevant text subsets from web-data for building topic specific language models, in: Proceedings of the Human Language Technologies (HLT) Conference, pages 145-148, 2006 |
|
| , and , Efficient rotation invariant retrieval of shapes using dynamic time warping with applications in medical databases, in: Proceedings of the IEEE International Symposium on Computer-Based Medical Systems (CBMS), pages 673-678, 2006 |
|
| , and , Text data acquisition for domain-specific language models, in: Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pages 382-389, 2006 |
|
| , , and , An English-Persian automatic speech translator: Recent developments in domain portability and user modeling, in: Proceedings of the International Conference on Intelligent Systems and Computing (ISYC), 2006 |
|
| , , and , Robust recognition and assessment of non-native speech variability, in: Proceedings of the International Conference on Intelligent Systems And Computing (ISYC), 2006 |
|
| and , Upper bound Kullback-Leibler divergence for hidden Markov models with application as discrimination measure for speech recognition, in: Proceedings of the IEEE International Symposium on Information Theory (ISIT), pages 2299-2303, 2006 |
|
| , , and , Where am I? Scene recognition for mobile robots using audio features, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 885-888, 2006 |
|
| , and , Acoustic-syntactic maximum entropy model for automatic prosody labeling, in: Proceedings of the IEEE/ACL 2006 Workshop on Spoken Language Technology, pages 74-77, 2006 |
|
| and , Interplay between linguistic and affective goals in facial expression during emotional utterances, in: Proceedings of the International Seminar on Speech Production (ISSP), pages 549-556, 2006 |
|
| , , , , and , Semi-automatic processing of real-time MR image sequences for speech production studies, in: Proceedings of the International Seminar on Speech Production (ISSP), pages 427-434, 2006 |
|
| , and , An exploratory study of emotional speech production using functional data analysis techniques, in: Proceedings of the International Seminar on Speech Production (ISSP), pages 11-17, 2006 |
|
| , and , Efficient scalable encoding for distributed speech recognition (2006), in: Speech Communication, 48:8(888-902) |
|
| , and , Pathological voice assessment, in: Proceedings of the IEEE Engineering in Medicine and Biology Society (EMBS) Annual International Conference, 2006 |
|
| and , Detection of non-native named entities using prosodic features for improved speech recognition and translation, in: Proceedings of the International Speech Communication Association (ISCA) Multiling Workshop, 2006 |
|
2005
| and , Unsupervised speaker indexing using generic models (2005), in: IEEE Transactions on Speech and Audio Processing, 13:5(1004-1013) |
|
| , , , , , , , , , , , , , and , Virtual humans for non-team interaction training, in: Proceedings of the SIGdial Workshop, 2005 |
|
| and , Distributed range difference based target localization in sensor network, in: Proceedings of the Asilomar Conference on Signals, Systems and Computers, pages 205-209, 2005 |
|
| and , Piecewise linear stylization of pitch via wavelet analysis, in: Proceedings of InterSpeech, pages 3277–3280, 2005 |
|
| , and , Building topic specific language models from webdata using competitive models, in: Proceedings of InterSpeech, pages 1293-1296, 2005 |
|
| , , , and , Detecting politeness and frustration state of a child in a conversational computer game, in: Proceedings of InterSpeech, pages 2209-2212, 2005 |
|
| , , and , An articulatory study of emotional speech production, in: Proceedings of InterSpeech, pages 497-500, 2005 |
|
| , , and , Modeling and automating detection of errors in Arabic language learner speech, in: Proceedings of InterSpeech, pages 177-180, 2005 |
|
| , , , , , and , Investigating the role of phoneme-level modifications in emotional speech resynthesis, in: Proceedings of InterSpeech, pages 801-804, 2005 |
|
| , , and , Pronunciation variations of Spanish-accented English spoken by young children, in: Proceedings of InterSpeech, pages 749–752, 2005 |
|
| , , , , , , , , and , TBALL data collection: The making of a young children's speech corpus, in: Proceedings of InterSpeech, pages 1581-1584, 2005 |
|
| , , and , Natural head motion synthesis driven by acoustic prosodic features, in: Proceedings of the Computer Animation and Social Agents, 2005 |
| and , Hidden-articulator Markov models for pronunciation evaluation, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 174-179, 2005 |
|
| , , , and , Creating data resources for designing usercentric frontends for query-by-humming systems (2005), in: ACM Multimedia Systems Journal, Special Issue on Music Information Retrieval, 10:6(475-483) |
|
| , and , Adaptive categorical understanding for spoken dialog systems (2005), in: IEEE Transactions on Speech and Audio Processing, 13:3(321-329) |
|
| and , Toward detecting emotions in spoken dialogs (2005), in: IEEE Transactions on Speech and Audio Processing, 13:2(293-303) |
|
| , and , Multichannel audio synthesis by subband-based spectral conversion and parameter adaptation (2005), in: IEEE Transactions on Speech and Audio Processing, 13:2(263-274) |
|
| and , Speech rate estimation via temporal correlation and selected sub-band correlation, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 413-416, 2005 |
|
| and , An unsupervised quantitative measure for word prominence in spontaneous speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 377-380, 2005 |
|
| and , An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 269-272, 2005 |
|
| and , Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 937-940, 2005 |
|
| , , , , , , and , Smart room: Participant and speaker localization and identification, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1117-1120, 2005 |
|
| , , , , , , and , Transonics: A practical speech-to-speech translator for English-Farsi medical dialogues, in: Proceedings of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL), pages 89-92, 2005 |
|
| , , and , Natural head motion synthesis driven by acoustic prosodic features (2005), in: Journal of Computer Animation and Virtual Worlds, 16:3-4(283-290) |
|
| , and , Automatic diacritization of Arabic transcripts for automatic speech recognition, in: Proceedings of the International Conference on Natural Language Processing (ICON), 2005 |
|
| , and , Towards parameter-free classification of sound effects in movies, in: Proceedings of the SPIE Optics and Photonics Symposium, 2005 |
|
2004
| and , Measuring convergence in language model estimation using relative entropy, in: Proceedings of InterSpeech, pages 1057-1060, 2004 |
|
| , , , , , , and , Emotion recognition based on phoneme classes, in: Proceedings of InterSpeech, pages 889-892, 2004 |
|
| and , Speaker model quantization for unsupervised speaker indexing, in: Proceedings of InterSpeech, pages 1517-1520, 2004 |
|
| , and , Context dependent statistical augmentation of Persian transcripts, in: Proceedings of InterSpeech, pages 853-856, 2004 |
|
| and , A statistical discrimination measure for hidden Markov models based on divergence, in: Proceedings of InterSpeech, pages 657-660, 2004 |
|
| , and , Robust speech recognition over packet networks: An overview, in: Proceedings of InterSpeech, pages 621-624, 2004 |
|
| , , , , and , Constructing emotional speech synthesizers with limited speech database, in: Proceedings of InterSpeech, pages 1185-1188, 2004 |
|
| , , , , , , and , An acoustic study of emotions expressed in speech, in: Proceedings of InterSpeech, pages 2193-2196, 2004 |
|
| , , and , Reference marking in children's computer-directed speech: An integrated analysis of discourse and gesture, in: Proceedings of InterSpeech, pages 1841-1844, 2004 |
|
| , and , A distributed speech recognition system in multi-user environments, in: Proceedings of InterSpeech, pages 2121-2124, 2004 |
|
| , , , , , , , and , Analysis of emotion recognition using facial expressions, speech and multimodal information, in: Proceedings of the International Conference on Multimodal Interfaces, pages 205-211, 2004 |
|
| , and , A statistical approach to retrieval under user-dependent uncertainty in query-by-humming systems, in: Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR), pages 113-118, 2004 |
|
| , , and , Audio-based head motion synthesis for avatar-based telepresence systems, in: Proceedings of the ACM SIGMM Effective Telepresence Workshop (ETP), pages 24-30, ACM Press, 2004 |
|
| , , , , , , , , , , , , , and , The transonics spoken dialogue translator: An aid for English-Persian doctor-patient interviews, in: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) Fall Symposium, pages 97-103, 2004 |
|
| , and , Adaptive speaker identification with audiovisual cues for movie content analysis (2004), in: Pattern Recognition Letters, 25:7(777-791) |
|
| , , , and , Creation of a doctor-patient dialogue corpus using standardized patients, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2004 |
|
| , and , Speaker identification using supra-segmental pitch pattern dynamics, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 89-92, 2004 |
|
| and , A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 525-528, 2004 |
|
| , and , Enhanced standard compliant distributed speech recognition (AURORA encoder) using rate allocation, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 485-488, 2004 |
|
| , , , , , , and , Tactical language training system: An interim report, in: Proceedings of the Conference on Intelligent Tutoring Systems (ITS), pages 336-345, 2004 |
|
| , , , , , and , Analyzing the interplay between spoken language and gestural cues in conversational child-machine interactions in pre/early literate age group, in: Proceedings of the InSTIL/ICALL Symposium, 2004 |
|
| , , , and , Tactical language detection and modeling of learner speech errors: The case of Arabic tactical language training for American English speakers, in: Proceedings of the InSTIL/ICALL Symposium, 2004 |
|
| , , , , and , Tactical language training system: Supporting the rapid acquisition of foreign language and cultural skills, in: Proceedings of the InSTIL/ICALL Symposium, 2004 |
|
| , , and , Automatic dynamic expression synthesis for speech animation, in: Proceedings of the IEEE Computer Animation and Social Agents (CASA), pages 267-274, IEEE Press, 2004 |
[URL] |
| , , , , and , Using cognitive task analysis to facilitate collaboration in development of simulator to accelerate surgical training, in: Proceedings of the Annual Medicine Meets Virtual Reality (MMVR) Conference, pages 114-120, 2004 |
|
| , and , Content-based movie analysis and indexing based on audiovisual cues (2004), in: IEEE Transactions on Circuits and Systems for Video Technology, 14:8(1073-1085) |
|
| , and , A transcription scheme for languages employing the Arabic script motivated by speech processing applications, in: Proceedings of the International Conference on Computational Linguistics, 2004 |
|
| , , , and , An approach to real-time magnetic resonance imaging for speech production (2004), in: Journal of the Acoustical Society of America, 115:4(1771-1776) |
|
| and , Text to speech synthesis: New paradigms and advances, Prentice Hall, 2004 |
| , and , Synthesizing expressive speech: Overview, challenges and open questions, in: Text to speech synthesis: New paradigms and advances, Prentice Hall, 2004 |
2003
| , and , Virtual microphones for multichannel audio resynthesis (2003), in: EURASIP Journal on Applied Signal Processing, 10:1(968-979) |
|
| and , Emotion recognition using a data-driven fuzzy inference system, in: Proceedings of InterSpeech, pages 157-160, 2003 |
|
| and , Language-adaptive Persian speech recognition, in: Proceedings of InterSpeech, 2003 |
|
| , and , Towards optimal encoding for classification with applications to distributed speech recognition, in: Proceedings of InterSpeech, 2003 |
|
| and , A method for on-line speaker indexing using generic reference models, in: Proceedings of InterSpeech, pages 2653-2656, 2003 |
|
| and , An empirical text transformation method for spontaneous speech synthesizers, in: Proceedings of InterSpeech, pages 1221–1224, 2003 |
|
| and , Robust recognition of children’s speech (2003), in: IEEE Transactions on Speech and Audio Processing, 11:6(603-616) |
|
| , , , and , Creating data resources for designing user-centric front-ends for query by humming systems, in: Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR), pages 475-483, 2003 |
|
| , and , A statistical multidimensional humming transcription using phone level hidden Markov models for query by humming systems, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 61-64, 2003 |
|
| , and , Acoustic correlates of user response to error in human-computer dialogues, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 215-220, 2003 |
|
| and , A study of generic models for unsupervised on-line speaker indexing, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 423-428, 2003 |
|
| , and , Improvements in English ASR for the Malach project using syllable-centric models, in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (ASRU), pages 129-134, 2003 |
|
| , and , ASCII based transcription systems with the Arabic script: The case of Persian, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2003 |
|
| , , , , , , , , , , , , and , Transonics: A speech to speech system for English-Persian interactions, in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (ASRU), 2003 |
|
| , , and , Acoustic analysis of preschool children's speech, in: Proceedings of the International Congresses of Phonetic Sciences (ICPhS), 2003 |
|
| and , Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 772-775, 2003 |
|
| and , An information-theoretic analysis of developmental changes in speech, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages I-480-I-483, 2003 |
|
| , and , Audiovisual-based adaptive speaker identification, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 565-568, 2003 |
|
| , and , Multidimensional humming transcription using a statistical approach for query by humming systems, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 385-388, 2003 |
|
| , and , Movie content analysis, indexing and skimming via multimodal information, in: Video Mining, pages 1-33, Kluwer Academic, 2003 |
|
2002
| and , Spoken language synthesis: Experiments in synthesis of spontaneous monologues, in: Proceedings of the IEEE Speech Synthesis Workshop, pages 203-206, 2002 |
|
| , , , and , Limited domain synthesis of expressive military speech for animated characters, in: Proceedings of the IEEE Speech Synthesis Workshop, 2002 |
|
| , and , A syllable based approach for improved recognition of spoken names, in: Proceedings of the International Speech Communication Association (ISCA) Pronunciation Modeling and Lexicon Adaptation Workshop, pages 1-4, 2002 |
|
| , and , Expressive speech synthesis using a concatenative synthesizer, in: Proceedings of InterSpeech, pages 1265-1268, 2002 |
|
| and , Refined speech segmentation for concatenative synthesis, in: Proceedings of InterSpeech, 2002 |
|
| and , Speaker change detection using a new weighted distance measure, in: Proceedings of InterSpeech, pages 16-20, 2002 |
|
| , and , Combining acoustic and language information for emotion recognition, in: Proceedings of InterSpeech, 2002 |
|
| , , , and , Analysis of user behavior under error conditions in spoken dialogs, in: Proceedings of InterSpeech, pages 2069-2072, 2002 |
|
| , and , Efficient multichannel audio resynthesis by subband-based spectral conversion, in: Proceedings of the European Signal Processing Conference (EUSIPCO), pages 413-416, 2002 |
|
| , and , Gaussian mixture model based methods for virtual microphone signal synthesis, in: Proceedings of the Audio Engineering Society (AES) Convention, 2002 |
|
| , and , Feature analysis for automatic detection of pathological speech, in: Proceedings of the IEEE Engineering in Medicine and Biology Society (EMBS) Meeting, pages 182-183, 2002 |
|
| and , Distribution detection and tracking in sensor networks, in: Proceedings of the Asilomar Conference on Signals, Systems and Computers, pages 1174-1178, 2002 |
|
| and , A confidence-score based unsupervised MAP adaptation for speech recognition, in: Proceedings of the Asilomar Conference on Signals, Systems and Computers, pages 222-226, 2002 |
|
| and , A system for automatic recognition of pathological speech, in: Proceedings of the Asilomar Conference on Signals, Systems and Computers, 2002 |
|
| , and , Maximum likelihood constrained adaptation for multichannel audio synthesis, in: Proceedings of the Asilomar Conference on Signals, Systems and Computers, pages 227-232, 2002 |
|
| , and , Identification of speakers in movie dialogs using audiovisual cues, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 2093-2096, 2002 |
|
| , and , A statistical approach to humming recognition, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages IV-4175, 2002 |
|
| , and , Comparison of dictionary-based approaches to automatic repeating melody extraction, in: Proceedings of Electronic Imaging (EI) Conference, pages 306-317, 2002 |
|
| and , Creating conversational interfaces for children (2002), in: IEEE Transactions on Speech and Audio Processing, 10:2(65-78) |
|
| , Towards modeling user behavior in human-machine interactions: Effect of errors and emotions, in: Proceedings of the ISLE Workshop on Multimodal Dialog Tagging, 2002 |
|
| , and , Collaborative classification applications in sensor networks, in: Proceedings of the IEEE Sensor Array and Multichannel Signal Processing (SAM) Workshop, pages 370-374, 2002 |
|
| , and , Classifying emotions in human-machine spoken dialogs, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 737-740, 2002 |
|
| , and , An HMM-based approach to humming transcription, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 337-340, 2002 |
|
| , and , Multiresolution spectral conversion for multichannel audio resynthesis, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 273- 276, 2002 |
|
2001
| , and , Efficient scalable speech compression for scalable speech recognition, in: Proceedings of InterSpeech, pages 1845-1848, 2001 |
|
| , , , and , Politeness and frustration language in child-machine interactions, in: Proceedings of InterSpeech, pages 2675-2678, 2001 |
|
| , , , , , , , , , , , , , , , , , and , DARPA communicator dialog travel planning systems: The June 2000 data collection, in: Proceedings of InterSpeech, pages 1371-1374, 2001 |
|
| , and , Automatic main melody extraction from MIDI files with a modified Lempel-Ziv Algorithm, in: Proceedings of the International Symposium on Intelligent Multimedia, Video and Speech Processing, pages 9-12, 2001 |
|
| , , , and , On the implementation of ASR algorithms for hand-held wireless mobile devices, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 17-20, 2001 |
|
| , , , and , Amount of information presented in a complex list: Effects on user performance, in: Proceedings of the Human Language Technology (HLT) Conference, pages 1-6, 2001 |
|
| , , , and , Just (all) the facts, ma'am, in: Proceedings of the ACM Conference on Computer-Human Interaction (CHI), pages 133-134, 2001 |
|
| , and , Recognition of negative emotions from the speech signal, in: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 240-243, 2001 |
|
| , and , Use of model transformations for distributed speech recognition, in: Proceedings of the International Speech Communication Association (ISCA) Workshop on Adaptation Methods for Speech Recognition, pages 113-116, 2001 |
|
| , and , Music indexing with extracted main melody by using modified Lempel-Ziv algorithm, in: Proceedings of the International Symposium on The Convergence of Information Technologies and Communications (ITCom), pages 124-135, 2001 |
|
| , , and , Automatic movie index generation based on multimodal information, in: Proceedings of the International Symposium on The Convergence of Information Technologies and Communications (ITCom), pages 42-53, 2001 |
|
| , and , A dictionary approach to repetitive pattern finding in music, in: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), pages 281-284, 2001 |
|
2000
| , , , , , and , Effects of dialog initiative and multi-modal presentation strategies on large directory information access, in: Proceedings of InterSpeech, pages 636-639, 2000 |
|
| , , , , , , , , , , and , The AT&T-DARPA communicator mixed-initiative spoken dialog system, in: Proceedings of InterSpeech, pages 122-125, 2000 |
|
| , , , , , , and , A spoken dialog system for conference/workshop services, in: Proceedings of InterSpeech, pages 736-739, 2000 |
| and , Web-based monitoring, logging and reporting tools for multiservice, multimodal systems, in: Proceedings of InterSpeech, pages 1041-1044, 2000 |
|
| and , Noise source models for fricative consonants (2000), in: IEEE Transactions on Speech and Audio Processing, 8:2(328-344) |
|
| , , and , Phrasal signatures in articulation, chapter 5, pages 70-87, Cambridge University Press, 2000 |
|
| , , , and , Automatic speech recognition for mobile communication devices, in: Proceedings of the IEEE Nordic Signal Processing Symposium (NORSIG), 2000 |
|
| , , , and , Acoustic modeling of American English /r/ (2000), in: Journal of the Acoustical Society of America, 108:1(343-356) |
|
| , , , , , , and , Unifying conversational multimedia interfaces for accessing network services across communication devices, in: Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), pages 1-4, 2000 |
|
1999
| , and , Multimodal systems for children: Building a prototype, in: Proceedings of InterSpeech, pages 1727-1730, 1999 |
|
| , and , Categorical understanding using statistical N-gram models, in: Proceedings of InterSpeech, pages 2027-2030, 1999 |
|
| , and , Geometry, kinematics, and acoustics of Tamil liquid consonants (1999), in: Journal of the Acoustical Society of America, 106:4(1993-2007) |
|
| , and , Acoustics of children’s speech: Developmental changes of temporal and spectral parameters (1999), in: Journal of the Acoustical Society of America, 105:3(1455-1468) |
|
| , , and , Speech production and perception models and their applications to synthesis, recognition, and coding, in: Speech Processing, Recognition, and Artificial Neural Networks, pages 138-161, Springer-Verlag, 1999 |
|
| , , and , Extending computer telephony and IP telephony standards for voice-enabled services in a multi-modal user interface environment, in: Proceedings of Interactive Dialogue in Multi-Modal Systems (IDS), pages 9-12, 1999 |
| , , , and , Spoken dialog systems: From theory to practice, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, 1999 |
|
| and , Acoustic modeling of Tamil retroflex liquids, in: Proceedings of the International Congresses of Phonetic Sciences (ICPhS), pages 2097-2100, 1999 |
|
1998
| , and , Language model adaptation for spoken language systems, in: Proceedings of InterSpeech, pages 2327-2330, 1998 |
|
| , , , , , , , , , , , and , VPQ: A spoken language interface to large scale directory information, in: Proceedings of InterSpeech, pages 2863-2867, 1998 |
|
| , , , and , Probing the relationship between qualitative and quantitative performance measures for telecommunication services, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 3769-3772, 1998 |
| and , Spoken dialog systems for children, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 197-200, 1998 |
|
| , and , Learning optimal dialogue strategies: A case study of a spoken dialogue agent for email, in: Proceedings of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL), pages 1345-1351, 1998 |
|
1997
| , and , Analysis of children's speech: Duration, pitch and formants, in: Proceedings of InterSpeech, pages 473-476, 1997 |
| , and , Automatic speech recognition for children, in: Proceedings of InterSpeech, pages 2371-2374, 1997 |
| and , Novel filler acoustic models for connected digit recognition, in: Proceedings of InterSpeech, pages 283-286, 1997 |
| , and , Unsupervised HMM adaptation based on speech-silence discrimination, in: Proceedings of InterSpeech, pages 2055-2088, 1997 |
| , , and , Evaluating spoken dialog systems for telecommunication services, in: Proceedings of InterSpeech, pages 2203-2206, 1997 |
| , and , Database management and analysis for spoken dialog systems: Methodology and tools, in: Proceedings of InterSpeech, pages 2199-2202, 1997 |
| , and , New results in vowel production: MRI, EPG, and acoustic data, in: Proceedings of InterSpeech, pages 1007-1010, 1997 |
| , , and , Acoustic modelling of American English /r/, in: Proceedings of InterSpeech, pages 393-396, 1997 |
| , , , and , The relationship between qualitative and quantitative service performance measures: Results from universal voiceline trial, in: Proceedings of the Service Infrastructure Performance Symposium, pages 162-169, 1997 |
| , , and , Phrasal boundaries and articulatory timing, in: Proceedings of the Meeting on Laboratory Phonology, 1997 |
| , and , Toward articulatory-acoustic models for liquid consonants based on MRI and EPG data. Part I: The laterals (1997), in: Journal of the Acoustical Society of America, 101:2(1064-1077) |
[URL] |
| , and , Toward articulatory-acoustic models for liquid consonants based on MRI and EPG data. Part II: The rhotics (1997), in: Journal of the Acoustical Society of America, 101:2(1078-1089) |
[URL] |
1996
| , , , and , Liquids in Tamil, in: Proceedings of InterSpeech, pages 797-800, 1996 |
|
| , and , From MRI and acoustic data to articulatory synthesis: A case study of the lateral approximants in American English, in: Proceedings of InterSpeech, pages 793-796, 1996 |
|
| and , Parametric hybrid source models for voiced and voiceless fricative consonants, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 377-380, 1996 |
|
| , and , Prosodic boundary effects in Tamil: An articulatory study, in: Proceedings of the Annual Meeting of the Linguistic Society of America, 1996 |
| and , Imaging applications in speech production research, in: Proceedings of the Society of Photographic Instrumentation Engineers (SPIE) Medical Imaging, pages 120-131, 1996 |
1995
| , and , An articulatory study of fricative consonants using magnetic resonance imaging (1995), in: Journal of the Acoustical Society of America, 98:3(1325-1347) |
|
| , , and , Speech production and perception models and their applications to synthesis, recognition, and coding, in: Proceedings of the International Symposium on Signals, Systems, and Electronics (ISSSE), pages 367-372, 1995 |
|
| , and , An articulatory study of liquid approximants in American English, in: Proceedings of the International Congress of Phonetic Sciences (ICPhS), pages 576-579, 1995 |
| and , A nonlinear dynamical systems analysis of fricative consonants (1995), in: Journal of the Acoustical Society of America, 97:4(2511-2524) |
1994
| , and , An MRI study of fricative consonants, in: Proceedings of InterSpeech, pages 627-630, 1994 |
| , , and , Fast and efficient motion compensation techniques using subband analysis, in: Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 265-269, 1994 |
1993
| and , Strange attractors and chaotic dynamics in the production of voiced and voiceless fricatives, in: Proceedings of InterSpeech, pages 77-80, 1993 |
| and , Loading effects on Indian musical drums: An acoustic analysis, in: Proceedings of the Material Research Society, 1993 |
1991
| and , Nonlinear filtering and smoothing for noisy alternating renewal process signals, in: Proceedings of the IEEE American Control Conference, pages 225-228, 1991 |