Assessment and interventions based on direct observations of human social, communicative and affective behavior are central to many realms including mental and behavioral healthcare and management. In this research project our focus application is distressed couple and family interactions. The analyses that inform diagnosis and treatment here rely on audio-visual observations not just to transcribe who is saying and doing what but to infer meta behavior and complex interaction patterns. However, "in situ" human observation alone is often not able to get at all the requisite details, hence demanding manual behavioral coding using the audio-video recordings of interactions. Our vision is to transform the current practice of observational behavior analysis methodologies by enabling a computational framework for the analysis and modeling of emotionally-rich human interactions through signal processing and machine learning technologies. To exemplify the proposed quantitative analysis of complex behavior, we focus on the study of reactivity during dyadic marital and triadic family conflict.


Our work has resulted in a range of findings. For instance in our first work, we analyzed the longitudinal corpus of married couples spontaneously interacting about a problem in their relationship. Each spouse was manually coded with relevant session-level perceptual observations (e.g., level of blame toward other spouse, global positive affect), and our goal was to classify the spouses' behavior using features derived from the audio signal. Based on automatic segmentation, we extracted prosodic/spectral features to capture global acoustic properties for each spouse. We then trained gender-specific classifiers to predict the behavior of each spouse for six codes. The classifiers had a performance of the order of 70-80% accuracy [for analytical results see M. Black et all 2010]. In a parallel work we we quantified one aspect of interaction synchrony: prosodic entrainment, more specifically pitch and energy by analyzing the same corpus (1) described above. Statistical testings demonstrate that some of these measures capture useful information; they show higher values in interactions with couples having high positive attitude compared to high negative attitude. Further, by using quantized entrainment measures employed with statistical symbol sequence matching in a maximum likelihood framework, we obtained 76% accuracy in predicting positive affect vs. negative affect.

Similar analysis is in the works using lexical feature. Behavioral classification using oracle transcripts performs extremely well (in the order of 80-90%) and early results show that performance remains significantly high when analyzing automatically derived lexical descriptors.

Work using the multimodal data set has also resulted in exciting findings. Our collected data has so far been annotated for approach-and- avoidance (AA) behavior in human dyadic interactions. We have provided algorithmic contributions towards automated quantification of this approach-and- avoidance using visual (motion capture) and audio based features. We proposed a novel ordinal regression algorithm by transforming the ordinal regression to multiple binary classification problems and combining with a cumulative logit logistic regression model with proportional odds (CLLRMP). An additional time series extension using a Hidden Markov Model on top of the short term classification. Overall we estimate the approach-avoidance label with ~75% agreement with the expert annotation.