|
![]() |
| home || / resumé | classes || story || pronto |
|
I
am in blood
Stepp'd in so far that, should I wade no more, Returning were as tedious as go o'er. --Shakespeare, Macbeth, Act III, Scene IV |
||
|
My
research concerns the weakly supervised learning of morphological
structure, with a focus on the rapid development of effective models
for resource-poor languages and domains. The core of this work deals with the requirements for a priori knowledge of a morphological system -- both in extent and specificity -- to constrain the otherwise unsupervised learning of statistical models and produce an accurate analyzer. Of particular interest is which types of phenomena are (and are not!) amenable to this approach, and how much model structure and learning technique may compensate for such difficulties. Aside from the challenges of morphological analysis itself, I am pursuing its further application to other natural language processing tasks, especially machine translation and large-corpus lexical semantics. While, in such practical settings, morphology is viewed primarily as a balm to soothe the blight of data sparsity, I am particularly interested in how it serves as another channel to communicate a language's structural information, alternate to and in concert with (syntactic) order. The morphological palette of human language is well known to express itself almost entirely within the finite-state machine, and the common use of the transducer in computational morphology leads naturally to the generalization of morphological analysis as text transformation. Pushing in this direction, I am interested in the extension of the above models to such problems as the study of language change and the normalization of dialects to a standard form. Finally, in theory and far beyond the scope of my present endeavors to graduate, I would love to investigate and produce evidence for my convictions that: (1) the structure of human language is much simpler than it seems, but (2) much of the surface signal is cut and/or inconsistently simplified to compress communication, and (3) we humans are still able to understand because our linguistic centers, like the rest of our brain, are essentially incredible pattern recognizers, operating over space and time, and, recursively, over patterns themselves. From these assumptions comes the practical approach that (4) we can build effective language processing systems by simplifying the core model under knowledge-based constraints and modeling separately the noisy transformations that give rise to surface clutter. While this philosophy is perhaps less overt in my present research and less of a strong claim when applied to the relatively simple problem of morphology, I fully believe it feasible in the more perilous realms of syntax (see Harris' operator grammar for an extensive theoretical development) and even semantics (e.g. the principle of abduction). Maybe some day I'll prove it! O good peoples of the word, machine and natural, let us then prefer minimalism to the indulgences of the baroque. |
||
|
|
||