CSE Colloquium: Squirrel or Skunk? NLP Models Face the Long Tail of Language
Abstract: Natural language is chock-full of rare events, which can challenge NLP models based on supervised learning. Dr. Schneider will present advances in modeling and evaluation of “long tail” phenomena in grammar and meaning: retrieving rare senses of words; tagging words with complex syntactic categories (TACL 2021); and calibrating model confidence scores for sparse tagsets.
Bio: Dr. Nathan Schneider is an annotation schemer and computational modeler for natural language. As Assistant Professor of Linguistics and Computer Science at Georgetown University, he looks for synergies between practical language technologies and the scientific study of language. He specializes in broad-coverage semantic analysis: designing linguistic meaning representations, annotating them in corpora, and automating them with statistical natural language processing techniques. A central focus in this research is the nexus between grammar and lexicon as manifested in multiword expressions and adpositions/case markers. He has inhabited UC Berkeley (BA in Computer Science and Linguistics), Carnegie Mellon University (Ph.D. in Language Technologies), and the University of Edinburgh (postdoc). Now a Hoya and leader of NERT, he continues to play with data and algorithms for linguistic meaning.
Event Contact: Timmy Zhu