AI Seminar: Cold-Start Universal Information Extraction

Monday, January 27, 2020 10:30 am - 10:30 am EST (GMT -05:00)

Lifu Huang, Department of Computer Science
University of Illinois at Urbana–Champaign

Who? What? When? Where? Why? are fundamental questions asked when gathering knowledge about and understanding a concept, topic, or event. The answers to these questions underpin the key information conveyed in the overwhelming majority, if not all, of language-based communication. Unfortunately, typical machine learning models and Information Extraction (IE) techniques heavily rely on human annotated data, which is usually very expensive and only available and compiled for very limited types or languages, rendering them incapable of dealing with information across various domains, languages, or other settings.

In this talk, I will introduce a new information extraction paradigm — Cold-Start Universal Information Extraction, which aims to create the next generation of information access where machines can automatically discover accurate, concise, and trustworthy information embedded in data of any form without requiring any human effort. Principally, my efforts along this line go towards three questions: (1) How can machines automatically discover the key information from texts without any pre-defined types or any human annotated data? (2) How can machines benefit from available resources, e.g., large-scale ontologies or existing human annotations? (3) How can information extraction approaches be extended to low-resource languages without any extra human effort? 

My research answers these questions with three key research innovations: a Liberal Information Extraction framework, which bottom-up discovers structured information and automatically induces a type schema, a Zero-shot IE approach, which reframes IE as a grounding problem instead of classification, and a multilingual common semantic space framework, which retains clustering structures in each language and enables IE to be feasible for thousands of languages. I will conclude my talk by showing what are the remaining challenges and discussing several future research directions.

Bio: Lifu Huang is a PhD candidate at the Computer Science Department of University of Illinois at Urbana-Champaign. He has a wide range of research interests in natural language processing and understanding. Specifically, his current research focuses on developing efficient information extraction approaches to automatically extract structured knowledge from any forms of data at little to no cost. He received his M.S. from Peking University in 2014 with the highest university honor and National Scholarship. He has served as the Program Committee member for many top NLP and AI venues including ACL, EMNLP, NAACL, AAAI, etc. He also received the fellowship from Allen Institute for Artificial Intelligence (AI2) in 2019.