Heng Ji, Rensselaer Polytechnic Institute
The goal of Information Extraction (IE) is to extract structured facts from a wide spectrum of heterogeneous unstructured data types including texts, speech, images and videos. Traditional IE techniques are limited to a certain source X (X = a particular language, domain, limited number of pre-defined fact types, single data modality...). When we move from X to a new source Y, we need to start from scratch again by annotating a substantial amount of training data and developing Y specific extraction capabilities.
We propose a new Universal Information Extraction (IE) paradigm to combine the merits of traditional IE (high quality and fine granularity) and Open IE (high scalability). This framework aims to discover schemas and extract facts from any input corpus, without any annotated training data or predefined schema. It can also be extended to multiple data modalities (images, videos) and 282 languages by constructing a common semantic space and transfer learning across sources.
Heng Ji is Edward P. Hamilton Development Chair Professor in Computer Science Department of Rensselaer Polytechnic Institute. She received her Ph.D. in Computer Science from New York University. Her research interests focus on Natural Language Processing, especially on Information Extraction and Knowledge Base Population. She was selected as "Young Scientist" and a member of the Global Future Council on the Future of Computing by the World Economic Forum in 2016 and 2017.
She received "AI's 10 to Watch" Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009, Google Research Awards in 2009 and 2014, IBM Watson Faculty Award in 2012 and 2014, Bosch Research Awards in 2015 and 2016. She coordinated the NIST TAC Knowledge Base Population task since 2010. She is now serving as the Program Committee Co-Chair of NAACL2018.