DSG Seminar Series • Cross-Domain Text-to-SQL Semantic Parsing

Monday, September 20, 2021 10:30 am - 10:30 am EDT (GMT -04:00)

Speaker: Yanshuai Cao, Borealis.AI

(Talk virtually over zoom -- note: talk will be recorded)

Abstract: 

Large-scale pre-training has enabled many NLP applications via transfer learning. However, many studies have shown that current deep learning models often rely on superficial cues and dataset biases to achieve seemingly high performance on a given dataset without proper understanding. This talk will discuss the challenges of cross-domain text-to-SQL semantic parsing and how it can be a test-bed for learning to reason in the real world. I will review recent advances in this field, including some of our work tackling the scarce data aspect of this problem. In particular, I will discuss how models encode prior knowledge about this problem's structures; how to train deep transformers on small datasets; and how to perform data augmentation when minor changes could alter the semantics. I will also showcase Turing, the natural language database interface demo built from our cross-domain text-to-SQL semantic parser.

Bio: Yanshuai Cao is a Senior Research Lead at Borealis AI, conducting R&D and building products for RBC.  His research spans natural language processing, generative models, and adversarial machine learning. Yanshuai received his Ph.D. from the University of Toronto under supervision of David J. Fleet and Aaron Hertzmann.

Talk video