Please note: This PhD seminar will take place online.
William Loh, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Pascal Poupart
Dealing with tabular data is challenging due to partial information, noise, and heterogeneous structure. Existing techniques often struggle to simultaneously address key aspects of tabular data such as free-form text, a variable number of columns, and unseen data without metadata besides column names. We propose a novel architecture, basis transformers, specifically designed to tackle these challenges while respecting inherent invariances in tabular data, including hierarchical structure and the representation of numeric values. We evaluate our design on a multi-task tabular regression benchmark, achieving an improvement of 18.7% lower scaled RMSE compared to the next best model across 34 tasks from the OpenML-CTR23 benchmark. In related task evaluations, when trained on 18 related but distinct tables, we outperform XGBoost by 8.7%, leveraging learned information from related tasks.