Monday, December 12, 2022 10:00 am

10:00 am
EST (GMT 05:00)
MC 5417 and Zoom (please email amgrad@uwaterloo.ca for the meeting link)
Candidate
Yanming Kang  Applied Mathematics, University of Waterloo
Title
Multilevel Transformer
Abstract
Transformer based models have shown strong performance on natural language tasks. However, the quadratic complexity of the selfattention operation limits the maximum input length can be handled. Our proposed model reduces the computational complexity to $O(n\log{n})$ by grouping tokens according to their distance to the target, and summarizing them using strided convolution. In this presentation I will review prior work focusing on the efficiency of Transformers, describe our method, and introduce some preliminary results.