Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
MC 6460 and Zoom (please email amgrad@uwaterloo.ca for the Zoom meeting link)
Yanming Kang | Applied Mathematics, University of Waterloo
Multi-level Transformer
Transformer models have become the most popular choice for NLP tasks. In general, transformer models with longer input sequences can achieve higher accuracy. However, due to the quadratic space complexity of dot-product attention, hardware constraints limit the maximum input length for transformers. There have been previous works that address the problem by applying fixed sparsity patterns on the attention matrix or using methods such as k-means clustering and local sensitive hashing. We present Multi-level Transformer, which uses a hierarchy of resolutions when computing the dot-product attention. In Multi-level Transformer, information is summarized using convolution to varying degrees depending on the distance between the input and output token. The multi-level attention has O(N log N) complexity in time and space. We found that compared to the transformer, Multi-level Transformer requires much less memory and is faster for longer inputs. Our preliminary results in language modeling on Wikitext-103 showed that Multi-level Transformer can achieve comparable perplexity (6% higher) to the transformer.
Contact Info
Department of Applied Mathematics
University of Waterloo
Waterloo, Ontario
Canada N2L 3G1
Phone: 519-888-4567, ext. 32700
Fax: 519-746-4319
PDF files require Adobe Acrobat Reader
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is centralized within our Office of Indigenous Relations.