Their new neural network model, which they have dubbed AfriBERTa, is based on BERT — Bidirectional Encoder Representations from Transformers — a deep learning technique for natural language processing developed in 2018 by Google.
“Pretrained language models have transformed the way computers process and analyze textual data for tasks ranging from machine translation to question answering,” said Kelechi Ogueji, a master’s student in computer science at Waterloo. “Sadly, African languages have received little attention from the research community.”
“One of the challenges is that neural networks are bewilderingly text- and computer-intensive to build. And unlike English, which has enormous quantities of available text, most of the 7,000 or so languages spoken worldwide can be characterized as low-resource, in that there is a lack of data available to feed data-hungry neural networks.”
Ogueji works with a team supervised by Cheriton Chair Jimmy Lin. Learn more about the research project in the feature article on the computer science website.