Engineering researchers at the University of Waterloo have unearthed inherent gender and age biases buried in a popular image dataset used to train artificial intelligence (AI) systems around the world.
The discovery will help researchers find ways to rebalance the data so it better reflects demographic diversity, ultimately paving the way for more accurate AI models.
They found some striking imbalances. People over 60-years-old represented less than two per cent of the data set. Females accounted for 32 per cent. The largest subgroup was males 15 to 29.
“Some people feel AI is algorithmic, objective and free of human bias,” said Wong. “But AI machine learning needs to learn from data, and data is created by humans and has all the biases humans have.
“Without a conscious inclusion of diversity in the data collection process, undesirable biases like these can collect and propagate to AI systems using that data, which is concerning.”
Dulhanty, whose master’s research focus is on ethical and responsible AI, notes the findings are preliminary as the AI tools used for auditing may themselves contain some demographic bias.
“There is work ahead in developing fair demographic annotation tools that will ensure dataset audits show a more complete picture and bring better transparency and accountability to the table.”
Dulhanty recently presented his research paper at the annual Conference on Computer Vision and Pattern Recognition (CVPR) 2019 Expo in Long Beach, California.
In July 2019, Wong was invited to join the ABOUT ML steering committee, a multi-year initiative to drive best practices for transparency in machine learning organized by the Partnership on AI.