Facebook’s new open-source dataset could help make AI less biased

April 9, 2021

The dataset comprises 45,186 videos of just over 3,000 participants having a non-scripted chat, and has an even distribution of different genders, age groups and skin types.

Image: Facebook AI

Facebook has created and labeled a new open-source video dataset, which the social media giant hopes will do a better job at removing bias when testing the performance of an AI system.

Dubbed “Casual Conversations,” the dataset comprises 45,186 videos of just over 3,000 participants having a non-scripted chat, and has an even distribution of different genders, age groups and skin tones.

Facebook asked paid actors to submit the videos and to provide age and gender labels themselves, to remove as much external error as possible in the way that the dataset is annotated. Facebook’s own team then identified different skin tones, based on the well-established Fitzpatrick scale, which includes six different types of skin types.

Artificial Intelligence

The annotators also labeled the level of lighting in each video, to help measure how AI models treat people with different skin tones under low-light ambient conditions.

“Casual Conversations” is now available for researchers to use to test computer vision and audio AI systems – although not to develop their algorithms, but rather to evaluate the performance of a trained system on different categories of people.

Testing is an integral part of the design of an AI system, and typically researchers measure their model against a labeled dataset after the algorithm has been trained to check how accurate the prediction is.

One issue with this approach is that when the dataset isn’t made of diverse enough data, the model’s accuracy will only be validated for a specific subgroup – which could mean that the algorithm will not work as well when faced with different types of data.

Those potential shortcomings are particularly striking in the case of an algorithm making predictions about people. Recent studies, for example, have shown that two of the common datasets used for facial analysis models, IJB-A and Adience, were overwhelmingly composed of lighter-skinned subjects (respectively 79.6% and 86.2%).

This is partly why the past years have been rife with examples of algorithms making biased decisions against certain groups of people. For instance, an MIT study that looked at the gender classification products offered by IBM, Microsoft and Face++, found that all classifiers performed better on male faces than female faces, and that better results were also obtained with lighter-skinned individuals.

Where some of the classifiers made almost no mistakes when identifying lighter male faces, found the researchers, the error rate for darker female faces climbed up to almost 35%.

It is critical, therefore, to verify that an algorithm is not only accurate, but also that it works equally among different categories of people. “Casual Conversations”, in this context, could help researchers evaluate their AI systems across a diverse set of age, genders, skin tones and lighting conditions, to identify groups for which their models could perform better.

“Our new Casual Conversations dataset should be used as a supplementary tool for measuring the fairness of computer vision and audio models, in addition to accuracy tests, for communities represented in the dataset,” said Facebook’s AI team.

In addition to evenly distributing the dataset between the four subgroups, the team also ensured that intersections within the categories were uniform. This means that, even if an AI system performs equally well across all age groups, it is possible to spot if the model underperforms for older women with darker skin in a low-light setting, for example.

Facebook used the new dataset to test the performance of the five algorithms that won the company’s Deefake Detection Challenge last year, which were developed to detect doctored media circulating online.

All of the winning algorithms struggled to identify fake videos of people specifically with darker skin tones, found the researchers, and the model that came up with the most balanced predictions across all subgroups was actually the third-place winner.

Although the dataset is already available for the open-source community to use, Facebook acknowledged that “Casual Conversations” comes with limitations. Only the choices of “male”, “female” and “other” were put forward to create gender labels, for example, which fails to represent people who identify as nonbinary.

“Over the next year or so, we’ll explore pathways to expand this data set to be even more inclusive, with representations that include a wider range of gender identities, ages, geographical locations, activities, and other characteristics,” said the company.

Facebook itself has experience of less than perfect algorithms, such as when its ad delivery algorithm resulted in women being shown less campaigns that were intended to be gender-neutral, for example STEM career ads.

The company said that Casual Conversations will now be available for all of its internal teams, and is “encouraging” staff to use the dataset for evaluation, while the AI team works on expanding the tool to represent more diverse groups of people.

By ZDNet Source Link

Facebook’s new open-source dataset could help make AI less biased

Artificial Intelligence

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Cisco’s 2025 Data Privacy Benchmark Study: Privacy landscape grows increasingly complex in the age of AI

FBI Warns of Scammers Impersonating the IC3

Growing focus on identity compromise by bad actors requires organizations to enforce zero trust principles and employ strong identity...

New Paubox Report Reveals 60% of Healthcare Orgs Admit Email Security Failure

Artificial Intelligence Fuels Rise of Hard-to-Detect Bots That Now Make up More Than Half of Global Internet Traffic

Most tested, most awarded: Kaspersky takes 97% of TOP3 places in independent industry tests

TECH NEWS

Large Language Models: Revolutionizing Artificial Intelligence

The Evolution of Smarter and More Connected Banking

AI for Ocean is not just a research theme, it’s a mission for the future of humanity”: Dr. V....

Most Demanding Technology Jobs in the Future

Researchers teach LLMs to solve complex planning challenges

Why Businesses Develop with Offshore Software Development Company

TOP NEWS

Colliers: About 80-85% of office leasing is expected in green-certified buildings

From Grammy-Award Winning Artist to Unicorn Founders: Meet the Young Global Leaders Class of 2025

US tariffs drive enterprise IT shift toward cloud and AI-driven digital transformation, finds GlobalData

AI Is Redefining Tech Infrastructure Priorities: Seagate Report Calls for Balance Between Cost and Carbon

New data highlights generational gap in readiness to manage inherited wealth

Global IT and Business Services Market Remained Resilient in Q1, Despite Heightened Economic Uncertainty: ISG Index

TECH NEWS & UPDATES

I tested Amazon’s latest soundbar system and it lives up to the hype. Here’s...

With Android 16, the Linux terminal gets all the space it needs

EA Announces Star Wars Zero Company, Sets 2026 Release on PC, PS5, Xbox Series...

The best antivirus software for Windows in 2025: Expert tested and reviewed

Instagram launches Edits video editor to rival TikTok’s CapCut

Facebook’s new open-source dataset could help make AI less biased

Artificial Intelligence

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES