MIT Technology Review: This Is Where the #Data to Build #AI Comes From
New findings show how the sources of data are concentrating power in the hands of the most powerful tech companies.
“AI is all about data. Reams and reams of data are needed to train algorithms to do what we want, and what goes into the AI models determines what comes out. But here’s the problem: AI developers and researchers don’t really know much about the sources of the data they are using. AI’s data collection practices are immature compared with the sophistication of AI model development. Massive data sets often lack clear information about what is in them and where it came from.
The Data Provenance Initiative, a group of over 50 researchers from both academia and industry, wanted to fix that. They wanted to know, very simply: Where does the data to build AI come from? They audited nearly 4,000 public data sets spanning over 600 languages, 67 countries, and three decades. The data came from 800 unique sources and nearly 700 organizations.
Their findings, shared exclusively with MIT Technology Review, show a worrying trend: AI's data practices risk concentrating power overwhelmingly in the hands of a few dominant technology companies. […]
The Western focus of these data sets becomes particularly clear with multimodal models. When an AI model is prompted for the sights and sounds of a wedding, for example, it might only be able to represent Western weddings, because that’s all that it has been trained on, Hooker says.
This reinforces biases and could lead to AI models that push a certain US-centric worldview, erasing other languages and cultures.”
By Melissa Heikkilä, Stephanie Arnett
https://lnkd.in/eGKWB3WT
#algorithms #AI #ArtificialIntelligence #LLMs #regulations #intellectualproperty #art #artists #creators #justice #equality #bias #health #socialmedia #media #productivity #labor #bigtech #startups #technology #datascience #privacy #security #journalism #democracy #humanity
Absolutely spot on! Benoit B.