logo

Wake up daily to our latest coverage of business done better, directly in your inbox.

logo

Get your weekly dose of analysis on rising corporate activism.

logo

The best of solutions journalism in the sustainability space, published monthly.

Select Newsletter

By signing up you agree to our privacy policy. You can opt out anytime.

Sarah Walkley headshot

People Are Working in 'Digital Sweatshops' to Train AI, But New EU Regulations Could Push Transparency

While AI brings new tools in the fight against modern slavery, it is built on exploitation that reinforces global inequality. The European Union's Corporate Sustainability Due Diligence Directive (CSDDD) could soon push companies to be more transparent about the people and systems behind their tech.
By Sarah Walkley
facebook content moderator discusses her experience tagging content for AI

Kauna Malgwi, a member of the African Content Moderators Union, shared her experience tagging content for Facebook at the Digital Africa Rising conference in Kenya last fall, detailing the profound challenges of isolation, exploitation, and mental health struggles that come with the job. (Image: PCC/Flickr)

Artificial intelligence (AI) promises huge benefits. It is already helping to improve the diagnosis and treatment of cancers and other medical conditions. It is making education more accessible and is fundamental to self-driving cars. For humanitarian groups, it brings new tools in the fight against modern slavery to uphold human rights.

But for all the benefits, there is a darker side to AI. Advanced computing can help us model weather patterns and better respond to climate risks, but the models are energy-hungry and come with climate impacts of their own. Similarly, while AI could help combat slavery, it is built on exploitation that is reinforcing global inequality.

This digital exploitation is largely overlooked in the debate about the social implications of advanced technologies. People often focus on the dangers of using AI — including job losses from automation and the potential for bias in decision making — rather than the negative impacts associated with its creation.

With the introduction of the Corporate Sustainability Due Diligence Directive (CSDDD) in the European Union, requiring greater vigilance across the supply chain, companies may soon be pushed to look upstream as well as downstream in efforts to deploy AI responsibly.

Trapped in the loop: Millions are working in digital sweatshops to train AI for big tech companies 

None of us is born knowing the difference between a cat and a dog. We learn to tell them apart when those around us point out cats and dogs. However sophisticated an AI algorithm, it also needs teaching. People need to collect text and images, label the content — indicating this is a dog and that is a cat, and so on — and feed it into the AI model so the technology can identify similar shapes and patterns in unlabelled images. We then check that the model identifies cats and dogs correctly, improving its accuracy.

It's an iterative process of refining AI models, with people at the core known as humans-in-the-loop, and it's one that is very costly. AI models need access to a library of at least 150 to 250 sample images to begin to reliably identify a single species or object. Some studies suggest it takes 1,000 or more images per object. The cost soon mounts up, particularly to train a general model such as ChatGPT.

Major technology companies including Meta, Google, ChatGPT developer Open AI, and large-scale AI users such as the U.S. military use business process outsourcing companies and data-labeling platforms to tag content. These companies employ people as data labelers to train AI models by annotating text and images. They are typically based in markets with lower wages than Europe and the U.S. — including India, Kenya, the Philippines and Venezuela — and often weaker standards of employee protection.

Millions of people now work in digital sweatshops or are signed up to data-labeling platforms for piece work, meaning they receive a small sum for each piece of text or image they tag. Dubbed "ghost work" by anthropologist Mary Gray and computational social scientist Siddharth Suri, this invisible industry is projected to reach $13.7 billion by 2030.

They are a well-educated workforce — many with a university education and almost half having studied science, technology, engineering and math subjects to an advanced level. They may aspire to careers in data science and other skilled fields but, against a backdrop of economic decline (Venezuela) or high youth unemployment (Kenya), opportunities are limited.

Poor wages and conditions are commonplace for the data labelers behind the AI boom

Although the AI market is projected to top $1.3 trillion by the end of the decade, most data labelers are paid just $1 to $2 per hour for an 8-hour day. Individuals doing piece work may only receive a few cents for each task and face intense competition to secure projects, keeping wages down.

Workers are constantly monitored for the pace and accuracy of their work. There is pressure to work quickly and without breaks, but no bonus for completing tasks ahead of the deadline. Some workers report having to complete tasks in three months that were scheduled to take twice as long. The outsourcing company charges clients by the project and is paid the same amount whether the task is completed in three months or six, but data labelers are only paid for the hours they work. Others report not being paid at all, as their contract is terminated just before the end of the project for a minor error, such as failure to record their time under the right charge code which constitutes a "policy breach."

Scanning through vast amounts of text and images is draining whatever the content. But many workers have to sift through disturbing and graphic material. To keep users safe, AI models must be able to recognize and exclude content of a sexual, racist, harmful or inflammatory nature. Data labelers are tasked with reviewing text and images describing terrorist acts, child exploitation, extreme violence and more, to train the models on what is unacceptable.

While outsourcing companies provide counseling for workers, the degree of support varies depending on the organization. Many data labelers working for unethical outsourcing companies indicate they have no access to support. Alternatively, it is provided by an untrained colleague or scheduled at a time that workers are unable to attend and still meet challenging deadlines.

Workers in Kenya formed their own union, the African Content Moderators Union, to lobby for better conditions. But the impact was short lived, as Meta and other major technology companies shifted their projects away from outsourcing providers using unionized workers. The workers were then dismissed by the outsourcing companies in a bid to get the work back.

Pressure for reform

Pressure is growing for action to address the exploitation of workers in the development of AI, starting with minimum standards for working conditions. While regulators in both the United States and European Union have addressed workers’ rights in relation to AI, they focused on how AI is used — for example, to support recruitment and staff development. As awareness of the plight of data labelers grows, regulatory attention may begin to shift.

Frameworks such as the International Labor Organization's outline minimum acceptable standards for workers. While they don’t explicitly mention AI or digital technologies, the ILO is spearheading debate on practices within the automated economy.

Current trade laws, including the new EU Forced Labor Regulation, ban the import of goods made with forced labor. But a ban is much harder to enforce when products enter the country as a data transfer or download, rather than in a shipping container.

Due diligence obligations

Nearly three-quarters of businesses are experimenting with generative artificial intelligence, according to research from McKinsey. Eager to protect users, they have put in place responsible use policies and clear guidance for staff on developing AI-powered products and applications. They may also have calculated their AI carbon footprint and developed a plan to switch to data centres run on renewable energy to reduce environmental impact. But there is often less consideration of the upstream impacts.

So, how should companies begin to address this?

Assess risks. It’s crucial to understand more about the technology you use and how it is created. For large companies operating in the EU, the Corporate Sustainability Due Diligence Directive introduces a requirement to identify, prevent, reduce, and end negative human rights and environmental impacts across the value chain. Companies would do well to apply the same level of scrutiny to digital inputs as physical raw materials.

Disclose impacts. The United Nations has also called for companies to increase disclosure of labor conditions in the AI supply chain to foster cross-industry collaboration, prevent exploitation and support the achievement of Sustainable Development Goal 8, which calls for decent work and economic growth for all.

Benchmark providers. Stanford University developed the Foundation Model Transparency Index to encourage greater transparency on AI’s risks and impacts. It tracks disclosures from 10 of the largest technology providers on issues from usage policy to workers’ rights. It is a useful benchmark to assess risks and compare models as part of corporate materiality assessments, and to select an alternative provider if necessary.

Support self-organization. Data labelers face a precarious existence, with little or no job security. But attempts to self-organize have so far been thwarted. Companies should apply the same rules to technology outsourcing partners as they do to other suppliers, tracking and favoring those that respect union representation, freedom of association and collective bargaining.

As use of AI grows, companies need to have greater insight into the impacts and risks of the technology they use — including how that technology is developed. Due diligence is fundamental for greater visibility and planning the appropriate strategic response. 

Sarah Walkley headshot

Sarah Walkley is the senior sustainability writer at Context Europe.She has over 25 years of professional writing and editing experience across multiple formats including infographics, blogs and whitepapers. Sarah has written for a broad range of audiences from business and government to academics and consumers.

She is a former member of the Executive Leadership team at Autovista Group where she headed up development of the group’s sustainability strategy. She also held executive positions in customer research and product development.Sarah is an expert on reporting frameworks and requirements, including CSRD. You can read her recent overview of the 2023 sustainability landscape, covering CSRD and more, here.

Read more stories by Sarah Walkley