Silicon Valley’s biggest artificial intelligence developers have a language problem. Generative AI tools, like ChatGPT, thrive in English and Spanish. But early research shows these same tools are chronically underperforming in “low-resource” languages that are less represented on the internet. Now, one of the biggest suppliers of training data seems to be tackling that problem head-on.
Scale AI, one of Silicon Valley’s most prominent training data companies, is currently hiring for nearly 60 contract writer roles across dozens of languages. Each job listing claims the work is for a project to train “generative artificial intelligence models to become better writers.” The languages include Hausa, Punjabi, Thai, Lithuanian, Persian, Xhosa, Catalan, and Zulu, among many others. Six job postings, under the category “experts,” are looking to hire writers specifically for regional South Asian languages, including aKannada, Gujarati, Urdu, and Telugu.
There are significant pay disparities between the languages, with Western languages commanding as much as 15 times more than those from the Global South. For example, the job posting for German writers pays $21.55 per hour, compared to a posting for an expert in Telugu that offers just $1.43 per hour.
Many of the lower-paid languages are considered “low-resource” — meaning languages that are less commonly available on the internet, which leaves AI models with scarce, and often poor, data. Some of the most-spoken languages in the world, like Urdu and Bengali, still qualify as low-resource because of their meager presence online. Scale AI’s use of human workers to improve “low-resource” language performance is a notable shift, according to Julian Posada, an assistant professor at Yale University, and a member of the law school’s Information Society Project.
“You’ve already scrubbed the entire internet. Now, you have to get data somewhere else,” Posada told Rest of World. “This could speak to the need for not any data randomly that you can get from 4chan, but actually data that is being built by someone with expertise.”
There are a few common explanations for why generative AI systems are so bad at low-resource languages, according to Dylan Hadfield-Mennell, an assistant professor of artificial intelligence and decision-making at the Massachusetts Institute of Technology (MIT).
“One [theory] is that there’s not enough unsupervised data to build good models of, say, the linguistic patterns in Bengali,” Hadfield-Mennell told Rest of World, noting how little a language like this is represented on the internet. There are 270 million native speakers of Bengali — nearly 3% of the world’s population — but it’s used for only 0.013% of all web domains.
One task outlined in Scale AI’s hiring descriptions may be trying to address this problem: writing a short story. Asking data workers to produce creative writing about a given topic in a language like Bengali is a way to build a new body of digitized texts — one that isn’t tethered to existing internet domains.
Using these original stories, which would be mostly free of hate speech and outright owned by developers, could have the added benefit of reducing the need for content moderation down the line, according to Posada. It could also help avoid potentially costly lawsuits, like the one being considered against OpenAI by The New York Times.
While generating new data is one solution, it’s clear other strategies are also at play. Another task in the job postings asks writers to “rank a series of responses that were produced by an AI model.”
To Hadfield-Mennell, that’s a clear-cut example of RLHF, or “reinforcement learning from human feedback.” RLHF is a technique that focuses on refining a model’s outputs, as opposed to solely changing its inputs. This tackles another common theory as to why models are struggling with low-resource languages. “The other possibility is that you’re fundamentally missing the feedback of how to write well in those [low-resource] languages,” he said.
Despite the complex theory behind RLHF, it’s relatively simple for contractors. “You’re going to have a model generate a bunch of responses in Bengali and ask [workers] to rank which one is better. Then they’re going to train their system to maximize those predicted rankings,” Hadfield-Mennell said. In other words, Scale AI’s client is possibly using the text produced by its models to try and improve them.
The work still requires real language expertise. A Scale AI contract listing posted in May asked for writers in Hindi and Japanese, and required applicants to have either a master’s degree or a PhD. The only exception for years of graduate schooling was previous experience as a professional poet, journalist, or book publisher in that language. The newer hiring spree has less rigid requirements, but still requests at least enrollment in a humanities undergraduate degree.
A recent Washington Post report found that Remotasks, Scale AI’s labor contracting subsidiary, has regularly withheld or delayed payments to workers in the Philippines, raising doubts about the broader working conditions at the company. A report published in July by the gig labor research group Fairwork gave Remotasks a 1 out of 10 score, saying the platform failed to meet minimum standards for fair pay and fair contracts.
In a perfect world, it would be completely the opposite. You would have low-resource languages being paid more.”
Reached for comment, Scale AI declined to address the language job listings, citing customer confidentiality, but defended the company’s broader pay rates. “We partner with the Global Living Wage Coalition, and our economists conduct quarterly pay analyses that take a number of factors into consideration, including local costs of rent, healthcare, and transportation, in order to ensure fair and competitive compensation,” a spokesperson told Rest of World.
The result is lower rates for workers in regions with a lower living wage, even if they’re providing samples of a less accessible language. A writer of the Marathi language is offered at most $1.67 per hour, while a writer of Finnish is guaranteed nearly 14 times more than that. In an even more bizarre case, Portuguese writers from Portugal were offered up to $8.20 an hour, while Portuguese writers from Brazil could make only $3.97 per hour. Besides wages and country of origin, the description of the two jobs is identical.
“In a perfect world, it would be completely the opposite. you would have low-resource languages being paid more,” Milagros Miceli, a research fellow at the Distributed Artificial Intelligence Research Institute (DAIR) studying labor conditions in data work, told Rest of World. Despite being “rarer” in AI development, low-resource language experts are being offered as little as one-fifteenth the pay of some of their European-language counterparts.
“There is a correlation between languages that are only spoken in places that are historically disadvantaged, and the wages you can pay to people in those places,” Miceli said.
Chatbots and generative AI tools are not the only technologies that struggle to bridge the “low-resource” language training data divide. Machine translation products, like Google Translate, still struggle in less-dominant languages — whether that be the Afghan languages Pashto and Dari, or the Ethiopian language Amharic. Even Meta’s AI moderation tools regularly fall short when trying to identify hate speech in low-resource languages.
Hadfield-Mennell said the advertised jobs are a sign that one of Silicon Valley’s biggest developers is aware of gaps in low-resource languages and, at the very least, is throwing money at the problem.
“It’s either a strategy to improve performance in a variety of languages or a strategy to market themselves as having improved it,” he said. “It’s probably a bit of both.”