The technical name for the use of such ratings to improve AI models is reinforcement learning with human feedback, or RLHF. OpenAI, Google, Anthropic, and other companies all use the technique. After a chatbot has processed massive amounts of text, human feedback helps fine-tune it. ChatGPT is impressive because using it feels like chatting with a human, but that pastiche does not naturally arise through ingesting data from something like the entire internet, an amalgam of recipes and patents and blogs and novels. Although AI programs are set up to be effective at pattern detection, they “don’t have any sense of contextual understanding, no ability to parse whether AI-generated text looks more or less like what a human would have written,” Sarah Myers West, the managing director of the AI Now Institute, an independent research organization, told me. Only an actual person can make that call.
—
AI experts I spoke with outside these companies took a different stance. Targeted human feedback has been “the single most impactful change that made [current] AI models as good as they are,” allowing the leap from GPT-2’s half-baked emails to GPT-4’s convincing essays, Luccioni said. She and others argue that tech companies intentionally downplay the importance of human feedback. Such obfuscation “sockets away some of the most unseemly elements of these technologies,” such as hateful content and misinformation that humans have to identify, Myers West told me—not to mention the conditions the people work under. Even setting aside those elements, describing the extent of human intervention would risk dispelling the magical and marketable illusion of intelligent machines—a “Wizard of Oz effect,” Luccioni said.
—
AI raters might be understood as an extension of that cloud, treated not as laborers with human needs so much as productive units, carbon transistors on a series of fleshly microchips—objects, not people. Yet even microchips take up space; they require not just electricity but also ventilation to keep from overheating. The Appen raters’ termination and reinstatement is part of “a more generalized pattern within the tech industry of engaging in very swift retaliation against workers” when they organize for better pay or against ethical concerns about the products they work on, Myers West, of the AI Now Institute, told me.
—
For more, head here