6 hours ago

Joseph Stiglitz Questions Whether AI’s Data Scrape Could Ultimately Undermine Quality Information

2 mins read
Alessandro Bremec—NurPhoto/Getty Images

The proliferation of artificial intelligence models, while promising, carries a significant risk to the integrity of global information systems, according to Nobel laureate Joseph Stiglitz. His concern centers on the fundamental mechanics of how large language models (LLMs) are trained, particularly their tendency to indiscriminately scrape vast quantities of online content. This process, he suggests, could lead to a deterioration of the very data these systems rely upon, creating a feedback loop where low-quality information is amplified and mistaken for truth.

Stiglitz describes this phenomenon as an “information externality,” a concept he distills into the familiar adage: garbage in, garbage out. The economist points out that AI systems, while adept at processing data, often lack the capacity to discern between credible knowledge and mere noise. When these models are fed a diet of online chatter, unverified claims, and even extremist forum discussions, their outputs inevitably reflect this underlying distortion. The polished appearance of AI-generated content, he cautions, might lead users to mistakenly perceive it as highly processed, reliable information, rather than a sophisticated repackaging of flawed inputs. This dynamic poses a threat not only to labor markets but also to the foundational accuracy of everything from financial predictions to political discourse.

A significant part of the problem, in Stiglitz’s view, stems from AI’s relationship with traditional content creators. He argues that AI essentially “steals information from legacy media” by scraping published articles and research without adequately compensating the originators. This practice, he contends, undermines the economic incentives for producing high-quality journalism and in-depth research. If the institutions responsible for generating credible information are defunded, the overall quality of available data diminishes. Consequently, the training data for AI models increasingly skews towards the cheapest and most abundant forms of online content, such as comment threads, partisan memes, and hastily produced material, further compromising the integrity of AI outputs.

The economist highlights a particular vulnerability in areas where public discourse is heavily influenced by passionate, often misinformed, voices. He uses the example of vaccine information, noting that anti-vaccine sentiment tends to be far more prevalent and actively disseminated across online platforms than scientifically rigorous studies. While scientists publish a limited number of dense, peer-reviewed papers, conspiracy theorists flood forums and social media with countless posts. AI models, trained on frequency and engagement, might inadvertently amplify these louder, albeit less accurate, voices simply because they are more abundant in the training data. This mechanism could inadvertently push a vocal minority’s perspective over the carefully established consensus of experts.

Stiglitz drew a parallel to the Grossman-Stiglitz paradox, which posited that if market prices fully reflect all available information, there is no incentive for individuals to expend resources gathering that information. He and a graduate student, Max Ventura, extended this concept to AI, suggesting a similar attenuation of incentives for producing high-quality information. When AI companies are not compelled to pay for the data they scrape, the creators of original content lose the returns necessary to fund their work. This leads to a scenario where prediction markets and trading algorithms, relying on these compromised AI outputs, become further detached from any underlying investment in truth, potentially creating a less stable and informed economic environment.

Despite these significant concerns, Stiglitz does not advocate for abandoning AI. He himself uses it as a research tool and encourages his students to do the same, emphasizing that it should aid, not replace, critical thinking and analysis. He views AI outputs as prompts for further investigation rather than definitive answers. However, he stresses the necessity of governmental intervention. Without regulatory measures, he warns, there is a substantial risk that the information ecosystem will worsen across numerous critical domains, leaving society with a more polished but ultimately more corrupted understanding of reality.

author avatar
Josh Weiner

Don't Miss