On May 28, 2026, Amnesty International published a report with a title that does not hedge: “Unlawful by Design: Exposing the Human Rights Costs of Generative AI.”

The conclusion is equally direct. Amnesty finds that the major generative AI companies — OpenAI, Google, Meta, and DeepSeek — built their systems through mass invasions of privacy by design. Not as a side effect, not as an unfortunate consequence of technically legal practices, but as the foundational method of construction. The data pipelines feeding these systems were assembled by scraping billions of public web posts — images, text, social media content, personal disclosures — without the explicit consent of the people who created them, in ways that violate international human rights privacy standards.

The report calls on governments to prohibit these data collection systems. Not to regulate them. Not to require disclosure about them. To prohibit them.

This is the most direct challenge to the legal and ethical foundation of the generative AI industry that any major international human rights organisation has published. It arrives at a moment when the legal framework for AI training data is genuinely contested — with Canada’s PIPEDA investigation, Italian and French regulatory actions, and a growing body of class-action litigation all circling the same core question: was the internet’s scraping legal?

What Amnesty Documents

The report covers three dimensions of harm: privacy violations in data collection, discriminatory outputs from biased training data, and threats to freedom of expression and thought.

On data collection, Amnesty documents that generative AI training datasets are “largely pulled from the web and therefore polluted with real-world biases” — built from the full range of public internet content including personal posts, images shared in social contexts, discussions in health forums, and content created by minors. The companies that built these systems did not obtain explicit consent. They relied on various legal theories — implied consent from public posting, transformative use arguments from copyright doctrine, and in some jurisdictions simple silence in the law — to proceed.

Amnesty’s position is that none of those theories survive scrutiny under the international human rights framework for privacy, which requires that data processing be lawful, necessary, and proportionate. Scraping the entire public internet to train commercial AI systems fails the necessity and proportionality tests regardless of the technical accessibility of the data.

On discriminatory outputs, Amnesty finds that training on web-sourced data means training on human bias at scale. Gender stereotypes, racial hierarchies, and harmful cultural assumptions embedded in internet content are not filtered out — they are learned and reproduced, at the scale of systems serving hundreds of millions of users. The report characterises this as a structural harm that flows directly from the collection method, not a correctable edge case.

On freedom of expression and thought, Amnesty argues that systems “capable of influencing users’ thoughts and shaping their personal beliefs” that were built on non-consensual data extraction represent a compounded violation — the privacy harm in collection enables a cognitive influence harm in deployment.

The Environmental Accounting

The report includes an environmental dimension that connects AI data centres to the privacy argument through the lens of broader harm disclosure.

Google’s greenhouse gas emissions increased 48% since 2019, attributed directly to data centre operations and supply chain demands from AI workloads — documented in Google’s own 2024 Sustainability Report. Microsoft’s emissions increased 29% between 2020 and 2024 for the same reasons. These are not small numbers. They represent the physical infrastructure required to store and process the data pipelines that Amnesty characterises as unlawful.

The environmental accounting is not central to the privacy argument, but it matters for the political economy of the debate. The generative AI industry has presented its products primarily through the lens of capability and productivity benefit. Amnesty’s framing requires that the full cost structure — privacy violations in construction, discriminatory outputs in deployment, and environmental costs in operation — be visible in the same frame as the claimed benefits.

The Global Regulatory Response

The report situates the AI privacy violation as part of a global regulatory response that is gathering momentum but has not yet produced enforceable solutions at scale.

Brazil enacted child privacy legislation that implicates AI training practices. Vietnam implemented a comprehensive AI law in March 2026. Canada’s PIPEDA investigation found that OpenAI violated privacy law in training GPT-3.5 and GPT-4 — a finding published three weeks before Amnesty’s report. The EU AI Act’s provisions on training data transparency are entering their implementation phase.

None of this constitutes a prohibition on AI training data scraping. The regulatory responses to date have been declaratory, conditional, or limited in geographic scope. Amnesty is explicitly calling for something more: a prohibition on the systems themselves where they are built on unlawful collection.

What Makes This Argument Significant

The generative AI industry’s legal defence of its training data practices rests on a chain of arguments that has been remarkably resilient: public data is public, implied consent exists for information voluntarily shared online, training is transformative and does not reproduce the original data, and in any case the benefits justify the collection model.

Amnesty attacks the chain at its foundation. The human rights framework for privacy does not rest on whether data is “public.” It rests on whether processing is lawful, necessary, and proportionate — criteria applied to what is done with data, not just whether it was technically accessible. Under that framework, the size, scope, and commercial purpose of AI training data collection fails the proportionality test regardless of the technical openness of the source material.

This is also the framework that underpins GDPR, which has been the most consequential privacy regulation globally. The GDPR’s lawful basis requirements — consent, legitimate interest, contract, legal obligation — have all been argued by AI companies as applying to their training practices. European data protection authorities have been sceptical but have not issued final binding determinations that definitively resolve the question at scale.

Amnesty’s report is not a legal ruling. It cannot compel anything. But it reframes the debate in terms that have political and reputational weight that a regulatory opinion does not: this is a human rights issue, not merely a compliance question. The companies involved — OpenAI, Google, Meta, DeepSeek — are named explicitly. The violations are characterised as structural rather than incidental.

The call for prohibition is the sharpest edge of the report and the one that will generate the most resistance. Prohibiting the data collection systems used to train large-scale generative AI would mean prohibiting large-scale generative AI as currently built. That is precisely what Amnesty says is required.


Amnesty International’s full report “Unlawful by Design” is available at amnesty.org (published May 28, 2026). The JURIST summary and Amnesty press release of the same date provide accessible summaries of the findings.