Canada Found OpenAI Broke Privacy Law Training GPT-4. The Fix Doesn't Apply to the Models That Broke It.

Canada’s Office of the Privacy Commissioner published its findings in PIPEDA Investigation #2026-002 on May 6, 2026, and the conclusions are direct: OpenAI collected personal data to train GPT-3.5 and GPT-4 in ways that violated Canadian federal privacy law. The collection was “overbroad and therefore inappropriate.” The data included medical records, individuals’ opinions on sensitive topics, and information relating to children. The people whose data was scraped would not reasonably have expected it to be used this way, which negates the implied consent OpenAI relied on.

The investigation’s resolution is more complicated than its findings. OpenAI agreed to implement privacy-protective tools in future model versions. The models found to have violated privacy law — GPT-3.5 and GPT-4, both still widely deployed — are not subject to any corrective technical requirement. The finding was marked “well-founded and conditionally resolved,” which is as close as PIPEDA investigations typically get to a verdict. The penalty is voluntary commitment with no binding enforcement mechanism attached to the specific models at issue.

What the Investigation Actually Found

The investigation covered OpenAI’s practices during the pre-training phase of GPT-3.5 and GPT-4 — the stage at which the models absorb massive text corpora to develop their language capabilities. OpenAI’s pre-training data consisted of scraped web content and licensed datasets containing “trillions of words.” Fine-tuning used human trainer conversations and user interactions.

Canadian commissioners concluded on three core points.

First, the data collection was broader than any disclosed purpose would justify. OpenAI claimed implied consent from the fact that the data was publicly accessible — a standard argument from AI companies that regulators globally are increasingly rejecting. The commissioners found that simply because data is technically accessible does not mean the individuals who created it consented to its use in AI model training. The gap between what people understood they were sharing and what OpenAI used it for was too wide for implied consent to bridge.

Second, the nature of the data made the implied consent argument especially weak. Web-scraped training corpora routinely contain medical forum posts, therapy discussion boards, support group archives, and children’s content. The commissioners noted that collecting “significant amounts of personal information of varying levels of sensitivity” — including medical information, opinions on controversial topics, and children’s data — requires a higher standard of consent than OpenAI obtained.

Third, individuals had no meaningful mechanism to know their data was being collected for this purpose, let alone to object to it. OpenAI had not established adequate transparency about the scope of its training data sources.

The Structural Problem With the Resolution

Privacy investigations that end in voluntary commitments face an inherent limitation: the resolution is forward-looking while the harm is backward-looking.

GPT-3.5 and GPT-4 were trained on data collected in violation of PIPEDA. Those models are deployed in products used by millions of Canadians. The training cannot be undone. Retraining from scratch on privacy-compliant data would cost hundreds of millions of dollars and produce a commercially different product. OpenAI committed to implementing privacy-protective tools — specifically, systems that detect and mask identifying information in training data — in future model versions.

This means the privacy violation that the investigation confirmed is baked into existing models that will continue to be used. The people whose medical forum posts or children’s records were scraped have no remedy. The models incorporating that data remain in the market. The legal finding is that a violation occurred, but the practical consequence is a commitment to do better next time.

The commissioners were constrained by PIPEDA’s architecture. Unlike GDPR, which allows fines of up to 4% of global annual revenue, PIPEDA investigations cannot result in monetary penalties. The Commissioner can find violations and recommend corrective measures, but cannot compel them. Enforcement requires the Federal Court.

Canada Is Not Alone in This Finding

The PIPEDA finding arrives within a broader international pattern of privacy regulators challenging AI training data collection. Italy’s data protection authority temporarily blocked ChatGPT in 2023 over similar concerns. The French CNIL has investigated multiple AI companies on training data grounds. The UK’s ICO has issued guidance asserting that web scraping for AI training triggers data protection obligations.

What distinguishes the Canadian finding is its specificity: rather than general guidance, it is a formal investigation conclusion that a specific company violated the law in training specific models. That specificity makes it more useful as a precedent and more meaningful as a reference point for other regulators.

The finding also comes months after Amnesty International published its “Unlawful by Design” report on May 28, 2026, which goes further — characterising major AI companies’ data collection practices as structural violations of international human rights law. The Canadian investigation is narrower and more legalistic, but points in the same direction.

What Should Actually Change

The gap between the findings and the resolution illustrates what comprehensive AI privacy enforcement would require but currently lacks.

Meaningful enforcement would require some form of training data accountability — an obligation to document what data was used, from what sources, and under what legal basis. It would require a mechanism for individuals to understand whether their data contributed to a model’s training, and a process for addressing that if they object. And it would require remedies that can reach existing models, not just prospective commitments for models not yet built.

None of that exists in Canadian law as written. PIPEDA was not designed with foundation model training in mind. The commissioners made their findings within the statute they have. Closing the gap between what the statute allows and what the technology requires is a legislative problem that no investigation can solve.

The immediate practical consequence of the Canadian finding is limited — OpenAI faces no fine and no requirement to modify its deployed products. But as an articulation of what privacy law requires when AI companies harvest the internet, it is one of the clearest formal statements any regulatory body has produced.

PIPEDA Investigation #2026-002 is available at priv.gc.ca. The finding was published May 6, 2026.

What the Investigation Actually Found

The Structural Problem With the Resolution

Canada Is Not Alone in This Finding

What Should Actually Change

Related Articles

The 'Your Phone Is Listening' Ad Tool Was a Scam — and the FTC Just Proved It

Amnesty International: Generative AI Is 'Unlawful by Design' — Built on Mass Privacy Violations

When the Best Model in the World Vanishes in Three Days: The Case for Local AI