Amazon Found High Volume Of Child Sex Abuse Material In AI Training Data

In 2025, NCMEC saw at least a fifteen-fold increase in these AI-related reports, with "the vast majority" coming from Amazon.

Bloomberg News
World
Jan 29, 2026 23:19 pm IST
- Published On Jan 29, 2026 23:19 pm IST
- Last Updated On Jan 29, 2026 23:19 pm IST

Read Time: 9 mins

Twitter
WhatsApp
Facebook
Reddit
Email

Amazon Found High Volume Of Child Sex Abuse Material In AI Training Data

Amazon said the training data was obtained from external sources.

Photo Source: Bloomberg

Amazon.com Inc. reported hundreds of thousands of pieces of content last year that it believed included child sexual abuse, which it found in data gathered to improve its artificial intelligence models. Though Amazon removed the content before training its models, child safety officials said the company has not provided information about its source, potentially hindering law enforcement from finding perpetrators and protecting victims.

Throughout last year, Amazon detected the material in its AI training data and reported it to the National Center for Missing and Exploited Children, or NCMEC. The organization, which was established by Congress to field tips about child sexual abuse and share them with law enforcement, recently started tracking the number of reports specifically tied to AI products and their development. In 2025, NCMEC saw at least a fifteen-fold increase in these AI-related reports, with “the vast majority” coming from Amazon. The findings haven't been previously reported.

An Amazon spokesperson said the training data was obtained from external sources, and the company doesn't have the details about its origin that could aid investigators. It's common for companies to use data scraped from publicly available sources, such as the open web, to train their AI models. Other large tech companies have also scanned their training data and reported potentially exploitative material to NCMEC. However, the clearinghouse pointed to “glaring differences” between Amazon and its peers. The other companies collectively made just “a handful of reports,” and provided more detail on the origin of the material, a top NCMEC official said.

In an emailed statement, the Amazon spokesperson said that the company is committed to preventing child sexual abuse material across all of its businesses. “We take a deliberately cautious approach to scanning foundation model training data, including data from the public web, to identify and remove known [child sexual abuse material] and protect our customers,” the spokesperson said.

The spike in Amazon's reports coincides with a fast moving AI race that has left companies large and small scrambling to acquire and ingest huge volumes of data to improve their models. But that race has also complicated the work of child safety officials — who are struggling to keep up with the changing technology — and challenged regulators tasked with safeguarding AI from abuse. AI safety experts warn that quickly amassing large datasets without proper safeguards comes with grave risks.

Amazon accounted for most of the more than 1 million AI-related reports of child sexual abuse material submitted to NCMEC in 2025, the organization said. It marks a jump from the 67,000 AI-related reports that came from across the tech and media industry a year prior, and just 4,700 in 2023. This category of AI-related reports can include AI-generated photos and videos, or sexually explicit conversations with AI chatbots. It can also include photos of real victims of sexual abuse that were collected, even unintentionally, in an effort to improve AI models.

Training AI on illegal and exploitative content raises newfound concerns. It could risk shaping a model's underlying behaviors, potentially improving its ability to digitally alter and sexualize photos of real children or create entirely new images of sexualized children that never existed. It also raises the threat of continuing the circulation of the images that models were trained on — re-victimizing children who have suffered abuse.

The Amazon spokesperson said that, as of January, the company is “not aware of any instances” of its models generating child sexual abuse material. None of its reports submitted to NCMEC were of AI-generated material, the spokesperson added. Instead, the content was flagged by an automatic detection tool that compared it against a database of known child abuse material involving real victims, a process called “hashing.” Approximately 99.97% of the reports resulted from scanning “non-proprietary training data,” the spokesperson said.

Amazon believes it over-reported these cases to NCMEC to avoid accidentally missing something. “We intentionally use an over-inclusive threshold for scanning, which yields a high percentage of false positives,” the spokesperson added.

Amazon has more than 900 data center facilities worldwide.
Photo Credit: Bloomberg

The AI-related reports received last year are just a fraction of the total number submitted to NCMEC. The larger category of reports also includes suspected child sexual abuse material sent in private messages or uploaded to social media feeds and the cloud. In 2024, for example, NCMEC received more than 20 million reports from across industry, with most coming from Meta Platforms Inc. subsidiaries Facebook, Instagram and WhatsApp. Not all reports are ultimately confirmed as containing child sexual abuse material, referred to with the acronym CSAM.

Still, the volume of suspected CSAM that Amazon detected across its AI pipeline in 2025 stunned child safety experts interviewed by Bloomberg News. The hundreds of thousands of reports made to NCMEC marked a drastic surge for the company. In 2024, Amazon and all of its subsidiaries made a total of 64,195 reports.

“This is really an outlier,” said Fallon McNulty, the executive director of NCMEC's CyberTipline, the entity to which US-based social media platforms, cloud providers and other companies are legally required to report suspected CSAM. “Having such a high volume come in throughout the year begs a lot of questions about where the data is coming from, and what safeguards have been put in place.”

McNulty, speaking in an interview, said she has little visibility into what's driving the surge of sexually exploitative material in Amazon's initial training data sets. Amazon has provided “very little to almost no information” in their reports about where the illicit material originally came from, who had shared it, or if it remains actively available on the internet, she said.

While Amazon is not required to share this level of detail, the lack of information makes it impossible for NCMEC to track down the material's origin and work to get it removed, McNulty said. It also limits relevant law enforcement agencies tasked with searching for sex offenders and children in active danger. “There's nothing then that can be done with those reports,” she said. “Our team has been really clear with [Amazon] that those reports are inactionable.”

When asked why the company didn't disclose information about the possible origin of the material, or other key details, the Amazon spokesperson replied, “because of how this data is sourced, we don't have the data that comprises an actionable report.” The spokesperson did not explain how the third-party data was sourced or why the company did not have sufficient information to create actionable reports. “While our proactive safeguards cannot provide the same detail in NCMEC reports as consumer-facing tools, we stand by our commitment to responsible AI and will continue our work to prevent CSAM,” the spokesperson said.

NCMEC, a nonprofit, receives funding both from the US government and private industry. Amazon is among its funders and holds a corporate seat on its board.

“There should be more transparency on how companies are gathering and analyzing the data to train their models — and how they're training them,” said David Thiel, the former chief technologist at the Stanford Internet Observatory, who has researched the prevalence of child sexual abuse material in AI training data.

Such data can be licensed, purchased or scraped from the internet, or could be so-called synthetic data, which is text or images created by other AI tools. As AI companies seek to release new models quickly, “the rapid gathering of data is a much higher priority than doing safety analyses,” Thiel said. He warned that there are “always some errors” when it comes to sifting out CSAM from training data, and believes the industry needs to be more open about where its data is coming from.

Amazon's Bedrock offering, which gives customers access to various AI models so they can build their own AI products, includes automated detection for known CSAM and rejects and reports positive matches. The company's consumer-facing generative AI products also allow users to report content that escapes its controls.

The Seattle-based tech giant scans for CSAM across its other businesses, too, including its consumer photo storage service. Amazon's cloud computing division, Amazon Web Services, also removes CSAM when it's discovered on the web services it hosts. McNulty said AWS submitted far fewer reports than came from Amazon's AI efforts. Amazon declined to break out specific reporting data across its various business units, but noted it would share broad data in March.

Only recently have technology companies really begun to scrutinize their AI models and training data for CSAM, said David Rust-Smith, a data scientist at Thorn, a nonprofit organization that provides tools to companies, including Amazon, to detect the exploitative material.

“There's definitely been a big shift in the last year of people coming to us asking for help cleaning data sets,” said Rust-Smith. He noted that “some of the biggest players” have sought to apply Thorn's detection tools to their training data, but declined to speak about any individual company. Amazon did not use Thorn's technology to scan its training data, the spokesperson confirmed.

Rust-Smith said AI-focused companies are approaching Thorn with a newfound urgency. “People are learning what we already knew, which is, if you hoover up a ton of the internet, you're going to get [child sexual abuse material],” he said.

Amazon was not the only company to spot and report potential CSAM from its AI workflows last year. Alphabet Inc.'s Google and OpenAI told Bloomberg News that they scan AI training data for exploitative material — a process that has surfaced potential CSAM, which the companies then reported to NCMEC. Meta and Anthropic PBC said they, too, search training data for CSAM. Meta did not comment on whether it had identified the material, but said it would report to NCMEC if it did. Anthropic said it has not reported such material out of its training data. Meta and Google said that they've taken efforts to ensure that reports related to their AI workflows are distinguishable from those generated by others parts of their business.

McNulty said that, with the exception of Amazon, the AI-related reports it received last year came in “really, really small volumes,” and included key details that allowed the clearinghouse to pass on actionable information to law enforcement.

“Simply flagging that you came across something but not providing any type of actionable detail doesn't help the larger child safety space,” McNulty said.

Essential Business Intelligence, Continuous LIVE TV, Sharp Market Insights, Practical Personal Finance Advice and Latest Stories — On NDTV Profit.