Cybersecurity|Cyberthreats
83M cases of child exploitation reported from 2020 to 2022
The surge in Child Sexual Abuse Material (CSAM) presents a challenge, with over 83 million cases reported globally between 2020 and 2022. This alarming trend is not only a stark reminder of the issue of online child exploitation but also raises critical questions about protecting user privacy. Can CSAM be dealt with effectively without undermining it?
Several speakers from the European Data Protection Supervisor suggest tightening the law by introducing a controversial CSAM-scanning plan¹, but the European Commission is pushing back. After all, if confirmed, the plan could become the tipping point for starting large-scale surveillance of the public.
Web-scraping experts from Oxylabs say there’s an AI-powered way to identify CSAM content. Their experiment² and our further extrapolation of data suggest there might be hundreds or thousands of unreported websites with CSAM. If it’s possible to get such estimates, perhaps countries could invest more into detecting CSAM in this privacy-preserving way.
Explore Surfshark Research Hub’s findings on estimated unreported CSAM websites and overall CSAM trends based on US companies’ reports to the National Center for Missing & Exploited Children (NCMEC). The data reveals which countries seem to be the most vulnerable regarding CSAM exploitation and might remain so unless better CSAM-hunting technologies are implemented.
New AI-powered web scraping tool helps identify CSAM content
In the last few years, artificial intelligence has significantly advanced its capability to recognize vast amounts of visual data. Therefore, web-scraping experts from Oxylabs decided to use the technology for their pro-bono initiative, Project 4β³, in cooperation with the Communications Regulatory Authority of Lithuania (RRT⁴).
Oxylabs undertook this project in order to find a more effective, proactive approach to tackling CSAM online rather than the manual and reactive one that’s usually been practiced by official authorities in the last decades. “Voluntary reports through the hotline is the most common way worldwide to collect complaints,” Vaidotas Ramonas, Director of the Digital Services Department at RRT, explains⁴.
Using AI-driven image recognition systems, Oxylabs' tool, combined with human supervision, can pretty effectively identify illegal content. Even if an image has been modified to avoid traditional detection systems, the tool can still identify such pictures. How? When it finds images based on specific criteria, it first analyzes them by checking how their fingerprints (metadata) match the fingerprints of images in the police database. Then, it analyzes images with a machine-learning model suited for detecting pornographic material. That’s how images with modified metadata cannot slip through undetected.
The project was carried out in two months by scanning nearly 300k Lithuanian websites. As a result, the tool identified 19 local websites as violators of national or EU laws, 8 police reports were filed, and 2 pre-trial investigations were started.
In conclusion, the sandbox test was successful and could easily be applied to other countries. The RRT staff plans to share their experience of the new AI tool with partner institutions across various countries. The experiment’s data provides concrete and clear results, suggesting that it could be an effective tool to push further global efforts in tackling CSAM content.
The state of CSAM in Europe
Now let’s explore how Oxylabs’ project data led us to our next research insight: a total of 1,720 European websites with harmful CSAM content might be online, unreported.
A law mandates that US Electronic service providers (ESPs), such as Google and Meta, must report Child Sexual Abuse Material (CSAM) cases to the National Center for Missing and Exploited Children (NCMEC). This reporting is conducted through the NCMEC’s CyberTipline. Given that some American ESPs have users worldwide, the CyberTipline serves as a central hub for collecting global CSAM cases.
We decided to extrapolate the numbers for all EU countries. To estimate the number of websites containing CSAM in EU countries, we first took the number of Lithuanian websites with CSAM found by Oxylabs. We divided it by the number of CSAM reports Lithuania submitted to the NCMEC. Then, we multiplied this ratio by the number of CSAM reports to the NCMEC from other EU countries. This calculation is based on the assumption that the situation in Lithuania is similar to that in the rest of the EU (for more information on our methodology, check the detailed description at the end of the article).
Based on these calculations, we found that Poland may account for 16% of EU cases (269 unreported local harmful websites), France — 260 websites, Germany — 158, Hungary — 152, Italy — 110.
CSAM in the rest of the world
In 2020-2022, the NCMEC received over 83 million CSAM reports, with EU countries accounting for 3.1 million reports.
Asia, the biggest region by population, seems to be leading in CSAM. Strikingly, in two-thirds of the reports, crime locations are attributed to Asian countries. Most often, CSAM content location is attributed to India — it accounts for almost 16% of CSAM reports (over 13 million reports). The Philippines follow India with 7.1 million reports, Pakistan with 5.4 million, Indonesia and Bangladesh with 4.7 million each.
We assume that the number of reports is vast not just because of the size of Asian countries' populations, including China, which is 30% of the Asian population. After all, China’s CSAM cases account for only 0.02% of global cases. The prevalence of CSAM content in the region could be attributed to the fact that the sexual exploitation of children is still very common in South Asian countries. According to Unicef, it is due to the “combined product of many factors, such as poverty, social norms condoning them, lack of decent work opportunities for adults and adolescents, migration and emergencies.” ⁸
Generally, CSAM cases are going upwards. In 2020, the NCMEC received slightly over 20 million reports. In 2021, the number rose to nearly 30 million; last year, it exceeded 30 million.
It should be highlighted that roughly half of the reports are not actionable, either because they lack sufficient information or have been reported too many times. A non-actionable report means that the tech company or other reporter cannot provide enough information for law enforcement to act, e.g., content uploader details, imagery, and possible location are missing. Such cases are usually considered “informational” by the NCMEC. Some reports also fall under that category because the imagery has gone viral and has already been reported many times.⁹ In retrospect, it can be difficult to find the roots of such cases and identify criminals effectively.
The biggest reporters for the NCMEC
Let’s look at the biggest NCMEC reporters each year. The NCMEC receives CSAM reports from over 200 electronic service providers. Meta is the most reporting company (over 74 million reports filed), accounting for 90% of all reports received by the NCMEC. Interestingly, the report count by Meta’s platforms varies greatly: Facebook has reported 5 times more than Instagram and 18 times more than WhatsApp.
The graph below shows the YoY trends of platforms where most CSAM cases were detected. It’s important to note that while U.S.-based ESPs are legally required to report instances of “apparent child pornography” to the CyberTipline when they become aware of them, “there are no legal requirements for proactive efforts to detect this content or what information an ESP must include in a CyberTipline report,” as the NCMEC states.
To investigate the reporting across all platforms, go over the lines and dots on the infographics or click on any platform title in the right sidebar.
Why are Meta’s numbers so big? There could be multiple reasons. First, there’s a general tendency for people to upload lots of content to Meta platforms. Second, Meta has developed and implemented open-source tools for detecting CSAM, likely improving Meta’s detection systems⁵. It’s difficult to say the precise reason because there is no way to measure such claims. Interestingly, unlike Meta, Apple still has not implemented any CSAM preventative systems⁶. It filed only 659 CSAM reports over the last three years.
Even though Meta could be seen as the most responsible NCMEC reporter, New Mexico’s Attorney General Raúl Torrez believes the situation might be more complicated. The recently filed lawsuit suggests Meta’s CSAM-catching algorithms could be lagging behind and that Meta’s platforms lack adequate age verification mechanisms.⁷ The further development of the lawsuit is unclear.
Conclusion
The problem of identifying CSAM content with a hundred percent accuracy lingers. General user privacy is also at stake, creating tensions between the governments and tech companies like Meta. Both parties are in search of solutions as current efforts don’t give perfect results.
Oxylabs’ tool, infused with AI capabilities, is an example of what the effective combination of technology can do in fighting abusive online content. There could be more similar solutions in the future — perhaps focused on CSAM content that’s already there or performing searches in real time. After all, the latest AI models have that capability.
Currently, there might be over 1.7k European websites with illegal CSAM content, so effective technological solutions to the problem are urgently needed.
Methodology and sources
This study used 2020-2022 open-source information from the National Center for Missing and Exploited Children (NCMEC) and findings reported by the Communications Regulatory Authority of Lithuania (RRT).
The study is split into two parts: unreported EU websites containing CSAM (section 1) and CSAM trends based on US companies’ reports to NCMEC (section 2-3).
To estimate the number of websites containing CSAM in EU countries, we took the number of Lithuanian websites with CSAM found by Oxylabs and divided it by the number of CSAM reports Lithuania submitted to the NCMEC. Then, we multiplied this ratio by the number of CSAM reports to the NCMEC from other EU countries. This calculation is based on the assumption that the situation in Lithuania is similar to that in the rest of the EU. For sections “CSAM in the rest of the world” and “The biggest reporters for the NCMEC,” data was aggregated per reported country and Electronic Service provider (ESP).
Note on EU countries' similarity: we presumed that EU countries are fairly similar in the usage of American ESPs, CSAM laws, and the number of reports with the wrong geographic indicators. That’s why we thought it was fair to infer unknown values from trends observed in the NCMEC data.
Note on NCMEC data usage and limitations: As per NCMEC, “Most CyberTipline reports include geographic indicators related to the upload location of the CSAM. It is important to note that country-specific numbers may be impacted by the use of proxies and anonymizers. In addition, each country applies its own national laws when assessing the reported content.”
For complete research material and calculations behind this study, visit here.