Cybersecurity|Cybersecurity statistics
US was hit the hardest by the Duolingo data leak
We all love to learn something new. But it's disheartening to learn that your favorite educational platform has experienced a data leak. Unfortunately, this happened to 2,676,690 Duolingo users whose email addresses were exposed and are now being sold online. The leak occurred by scraping Duolingo's data using an application programming interface (API), revealing a mix of both public and private information¹. Let’s see what lessons can be learned here.
Key insights
- The US is the most affected country, with 967k unique email addresses exposed (that’s roughly the population size of Rhode Island!). This constitutes approximately a third of the compromised accounts. The outcome is unsurprising, given that the US holds the highest number of leaked accounts since 2004².
- South Sudan comes in second, with five times fewer accounts leaked (175k) than the US. Spain follows in third place with 123k exposed accounts, followed by France with 105k, and the United Kingdom with 98k.
- In total, 16.3M data points of Duolingo users were exposed. On average, each email account was leaked with five data points, such as language (5.3M), profile picture (2.7M), username (2.7M), name (2.2M), country (0.7M) or bio (6k). Some user accounts got all of their details leaked.
- The biggest concern is the exposure of email addresses — it could be used for phishing attacks. Phishing continues to be the most common cybercrime for the third year in a row, with a total of 300,497 phishing victims in 2022³. People affected might receive personalized phishing emails, such as offering affordable courses related to the language they have been studying on Duolingo. This could be done using leaked names and origin countries, resulting in highly customized emails, possibly even in their own native languages.
Methodology and sources
The data was collected by our independent partners from a publicly available database and aggregated by email addresses. This data was then anonymized and passed on to Surfshark’s researchers to perform a statistical analysis of their findings.
For the complete research material behind this study, visit here.