Although the term ‘metadata’ was coined in the ’60s, the general public became familiar with it after the Snowden leaks. In 2014 in his TED Talk Snowden explained that metadata could reveal ‘who you are talking to when you are talking to them, where you travel.’
Metadata is precious, yet it doesn’t receive the public attention it deserves. Maybe due to government officials trying their best to stop us from caring. Or perhaps due to legal regulations concerning surveillance – they are often vague and don’t provide sufficient details when citizens become subject to surveillance.
In this article, I will discuss what metadata is, what it can say about you.
Metadata? Meta… what?
Metadata is a part of everyday life. Every file you send or receive has metadata. It reveals what might be contained in the data: the point of it is to make connections and provide context, show relationships and help understand them.
Metadata answers questions:
Let’s take the package of a chocolate bar – the information you see on it, like the name of the brand, the barcode, etc. is metadata. When you listen to a song, the title, the artist’s name, the keywords, the tags, the listening frequency is metadata as well. It helps music-streaming platforms provide you with recommendations: when you watch Youtube and leave videos on autoplay, the metadata of your previous choices helps determine what it plays next.
Metadata in social media is used to group posts, track the interests of users, relationships between the user and the data, help build context around data. It’s one of the most helpful features of metadata because it determines the reason why the data was created and used.
Imagine you’re sending a picture to your friend. Let’s say it’s a selfie. Data reveals the contents of the selfie, while metadata can contain location data, time, even what exposure time it had on the camera.
What can metadata reveal about us
In short, almost everything. The most intimate personal details about us – our political views, health information, finances, family relationships, etc.
A few examples:
MIT Media Lab graduate students Deepak Jagdish and Daniel Smilkov developed a tool called ‘Immersion’ which was designed to make sense of email metadata. Only by analyzing the From, To, Cc and Timestamp fields of the emails researchers were able to make surprising discoveries about their social interactions, relationships, social life, even their sleep cycle.
For example, how many people they introduced each other to during the given periods, when they had difficult personal life moments, when they were the most productive during the day, etc.
The long reach of telephone metadata surveillance was demonstrated by Stanford University students who found that the NSA’s mass collection of phone records can provide much more information about people’s private lives than the US government claimed. Just by acquiring the phone number of the caller and recipient, the particular serial number of the phones involved, the time and duration of calls and possibly the location of each person when the call occurred, the researchers were able to isolate the data to a particular identity.
Researchers, Beatrice Perez, Mirco Musolesi and Gianluca Stringhini, from the University College London and the Alan Turing Institute used Twitter as a case study to quantify the uniqueness of the association between metadata and user identity. They wrote: ‘we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy.‘ According to the authors, the same techniques can be applied to most social media platforms.
Your metadata is so important that social media bots will use it to look ‘more human,’ thus harder to detect. A new report ‘The Manipulation of Social Media’ by Data & Society, claims: ‘Manipulators are getting craftier at evading moderation efforts built upon these metadata categories by using platform features in unexpected ways’. This means that your activity on social platforms (likes, retweets, favorites, comments, reactions, etc.) is a form of metadata, which can be used by manipulators to create bots that can mimic your behavior and appear ‘real.’ Such bots are difficult to detect (either by humans or by platform filters) and can be used for fake news, spam and other malicious activities.
Researchers from Greece and the USA explored the importance of public location (meta)data. They concluded: ‘The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymization, the inference of sensitive information, and even physical threats.’
A totalitarian government doesn’t always track the contents of communications between potential opposition suspects, but their relationships – who is friends with whom. It’s also not new. During the Soviet era, often the dissidents who fled to the USA thought of ways to communicate with their family back in the USSR without writing their names and addresses (or the metadata) on the envelopes they sent.
As surveillance technology improves and our phones act as GPS trackers, potential voice and video recorders that we carry with us all the time, snoopers gather increasing amounts of metadata every day. As a security expert, Bruce Schneier has put it in his book ‘Data and Goliath’: ‘Your cell phone tracks where you live and where you work. It tracks where you like to spend your weekends and evenings. It tracks how often you go to church (and which church), how much time you spend in a bar, and whether you speed when you drive. It tracks—since it knows about all the other phones in your area—whom you spend your days with, whom you meet for lunch, and whom you sleep with.’
Hence, our personal profiles are getting more detailed and in-depth. So how come metadata still doesn’t get the attention it deserves?
Secure websites don’t secure metadata
You may have heard that HTTPS indicates secure websites. A somewhat less known fact is that although HTTPS encrypts your content, it reveals metadata. To explain this, I need to give some history because it can be complicated to understand for the less tech-savvy.
In 1989 physicist Tim Berners-Lee at CERN invented the World Wide Web (or WWW) and started the development of Hypertext Transfer Protocol (or HTTP), which is the foundation of data communication for the web. However, the problem with HTTP is that its contents are not encrypted, hence – not secure. It can be relatively easily exploited during man-in-the-middle attacks. Therefore, your sensitive information (emails, passwords, credit card details, etc.) can be stolen.
The ‘s’ in HTTPS stands for ‘secure.’ This protocol was designed to enhance privacy on the internet when sending sensitive personal data. Man-in-the-middle attacks are still possible, but they are a lot more challenging to perform. HTTPS is now widely used across the internet – such as by Google, Facebook or Twitter, and of course – our website. To check if the website you’re visiting is secure, look at the URL field.
To achieve HTTP to HTTPS conversion, the owner of the website needs to purchase the Secure Sockets Layer (or SSL) or the more secure Transport Layer Security (or TLS) protocol certificates. This protocol proves the website is legit and is what it claims to be.
When you go to your bank’s website, you need to be sure that you’re accessing what you’re trying to access. For example, that it’s not a fake site created by scammers to ‘phish’ for your credentials.
To make it more clear, imagine you’re sending a postcard – anybody can see what you’ve written on that postcard. It can be an analogy for HTTP. If you put it in an envelope, others can see the envelope, but can only guess what’s inside – that’s HTTPS.
It means that the contents of your communications are secure, but the metadata isn’t. Your ISP or third-party snoopers can see what websites you access.
However, seems like this might be changing soon. Recently Mozilla announced that Firefox Nightly now supports encrypting TLS Server Name Indication (or SNI) extension. This means that attackers (and your ISP) won’t be able to monitor what websites you access – i. e. your browsing history. Currently, this only works for sites hosted by Cloudflare, but it’s just a question of time when other providers will follow this example.
America – the home of widespread metadata surveillance
The National Security Agency, NSA, might be the most intrusive and creative metadata snooper we know.
In 2016 Snowden leaked the NSA’s newsletter highlighting metadata as the most useful tool.
Various political, social and technological shifts have allowed the NSA to raise the levels of metadata collection. Although the 2015 USA Freedom Act limited the NSA to collecting phone records and contacts of people who may have ties to terrorism, in May the Agency revealed a massive increase in the amount of call metadata collected (in the report titled ‘call detail records’) – from 151 million call records in 2016 to over 534 million in 2017. Regardless of this fact, there were only 40 terrorism suspects (fewer than in 2016).
At the end of June, the NSA released a public statement announcing that it began purging the records because officials discovered technical irregularities.
Also, there’s an ongoing debate whether collecting metadata can be considered an intrusion into a person’s home, hence – against the Fourth Amendment, which broadly states the American’s right to privacy as fundamental to individual liberty, by prohibiting unreasonable searches and seizures.
Still, too often we hear US government officials and politicians defending such actions: ‘Relax, it‘s just plain old metadata. You don’t have to worry about it. Carry on what you’re doing’. Of course, not touching on the fact that metadata can actually reveal even the most personal and intimate information about you and everyone you communicate to.
Unfortunately, such efforts of government officials to fool citizens seem to be working. Laura Finley and Luigi Esposito from Barry University examined the NSA’s ‘coupling with major telecommunications companies for mass surveillance of Americans’ communications. They concluded that it’s as a form of state-facilitated state-corporate crime.’ One of the conclusions was that Americans are generally apathetic toward the NSA’s actions.
Europe: better have a legit reason for logging
There’s this so-called conventional wisdom that Europe is better at protecting their citizens’ privacy online than America because the EU has the strictest data protection regulations.
To put it simply, in the EU, companies can log as much data as is absolutely required for their business. If they need to record something that is not necessary to provide their services, most probably, they would have to get the user’s consent.
In 2006 the EU passed the Data Retention Directive which required telecommunications providers to collect ‘all kinds of telephone and internet metadata’ and store it for at least 6 months. But in 2014 a ruling by the European Court of Justice declared the Directive invalid.
Since metadata can be traced back to an individual, it’s defined in the EU’s General Data Protection Regulation (GDPR), which took effect last May, where Article 4 states: ‘an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person’.
This means, if the data you collect is not the metadata of the trajectories of stars in space, it is personal.
However, activists often express their concerns about the rise of surveillance in Europe. The United Kingdom, France, and Germany have adopted laws allowing the bulk interception of communications.
Still, personal data is much more private in the EU than in the USA or other countries. For example:
- Under the Australian Data Retention Law, which took effect in 2017, the metadata of Australians’ mobile and online communications is collected and stored for at least two years. National agencies, such as the Australian Security Intelligence Organisation (ASIO) and the Australian Federal Police (AFP), can gain self-authorized access to the data.
- Metadata has been the subject of debates in Canada too since the Canadian Security Intelligence Service (CSIS) received criticism for holding on to telecommunications metadata of innocent people and keeping their actions in the dark. Recently it was announced 70% of the data has already been destroyed.
Is there something I could do?
The bad news is – there’s no panacea, no magic button you can press and avoid spreading your metadata in all directions. Unless, like Henry David Thoreau, you start a new life in the woods. Each action online can have a method to minimize metadata.
Here’s some advice from our technologists:
- When it comes to HTTPS websites, While TLS provides more security in communication between server and client there is a leakage of website name to the ISP or any third party in the middle due to how SNI works, hence to make sure nobody snoops on you – VPN can be used;
- Do not over-share. Actually, this is a general rule for every situation. Every time you put something on the internet, it stays there forever;
- Install Linux with various privacy features;
- Install Snowden-approved operating system – Qubes;
- Turn off GPS when not needed.
This article was updated: January 22, 2019
What do you use to protect your metadata?