What does "metadata" actually mean?
04 September 2017
One of the buzzwords around online surveillance and leaked NSA data collection programs has been the word “metadata.” The government doesn’t collect content of communication, just the “metadata” we’ve been assured, which seems to imply that collecting only metadata—though at times, far more than just metadata has been collected—is acceptable and respects our privacy. Unfortunately, “metadata” is a broad term and allows for a large amount of data collection. Not only that, but collecting metadata is not subject to regulations as stringent as those requiring warrants for wiretapping.
The Merriam-Webster dictionary defines “metadata” as “data about other data.” Everything stored digitally has some sort of metadata associated with it, such as where it lives on a computer, who created it, who owns it, or where it came from. The JPEG image format, which is widely used by cameras and smartphones, includes the date and time a photo was taken, its location (as GPS coordinates, if available), the camera make and model that took it, and has support for adding arbitrary notes to a file. Often, metadata is almost completely invisible unless you’re specifically looking for it. Other times, it’s a core part of making things work. The “To” and “From” fields in an email fall under the category of metadata because they are data about the content—that is, where it’s intended to go and where it came from.
When it comes to being invisible or keeping things private, metadata can be downright dangerous. It has helped law enforcement track files to specific individuals who were breaking the law, such as in the case of Dennis Rader. In the case of Dennis Rader, that data came from a deleted Microsoft Office file which contained information that allowed police to determine who Rader was. Image sharing sites, which can seem fairly innocuous, have had problems around revealing the location and camera data embedded in photos. Imgur, a popular site for sharing images of all kinds, attempts to strip that data when pictures are uploaded as a measure to improve user privacy. The data can be used as a means of figuring out a photographer’s secrets to taking great photos, or can reveal where a person lives—including their home address—to strangers online. Exif data, which is the image metadata stored in JPEGs, has been specifically listed as one type of metadata collected by the NSA XKeyscore program.
In addition to the information explicitly collected, metadata collected over time can reveal a lot about the context of someone’s activities. With a lot of data points, modeling a person’s behavior is possible. With information about the cell tower a person made a call from and the people they talked to, it might be possible to figure out what someone was doing especially with a lot of other data points to compare against. Over a number of years, call records can build a picture of who you talk to, who you’re close to, and when and where you talk to people. Theoretically, U.S. law protects U.S. citizens and limits how much information can be gathered without a warrant. However, a person’s network can include someone of interest to the NSA and it’s difficult to determine whether someone is a U.S. citizen based on their metadata. Without specific information about someone, it’s assumed that they’re a non-U.S. person and can be monitored freely. That’s not including, of course, the amount of data collected domestically by accident.
With smartphones, we create more metadata than ever before with information tagged on images, emails, phone calls, and web browsing habits (because, yes, the address of a website is metadata). We create so much metadata, that a lack of it could be seen as suspicious. Even with nothing to hide, our privacy is at stake. We don’t know for sure how much data is collected, how it’s used, or how it’s secured. We might be sharing things we’re not even aware of and we may not know who is listening because in the past the government appears to have simply ignored data sharing rules, hacking and data leaks aside.