Computation Eliminates Obscurity January 10
There is an interesting new paper (pdf) from some researchers at google making it’s way around the tech news sites that outlines some of the ways that clever computer programs could use the data we reveal on social networks, blogs and other online communities can undermine our expectations of pseudo-anonymity in surprising ways. Now of course if you can automatically connect an individual to their online identities people lose their obscurity. Your employer will be able to discover that you are gay or learn about the time you flashed Bourbon street during a college Mardi Gras trip. This paper doesn’t say anything very surprising if you’ve already been convinced by my prior arguments about the impossibility of maintaining obscurity (usually called anonymity1) in the information age.
To summarize briefly the google researchers pointed out that by comparing your friends on different social networking sites or data mining comments you or your associates leave on blogs it will frequently be possible to associate your pseudonymous identity to your official identity (name). While it says nothing particularly surprising the paper is interesting for vividly demonstrating how easy it is for people to have their pseudo-anonymity stripped. It is also interesting for the responses it suggests to these dangers.
To combat the risk of a friend’s trackback accidentally connecting your official and pseudonymous identities the researchers suggest automated link analysis to warn users when data mining might allow third parties to learn more about them or their friends than they intended. Presumably the idea is that some kind of automated warning would tell you before you added a trackback to your friends blog that might connect his blogging handle and real name. Similarly they suggest providing users with a tool to warn them when information they reveal on myspace might allow someone to associate their myspace and twitter accounts.
These suggested countermeasures are interesting not because they are workable but because they are so horribly flawed. Warnings about unintended information exposure are only as good as the current generation of data mining techniques but once published information can’t be put back in the bottle2. When the Netflix dataset was published it was thought to be impractically difficult if not impossible to connect rental histories with individuals but researchers developed a new technique that allowed them to do just that. Moreover, this incident also demonstrates that even trivial pieces of information like what movies you like or your favorite TV show can help connect your pseudonymous and official identities. Each time you wanted to answer a question for an internet quiz or a compatibility test for online dating you would have to study the report warning of the information that could be inferred from this data and your friends would have to be just as cautious on your behalf as unmasking them would likely unmask you.
Indeed, even if you never set foot online it would be enough for someone to analyze the people who claim to be your friend and their answers to questions like “Do you have any gay friends?” to discover you are gay3. Even if your friends are willing and contentious enough to avoid ever mentioning their favorite movies on their livejournal because of the drunk post you made five years ago revealing you and doglover69 were friends you still aren’t safe. Complete strangers can unmask you by revealing trivial information about your friends. Separate posts on different sites revealing the favorite movies of your four friends favorite movies could be compared with your blog post about the day your supportive friends each brought over their favorite movie and watched them all with you after learning you were gay. And these are only the inferences that are simple enough for people to easily imagine. By integrating all sorts of statistical information from social networks comments by people who don’t even know your friends could unmask you. The situation becomes completely hopeless when you consider other tools like Stylometry that, with a proper search tool, might allow your employer to search for blogs with similar linguistic style to yours.
Even though the authors of the paper must realize how weak these techniques are they still can’t accept (or believe their audience can’t) that information technology fundamentally changes the nature of social interaction in a large society. I suppose this shouldn’t be surprising as we have seen the same kind of response when other technologies have fundamentally altered the social ‘economics.’ Just as before the invention of the printing press each copy of a book required substantial effort to produce so too did finding out about other parts of someone’s life require great cost or effort, e.g., hiring a private investigator. The printing press changed the equation so that one person’s labour in setting up the press could cheaply distribute that information to large populations and similarly data mining reduces the marginal cost of discovering public but obscure information (what you did at that party) to nearly zero. Only one person needs to come up with the clever algorithm to ferret out yet more information from our online activities and everyone can now mine that information.
It’s hopeless to imagine that we will stop revealing any personal information about ourselves or our friends online. We are evolutionarily hard wired signal our preferences, opinions, subcultural affinities (pot smoker, party girl, player, slacker, bear/twink4) and sexual daring as well as to gossip about the behavior and sexual couplings of our friends. The idea that teens or adults will avoid advertising their sexual attractiveness, social status, or scandalous behavior online makes the idea that people only have sex inside marriage sound plausible. I mean a major reason that people flash the bar during spring break, go streaking across campus, cross dress at a party or other scandalous public behavior with on vacation is to advertise ourselves as fun, sexually daring, brave or whatever else so it’s absurd to think we won’t distribute this advertisement in the social context in which we wish to project that image. The very point of sharing that information is to build social connections and portray who we are (or want to be) so inevitably enough information will be revealed to demask all but the most reclusive or paranoid individuals social networking accounts and blogs and what gets revealed will include drug use, sexual kinks, and how trashed we got at that party.. It’s time to accept the fact that the era of obscurity is coming to an end and to start working on how to deal with it. At least pot will probably finally be decriminalized.
-
True anonymity is still possible, perhaps even easier. Political dissidents who are willing to go to great lengths to hide their real identity and impose a total barrier between their secret and non-secret activities can retain anonymity. Nothing stops people from keeping secrets. What will become impossible is to reveal things in public forums of one kind (at a party in New Orleans) and count on the obscurity of this information to prevent your coworkers from discovering it. ↩
-
What you going to make it illegal for people to archive public pages on the internet? ↩
-
If you examine the friends of your friends and discover that in that population claiming to be your friend greatly raises the proportion claiming to have a gay friend it’s a good bet you are gay, or at least your friends think you are. ↩
-
Referring to particular gay sexual stereotypes, analogous to say being a sporty girl or a manly man but more sexual. ↩
The younger generation is already all over this. If you ever see documentaries or read stories that have first-person accounts of younger people’s internet behavior, you’ll find that they are much more open and don’t really have the expectation of online-offline dissociation that older people do.
Though interestingly, if you look at our parents’ generation, they seem to completely lack the understanding that their internet personae reflect upon them as people. People who would otherwise be very studious and careful will do things like type in all lowercase, email lewd pictures, or leave really stupid comments. It’s bizarre.
Great post, very informative, thank you for sharing!
Even if you have a $0 budget you can find people to work for you through high school and foreign student internship programs. Once you have a budget, you can bring people on board for as little as one hour a day (what I first did) and then increase their hours when you can afford it. You need to be spending your time working on the business and not in the business.