Computation Eliminates Obscurity

There is an interesting new paper (pdf) from some researchers at google making it’s way around the tech news sites that outlines some of the ways that clever computer programs could use the data we reveal on social networks, blogs and other online communities can undermine our expectations of pseudo-anonymity in surprising ways. Now of course if you can automatically connect an individual to their online identities people lose their obscurity. Your employer will be able to discover that you are gay or learn about the time you flashed Bourbon street during a college Mardi Gras trip. This paper doesn’t say anything very surprising if you’ve already been convinced by my prior arguments about the impossibility of maintaining obscurity (usually called anonymity1) in the information age.

To summarize briefly the google researchers pointed out that by comparing your friends on different social networking sites or data mining comments you or your associates leave on blogs it will frequently be possible to associate your pseudonymous identity to your official identity (name). While it says nothing particularly surprising the paper is interesting for vividly demonstrating how easy it is for people to have their pseudo-anonymity stripped. It is also interesting for the responses it suggests to these dangers.

To combat the risk of a friend’s trackback accidentally connecting your official and pseudonymous identities the researchers suggest automated link analysis to warn users when data mining might allow third parties to learn more about them or their friends than they intended. Presumably the idea is that some kind of automated warning would tell you before you added a trackback to your friends blog that might connect his blogging handle and real name. Similarly they suggest providing users with a tool to warn them when information they reveal on myspace might allow someone to associate their myspace and twitter accounts.

These suggested countermeasures are interesting not because they are workable but because they are so horribly flawed. Warnings about unintended information exposure are only as good as the current generation of data mining techniques but once published information can’t be put back in the bottle2. When the Netflix dataset was published it was thought to be impractically difficult if not impossible to connect rental histories with individuals but researchers developed a new technique that allowed them to do just that. Moreover, this incident also demonstrates that even trivial pieces of information like what movies you like or your favorite TV show can help connect your pseudonymous and official identities. Each time you wanted to answer a question for an internet quiz or a compatibility test for online dating you would have to study the report warning of the information that could be inferred from this data and your friends would have to be just as cautious on your behalf as unmasking them would likely unmask you.

Indeed, even if you never set foot online it would be enough for someone to analyze the people who claim to be your friend and their answers to questions like “Do you have any gay friends?” to discover you are gay3. Even if your friends are willing and contentious enough to avoid ever mentioning their favorite movies on their livejournal because of the drunk post you made five years ago revealing you and doglover69 were friends you still aren’t safe. Complete strangers can unmask you by revealing trivial information about your friends. Separate posts on different sites revealing the favorite movies of your four friends favorite movies could be compared with your blog post about the day your supportive friends each brought over their favorite movie and watched them all with you after learning you were gay. And these are only the inferences that are simple enough for people to easily imagine. By integrating all sorts of statistical information from social networks comments by people who don’t even know your friends could unmask you. The situation becomes completely hopeless when you consider other tools like Stylometry that, with a proper search tool, might allow your employer to search for blogs with similar linguistic style to yours.

Even though the authors of the paper must realize how weak these techniques are they still can’t accept (or believe their audience can’t) that information technology fundamentally changes the nature of social interaction in a large society. I suppose this shouldn’t be surprising as we have seen the same kind of response when other technologies have fundamentally altered the social ‘economics.’ Just as before the invention of the printing press each copy of a book required substantial effort to produce so too did finding out about other parts of someone’s life require great cost or effort, e.g., hiring a private investigator. The printing press changed the equation so that one person’s labour in setting up the press could cheaply distribute that information to large populations and similarly data mining reduces the marginal cost of discovering public but obscure information (what you did at that party) to nearly zero. Only one person needs to come up with the clever algorithm to ferret out yet more information from our online activities and everyone can now mine that information.

It’s hopeless to imagine that we will stop revealing any personal information about ourselves or our friends online. We are evolutionarily hard wired signal our preferences, opinions, subcultural affinities (pot smoker, party girl, player, slacker, bear/twink4) and sexual daring as well as to gossip about the behavior and sexual couplings of our friends. The idea that teens or adults will avoid advertising their sexual attractiveness, social status, or scandalous behavior online makes the idea that people only have sex inside marriage sound plausible. I mean a major reason that people flash the bar during spring break, go streaking across campus, cross dress at a party or other scandalous public behavior with on vacation is to advertise ourselves as fun, sexually daring, brave or whatever else so it’s absurd to think we won’t distribute this advertisement in the social context in which we wish to project that image. The very point of sharing that information is to build social connections and portray who we are (or want to be) so inevitably enough information will be revealed to demask all but the most reclusive or paranoid individuals social networking accounts and blogs and what gets revealed will include drug use, sexual kinks, and how trashed we got at that party.. It’s time to accept the fact that the era of obscurity is coming to an end and to start working on how to deal with it. At least pot will probably finally be decriminalized.


  1. True anonymity is still possible, perhaps even easier. Political dissidents who are willing to go to great lengths to hide their real identity and impose a total barrier between their secret and non-secret activities can retain anonymity. Nothing stops people from keeping secrets. What will become impossible is to reveal things in public forums of one kind (at a party in New Orleans) and count on the obscurity of this information to prevent your coworkers from discovering it. 

  2. What you going to make it illegal for people to archive public pages on the internet? 

  3. If you examine the friends of your friends and discover that in that population claiming to be your friend greatly raises the proportion claiming to have a gay friend it’s a good bet you are gay, or at least your friends think you are. 

  4. Referring to particular gay sexual stereotypes, analogous to say being a sporty girl or a manly man but more sexual. 

Privacy For The 21st Century

So today on slashdot I ran across a link to law professor Daniel Solove’s article grappling with the “nothing to hide” argument against privacy protections. He certainly has some thought provoking things to say and his new book will likely be interesting but I think he makes some fundamental errors in his approach to the subject. Nevertheless, reading it did inspire me to better formulate some of my thoughts on the subject.

The problem with Solove’s arguments is that he tries to simultaneously argue for the value of privacy while seemingly rejecting the notion that there is any principled commonality to the values that we place under the rubric of privacy. While both of these notions are plausible on their own they are in significant tension with each other. If indeed privacy is a word like ‘game’, famously analyzed by Wittgenstein to be a hodgepodge of different concepts related only by a chain of analogies, then it’s at best pointless and confusing to defend it as a package and at worst a way to smuggle in values you can’t defend using the cover of an unprincipled linguistic grouping. Unless the values we term privacy have some important principled commonality then they should stand or fall on their own merits rather than riding the coat tails of the vague positive connotations we have with the word privacy.

To see that privacy isn’t really a monolithic notion compare the idea that other people shouldn’t be able to easily find out your social security number really doesn’t have much to do with the idea that the government shouldn’t be able to monitor your phone calls and reading habits. These two notions don’t really have very much in common. One of them is concerned with other people’s knowledge of your intimate affairs and private conversations while the other involves only a purely arbitrary identifying number. The reason we don’t want people to find out our social security number isn’t because it’s an intimate detail of our life but because it’s unfortunately used as an authentication method for certain financial transactions and we fear becoming the victims of credit fraud. Certainly it’s important that people not be able to buy a car in my name but arguments that defend my right to be free of government surveillance aren’t going to have much to say about who finds out my social security number and vice versa.

However, I do think there is a certain core concept that is shared by many, though far from all, things we conceptualize as a right to privacy. That is the notion that we should enjoy a certain autonomy or freedom of choice, both from the government and society, in how we conduct certain parts of our lives. Certainly this is no definition of even one kind of privacy but I think it’s the uncritical acceptance that it’s literally privacy that’s important that sidetracks so many people into silly issues like what facebook publishes by default on their friend feed1. The reason I tend to be largely critical of privacy crusaders is that they tend to take the idea too literally and fight a lost cause trying to limit what other people are able to learn about you (endangering free speech….and privacy2 along the way) rather than looking for the underlying value privacy provides for the culture and seeing how best to achieve that end in the information age

Ultimately what privacy provides is the freedom from judgment (be it legal, religious or social) about certain aspects of our lives. It does this both by making it practically difficult to enforce certain kinds of invasive laws (thus discouraging their enactment) as well as keeping your porn collection or wild spring break party a secret from your parents/priest/boss. Both of these mechanisms are endangered by the information age. The traditional protections of 4th amendment law border on uselessness in the face of fancy data mining programs to suggest likely offenders, the amount of information out there on the internet (your friends and neighbors gossip…and may take infrared pictures of your house even if the police can’t), and the huge amount of information we store on computers (police can subpoena your ISP’s buisness records or get access to your entire computer if they have probable cause to see even one document). Similarly search programs and the inevitable advent of facial recognition along with people’s tendency to post pictures to the internet will erase the anonymity you might have once had on spring break.

However, I think we can find replacements for these tools that provide the same benefits in the information age. Just as some other cultures have done we need to develop traditions of ignoring (or at least not scolding) based on certain aspects of people’s lives. This is the reason that unequal loss of privacy/anonymity is so much more dangerous than an equitable loss. Everyone has things that might embarrass them or present a less than professional image and if we all know that these can easily be found we are much more likely to let other people have their personal space as well. The legal aspect will be more difficult but it is also achievable. We will need to shift the focus of our protections away from the guarding of information and towards rules against intrusiveness. Perhaps in addition to rules requiring search warrants we could have rules barring unprompted investigation, i.e., rules that prevent tearing someone’s life up for a crime without a particularized identification of a victim who does/would have wanted an investigation. That’s just a shot in the dark but I suspect something better will be found.


  1. Certainly it can be annoying to find out your Christmas surprise was ruined because facebook changed the defaults and the wrong defaults can make facebook an unpleasant place to visit but sub-optimal site design is a concern for facebook shareholders hardly an issue of grave concern. If people are bothered enough it’s not like you can’t just quit using facebook. 

  2. Ironically if you want to stop people from doing the kind of information retrieval and processing that scares the privacy advocates you would have to violate people’s privacy to do it. After all if my internet usage is unmonitored and what I do with my computer is my own business you can’t prevent me from gathering data, analyzing it and even discretely sharing it with my friends. 

Finally Someone Gets Privacy Right

I was originally inspired to think about the whole privacy issue when I heard that David Brin argued that it was the uneven lose of privacy that was the threat not the loss of privacy itself. I didn’t bother to actually read what he had said until today but unsurprisingly he has some pretty interesting views on the subject.

What was surprising, however, was to see someone else who had a reasonable take on the whole ‘privacy’ issue, especially linked from slashdot.

While the author seems reluctant to make the leep the article flirts with the two critical points in the ‘privacy’ debate. First of all that true obscurity/freedom from recording is a lost cause and secondly that the real danger is from unequal erosion of our obscurity. So long as we only see footage of ‘crooks’ or the surveilance cameras are only placed in minority/poor neighborhoods it’s easy to use the substantial difference between what society officially designates as acceptable and how people actually behave against the most powerless parts of our society.

(more…)

A Tiny Taste Of The End Of Obscurity

I just ran across this interesting article on slashdot describing a project to create 3D models of famous landmarks (Tower of Liberty, Notre Dame Cathedral) by algorithmically combining photos posted on flikr. Apart from the technical coolness of the project what struck me about the article was their long term goal of creating full 3D reconstructions of cities by combining the information from billions of online photographs. This project is a perfect illustration of how absurd opposition to projects like google’s street view truly is.

While it may have been intellectually obvious before this sort of project really drives home the fact that increasing computational power and algorithmic advances in computer vision negate the need for any coordinated database. So long as their are enough pictures out there somewhere the right algorithm can sow them back together and extract whatever information you want out of them. Right now the best we might be able to hope for is a fancy version of google’s street view but the inevitable increase in the amount of online content (webcams, automated picture taking etc..) and the inexorable progress of the computer industry means that eventually we will be able to figure out who you are sleeping with1, where you buy your groceries and even reveal certain health problems.

There is no way around it. Computational advances will eliminate obscurity. The only real question is whether we implement ultimately ineffective laws about ‘privacy’ that will give large organizations with massive computing power an informational advantage until computational power catches up. Anyway I’m repeating myself some I’m going to stop now.


  1. Look for people who frequently appear in the same vicinity at night and in the morning. 

dcphonelist: Legalizing Prostitution One Step At A Time

In an entertaining turn of events four Brandeis alums have pitched in and created a searchable interface to Madam Palfrey’s phone records. If you want to try a number for yourself head on over to dcphonelist.com and once you are bored of that the story in the Hill about the the project is worth a read. Apparently one lobbyist has already been outed through the site but given the difficulty. In case you aren’t familiar with the DC madam case so far I give a brief summary after the break.

Now some people seem to think that reporting on or distributing this information is immoral as the sex lives of politicians should remain private and others find this an unpalatable invasion of privacy. Presumably this is the reason that ABC refused to identify any of Palfrey’s non-politician clients. But this is mind bogglingly hypocritical. I mean jesus christ the men on this list are faced with potentially losing their job or being divorced. Ms. Palfrey is facing prison time. It’s insane to think that prostitution is bad enough to throw Palfrey in jail for it but not bad enough to cause some guys to be embarrassed. Unless the guys calling are on the record as supporting the legalization of prostitution I have no sympathy for their plight.

Every day the government takes away people’s freedom for no other reason than prudish moral disapproval1. It is the people who don’t really believe prostitution (or drug use) is that bad (such as the johns) but stay silent out of ambition or fear of censure who are really guilty here not Madam Palfrey. None of us would defend the person who let an innocent man go to jail rather than reveal he was having an affair and tacitly supporting the criminalization of prostitution is even worse. You don’t even need to admit you have been to a prostitute to argue for it’s legalization. Just like homosexuals working for gay bashing senators these clients deserve to be punished for their hypocrisy if anyone does and more importantly we ought to discourage this sort of hypocritical behavior.

If we really knew the names of everyone who used drugs or visited prostitutes they would become legal within the week. I’m hopeful the loss of obscurity (aka privacy) that everyone complains about will bring us to a point where this sort of hypocritical moralizing is no longer possible.

(more…)


  1. This isn’t to deny there are harms from prostitution or drug use but only that on net there are more harms in banning them then in regulating them. Thus the choice to ban rather than regulate is a choice to hurt people just so you can feel morally righteous. 

Obscurity or Freedom

The notions of privacy and obscurity are often confused. Simply put privacy is the right to do things in secret. Obscurity on the other hand is the ability to do things without them becoming widely associated with you. If someone sneaks into your hotel room and watches you have an affair your privacy has been violated. On the other hand if they just ask people who sat next to you on the bus whether you talked mushy with Suzie on your cell phone it’s only obscurity that you have lost.

This distinction is important to make as it is privacy that is essential to both personal liberty and a democratic society not obscurity. It is the fact that the government doesn’t get to see the books we read in our homes that lets us learn what the unpopular side thinks during times like the red scare, not the ability to keep a lid on the mundane thrillers we read on the train. Moreover, the success of early American democracy suggests that a free society can live perfectly well without obscurity as the notion hardly exists in small towns but that privacy is a bedrock value. This doesn’t mean that the continued erosion of obscurity by technology posses no serious problems, indeed we may need to take steps to ensure this doesn’t allow the government to peer into our private sphere, only that it is foolish to assume that the only solution is to hold off our loss of obscurity.

This is lucky as in this age of increasing computational power and worldwide information networks maintaining obscurity is incompatible with individual freedom. At the moment we might be able to buttress our failing obscurity by opposing anti-crime cameras in our cities and government attempts to build DNA databases but eventually increasing computational power will bring obscurity directly into conflict with personal freedom.

Even ignoring things like security systems and ATM cameras I probably walk past at least one public webcam everyday probably more and who knows when I might appear in the corner of someone’s tourist picture. Someday face recognition will become sufficiently advanced that you will be able to search the internet for pictures that show a particular individual. Should we ban people from posting their vacation photos to flickr? Do we pass a law against webcams, require a webcam license? What then happens when wearable computers really become feasible? Should the law stop me from having an assistant in my eyeglasses who reminds me of names I’ve forgotten? Will we abridge free speech by banning me from posting who I run into every day? I could continue but I think the point is clear.

Legally any attempt to preserve obscurity will run smack dab into the first amendment. Even if we ban webcams and tyrannically clamp down on what can be done with security cameras just a small percent of the population choosing to document who or what they see in public locals would be enough to eliminate obscurity. We can’t ban the practices that will strip our obscurity because they differ only in degree from behavior we think deserves protection (relating your day, posting pictures, citizen journalism a la Rodney King). Free speech will not abide quotas on the amount of photos we can post or the number people we mention having seen that day leaving us with a choice between freedom and obscurity.

Obviously I think we should opt for freedom. We can partially alleviate the loss of obscurity by stronger privacy protections (giving phone call logs, work email more protection). Other harms are primarily the consequence of differential obscurity rather than obscurity itself. Ironically I fear that efforts to prevent the loss of obscurity will leave governments and large corporations with the resources to analyze the information themselves with a leg up on the average citizen who cannot. If we accept that retaining obscurity is a lost cause and put our efforts to productive use ensuring that we retain a substantive private sphere and that the same rules apply to the powerful and powerless it is likely that the ill effects will be minimal. We had better hope so since we don’t have any choice about it.

UPDATE (7/8/07): Replaced anonymity with obscurity throughout the post. I think this is a much more accurate term.

Crazy Scary

Alright I don’t go in for all that tinfoil hat type stuff but this is really fucking scary. Watch lists of books which get you visits from government agents, and not even guides to being a suicide bomber or other instructions to cause harm but Mao’s little red book.

Unless this report turns out to be a politically motivated hoax things are a lot more scary than I thought.

Would a National ID Card Increase our Privacy?

So originally I was going to write a post about the possibilities of electronic and internet voting. After my voting card yesterday failed to work (someone told me each card contains many people’s votes can anyone confirm this?) I was inspired to write about how electronic and even internet voting can be done right. Despite many people’s preconceptions some clever mathematical tricks like homomorphic encryption can make the whole system even safer than paper voting. However, any successful internet voting scheme would require a national ID that was also a smart card. So before I go on to talk about voting I wanted to thoroughly address the issue of national IDs.

While many people seem to regard the specter of a national ID card as a serious threat to privacy and perhaps civil liberties less people are aware that the REAL ID act effectively turns state drivers licenses into a national ID. This approach has a great many problems. However, just pushing back this legislation wouldn’t do much for privacy or identity theft. Credit cards, drivers licenses, social security numbers and a hundred other forms of identification are easily used and stolen. Surprisingly I think the best solution is to create a national ID card designed to respect privacy and prevent identity theft.

Instead of trying to fight a losing battle to prevent universal machine readable identification privacy advocates should get behind a well designed national ID. As I argue below such an ID would actually increase our personal privacy in addition to providing several other benefits. By supporting such a system privacy advocates can make sure it is built to protect privacy not to enable surveillance by law enforcement and corporations. (more…)