Patents and Invention Types

Many people working in the software industry seem to be coming to the conclusion that patents do more harm than good to their industry and therefore advocate abolishing software patents. The reasons they feel this way are pretty apparent. There are so many obvious1 patents like the Eolas patent of Amazon’s one-click patent that any major piece of software is probably more likely to infringe on one than not. Something is pretty clearly wrong when people are spending more time worrying about accidentally infringing on someone’s patent than struggling with the problems the patented inventions solve. However, not all software patents are so unreasonable.

Consider google’s (well stanford’s) PageRank patent. Recognizing that one could create a useful measure of a page’s importance by summing up the importance of the pages linking to it and realizing that this could be efficiently computed using tricks from linear algebra was anything but trivial. Indeed, this kind of non-trivial application of mathematics to an ill-defined problem (return the best search) is a prototypical example of the sort of discovery that benefits from patent protection. Society can derive great benefits from these kinds of discoveries and money will lure people with the expertise to solve these problems away from pure mathematics or the sciences but without the ability to patent the discovery the financial incentives wouldn’t exist2. While most software patents are more like the 1-click patent than the PageRank patent there is no shortage of real world problems that are crying out for a similarly brilliant solution.

In the face of examples like google’s PageRank patent it’s tempting to say that software patents are just dandy and the real problem is obvious patents. Unfortunately, it’s not that simple. Consider a hypothetical patent on the use of a LRU (least recently used) cache for texture data in a MMORPG client3. It’s certainly obvious in the sense that any decently skilled software developer would consider that solution if he was asked to solve the problem but until you actually try this design it’s not clear that an LRU cache would work. Maybe players don’t backtrack much so an LRU cache would simply waste resources on rooms the player won’t see again for awhile. At least in this example it seems clear that simply noticing that a fairly obvious approach solved the problem shouldn’t warrant a patent. The cost of actually testing out the ‘discovery’ is quite low and usually there are only a few obvious approaches to try.

However, such a rule would be totally unworkable in another industry where there might be a vast array of potential approaches that experts in the field would agree seemed promising but the cost of investigating them are quite high. For instance in the pharmaceutical industry everyone might realize that a certain large class of compounds are promising candidates to treat depression but actually evaluating each of these compounds for efficacy and safety is very costly. Incentivizing drug development requires that we let the pharmaceutical company patent their discovery that compound 5043A1 actually works to treat depression.

Ultimately I think the real problem stems from the fact that we are lumping two very different kinds of invention into the patent system. There is the first type of invention, like the google PageRank system, that represents a flash of inspiration to try something that no one else thought of and then there is the second type of invention that consists of the discovery that some potential solution really works. Ideally the patent system would protect the first kind of discovery pretty broadly but only protect the second sort of discovery in industries where it requires considerable resources to ascertain which of many potential solutions succeeds.


  1. Used as the normal language term not the legal term of art. 

  2. Sure, inventing this kind of algorithm might land you a decent programming job or a nice faculty appointment in CS but that’s no reason to spend time working on these problems rather than pursuing an academic career in physics or math. 

  3. You keep the data about how recently seen objects look around in case you see them again. 

Computation Eliminates Obscurity

There is an interesting new paper (pdf) from some researchers at google making it’s way around the tech news sites that outlines some of the ways that clever computer programs could use the data we reveal on social networks, blogs and other online communities can undermine our expectations of pseudo-anonymity in surprising ways. Now of course if you can automatically connect an individual to their online identities people lose their obscurity. Your employer will be able to discover that you are gay or learn about the time you flashed Bourbon street during a college Mardi Gras trip. This paper doesn’t say anything very surprising if you’ve already been convinced by my prior arguments about the impossibility of maintaining obscurity (usually called anonymity1) in the information age.

To summarize briefly the google researchers pointed out that by comparing your friends on different social networking sites or data mining comments you or your associates leave on blogs it will frequently be possible to associate your pseudonymous identity to your official identity (name). While it says nothing particularly surprising the paper is interesting for vividly demonstrating how easy it is for people to have their pseudo-anonymity stripped. It is also interesting for the responses it suggests to these dangers.

To combat the risk of a friend’s trackback accidentally connecting your official and pseudonymous identities the researchers suggest automated link analysis to warn users when data mining might allow third parties to learn more about them or their friends than they intended. Presumably the idea is that some kind of automated warning would tell you before you added a trackback to your friends blog that might connect his blogging handle and real name. Similarly they suggest providing users with a tool to warn them when information they reveal on myspace might allow someone to associate their myspace and twitter accounts.

These suggested countermeasures are interesting not because they are workable but because they are so horribly flawed. Warnings about unintended information exposure are only as good as the current generation of data mining techniques but once published information can’t be put back in the bottle2. When the Netflix dataset was published it was thought to be impractically difficult if not impossible to connect rental histories with individuals but researchers developed a new technique that allowed them to do just that. Moreover, this incident also demonstrates that even trivial pieces of information like what movies you like or your favorite TV show can help connect your pseudonymous and official identities. Each time you wanted to answer a question for an internet quiz or a compatibility test for online dating you would have to study the report warning of the information that could be inferred from this data and your friends would have to be just as cautious on your behalf as unmasking them would likely unmask you.

Indeed, even if you never set foot online it would be enough for someone to analyze the people who claim to be your friend and their answers to questions like “Do you have any gay friends?” to discover you are gay3. Even if your friends are willing and contentious enough to avoid ever mentioning their favorite movies on their livejournal because of the drunk post you made five years ago revealing you and doglover69 were friends you still aren’t safe. Complete strangers can unmask you by revealing trivial information about your friends. Separate posts on different sites revealing the favorite movies of your four friends favorite movies could be compared with your blog post about the day your supportive friends each brought over their favorite movie and watched them all with you after learning you were gay. And these are only the inferences that are simple enough for people to easily imagine. By integrating all sorts of statistical information from social networks comments by people who don’t even know your friends could unmask you. The situation becomes completely hopeless when you consider other tools like Stylometry that, with a proper search tool, might allow your employer to search for blogs with similar linguistic style to yours.

Even though the authors of the paper must realize how weak these techniques are they still can’t accept (or believe their audience can’t) that information technology fundamentally changes the nature of social interaction in a large society. I suppose this shouldn’t be surprising as we have seen the same kind of response when other technologies have fundamentally altered the social ‘economics.’ Just as before the invention of the printing press each copy of a book required substantial effort to produce so too did finding out about other parts of someone’s life require great cost or effort, e.g., hiring a private investigator. The printing press changed the equation so that one person’s labour in setting up the press could cheaply distribute that information to large populations and similarly data mining reduces the marginal cost of discovering public but obscure information (what you did at that party) to nearly zero. Only one person needs to come up with the clever algorithm to ferret out yet more information from our online activities and everyone can now mine that information.

It’s hopeless to imagine that we will stop revealing any personal information about ourselves or our friends online. We are evolutionarily hard wired signal our preferences, opinions, subcultural affinities (pot smoker, party girl, player, slacker, bear/twink4) and sexual daring as well as to gossip about the behavior and sexual couplings of our friends. The idea that teens or adults will avoid advertising their sexual attractiveness, social status, or scandalous behavior online makes the idea that people only have sex inside marriage sound plausible. I mean a major reason that people flash the bar during spring break, go streaking across campus, cross dress at a party or other scandalous public behavior with on vacation is to advertise ourselves as fun, sexually daring, brave or whatever else so it’s absurd to think we won’t distribute this advertisement in the social context in which we wish to project that image. The very point of sharing that information is to build social connections and portray who we are (or want to be) so inevitably enough information will be revealed to demask all but the most reclusive or paranoid individuals social networking accounts and blogs and what gets revealed will include drug use, sexual kinks, and how trashed we got at that party.. It’s time to accept the fact that the era of obscurity is coming to an end and to start working on how to deal with it. At least pot will probably finally be decriminalized.


  1. True anonymity is still possible, perhaps even easier. Political dissidents who are willing to go to great lengths to hide their real identity and impose a total barrier between their secret and non-secret activities can retain anonymity. Nothing stops people from keeping secrets. What will become impossible is to reveal things in public forums of one kind (at a party in New Orleans) and count on the obscurity of this information to prevent your coworkers from discovering it. 

  2. What you going to make it illegal for people to archive public pages on the internet? 

  3. If you examine the friends of your friends and discover that in that population claiming to be your friend greatly raises the proportion claiming to have a gay friend it’s a good bet you are gay, or at least your friends think you are. 

  4. Referring to particular gay sexual stereotypes, analogous to say being a sporty girl or a manly man but more sexual. 

The Singularity and the Nature of Intelligence

The capability of computers and our ability to program them seems to be increasing exponentially. Even if we hit a brick wall in terms of increased miniaturization and frequency our CS knowledge seems sure to continue building on itself. It stands to reason that within the next century we will have the ability to build computers, or at least augment our own brains, to create entities smarter than ourselves (whether or not you think they will have experiences). But if our creations are smarter than us then, barring any limit imposed by fundamental physics, one would think they could improve on our design and design another generation that was even smarter. These machines (or augmented humans) would soon reach transcendent levels of intelligence and change our society beyond recognition.

At least this is (more or less) the notion of the Singularity as popularized by Vernor Vinge and Ray Kurzweil. For more details I recommend reading Vinge himself or checking out one Kurzweil’s many interviews and talks (audio) as well as his webpage. These are certainly two very smart individuals who have the rare ability to look beyond the specifics and take a fairly clear headed look at how technology will transform society. But smart doesn’t mean infallible and predicting the future is a notoriously difficult business.

While I used to find the arguments for the singularity convincing I’m now much more skeptical. In particular it seems the argument for the singularity rests on a misconception of intelligence. I mean it seems obvious to us that if someone was significantly smarter than us they would be significantly better at designing intelligent computers or human augmentation. But that’s because we both assume that intelligence is some kind of fully general ability to solve problems and conflate intelligence with technical skill and achievement. After all we rarely see people’s raw IQ scores so we tend to simply call people intelligent if they are especially capable in technical fields or other academic endeavors. However, while intelligence is certainly helpful much of what makes for a good scientist or engineer is their store of accumulated experience, both personal and distilled into formal education.

While it does seem that people’s ability at a wide range of reasoning tasks is substantially correlated this doesn’t mean talking about intelligence makes sense for anyone but biologically natural humans. It seems quite plausible that there is no such thing as general reasoning ability. Rather there are only heuristics applicable to certain types of problems, e.g., ability to do mental rotations, solve crosswords, recognize objects etc.. Yet if so there is no reason to believe that there is any good heuristic for designing good heuristics, in fact it seems downright unlikely. Thus just because we were able to find a collection of heuristics that give rise to something better at math and play chess than us doesn’t mean we should expect it to have a substantially easier time discovering better heuristics for the next generation. Sure, we will probably be able to create beings who can remember more numbers, do CAD drawings in their heads and so forth but the singularity requires an exponential (or at least super-linear) increase in capability over time so mere elimination of minor inefficiencies we have at AI design isn’t sufficient.

Even in mathematics people primarily reason inductively. We don’t blindly search for a formal proof, rather, we try the same techniques we’ve seen work in ’similar’ problems in the past and attempt minor modifications. In other words what makes someone a good mathematician is largely their mental collection of heuristics they use to approach problems. While continued miniaturization of computer chips might enable AI to reduce the time it takes to do mathematics pure increases in computational speed a may already be near the physically practical limit (though going 3D and using light should eventually give a few more orders of magnitude) and certainly this effect wouldn’t be sufficient to create the singularity. Thus it seems the singularity requires a sequence of exponentially increase sequence of better and better heuristics to guess the true theory based on limited data. In other words a more effective form of scientific induction.

In other words people currently use some heuristic to guess at a rule underlying a set of observations. We make some finite number of observations about disease occurring near wells near sick families and hypothesize that disease can be spread through the water. We observe some examples of current generated by metal exposed to various frequencies of light and hypothesize that light must come in quantized units. The singularity seems to require that not only is there a heuristic that lets us make equally effective guesses at the true theory based on less information but that there is an exponentially increasing sequence of such heuristics. Moreover, it would be necessary that each heuristic can discover the next in roughly the same amount of time despite the substantially greater performance each subsequent heuristic requires. Frankly, I find this somewhat implausible.

Privacy For The 21st Century

So today on slashdot I ran across a link to law professor Daniel Solove’s article grappling with the “nothing to hide” argument against privacy protections. He certainly has some thought provoking things to say and his new book will likely be interesting but I think he makes some fundamental errors in his approach to the subject. Nevertheless, reading it did inspire me to better formulate some of my thoughts on the subject.

The problem with Solove’s arguments is that he tries to simultaneously argue for the value of privacy while seemingly rejecting the notion that there is any principled commonality to the values that we place under the rubric of privacy. While both of these notions are plausible on their own they are in significant tension with each other. If indeed privacy is a word like ‘game’, famously analyzed by Wittgenstein to be a hodgepodge of different concepts related only by a chain of analogies, then it’s at best pointless and confusing to defend it as a package and at worst a way to smuggle in values you can’t defend using the cover of an unprincipled linguistic grouping. Unless the values we term privacy have some important principled commonality then they should stand or fall on their own merits rather than riding the coat tails of the vague positive connotations we have with the word privacy.

To see that privacy isn’t really a monolithic notion compare the idea that other people shouldn’t be able to easily find out your social security number really doesn’t have much to do with the idea that the government shouldn’t be able to monitor your phone calls and reading habits. These two notions don’t really have very much in common. One of them is concerned with other people’s knowledge of your intimate affairs and private conversations while the other involves only a purely arbitrary identifying number. The reason we don’t want people to find out our social security number isn’t because it’s an intimate detail of our life but because it’s unfortunately used as an authentication method for certain financial transactions and we fear becoming the victims of credit fraud. Certainly it’s important that people not be able to buy a car in my name but arguments that defend my right to be free of government surveillance aren’t going to have much to say about who finds out my social security number and vice versa.

However, I do think there is a certain core concept that is shared by many, though far from all, things we conceptualize as a right to privacy. That is the notion that we should enjoy a certain autonomy or freedom of choice, both from the government and society, in how we conduct certain parts of our lives. Certainly this is no definition of even one kind of privacy but I think it’s the uncritical acceptance that it’s literally privacy that’s important that sidetracks so many people into silly issues like what facebook publishes by default on their friend feed1. The reason I tend to be largely critical of privacy crusaders is that they tend to take the idea too literally and fight a lost cause trying to limit what other people are able to learn about you (endangering free speech….and privacy2 along the way) rather than looking for the underlying value privacy provides for the culture and seeing how best to achieve that end in the information age

Ultimately what privacy provides is the freedom from judgment (be it legal, religious or social) about certain aspects of our lives. It does this both by making it practically difficult to enforce certain kinds of invasive laws (thus discouraging their enactment) as well as keeping your porn collection or wild spring break party a secret from your parents/priest/boss. Both of these mechanisms are endangered by the information age. The traditional protections of 4th amendment law border on uselessness in the face of fancy data mining programs to suggest likely offenders, the amount of information out there on the internet (your friends and neighbors gossip…and may take infrared pictures of your house even if the police can’t), and the huge amount of information we store on computers (police can subpoena your ISP’s buisness records or get access to your entire computer if they have probable cause to see even one document). Similarly search programs and the inevitable advent of facial recognition along with people’s tendency to post pictures to the internet will erase the anonymity you might have once had on spring break.

However, I think we can find replacements for these tools that provide the same benefits in the information age. Just as some other cultures have done we need to develop traditions of ignoring (or at least not scolding) based on certain aspects of people’s lives. This is the reason that unequal loss of privacy/anonymity is so much more dangerous than an equitable loss. Everyone has things that might embarrass them or present a less than professional image and if we all know that these can easily be found we are much more likely to let other people have their personal space as well. The legal aspect will be more difficult but it is also achievable. We will need to shift the focus of our protections away from the guarding of information and towards rules against intrusiveness. Perhaps in addition to rules requiring search warrants we could have rules barring unprompted investigation, i.e., rules that prevent tearing someone’s life up for a crime without a particularized identification of a victim who does/would have wanted an investigation. That’s just a shot in the dark but I suspect something better will be found.


  1. Certainly it can be annoying to find out your Christmas surprise was ruined because facebook changed the defaults and the wrong defaults can make facebook an unpleasant place to visit but sub-optimal site design is a concern for facebook shareholders hardly an issue of grave concern. If people are bothered enough it’s not like you can’t just quit using facebook. 

  2. Ironically if you want to stop people from doing the kind of information retrieval and processing that scares the privacy advocates you would have to violate people’s privacy to do it. After all if my internet usage is unmonitored and what I do with my computer is my own business you can’t prevent me from gathering data, analyzing it and even discretely sharing it with my friends. 

A Sane Version Of Trusted Computing

Should you control your own computer?

That’s the question that opponents of trusted computing want us to ask. But that’s just as misleading as the suggestions that trusted computing will eliminate piracy thereby bringing about a digital paradise. A better more accurate question to ask is:

Should you be able to offer proof that this result is the output of running that program?

Stated this way the issue of trusted computing becomes much clearer. Obviously, other things being equal, it would be desierable to be able to prove the information you are submitting really did result from the execution of a particular program. For instance this would allow you to purchase processor cycles without the fear of false results or to trust calculations performed by other clients in a distributed virtual world. Moreover, like other technologies it would surely offer benefits that we can’t yet imagine. Below the break I explain why DRM opponents and open source advocates should get behind this useful technology rather than leaving it to falsely identified with DRM and standardized in the worst possible way.

(more…)

Keeping Track Of Posts: Using Smart Folders With Places

Update:

This has been broken in the most recent nightly builds. I think I’ll wait till things settle down before I try and figure out how to do it again.

So keeping track of the various comments I leave on blogs around the web has always been a challenge. For awhile I tried using the service coComment but It’s hard to believe how astonishingly bad supposedly professional web developers can be at creating simple javascript tools. I could either install their bloated extension that would run their code on every webpage I visited (which I think made a call back to coComment on any page that looked like it might be a blog) or run a crappy bookmarklet that didn’t work very well. I’m sure the service works for some people but it’s yet another example of a bad system created because the authors thought everyone would want to use it in exactly one way (or should use it that way). I’m much happier with co.mments but sometimes I just want to remember a page/post not place it in the list of discussions I’m following, keep track of several types of pages (discussions and neat products) or I just want these links to be conveniently accessible from my toolbar.

With Firefox 3’s places support I’ve finally found a great solution. Say I want a folder that lists the 15 most recent pages I bookmarked with the tag ‘comments’. First I need to have bookmarked at least one page with that tag at which point I find the id of the folder “comments” under the tag folder. I did this by using the sqlite manager plugin for firefox and opening the places.sqlite database in my profile folder. (I think it may also be possible to do this by opening bookmarks.postplaces.html). You’ll know that you’ve found the right entry if it has the same name as the tag you’re interested in and has a parent of 4 (it’s in the tags folder). Once you’ve found the id of the tag folder you are interested in create a new bookmark and enter the following for it’s location:

place:folder=ID&queryType=1&group=3&sort=4&applyOptionsToContainers=1&maxResults=15

Where you should replace ‘ID’ by the id of the folder for the tag in question. This will then create a smart folder that will display the 15 bookmarks that were most recently visited with the tag in question. You can investigate other options and work out other nifty queries to try using this post from the mozillazine forums. In partcular if you chage the sort from 4 to 12 it will instead list the 15 most recently bookmarked pages with the given tag.

Note that I found that I had to restart minefield in order for it to recognize any changes in the querystring. The list itself with update as you add new bookmarks with the appropriate tag but if you decide that it should display only the 10 most recent rather than 15 you may have to restart your browser for it to recognize the change.

Finally Someone Gets Privacy Right

I was originally inspired to think about the whole privacy issue when I heard that David Brin argued that it was the uneven lose of privacy that was the threat not the loss of privacy itself. I didn’t bother to actually read what he had said until today but unsurprisingly he has some pretty interesting views on the subject.

What was surprising, however, was to see someone else who had a reasonable take on the whole ‘privacy’ issue, especially linked from slashdot.

While the author seems reluctant to make the leep the article flirts with the two critical points in the ‘privacy’ debate. First of all that true obscurity/freedom from recording is a lost cause and secondly that the real danger is from unequal erosion of our obscurity. So long as we only see footage of ‘crooks’ or the surveilance cameras are only placed in minority/poor neighborhoods it’s easy to use the substantial difference between what society officially designates as acceptable and how people actually behave against the most powerless parts of our society.

(more…)

A Tiny Taste Of The End Of Obscurity

I just ran across this interesting article on slashdot describing a project to create 3D models of famous landmarks (Tower of Liberty, Notre Dame Cathedral) by algorithmically combining photos posted on flikr. Apart from the technical coolness of the project what struck me about the article was their long term goal of creating full 3D reconstructions of cities by combining the information from billions of online photographs. This project is a perfect illustration of how absurd opposition to projects like google’s street view truly is.

While it may have been intellectually obvious before this sort of project really drives home the fact that increasing computational power and algorithmic advances in computer vision negate the need for any coordinated database. So long as their are enough pictures out there somewhere the right algorithm can sow them back together and extract whatever information you want out of them. Right now the best we might be able to hope for is a fancy version of google’s street view but the inevitable increase in the amount of online content (webcams, automated picture taking etc..) and the inexorable progress of the computer industry means that eventually we will be able to figure out who you are sleeping with1, where you buy your groceries and even reveal certain health problems.

There is no way around it. Computational advances will eliminate obscurity. The only real question is whether we implement ultimately ineffective laws about ‘privacy’ that will give large organizations with massive computing power an informational advantage until computational power catches up. Anyway I’m repeating myself some I’m going to stop now.


  1. Look for people who frequently appear in the same vicinity at night and in the morning. 

Say No To ‘Do Not Track’ List

I’m frequently frustrated by the silly imprecise concerns people have about ‘privacy’, particularly in relation to technology. Not only do most people who express concern about this issue have any clear theory about why a loss of privacy would be a bad thing they don’t even bother to distinguish the concept of privacy from other related concepts like obscurity. Ironically while many people who claim to be worried about privacy would, if pressed, cite some Orwellian concern about the government or corporations using information about to control what we can say few people seem to be upset at the governments continuing attempts to do just that. While I know that in a country where a sizeable fraction of the populace is convinced we need laws to protect us from certain combinations of sounds rational consideration is unlikely to ever make a difference it’s still a fun game to play so I’ll take a look at the recently proposed ‘Do Not Track’ list.

Now I agree with Mr. Harper over at The Technology Liberation Front (TLF) when he observes that a ‘Do Not Track’ list isn’t really analogous to the ‘Do Not Call’ list. Targeted advertising is a practice which increases the efficiency of the voluntary exchange of your time/eyeballs for web content while telemarketing is a practice with a significant externality (your time/annoyance) that isn’t paid for by the advertiser. In short targeted advertising makes us better off while telemarketing makes us worse off.

I’m slightly sympathetic to the concern that some people have about advertising companies like doubleclick/google tracking their visits to third party websites. However, despite the ridiculous claim that because mozilla developers get advertising revenue (from google for the default search box) they would never do anything about this problem their are a fuckton of privacy and anonymity extensions for people to use. If people don’t even care enough to go install a browser extension or to convince the firefox people to include such a feature by default (like the popup blocker) then the intrinsic cost of the legislation outweighs these minor benefits. Worse, ignorant of the benefits they get from the practice many people are likely to sign up as the result of scary sound bites about “being watched.”

While I think some of the more extreme worries presented over at the TLF are unrealistic I do think the concept of a ‘Do Not Track’ list raises free speech concerns. While I think it’s surely within the scope of congressional power to require corporations to have privacy policies and abide by them or to impose liability for data breaches requiring someone to delete or avoid recording freely given information is more troublesome. Surely the government could not pass a law preventing any unauthorized individual from retaining financial data on members of congress, that would bar any journalistic inquiry into fraud charges against congressmen but you might think a rule that applied only to normal buisness records avoided these problems. However, I think the recent revelations about who is editing wikipedia, e.g., congressional staffers editting their patron’s page, are a clear example of how user tracking data can be necessary to speak on matters of public concern.

More generally the first ammendment, if it is to have any force, must be read to protect the creation of notes and collection of data. Hell, the creation of a note is itself an act of speech. Simply because you might have lots and lots of data now as a result of the information revolution can’t change that. If I as a blogger have first ammendment protection for announcing that so-and-so visited my blog or that someone with cookie blah-blah-blah did so it’s hard to see how I couldn’t also have first ammendment protection for conveying that information in bulk to someone like google.

Now I admit that the supreme court’s analysis of this issue might be significantly different. I do tend to be a bit of a free speech absolutist. However, regardless of legality I think it’s damn important for us to retain the right to record what happens to us or to objects we control and analyze or pass on that information. If that means people have to go to a bit of trouble to remain anonymous that’s a small price to pay for information freedom. Ultimately technology will erase the obscurity we’ve enjoyed for the last 100 years or so since we moved into cities the only question is do we get democratic access to the analyzed data and the right to use the same tools to monitor the government they use against use or do we adopt a hierarchical model where the government knows everything and the rest of us are barred from accumulating information in databases.

Dumb Remark Of The Day: Google’s Search Broken

Some idiot from CNET being interview on the radio just claimed that google’s search algorithm is broken because it returns tens or hundreds of thousand results for many queries. According to them someone will eventually come along and offer an algorithm so elegant that it only gives eight results and “I will love every one of them.” I haven’t heard much today so this qualifies for the dumb remark of the day.

Go into a library and ask a librarian for a book on the Spanish civil war and she’ll probably recommend one or two. Go back and tell her those weren’t what you were looking for and she’ll recommend a few more. Unless the librarian’s patience wears out you can keep doing this for a long time and the better the librarian the longer she will be able to keep suggesting more books. It’s not a feature of the librarian that we have to walk all the way back over to the reference desk to ask for more recommendations. It would be strictly better if we could hit the next page button and get the next set of results.

Of course a real librarian would incorporate feedback from our reactions to the previous recommendations (yah this is closer but not quite what I was looking for) and this is an obvious direction for search engines to explore but it’s not clear it would be good for the majority of searches (the overhead of feedback might overwhelm the cost of scanning output or changing your search in most cases). However, it’s just idiotic to think that merely offering the user the change to look at more pages means google’s algorithm is broken.