Tuesday, January 31, 2006

Hacks versus Technique

Some of the key insights in initial web-based tools were extremely clever ways to solve the problem by not directly solving it. For example, image search can be very well done by looking at image tags, filenames and text around the html that pointed to the image. Searching audio and video content via closed caption text that came with it. And how do you tell two songs are similar? The mother of all methods: collaborative filtering. Works great with anything, and in most cases you are interested in finding something relevant rather than similar. With all due respect, let us call these methods Hacks. They are great hacks, mind you.

One could do image processing, or audio signal processing, or figure out representations that help compute relevance. Lets call these approaches to directly solving the problem Technique. But Technique was easily beat by Hacks, as 1) they dont work so well in general, 2) they require too much computational resources and/or rich representation of the problem domain.

But it seems like Technique is making a comeback. Some examples: Nexidia.com is a profitable company that does speech recognition to make audio databases searchable. Its major revenue source is providing analysis of call centers. There is a lot of processing happening, but running on clusters made out of standard boxes. Pandora.com searches through music by a direct analysis of music. A common insight in both of these tools is offline processing to create representations of the domain that are easily searchable.

At this point, Hacks are still winning, but its nice to see Technique back in the running. Progress is a healthy competition between Hacks and Technique.