For $5 Million You Can Buy Enough Storage to Compete with Google

Kevin Burton calculates that Blekko, one of the barbarian hoard storming Google's search fortress, would need to spend $5 million just to buy enough weapons, er storage.

Kevin estimates storing a deep crawl of the internet would take about 5 petabytes. At a projected $1 million per petabyte that's a paltry $5 million. Less than expected. Imagine in days of old an ambitious noble itching to raise an army to conquer a land and become its new prince. For a fine land, and the search market is one of the richest, that would be a smart investment for a VC to make.

In these situations I always ask: What would Machiavelli do?

Machiavelli taught some lands are hard to conquer and easy to keep and some are easy to conquer and hard to keep. A land like France was easy to conquer because it was filled with nobles. You can turn nobles on each other because they always hate each other for some reason or another. But it's hard to keep a land of nobles because they all think they are as good as you are and will continually plot your downfall. The Ottoman empire was hard to conquer because it's led by a single ruler. Everyone owes their wealth and prosperity to that ruler so subjects, assuming the prince has not turned the people against him, will fight to death for the existing structure because their future depends on it. To conquer takes an all out war. But once victorious the Ottomon empire would be easy to rule because there are no loyalties to drive resistance. It was always a marriage of convenience.

Google is the Ottomon empire. Allegiance is given to Google because people are getting paid. Defeating Google will take total war, assuming the prince has not turned the people against him, but once defeated ruling will be easy.

How might Google keep strengthening the ties that bind to make it harder for a prospective prince? One way might be to prevent subjects from cavorting with potentially corrupting influences outside the land. What if Google were to give greater rewards to websites that changed their robots.txt to reject all other search engines? That would deny all routes into the principality and strengethen ties considerably. A new prince would find it very difficult to break in.

Machiavelli might like that.

Nice post, glad to hear people are waking up to this :). Actually IMHO it is (far) less than 5 PB - a world class index would be 20B pages times 10KB per page = 200TB. This is for page storage, there would be more for storing the index i.e. posting lists. It would depend on size of individual postings and lengths of posting lists but few PB would cover it.

The bottom line is the storage required is very cheap. BTW, $1M/PB = $1/GB seems too high, nowadays cheap SATA 500GB disks can be had for $100.

There is (much) more to this, beside storage, the crawling resources required are not high at all because crawling basically does not scale. For instance, one can crawl with a good crawler 1M pages/day on 1Mbps bandwidth i.e. 1B pages/day with 1GBps. So with 20Gbps one can crawl entire Internet daily. 20Gbps of crawling bandwidth goes for $100K/mo in the Valley, you can saturate it with , say, thousand cheap crawlers ($1-$1.5K each). I would think that Google spends way more in their cafeteria than that :)

So one can ask why Goggle does not crawl much more frequently - simple, webmasters would not allow them to be crawled every 5 min or so, there is also lots of stuff they simply can not get to == crawling does not scale.

There is more to this, and the emperor is wearing a very skimpy outfit :) , stay tuned ...

November 29, 1990 | Unregistered CommenterBorislav Agapiev

Who'd have known you were such a military strategist Todd? :) Nice post, reminds me I need to read The Prince!

Cheers -" title="Callum" target="_blank">Callum

November 29, 1990 | Unregistered Commenterchmac

> Actually IMHO it is (far) less than 5 PB

I figure Kevin knows about billion times more than me :-) But your numbers are interesting. At one time those would have been daunting requirements, but not so much now.

> webmasters would not allow them to be crawled every 5 min or so

In a saner world you could just post changes at sequence number, ping that a changed happened, and everyone would pick up all the changes since their last sequence number at their leisure. But I guess that method can't be trusted.

> say, thousand cheap crawlers

Hm, isn't that about how many servers the wikia search engine has?

November 29, 1990 | Unregistered CommenterTodd Hoff

> Who'd have known you were such a military strategist

Unfortunately my plans for world domination haven't quite worked out. Maybe tomorrow :-)

> I need to read The Prince!

Don't forget his Discourses on Livy. A lot more to think about in that one.

November 29, 1990 | Unregistered CommenterTodd Hoff

Instead of responding in the comments I just posted an item on the Spinn3r blog:

Some interesting thoughts here.

Of course building out a $5M cluster could be a competitive advantage :)

November 29, 1990 | Unregistered CommenterKevin Burton

If anyone seriously thinks that the biggest investment required for making a Google competitor is the storage, they're seriously deluded.

November 29, 1990 | Unregistered CommenterSam B

nmmm.. I would argue that Google's key differentiator does not lie in their technical superiority anymore, as can be seen by competitors such as Yahoo, Microsoft, who can produce as good if not better search results. It is their brand, better integrated surfaces and their vast distribution channels both in terms of search and ads. Any new comers has to innovate not only at the technology level but also at the business strategy level.

And now, they are trying to do the same thing with Android in terms of the Mobile platform. Only difference this time, it will be a lot of harder as everyone knows the game.

Some very clever people at Google working on their evolution strategy indeed and always keeping one step ahead of everyone else.

November 29, 1990 | Unregistered CommenterFelix

Maybe somebody should open up a donations page? :))))
I would surely donate a couple of $ to see someone slapping Google like in the good old 18th century ;)

November 29, 1990 | Unregistered CommenterDumitru Brinzan

