advertise
« Cloud AWS Infrastructure vs. Physical Infrastructure | Main | Hot Scalability Links for July 2, 2010 »
Wednesday
Jul072010

Strategy: Recompute Instead of Remember Big Data

Professor Lance Fortnow, in his blog post Drowning in Data, says complexity has taught him this lesson: When storage is expensive, it is cheaper to recompute what you've already computed. And that's the world we now live in: Storage is pretty cheap but data acquisition and computation are even cheaper.

Jouni, one of the commenters, thinks the opposite is true: storage is cheap, but computation is expensive. When you are dealing with massive data, the size of the data set is very often determined by the amount of computing power available for a certain price. With such data, a linear-time algorithm takes O(1) seconds to finish, while a quadratic-time algorithm requires O(n) seconds. But as computing power increases exponentially over time, the quadratic algorithm gets exponentially slower.

For me it's not a matter of which is true, both positions can be true, but what's interesting is to think that storage and computation are in some cases fungible. Your architecture can decide which tradeoffs to make based on the cost of resources and the nature of your data. I'm not sure, but this seems like a new degree of freedom in the design space.

Reader Comments (4)

Never ever meet the case when recalculation is more expensive then recompute. The most problems came from fact that store may cause 'out of date' and inconsistency e.g. flush cache. So recalculation is just cheaper to maintain from people side not from machine side but storage is vice versa.

July 7, 2010 | Unregistered CommenterBogdan Gusiev

Storage is cheap but managed storage is extremely expensive. Also, the limiting factor is often I/O bandwidth, which has not been keeping pace with the increases in areal density. It takes much longer to back up a 2TB drive than it used to with previous generations.

July 7, 2010 | Unregistered CommenterFazal Majid

The question here is phrased in terms of "data-center size" or "cloud-size" efficient application architecture. We have some big data acquisition, processing and distribution job, and we want to resolve our logistical difficulties with minimal expense and as quickly as possible. I think that we will start to see a lot of new SQL/"NoSql" hybrid architectures drawing inspiration from classical algorithmic research on space vs. time trade-offs. Some of the most specialized Software as A Service business models are simply large scale implementations of particular algorithms (encryption, encoding, compression, pattern matching, maybe various forms of optimization in the future). Luckily there's tons of research out there waiting to be developed commercially.

http://scholar.google.com/scholar?q=space+vs.+time+tradeoffs&hl=en&btnG=Search&as_sdt=40000001&as_sdtp=on

July 7, 2010 | Unregistered CommenterAlexander Fairley

O(1) is constant. O(n) is linear. Jouni is talking out of his behind.

July 8, 2010 | Unregistered Commenterthiago

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>