Blogs

Todd Hoff's picture

Welcome to High Scalability

We started High Scalability to help you build successful scalable websites. This site tries to bring together all the lore, art, science, practice, and experience of building scalable websites into one place so you can learn how to build your system with confidence. Hopefully this site will move you further and faster along the learning curve of success. Please Start Here.

Todd Hoff's picture

It Must be Crap on Relational Dabases Week

It's hard to be a relational database lately. After years of faithful service everywhere you look the world is turning against you:

  • Recently at the NoSQL conference 150 revolutionaries met with their new anti-RDBMS arms suppliers. And you know what happens when revolutionaries are motivated, educated, funded, and well armed.
  • The revolution has gone mainstream when Computerworld writes No to SQL? Anti-database movement gains steam. It's not just whispers anymore, it's everywhere.
  • And perennial revolutionary Michael Stonebraker runs from blog to blog shouting the The End of a DBMS Era (Might be Upon Us). Relational vendors are selling legacy software, are 50x slower than other alternatives, and that can not stand.
  • The Greek Chorus on Hacker News sings of anger and lies.

    Certainly some say stick with the past. It's your fault, you aren't doing it right, give us another chance and all will be as it ever was. Some smirk saying this is nothing but a return to a more ancient time when IBM was King.

    But it's in the air. It's in the code. A revolution is coming. To what? That is what is not yet clear.

    See also:

  • NoSQL? by Curt Monash.
  • CouchDB says BigTable clones are too complex.
  • Yahoo! Developer Network Blog. Very nice summary of the different talks.

  • Podcast about Facebook's Cassandra Project and the New Wave of Distributed Databases

    In this podcast, we interview Jonathan Ellis about how Facebook's open sourced Cassandra Project took lessons learned from Amazon's Dynamo and Google's BigTable to tackle the difficult problem of building a highly scalable, always available, distributed data store.

    inquiry's picture

    GemFire 6.0: New innovations in data management

    GemStone has unveiled GemFire 6.0 which is the culmination of several years of development and the continuous solving of the hardest data management problems in the world. With this release GemFire touts some of the latest innovative features in data management.

    In this release:

    JeffD's picture

    kngine 'Knowledge Engine' milestone 2

    What is Kngine?

    Kngine is Knowledge Web search engine designed to provide meaningful search results, such as: semantic information about the keywords/concepts, answer the user’s questions, discover the relations between the keywords/concepts, and link the different kind of data together, such as: Movies, Subtitles, Photos, Price at sale store, User reviews, and Influenced story

    bnewport's picture

    Dealing with multi-partition transactions in a distributed KV solution

    I've been getting asked about this a lot lately so I figured I'd just blog about it. Products like WebSphere eXtreme Scale work by taking a dataset, partitioning it using a key and then assigning those partitions to a number of JVMs. Each partition usually has a primary and a replica. These 'shards' are assigned to JVMs. A transactional application typically interacts with the data on a single partition at a time. This means the transaction is executed in a single JVM. A server box will be able to do M of those transactions per second and it scales because N boxes does MN (M multiplied by N) transactions per second. Increase N, you get more transactions per second. Availability is very good because a transaction only depends on 1 of the N servers that are currently online. Any of the other (N-1) servers can go down or fail with no impact on the transaction. So, single partition transactions can scale indefinitely from a throughput point of view, offer very consistent response times and they are very available because they only point a small part of the grid at once.

    Hans-Jürgen Schönig's picture

    Scaling PostgreSQL using CUDA

    Combining GPU power with PostgreSQL

    PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily.

    Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications.

    When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided?

    Wolfram|Alpha Architecture

    Making the world's knowledge computable

    Today's Wolfram|Alpha is the first step in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone. You enter your question or calculation, and Wolfram|Alpha uses its built-in algorithms and growing collection of data to compute the answer.

    Answer Engine vs Search Engine

    When Wolfram|Alpha launches later today, it will be one of the most computationally intensive websites on the internet. The Wolfram|Alpha computational knowledge engine is an "answer engine" that is able to produce answers to various questions such as

    • What is the GDP of France?
    • Weather is Springfield when David Ortiz was born
    • 33 g of gold
    • LDL vs. serum potassium 150 smoker male age 40
    • life expectancy male age 40 finland
    • highschool teacher median wage

    Wolfram|Alpha excels at different areas like mathematics, statistics, physics, engineering, astronomy, chemistry, life sciences, geology, business and finance as demonstrated by Steven Wolfram in his Introduction screencast.

    The Stats

    • Abour 10,000 CPU cores at launch
    • 10+ trillion of pieces of data
    • 50,000+ types of algorithms
    • Able to handle about 175 million queries per day
    • 5+ million lines of symbolic Mathematica code

    The Computers Powering Computable Knowledge

    There is no way to know exactly how much traffic to expect, especially during the initial period immediately following the launch, but the Wolfram|Alpha team is working hard to put reasonable capacity in place.

    As Stephen writes in the Wolfram|Alpha blog Alpha will run in 5 distributed colocation facilities. What computing power have they gathered in these facilities for launch day? Two supercomputers, just about 10,000 processor cores, hundreds of terabytes of disks, a heck of a lot of bandwidth, and what seems like enough air conditioning for the Sahara to host a ski resort.

    One of their launch partners, R Systems, created the world’s 44th largest supercomputer (per the June 2008 TOP500 list - it is listed as 66th per the latest Top500 list). They call it the R Smarr. It will be running Wolfram|Alpha on launch day! R Smarr has a Sum Rmax of 39580 GFlops using Dell DCS CS23-SH, QC HT 2.8 GHz computers, 4608 cores, 65536 GB of RAM and Infiniband interconnect.

    Dell is another of the launch partners with a data center full of quad-board, dual-processor, quad-core Harpertown servers. What does it all add up to? The ability to handle 175 million queries (yielding maybe a billion) per day—over 5 billion queries (encompassing around 30 billion calculations) per month.

    Serving 250M quotes/day at CNBC.com with aiCache

    As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore.

    While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com).

    Syndicate content