Paper: The End of an Architectural Era (It’s Time for a Complete Rewrite)

Todd Hoff's picture

Update 2: H-Store: A Next Generation OLTP DBMS is the project implementing the ideas in this paper: The goal of the H-Store project is to investigate how these architectural and application shifts affect the performance of OLTP databases, and to study what performance benefits would be possible with a complete redesign of OLTP systems in light of these trends. Our early results show that a simple prototype built from scratch using modern assumptions can outperform current commercial DBMS offerings by around a factor of 80 on OLTP workloads.
Update: interesting related thread on Lamda the Ultimate.

A really fascinating paper bolstering many of the anti-RDBMS threads the have popped up on the intertube lately. The spirit of the paper is found in the following excerpt:

In summary, the current RDBMSs were architected for the business data processing market in a time of different user interfaces and different hardware characteristics. Hence, they all include the following System R architectural features:
* Disk oriented storage and indexing structures
* Multithreading to hide latency
* Locking-based concurrency control mechanisms
* Log-based recovery


Of course, there have been some extensions over the years, including support for compression, shared-disk architectures, bitmap indexes, support for user-defined data types and operators, etc. However, no system has had a complete redesign since its inception. This paper argues that the time has come for a complete rewrite.

Of particular interest the discussion of H-store, which seems like a nice database for the data center.
H-Store runs on a grid of computers. All objects are partitioned
over the nodes of the grid. Like C-Store [SAB+05], the user can
specify the level of K-safety that he wishes to have.
At each site in the grid, rows of tables are placed contiguously in
main memory, with conventional B-tree indexing. B-tree block
size is tuned to the width of an L2 cache line on the machine
being used. Although conventional B-trees can be beaten by
cache conscious variations [RR99, RR00], we feel that this is an
optimization to be performed only if indexing code ends up being
a significant performance bottleneck.
Every H-Store site is single threaded, and performs incoming SQL
commands to completion, without interruption. Each site is
decomposed into a number of logical sites, one for each available
core. Each logical site is considered an independent physical site,
with its own indexes and tuple storage. Main memory on the
physical site is partitioned among the logical sites. In this way,
every logical site has a dedicated CPU and is single threaded.

The paper goes through how databases should be written with modern CPU, memory, and network resources. It's a fun an interesting read. Well worth your time.

Comments

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

Great paper indeed. Thanks for the link!

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

Fascinating paper, and not particularly daunting for the inexperienced user. I'm not a db admin or db expert by any means, but I was able to follow most of the points and understand the general argument being made.

Callum

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

This is the paper behind a new product called Vertica The author of the paper, Michael Stonebreaker, is a database heavyweight responsible for Ingres and lots of other database technology. Vertica is optimized for data warehousing. Another Stonebreaker project, Streambase, is for things like stock prices that are constantly flooding systems that need super-low latency.

Oh no, not Michael Stonebraker again!

For those who don't know him. Michael Stonebraker already proclaimed the end of relational databases in the 90ies - supplanted, of course, by products from one of his companies. Go figure!

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

Stonebreaker wrote this:

http://www.databasecolumn.com/2007/09/one-size-fits-all.html

Not sure if Vertica is the final answer, but I agree generally with his comments.

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

I reviewed this article and wrote few blogs entries this matter: Putting the Data-Base where it belongs
and Persistence As A Service The later describes how we can use In-Memory Data-Grid's to store the real time information of our application and have the data base as the back-end storage.

I also wrote another blog The True Meaning of linear scalability where i covered few of the principles for scaling in transactional system where i reviewed the core principles for scaling from an architectural point of view.

You can also find online presentation entitled Scalable As Google Simple As Spring where i describe this a new architecture and how it can be used to addressing the scalability challenge.

HTH
Nati S.

Re: Paper: The End of an Architectural Era (It’s Time for a Comp

Hmm this group of people published another paper at SIGMOD2008 titled
"OLTP Through the Looking Glass, and What We Found There".

I'm pretty critical about their agenda, though the arguments certainly fly both ways.

However, I found that in their SIGMOD08 paper they pretty much pull away all the goodness that a RDBMS gives you; and then declare the result as a decisive improvement in performance. Well hands up who didnt know if you remove locking, buffer management, logging etc from a dbms you would get huge performance gains.

Uhhhhh...

These papers are interesting, but they don't point the way to greater "scalability."

Their obsession is with getting the maximum throughput possible with a single CPU. Other processors could be used as replication servers to increase availability, but not to increase the throughput at which transactions can be run.

What they're talking about is a limited sort of product: kind of like the next generation of sqllite, Microsoft Access, or SQL Server Compact Edition. Something that gets awesome performance on a single-processor machine, but isn't going to scale to the heights possible with a conventional RDBMS if you were going to throw either a big SMP or a shared-nothing cluster at the problem.

Some of their ideas might point to something more scalable, but they see "single threaded execution" as a fundamental optimization.

There is something appealing about the "all transactions run in stored procedures" model that they use, but I've found that the ability to do ad-hoc and long running queries is important in both business systems and in web publishing systems. They suggest that their kind of system could be integrated into a "data warehouse", but it's not a problem they've solved. Also, there's a big difference between "data warehouse" system (that usually involves a lot of cleaning and integration of various sources) and the ability to run OLAP queries to see the state of a system in real time.

Re: Paper: The End of an Architectural Era

Well it seems that you yourself the writer have rewrote this article.as the date specifies it.!
-----
sea plants
sea grapes...plant roots

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><div ?=?><p ?=?> <img ?=?><h1 ?=?><h2 ?=?><h3 ?=?>
  • Lines and paragraphs break automatically.
  • Glossary terms will be automatically marked with links to their descriptions
  • You may link to webpages through the weblinks registry

More information about formatting options

To combat spam, please enter the code in the image.