QmVya2VsZXlEQiAmIG90aGVyIGRpc3RyaWJ1dGVkIGhpZ2ggcGVyZm9ybWFuY2Uga2V5L3ZhbHVl IGRhdGFiYXNlcw==

I currently use BerkeleyDB as an embedded database
http://www.oracle.com/database/berkeley-db/
a decision which was initially brought on by learning that Google used BerkeleyDB for their universal sign-on feature.

Lustre looks impressive, but their white paper shows speeds of 800 files created per second, as a good number.  However, BerkeleyDB on my mac mini does 200,000 row creations per second, and can be used as a distributed file system.

I'm having I/O scalability issues with BerkeleyDB on one machine, and about to implement their distributed replication feature (and go multi-machine), which in effect makes it work like a distributed file system, but with local access speeds.  That's why I was looking at Lustre.

The key feature difference between BerkeleyDB and Lustre is that BerkeleyDB has a complete copy of all the data on each computer, making it not a viable solution for massive sized database applications.  However, if you have < 1TB (ie, one disk) of total possible data, it seems to me that a replicated local key/value database is the fastest solution.

I haven't found much discussion of people using this kind of technology for highly scalabable web sites.  

Over the years, I've had extremely good performance results with dbm files, and have found that nothing beats local data, access through C APIs, and btree or hash table implementations.  I have never tried replicated/redundant versions of this approach, and I'm curious if others have, and what your experience has been.