The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards. There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.
This may be a bit higher level then the general discussion here, but I think this is an important issue in how it relates to reliability and uptime. What kind of SLAs should we be expecting from SaaS services and platforms (e.g. AWS, Google App Engine, Google Premium Apps, salesforce.com, etc.)? Up to today, most SaaS services either have no SLAs or offer very weak penalties. What will it take to get these services up to the point where they can offer the SLAs that users (and more importantly, businesses) require? I presume most of the members here want to see more movement into the cloud and to SaaS services, and I'm thinking that until we see more substantial SLA guarantees, most businesses will continue to shy away as long as they can. Would love to hear what others think. Or am I totally off base?
Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.
Update:Presentation: Second Life’s Architecture. Ian Wilkes, VP of Systems Engineering, describes the architecture used by the popular game named Second Life. Ian presents how the architecture was at its debut and how it evolved over years as users and features have been added. Second Life is a 3-D virtual world created by its Residents. Virtual Worlds are expected to be more and more popular on the internet so their architecture might be of interest. Especially important is the appearance of open virtual worlds or metaverses. What happens when video games meet Web 2.0? What happens is the metaverse.
- Second Life runs MySQL
- Interview with Ian Wilkes
- TechTrends: Inside Linden Lab
- Town Hall with Cory Linden
- InformationWeek articles (1, 2) and blog
- Second Life Wiki: Server Architecture
- Wikipedia: Second Life Server
- Second Life Blog
- Second Life: A Guide to Your Virtual World
- ~1M active users
- ~95M user hours per quarter
- ~70K peak concurrent users (40% annual growth)
- ~12Gbit/sec aggregate bandwidth (in 2007)
Staff (in 2006)
- 70 FTE + 20 part time
- Open Source client
- Render the Virtual World
- Handles user interaction
- Handles locations of objects
- Gets velocities and does simple physics to keep track of what is moving where
- No collision detection
- Runs Havok 4 physics engine
- Runs at 45 frames/sec. If it can't keep up, it will attempt time dialation without reducing frame rate.
- Handles storing object state, land parcel state, and terrain height-map state
- Keeps track of where everything is and does collision detection
- Sends locations of stuff to viewer
- Transmits image data in a prioritized queue
- Sends updates to viewers only when needed (only when collision occurs or other changes in direction, velocity etc.)
- Runs Linden Scripting Language (LSL) scripts
- Scripting has been recently upgraded to the much faster Mono scripting engine
- Handles chat and instant messages
- One big clustered filesystem ~100TB
- Stores asset data such as textures.
- Eventlet is a networking library written in Python. It achieves high scalability by using non-blocking io while at the same time retaining high programmer usability by using coroutines to make the non-blocking io operations appear blocking at the source code level.
- Mulib is a REST web service framework built on top of eventlet
- 2000+ Servers in 2007
- ~6000 Servers in early 2008
- Plans to upgrade to ~10000 (?)
- 4 sims per machine, for both class 4 and class 5
- Used all-AMD for years, but are moving from the Opteron 270 to the Intel Xeon 5148
- The upgrade to "class 5" servers doubled the RAM per machine from 2GB to 4GB and moved to a faster SATA disk
- Class 1 - 4 are on 100Mb with 1Gb uplinks to the core. Class 5 is on pure 1Gb
Today, most banks have migrated their internal software development from C/C++ to the Java language because of well-known advantages in development productivity (Java Platform), robustness & reliability (Garbage Collector) and platform independence (Java Bytecode). They may even have gotten better throughput performance through the use of standard architectures and application servers (Java Enterprise Edition). Among the few banking applications that have not been able to benefit yet from the Java revolution, you find the latency-critical applications connected to the trading floor. Why? Because of the unpredictable pauses introduced by the garbage collector which result in significant jitter (variance of execution time). In this post Frederic Pariente Engineering Manager at Sun Microsystems posted a summary of a case study on how the use of Sun Real Time JVM and GigaSpaces was used in the context of of a customer proof-of-concept this summer to ensure guaranteed latency per message under 10 msec, with no code modification to the matching engine.
hi, What are the practices followed, tools used to measure session memory requirement per user? Thanks, Unmesh
Every day brings news of either more failures of the financial systems or out-right fraud, with the $50 billion Bernard Madoff Ponzi scheme being the latest, breaking all records. This post provide a technical overview of a solution that was implemented for one of the largest banks in China. The solution illustrate how one can use Excel as a front end client and at the same time leverage cloud computing model and mapreduce as well as other patterns to scale-out risk calculations. I'm hoping that this type of approach will reduce the chances for seeing this type of fraud from happening in the future.
Ringo is an experimental, distributed, replicating key-value store based on consistent hashing and immutable data. Unlike many general-purpose databases, Ringo is designed for a specific use case: For archiving small (less than 4KB) or medium-size data items (<100MB) in real-time so that the data can survive K - 1 disk breaks, where K is the desired number of replicas, without any downtime, in a manner that scales to terabytes of data. In addition to storing, Ringo should be able to retrieve individual or small sets of data items with low latencies (<10ms) and provide a convenient on-disk format for bulk data access. Ringo is compatible with the map-reduce framework Disco and it was started at Nokia Research Center Palo Alto.
This article is a primer, intended to shine some much needed light on the logical, process oriented implementations of database scalability strategies in the form of a broad introduction. More specifically, the intent is to elaborate on the majority of these implementations by example.
The SHOP.COM Cache System is now available at http://code.google.com/p/sccache/ The SHOP.COM Cache System is an object cache system that... * is an in-process cache and external, shared Cache * is horizontally scalable * stores cached objects to disk * supports associative keys * is non-transactional * can have any size key and any size data * does auto-GC based on TTL * is container and platform neutral It was built in-house at SHOP.COM (by me) and has powered our website for years. We are open-sourcing it in the hope that it will be useful to others and to get some help in its maintenance. This is our first open source attempt and we'd appreciate any help and comments.