Serving JavaScript Fast

Cal Henderson writes at "With our so-called "Web 2.0' applications and their rich content and interaction, we expect our applications to increasingly make use of CSS and JavaScript. To make sure these applications are nice and snappy to use, we need to optimize the size and nature of content required to render the page, making sure we’re delivering the optimum experience. In practice, this means a combination of making our content as small and fast to download as possible, while avoiding unnecessarily refetching unmodified resources." A lot of good comments too.

Click to read more ...


Shared filesystem on EC2

Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files are mostly thumbnails, but also some larger images and media files. Does anyone have a good solution for sharing files between EC2 nodes? I like the ThruDB [] concept of using the local filesystem as a cache for S3, but I'm not sure if ThruDB is mature enough yet. Or maybe some kind of distributed filesystem built on top of git would work? Any ideas? Thanks! ~rvr

Click to read more ...


Database War Stories #3: Flickr

[Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"

Click to read more ...


Database Design 101

I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might be best to write the data in both formats at first and see what query speed is like once the tables fill up... Another potential issue would be the fact that inserts will be coming in batches of about 500 - 2000+ per user at a time which will be pretty intensive to pull off for the normalized table as there will need to be quite a few selects for each insert due to the fact that the artist, album or song may already be in the database or it may not requiring an insert. What do you all think?

Click to read more ...


Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems. Other excellent resources for learning more about consistent hashing are at:

  • Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
  • Consistent Hashing by Tom White. A good explanation and some actual Java code as an implementation.
  • Programmer’s Toolbox Part 3: Consistent Hashing by Tom Kleinpeter. Another good explanation with an emphasis on useful applications: load distribution on failure, load tuning by capacity, method for bringing servers on line, redundant caching to protect the database in case of failure.
  • Distributed Hash Tables: an infrastructure that can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Notable distributed networks that use DHTs include BitTorrent (with extensions), eDonkey network, YaCy, and the Coral Content Distribution Network.
  • Chord - a peer-to-peer lookup algorithm. It allows a distributed set of participants to agree on a single node as a rendezvous point for a given key, without any central coordination.
  • Dynamo, Amazon's database uses consistent hashing.
  • Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution

    Click to read more ...

  • Monday

    Microsoft's New Database Cloud Ready to Rumble with Amazon

    Update: Zdnet says Ozzie signals Microsoft’s surrender to the cloud. CD ROMs are to the internet as the internet is to the cloud and Microsoft aims to scratch and claw its way into this paradigm shift as well. The gloves are off. The tag line for Microsoft's new SQL Server Data Service is Your Data, Any Place, Any Time. Thems fighten' words. Microsoft is itch'n for a fight! Who will be Amazon's second? The service description: SQL Server Data Services (SSDS) are highly scalable, on-demand data storage and query processing utility services. Built on robust SQL Server database and Windows Server technologies, these services provide high availability, security and support standards-based web interfaces for easy programming and quick provisioning. Sounds like a fast uppercut aimed squarely at SimpleDB's jaw. As a developer what do you need to know?

  • Highly available and highly scalable.
  • Targeted at applications that can tolerate high internet latencies. Primarily read-mostly data sets "storing data that is naturally partitionable into disjoint data sets requiring little or no cross-correlations."
  • Pricing isn't yet available.
  • No announced SLA, though you will be notified of downtime.
  • Access is via REST or SOAP.
  • Data is redundantly stored and backed up.
  • Geo-redundant data copies are maintained. Data is stored in large storage clusters in various Microsoft data centers located across North America. Plan to expand to more areas later.
  • The data model is flexible. No schemas required. "Simply add new attributes to your data set when needed, and the system will automatically store, index, and query your data accordingly."
  • String, numeric, datetime, and boolean data type are supported. All attributes are indexed.
  • The data model is: Customer { SSDS account (1..N) { Authority (1..N) { Container (0..N) { Entity (0..N) - Authorities give billing entities a way to organize their usage for accounting, security and co-location purposes. All containers under a single authority are provisioned within the same data center. As such authorities are the unit of geo-scale and geo-location. For example: Seattle or San Francisco. - Containers create contexts and scope for entity storage and query. For example, within its authorities, operations could choose to assign each member their own container, intended to contain a set of personal data for that member. Containers are the unit of consistency in the Microsoft SSDS service. For example: Autos for Sale, Services Offered. - Entities are the fundamental unit of storage in the system. Entities are a bag of scalars with no enforced type. For example, an individual member‟s jobs, educational institutions, contacts, recommendations, etc. could all be modeled as entities.
  • Data manipulation operations include: - Creation and deletion of containers. There are no updatable container properties. - Creation, replacement, and deletion of entities. - Retrieval of a single container in a serialized format. - Retrieval of a single entity in a serialized format.
  • A text based query language is supported that follow the LINQ pattern for C#. Only complete entities are returned. For example: To query addressed to a container to retrieve all entities in that container having a “City” property equal to “Seattle” and a “State” property equal to “WA” would be written as follows: from e in entities where e[“City”] == “Seattle” && e[“State”] == “WA” select e
  • Resource queries are supported. This URI returns all the entities in that container:
  • Customers will be able to associate entities with blobs which could be accessed as URL addressable resources.
  • SSDS is not just a hosted version of SQL Server. It's SQL Server deconstructed, running on blade servers with SATA drives.
  • A local on-premise version of SSDS will also be available to "allow users to better synchronize between the enterprise and the cloud, especially when handling reporting, analytics and business-intelligence tasks."
  • Not sure if you can do a full text search on a string property.
  • Not sure how large attributes can be. Should I store my 25MB documents as a string field so it can be searched? Or is that too expensive? Or is it even supported? Where Microsoft is behind Amazon is that they don't have EC2 and S3. Microsoft is having you traverse the internet for each database access. Now let''s say you have chunks of large storage stored off in a storage cloud. That's a lot of overhead. A major benefit of AWS is the free bandwidth and higher performance within the AWS cloud. If your application is completely hosted within the cloud the only slow part is from the server to the user's browser. All-in-all SSDS seems very comparable to and competitive with SimpleDB. Hard to say without solid pricing information. But it will be interesting to see how Microsoft's cloud strategy evolves and how we can all benefit from the competition.

    Related Articles

  • Home Page for SSDS
  • The Current Pros and Cons List for SimpleDB
  • Microsoft SQL Server Data Services
  • SQL Server Data Services FAQ
  • Microsoft launches its alternative to Amazon’s SimpleDB

    Click to read more ...

  • Sunday

    Do you have any questions for the Elastra CEO?

    It looks like in the near future I'll have a chance to interview the Elastra CEO. Elastra provides standard databases--MySQL, EnterpriseDB and PostgreSQL-- on top of EC2 and S3. They are selling aggressive pricing, expandable and contactable database resource usage in response to demand, and a simple management and operations interface to well known databases deployed in a cloud. Elastra could be an important option for developers looking for a more traditional cloudy database. I was wondering if you guys had any suggestions for questions you would like answered? What would you like to know about their service? What are you looking for in a cloudy database? What would stop you from adopting it or what would make you decide to adopt it? Any ideas you have would help a lot and will probably be better than anything I have.

    Click to read more ...


    Product: GlusterFS

    Adapted from their website: GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA). Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain though they are extremely scalable and cheap. Can be entirely built out of commodity OS and hardware. GlusterFS hopes to solves this problem. GlusterFS achieved 35 GBps read throughput. The GlusterFS Aggregated I/O Benchmark was performed on 64 bricks clustered storage system over 10 Gbps Infiniband interconnect. A cluster of 220 clients pounded the storage system with multiple dd (disk-dump) instances, each reading / writing a 1 GB file with 1MB block size. GlusterFS was configured with unify translator and round-robin scheduler. The advantages of GlusterFS are: * Designed for O(1) scalability and feature rich. * Aggregates on top of existing filesystems. User can recover the files and folders even without GlusterFS. * GlusterFS has no single point of failure. Completely distributed. No centralized meta-data server like Lustre. * Extensible scheduling interface with modules loaded based on user's storage I/O access pattern. * Modular and extensible through powerful translator mechanism. * Supports Infiniband RDMA and TCP/IP. * Entirely implemented in user-space. Easy to port, debug and maintain. * Scales on demand.

    Related Articles

  • Technical Presentation on GlusterFS
  • Open Fest 5th Annual Conference
  • Zresearch
  • GlusterFS FAQ

    Click to read more ...

  • Saturday

    New Website Design Considerations

    I am in the design phase of getting a website up and running that will have scalability as a main concern. I am looking for opinions on architecture and the like for this endeavor. The site has a few unique characteristics that make scalability difficult. Users will all have a pretty large amount of data that other users will be able to search. The site will be entirely based around search. The catch is that other users will be searching always with a stipulation of 'n' miles from me. I imagine that fact will kill the possibility of query caching for most searches. I have extensive experience with PHP and MYSQL, some experience with ASP.NET/C#, some experience with perl but can learn anything fast. The site will start out on a single server but I want to be 100% certain that I architect the code and databases such that scaling will be simple. What language should I code the site in? What DB would you use: Postgres, MYSQL, MSSQL, BerkelyDB? Should we shard the database by location? by user? not at all? What does everyone think for possible architectures on this?

    Click to read more ...


    Problem: Mobbing the Least Used Resource Error

    A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-)

    The Problem

    There's an unexpected and frequently fatal type of error that can happen when new resources are added to a horizontally scaled architecture. Because the new resource has the least of something, load or connections or whatever, a load balancer configured with a least metric will instantaneously direct all new traffic to that new resource. And bam! Your system dies. All the traffic that was meant to be spread across your entire cluster is now directed like a laser beam to one small part of it. I love this problem because it's such a Heisenberg. Everyone is screaming for more storage space so you bring up a new filer. All new data streams flow to the new filer and it crumbles and crawls because it can't handle the load for the entire system. It's in the very act of turning up more storage you bring your system down. How "cruel world the universe hates me" is that? Let's say you add database slaves to handle load. Your load balancer redirects traffic to the new slaves, but the slaves are trying to sync, yet they can't sink because they are getting hammered by the new traffic. Down goes Frazier. This is the dark side of partitioning. You partition data to get high performance via parallelization. For example, you hash on the user name to a cluster dedicated to handle those users. Unless your system is very flexible you can't scale anymore by adding resources because you can't repartition the data. All users are handled by their cluster. If you want a different organization you would have to redistribute data across all the clusters. Most systems can't handle that and you end not being able to scale out as easily as you hoped.

    The Solution

    The solution depends of course on the resource in question. Butting knowing a potential problem is present gives you the heads up you need to avoid destruction.
  • For filers migrate storage from existing filers to the new filers so storage is evened out. Then new storage will be allocated evenly across all the filers.
  • For services have a life cycle state machine indicating when a service is up and ready for work. Simply being alive doesn't mean it's ready.
  • Consistent Hashing to assign resources to a pool of servers in a scalable fashion.
  • For servers use random or round-robin balancing when the load balancer can receive incorrect feedback from pool servers. The Thundering Herd Problem is supposedly the same problem described here, but it doesn't seem the same to me.

    Click to read more ...