Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files are mostly thumbnails, but also some larger images and media files. Does anyone have a good solution for sharing files between EC2 nodes? I like the ThruDB [http://trac.thrudb.org/] concept of using the local filesystem as a cache for S3, but I'm not sure if ThruDB is mature enough yet. Or maybe some kind of distributed filesystem built on top of git would work? Any ideas? Thanks! ~rvr
[Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"
I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might be best to write the data in both formats at first and see what query speed is like once the tables fill up... Another potential issue would be the fact that inserts will be coming in batches of about 500 - 2000+ per user at a time which will be pretty intensive to pull off for the normalized table as there will need to be quite a few selects for each insert due to the fact that the artist, album or song may already be in the database or it may not requiring an insert. What do you all think?
Paper: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
Consistent hashing is one of those ideas that really puts the science in computer science and reminds us why all those really smart people spend years slaving over algorithms. Consistent hashing is "a scheme that provides hash table functionality in a way that the addition or removal of one slot does not significantly change the mapping of keys to slots" and was originally a way of distributing requests among a changing population of web servers. My first reaction to the idea was "wow, that's really smart" and I sadly realized I would never come up with something so elegant. I then immediately saw applications for it everywhere. And consistent hashing is used everywhere: distributed hash tables, overlay networks, P2P, IM, caching, and CDNs. Here's the abstract from the original paper and after the abstract are some links to a few very good articles with accessible explanations of consistent hashing and its applications in the real world. Abstract: We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems. Other excellent resources for learning more about consistent hashing are at:
Update: Zdnet says Ozzie signals Microsoft’s surrender to the cloud. CD ROMs are to the internet as the internet is to the cloud and Microsoft aims to scratch and claw its way into this paradigm shift as well. The gloves are off. The tag line for Microsoft's new SQL Server Data Service is Your Data, Any Place, Any Time. Thems fighten' words. Microsoft is itch'n for a fight! Who will be Amazon's second? The service description: SQL Server Data Services (SSDS) are highly scalable, on-demand data storage and query processing utility services. Built on robust SQL Server database and Windows Server technologies, these services provide high availability, security and support standards-based web interfaces for easy programming and quick provisioning. Sounds like a fast uppercut aimed squarely at SimpleDB's jaw. As a developer what do you need to know?
It looks like in the near future I'll have a chance to interview the Elastra CEO. Elastra provides standard databases--MySQL, EnterpriseDB and PostgreSQL-- on top of EC2 and S3. They are selling aggressive pricing, expandable and contactable database resource usage in response to demand, and a simple management and operations interface to well known databases deployed in a cloud. Elastra could be an important option for developers looking for a more traditional cloudy database. I was wondering if you guys had any suggestions for questions you would like answered? What would you like to know about their service? What are you looking for in a cloudy database? What would stop you from adopting it or what would make you decide to adopt it? Any ideas you have would help a lot and will probably be better than anything I have.
Adapted from their website: GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86-64 server with SATA-II RAID and Infiniband HBA). Cluster file systems are still not mature for enterprise market. They are too complex to deploy and maintain though they are extremely scalable and cheap. Can be entirely built out of commodity OS and hardware. GlusterFS hopes to solves this problem. GlusterFS achieved 35 GBps read throughput. The GlusterFS Aggregated I/O Benchmark was performed on 64 bricks clustered storage system over 10 Gbps Infiniband interconnect. A cluster of 220 clients pounded the storage system with multiple dd (disk-dump) instances, each reading / writing a 1 GB file with 1MB block size. GlusterFS was configured with unify translator and round-robin scheduler. The advantages of GlusterFS are: * Designed for O(1) scalability and feature rich. * Aggregates on top of existing filesystems. User can recover the files and folders even without GlusterFS. * GlusterFS has no single point of failure. Completely distributed. No centralized meta-data server like Lustre. * Extensible scheduling interface with modules loaded based on user's storage I/O access pattern. * Modular and extensible through powerful translator mechanism. * Supports Infiniband RDMA and TCP/IP. * Entirely implemented in user-space. Easy to port, debug and maintain. * Scales on demand.
I am in the design phase of getting a website up and running that will have scalability as a main concern. I am looking for opinions on architecture and the like for this endeavor. The site has a few unique characteristics that make scalability difficult. Users will all have a pretty large amount of data that other users will be able to search. The site will be entirely based around search. The catch is that other users will be searching always with a stipulation of 'n' miles from me. I imagine that fact will kill the possibility of query caching for most searches. I have extensive experience with PHP and MYSQL, some experience with ASP.NET/C#, some experience with perl but can learn anything fast. The site will start out on a single server but I want to be 100% certain that I architect the code and databases such that scaling will be simple. What language should I code the site in? What DB would you use: Postgres, MYSQL, MSSQL, BerkelyDB? Should we shard the database by location? by user? not at all? What does everyone think for possible architectures on this?
A thoughtful reader recently suggested creating a series of posts based on real-life problems people have experienced and the solutions they've created to slay the little beasties. It's a great idea. Often we learn best from great trials and tribulations. I'll start off the new "Problem Report" feature with a diabolical little problem I dubbed the "Mobbing the Least Used Resource Error." Please post your own. And if you know someone with an interesting problem report, please tag them too. It could be a lot of fun. Of course, feel free to scrub your posts of all embarrassing details, but be sure to keep the heroic parts in :-)