Paper: Brewer's Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services
Abstract: When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.
If you are a startup you may find useful Guy Kawasaki's post Financial Models for Underachievers: Two Years of the Real Numbers of a Startup. Part of any business plan are the projected guestimates. They are guestimates because everyone keeps these numbers hidden like a Swiss bank account. But not Redfin. They've bravely shared their initial cost projections, their actual numbers from real life, and the lessons they've learned from the discrepancy between the two... You can find their model estimates and actuals for Rent, Per Employee, Per Month (model: $250, actual: $336); Initial Per-Employee Equipment Cost; Monthly Benefits, Per-Employee; Annual Payroll Tax; Quarterly Bonus Payout, as a % of the Total Possible; Annual Payroll Increase for Existing Employees; All-Company Meeting Cost, Per-Meeting, Per-Employee; Annual Accounting Costs, and a few more. There is also a great lessons section: Focus on headcount; Plan slow, run fast; Run top-down sanity-checks; Forget economies of scale; Admit that revenues are a mystery; Build from building blocks; Take out "hope"; Flag your assumptions; Hit $100 million in revenues within five years; Keep market-share under 20%. I find $100 million in revenues a surprisingly high number. That's a lot of money. And the underestimate for meeting costs is pretty funny. It's always those damn meetings!
Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com/
I have a new memcached user to add to your list: we here at Fotolog, the world's largest photo blogging community, now use it and we love it. I just rolled our first code to use it into production today and it has been a lifesaver. I can't wait to start using it in places where we had been relying on Berkeley databases to offload some database work. We are not some wimpy million page a day site, either. Fotolog is a billion+ pages/month site (35 to 40 million views/day is pretty typical for us). We had recently overcome some significant DB-related performance issues which allowed our site traffic to explode, and it started to bog down again under the heavy traffic load (getting back up towards 10 seconds for a page to load sometimes during the peak periods). The servers were churning away each recreating a list every time when it could easily be shared in the same form for at least 5 or 10 minutes. So we introduced memcache, creating a distributed 30-server cluster with 4 gigs available in total and made a very minor code mod to use memcache, and our peak period load times dropped back down to the 2 second or so range. It has allowed for continued growth and incredible efficiency. I can't say when I've ever been so pleased with something that worked so simply."
SmugMug's CEO & Chief Geek Don MacAskill smugly (hard to resist) gushes over finally finding, after a long and arduous quest, their "best bang-for-the-buck storage array." It's the Dell MD300. His in-depth explanation of why he prefers the MD3000 should help anyone with their own painful storage deliberations. His key points are: The price is right; DAS via SAS, 15 spindles at 15K rpm each, 512MB of mirrored battery-backed write cache; You can disable read caching; You can disable read-ahead prefetching; The stripe sizes are configurable up to 512KB; The controller ignores host-based flush commands by default; They support an ‘Enhanced JBOD’ mode. His reasoning for the desirability each option is astute and he even gives you the configuration options for carrying out the configuration. This is not your average CEO. Don also speculates that a three tier system using flash (system RAM + flash storage + RAID disks) is a possible future direction. Unfortunately, flash may not be the dream solution it has been thought to be. StorageMojo talks about this in Flash vs disk at DISKCON 2007.
My company is developing a centralized web platform to service our clients. We currently use about 3Mb/s on our uplink at our ISP serving web pages for about 100 clients. We'd like to offer them statistics that mean something to their businesses and have been contemplating writing our own statistics code to handle the task. All statistics would be gathered at the page view level and we're implementing a HttpModule in ASP.Net 2.0 to handle the gather of the data. That said, I'm curious to hear comments on writing this data (~500 bytes of log data/page request). We need to write this data somewhere and then build a process to aggregate the data into a warehouse application used in our reporting system. Google Analytics is out of the question because we do not want our hosting infrastructure dependant upon a remote server. Web Trends et al. are too expensive for our clients. I'm thinking of a couple of options. 1) Writing log data directly to a SQL Server 2000 db and having a Windows Service come in periodically to summarize and aggregate the data to the reporting server. I'm not sure this will scale with higher load and that the aggregation process will timeout because of the number of inserts being sent to the table. 2) Write the log data to a structure in memory on the web server and periodically flush the data to the db. The fear here is that the web server goes down and we lose all the data in memory. Other fears are that the IIS processes and worker threads might mangle one another when contending for the memory system resource. 3) Don't use memory and write to a file instead. Save the file handler as an application variable and use it for all accesses to the file. Not sure about threading issues here as well and am reluctant to use anything which might corrupt a log file under load. 4) Add comment data to the IIS logs. This theoretically should remove the threading issues but leaves me to think that the data would not be terribly useful once its in the IIS logs. The major driver here is that we do not want to use any of the web sites and canned reports built into 90% of all statistics platforms. Our users shouldn't have to "leave" the customer care portal we're creating just to see stats for their sites. IFrames are not an option. I'm looking for a solution that's not entirely complex, nor is it overly expensive and it will give me the access to the data we need to record on page views. It has to scale with volume. Thoughts are appreciated. Derek
There's a new clustered file system on the spindle: Kosmos File System (KFS). Thanks to Rich Skrenta for turning me on to KFS and I think his blog post says it all. KFS is an open source project written in C++ by search startup Kosmix. The team members have a good pedigree so there's a better than average chance this software will be worth considering. After you stop trying to turn KFS into "Kentucky Fried File System" in your mind, take a look at KFS' intriguing feature set:
Sequoia is a transparent middleware solution offering clustering, load balancing and failover services for any database. Sequoia is the continuation of the C-JDBC project. The database is distributed and replicated among several nodes and Sequoia balances the queries among these nodes. Sequoia handles node and network failures with transparent failover. It also provides support for hot recovery, online maintenance operations and online upgrades.
Features in a nutshell
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
If you have a lot of static content to store and you aren't looking forward to setting up and maintaining your own giganto SAN, maybe you can push off a lot of the hard lifting to a CDN? Jesse Robbins at O'Reilly Radar posts that you have a lot more options now because the number of Content Distribution Networks have doubled since last year. In fact, Dan Rayburn says there are now 28 CDN providers in the market. Hopefully you can find reasonable pricing at one of them. Other than easing your burden, why might a CDN work for you? Because it makes your site faster and customers like that. How can a CDN so dramatically improve your site's performance? Steve Saunders, author of High Performance Web Sites: Essential Knowledge for Front-End Engineers, has using a CDN has one of his "Thirteen Simple Rules for Speeding Up Your Web Site." About CDNs Steve says:
Remember that 80-90% of the end-user response time is spent downloading all the components in the page: images, stylesheets, scripts, Flash, etc. This is the Performance Golden Rule, as explained in The Importance of Front-End Performance. Rather than starting with the difficult task of redesigning your application architecture, it's better to first disperse your static content. This not only achieves a bigger reduction in response times, but it's easier thanks to content delivery networks. ... At Yahoo!, properties that moved static content off their application web servers to a CDN improved end-user response times by 20% or more. Switching to a CDN is a relatively easy code change that will dramatically improve the speed of your web site.It's at least worth looking into if looking for a performance boost or are concerned about storing so many buckets of bits.