Weblinks Top Level

Categories

Links in this category and its subcategories

Todd Hoff's picture

Webcast: Advanced Database High Availability and Scalability Solutions

If MySQL, PostgreSQL or EnterpriseDB High-Availability and Scalability issues are on your plate, you'll find this webcast very informative. Highly recommended!

Webcast starts on Thursday, July 12, 2007 at 10:00AM PDT (1:00PM EDT, 18:00GMT). Duration: 50 minutes, plus Q&A

Advanced Database High-Availability and Scalability Solutions

ImageProgram Agenda

Disk Based Replication
• Overview, major features
• Benefits, use cases
• Limitations and challenges

Master/Slave Asynchronous Replication
• Overview, major features
• Benefits, use cases
• Limitations and challenges

Synchronous Multi-Master Cluster: Continuent uni/cluster
• Cluster overview, major features
• Cluster benefits, use cases
• Limitations and challenges

Product Positioning: HA Continuum
• Comparisons
• Key differentiators
• How to pick the right solution

Continuent Professional Services
• HA Quick Assessment Service
• HA JumpStart Implementation Services

Q&A

Presented by:
• Robert Hodges, CTO - Continuent
• Robert Noyes, Director of Sales, Americas - Continuent

Webcast starts on Thursday, July 12, 2007 at 10:00AM PDT (1:00PM EDT, 18:00GMT). Duration: 50 minutes, plus Q&A.

Click Here to Register!

Continuent, the High Availability and Scalability Experts!

If you are concerned about any of the following…
- Application Availability
- Read Scalability
- Write Scalability
- ZERO data loss requirement
- Disaster Recovery
- Geographically Distributed Operations
… you'll want to talk to us!

Yahoo! Distribution of Hadoop

Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project.
This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop.

Read more and get the Hadoop distribution from Yahoo

Todd Hoff's picture

10 More Rules for Even Faster Websites

Update:How-To Minimize Load Time for Fast User Experiences. Shows how to analyze the bottlenecks preventing websites and blogs from loading quickly and how to resolve them.

80-90% of the end-user response time is spent on the frontend, so it makes sense to concentrate efforts there before heroically rewriting the backend. Take a shower before buying a Porsche, if you know what I mean. Steve Souders, author of High Performance Websites and Yslow, has ten more best practices to speed up your website:

  • Split the initial payload
  • Load scripts without blocking
  • Don’t scatter scripts
  • Split dominant content domains
  • Make static content cookie-free
  • Reduce cookie weight
  • Minify CSS
  • Optimize images
  • Use iframes sparingly
  • To www or not to www

    Sadly, according to String Theory, there are only 26.7 rules left, so get them while they're still in our dimension.

    Here are slides on the first few rules. Love the speeding dog slide. That's exactly what my dog looks like traveling down the road, head hanging out the window, joyfully battling the wind.

    Also see 20 New Rules for Faster Web Pages.

  • 100% on Amazon Web Services: Soocial.com - a lesson of porting your service to Amazon

    Simone Brunozzi, technology evangelist for Amazon Web Services in Europe, describes how Soocial.com was fully ported to Amazon web services.

    ----------------
    This period of the year I decided to dedicate some time to better understand how our customers use AWS, therefore I spent some online time with Stefan Fountain and the nice guys at Soocial.com, a "one address book solution to contact management", and I would like to share with you some details of their IT infrastructure, which now runs 100% on Amazon Web Services!

    In the last few months, they've been working hard to cope with tens of thousands of users and to get ready to easily scale to millions. To make this possible, they decided to move ALL their architecture to Amazon Web Services. Despite the fact that they were quite happy with their previous hosting provider, Amazon proved to be the way to go.
    -----------------

    Read the rest of the article here.

    Todd Hoff's picture

    11 Secrets of a Cloud Scale Consultant That They Dont' Want You to Know

    OK, there is no "they" and "they" wouldn't care if you knew anyway. After all, this isn't a blog about really important stuff like investing, acne cures, or cheap natural cleansing products.

    But the secrets are real. Super cloud scaling consultant Kent Langley has put together a comprehensive checklist to consider when developing for the cloud:

  • ORM for Data Partitioning and Query Splitting - Split queries between updates and deletes from the start
  • Monitoring process, resources, and uptime - Process Monitoring, Resource Monitoring, UpTime Monitoring
  • Performance Testing and Capacity Planning - Can't make good decisions without doing some degree of Performance Testing and Capacity planning.
  • Static vs. Dynamic Content splitting / CDN - Reverse Proxy, Splitting Static and Dynamic content
  • Bundling and Compressing JS and CSS - Bundle them, compress, version, and then properly cache those bundles
  • Logging - Log appropriately and monitor those logs
  • Pragmatic Caching - Most current web applications will have between 3-5 layers of caching
  • Functional Decomposition - Decompose your entire application into functional silos
  • Deployment - It should be efficient, it should have a roll back capability, and it should be almost entirely automated to development
  • Asynchronous Practices - Most cases work can be queued and done by a separate process
  • Make sure your application processes are as lean as possible - More efficient code means less servers

    Please follow the link to Kent's post for a full explanation. To some this may seem obvious, but that doesn't mean it gets done. Good helpful stuff.

    Related Articles


  • Joyent - Cloud Computing Built on Accelerators by Kent Langley

  • Todd Hoff's picture

    13 Screencasts on How to Scale Rails

    Gregg Pollack has made 13 screen casts on how to scale rails:

  • Episode #1 - Page Responsiveness
  • Episode #2 - Page Caching
  • Episode #3 - Cache Expiration
  • Episode #4 - New Relic RPM
  • Episode #5 - Advanced Page Caching
  • Episode #6 - Action Caching
  • Episode #7 - Fragment Caching
  • Episode #8 - Memcached
  • Episode #9 - Taylor Weibley & Databases
  • Episode #10 - Client-side Caching
  • Episode #11 - Advanced HTTP Caching
  • Episode #12 - Jesse Newland & Deployment
  • Episode #13 - Jim Gochee & Advanced RPM

    For a good InfoQ interview with Greg take a look at Gregg Pollack and the How-To of Scaling Rails.

  • 17 Distributed Systems and Web Scalability Resources

    Here's a short list of some great resources that I've found very inspirational and thought provoking. I've broken these resources up into two lists: Blogs and Presentations.

    Todd Hoff's picture

    20 New Rules for Faster Web Pages

    Update: Nice explanation in The importance of bandwidth versus latency of how long latencies cause cascading delays in resource loading. Doloto tries to optimize how resources are loaded.

    Twenty new rules have been added to the original 14 rules for sizzling web performance. Part of scalability is worrying about performance too. The front-end is where 80-90% of end-user response time is spent and following these best practices improved the performance of Yahoo! properties by 25-50%. The rules are divided into server, content, cookie, JavaScript, CSS, images, and mobile categories. The new rules are:

    Todd Hoff's picture

    37signals Architecture

    Update 6: Things We’ve Learned at 37Signals. Themes: less is more; don't worry be happy.
    Update 5: Nuts & Bolts: HAproxy . Nice explanation (post, screencast) by Mark Imbriaco of why HAProxy (load balancing proxy server) is their favorite (fast, efficient, graceful configuration, queues requests when Mongrels are busy) for spreading dynamic content between Apache web servers and Mongrel application servers.
    Update 4: O'Rielly's Tim O'Brien interviews David Hansson, Rails creator and 37signals partner. Says BaseCamp scales horizontally on the application and web tier. Scales up for the database, using one "big ass" 128GB machine. Says: As technology moves on, hardware gets cheaper and cheaper. In my mind, you don't want to shard unless you positively have to, sort of a last resort approach.
    Update 3: The need for speed: Making Basecamp faster. Pages now load twice as fast, cut CPU usage by a third and database time by about half. Results achieved by: Analysis, Caching, MySQL optimizations, Hardware upgrades.
    Update 2: customer support is handled in real-time using Campfire.
    Update: highly useful information on creating a customer billing system.

    In the giving spirit of Christmas the folks at 37signals have shared a bit about how their system works. 37signals is most famous for loosing Ruby on Rails into the world and they've use RoR to make their very popular Basecamp, Highrise, Backpack, and Campfire products. RoR takes a lot of heat for being a performance dog, but 37signals seems to handle a lot of traffic with relatively normal sounding resources. This is just an initial data dump, they promise to add more details later. As they add more I'll update it here.

    Todd Hoff's picture

    A Bunch of Great Strategies for Using Memcached and MySQL Better Together

    The primero recommendation for speeding up a website is almost always to add cache and more cache. And after that add a little more cache just in case. Memcached is almost always given as the recommended cache to use. What we don't often hear is how to effectively use a cache in our own products. MySQL hosted two excellent webinars (referenced below) on the subject of how to deploy and use memcached. The star of the show, other than MySQL of course, is Farhan Mashraqi of Fotolog. You may recall we did an earlier article on Fotolog in Secrets to Fotolog's Scaling Success, which was one of my personal favorites.

    Fotolog, as they themselves point out, is probably the largest site nobody has ever heard of, pulling in more page views than even Flickr. Fotolog has 51 instances of memcached on 21 servers with 175G in use and 254G available. As a large successful photo-blogging site they have very demanding performance and scaling requirements. To meet those requirements they've developed a sophisticated approach to using memcached that others can learn from and emulate. We'll cover some of the highlightable strategies from the webinar down below the fold.

    Todd Hoff's picture

    A High Performance Memory Database for Web Application Caches

    Abstract—This paper presents the architecture and
    characteristics of a memory database intended to be used as a
    cache engine for web applications. Primary goals of this database
    are speed and efficiency while running on SMP systems with
    several CPU cores (four and more). A secondary goal is the
    support for simple metadata structures associated with cached
    data that can aid in efficient use of the cache. Due to these goals,
    some data structures and algorithms normally associated with
    this field of computing needed to be adapted to the new
    environment.

    Todd Hoff's picture

    A Scalable, Commodity Data Center Network Architecture

    Looks interesting...

    Abstract:
    Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance.
    In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

    a8cjdbc - Database Clustering via JDBC

    Practically any software project nowadays could not survive without a database (DBMS) backend storing all the business data that is vital to you and/or your customers. When projects grow larger, the amount of data usually grows larger exponentially. So you start moving the DBMS to a separate server to gain more speed and capacity. Which is all good and healthy but you do not gain any extra safety for this business data. You might be backing up your database once a day so in case the database server crashes you don't lose EVERYTHING, but how much can you really afford to lose?

    Alternate strategy for database sharding

    An alternate strategy for database sharding which avoids queries across different shards and merging results. A central repository of data is maintained for some tables along with other shards. Can be used in calculating top users, recent users, most read etc.

    Alternative Memcache Usage: A Highly Scalable, Highly Available, In-Memory Shard Index

    While working with Memcache the other night, it dawned on me that it’s usage as a distributed caching mechanism was really just one of many ways to use it. That there are in fact many alternative usages that one could find for Memcache if they could just realize what Memcache really is at its core – a simple distributed hash-table – is an important point worthy of further discussion.

    To be clear, when I say “simple”, by no means am I implying that Memcache’s implementation is simple, just that the ideas behind it are such. Think about that for a minute. What else could we use a simple distributed hash-table for, besides caching? How about using it as an alternative to the traditional shard lookup method we used in our Master Index Lookup scalability strategy, discussed previously here.

    Alternatives to Google App Engine

    One particularly interesting EC2 third party provider is GigaSpaces with their XAP platform that provides in memory transactions backed up to a database. The in memory transactions appear to scale linearly across machines thus providing a distributed in-memory datastore that gets backed up to persistent storage.

    Todd Hoff's picture

    Amazon Announces Static IP Addresses and Multiple Datacenter Operation

    Amazon is fixing two of their major problems: no static IP addresses and single datacenter operation. By adding these two new features developers can finally build a no apology system on Amazon. Before you always had to throw in an apology or two. No, we don't have low failover times because of the silly DNS games and unexceptionable DNS update and propagation times and no, we don't operate in more than one datacenter. No more. Now Amazon is adding Elastic IP Addresses and Availability Zones.

    Elastic IP addresses are far better than normal IP addresses because they are both in tight with Jessica Alba and they are:

    Todd Hoff's picture

    Amazon Architecture

    This is a wonderfully informative Amazon update based on Joachim Rohde's discovery of an interview with Amazon's CTO. You'll learn about how Amazon organizes their teams around services, the CAP theorem of building scalable systems, how they deploy software, and a lot more. Many new additions from the ACM Queue article have also been included.

    Amazon grew from a tiny online bookstore to one of the largest stores on earth. They did it while pioneering new and interesting ways to rate, review, and recommend products. Greg Linden shared is version of Amazon's birth pangs in a series of blog articles

    Todd Hoff's picture

    Amazon Improves Diagonal Scaling Support with High-CPU Instances

    Now you can buy more cores on EC2 without adding more machines:

  • The High-CPU Medium Instance is billed at $0.20 (20 cents) per hour. It features 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units Each), and 350 GB of instance storage, all on a 32-bit platform.
  • The High-CPU Extra Large Instance is billed at $0.80 (80 cents) per hour. It features 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), and 1,690 GB of instance storage, all on a 64-bit platform.

    Diagonal Scaling is making a site faster by removing machines. More on this intriguing idea in Diagonal Scaling - Don't Forget to Scale Out AND Up.

  • Todd Hoff's picture

    Amazon's EC2: Pay as You Grow Could Cut Your Costs in Half

    Update 2: Summize Computes Computing Resources for a Startup. Lots of nice graphs showing Amazon is hard to beat for small machines and become less cost efficient for well used larger machines. Long term storage costs may eat your saving away. And out of cloud bandwidth costs are high.
    Update: via ProductionScale, a nice Digital Web article on how to setup S3 to store media files and how Blue Origin was able to handle 3.5 million requests and 758 GBs in bandwidth in a single day for very little $$$. Also a Right Scale article on Network performance within Amazon EC2 and to Amazon S3. 75MB/s between EC2 instances, 10.2MB/s between EC2 and S3 for download, 6.9MB/s upload.

    Now that Amazon's S3 (storage service) is out of beta and EC2 (elastic compute cloud) has added new instance types (the class of machine you can rent) with more CPU and more RAM, I thought it would be interesting to take a look out how their pricing stacks up.

    The quick conclusion:the more you scale the more you save. A six node configuration in Amazon is about half the cost of a similar setup using a service provider. But cost may not be everything...

    An Open Source Web Solution - Lighttpd Web Server and Chip Multithreading Technology

    With more users interacting, working, purchasing, and communicating over the network than ever before, Web 2.0 infrastructure is taking center stage in many organizations. Demand is rising, and companies are looking for ways to tackle the performance and scalability needs placed on Web infrastructure without raising IT operational expenses. Today companies are turning to efficient, high-performance, open source solutions as a way to decrease acquisition, licensing, and other ongoing costs and stay within budget constraints.

    Todd Hoff's picture

    An Unorthodox Approach to Database Design : The Coming of the Shard

    Update 3: Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes by Dare Obasanjo. Excellent discussion of why and when you would choose a sharding architecture, how to shard, and problems with sharding.
    Update 2: Mr. Moore gets to punt on sharding by Alan Rimm-Kaufman of 37signals. Insightful article on design tradeoffs and the evils of premature optimization. With more memory, more CPU, and new tech like SSD, problems can be avoided before more exotic architectures like sharding are needed. Add features not infrastructure. Jeremy Zawodny says he's wrong wrong wrong. we're running multi-core CPUs at slower clock speeds. Moore won't save you.
    Update: Dan Pritchett shares some excellent Sharding Lessons: Size Your Shards, Use Math on Shard Counts, Carefully Consider the Spread, Plan for Exceeding Your Shards

    Once upon a time we scaled databases by buying ever bigger, faster, and more expensive machines. While this arrangement is great for big iron profit margins, it doesn't work so well for the bank accounts of our heroic system builders who need to scale well past what they can afford to spend on giant database servers. In a extraordinary two article series, Dathan Pattishall, explains his motivation for a revolutionary new database architecture--sharding--that he began thinking about even before he worked at Friendster, and fully implemented at Flickr. Flickr now handles more than 1 billion transactions per day, responding in less then a few seconds and can scale linearly at a low cost.

    What is sharding and how has it come to be the answer to large website scaling problems?

    Todd Hoff's picture

    Announce: First Meeting of Boston Scalability User Group

    The first meeting will take place on Wednesday March 26 at 6 p.m. in the IBM Innovation Center (Waltham, MA). The first speaker will be Patrick Peralta of Oracle! Patrick will be presenting: Orchestrating Messaging, Data Grid and Database for Scalable Performance. Important Note: There will be pizza at this meeting!

    The site is at: http://www.bostonsug.org/

    another approach to replication

    File replication based on erasure codes can reduce total replicas size 2 times and more.

    Todd Hoff's picture

    Anti-RDBMS: A list of distributed key-value stores

    Update 2: They are now called NoSQL databases. So keep up! Eric Lai wrote a good article in Computerworld No to SQL? Anti-database movement gains steam about the phenomena. There was even a NoSQL conference. It was unfortunately full by the time I wanted to sign up, but there are presentations by all the major players. Nice Hacker News thread too.
    Update: Some Notes on Distributed Key Stores by Leonard Lin. What's the best way to handle a fast growing system with 100M items that requires low latency and lots of inserts? Leanord takes a trip through several competing systems. The winner was: Tokyo Cabinet.

    Richard Jones has put together a very nice list of various key-value stores around the internets. The list includes: Project Voldemort, Ringo, Scalaris, Kai, Dynomite, MemcacheDB, ThruDB, CouchDB, Cassandra, HBase, and Hypertable. Richard also includes some commentary and their basic components (language, fault tolerance, persistence, client protocol, data model, docs, community).

    There's an excellent discussion in the comments of Paxos vs Vector Clock techniques for synchronizing writes in the face of network failures.

    Todd Hoff's picture

    Are Cloud Based Memory Architectures the Next Big Thing?

    We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon.

    Let's take a short trip down web architecture lane:

  • It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database
  • It's 1995: Scale-up the database.
  • It's 1998: LAMP
  • It's 1999: Stateless + Load Balanced + Database + SAN
  • It's 2001: In-memory data-grid.
  • It's 2003: Add a caching layer.
  • It's 2004: Add scale-out and partitioning.
  • It's 2005: Add asynchronous job scheduling and maybe a distributed file system.
  • It's 2007: Move it all into the cloud.
  • It's 2008: Cloud + web scalable database.
  • It's 20??: Cloud + Memory Based Architectures

    You may disagree with the timing of various innovations and you would be correct. I couldn't find a history of the evolution of website architectures so I just made stuff up. If you have any better information please let me know.

    Why might cloud based memory architectures be the next big thing? For now we'll just address the memory based architecture part of the question, the cloud component is covered a little later.

    Behold the power of keeping data in memory:


    Google query results are now served in under an astonishingly fast 200ms, down from 1000ms in the olden days. The vast majority of this great performance improvement is due to holding indexes completely in memory. Thousands of machines process each query in order to make search results appear nearly instantaneously.

    This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

    Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebook has northwards of 800 memcached servers creating a reservoir of 28 terabytes of memory enabling a 99% cache hit rate. Even little guys can handle 100s of millions of events per day by using memory instead of disk.

  • HFadeel's picture

    Art of Distributed

    Art of Distributed

    Part 1: Rethinking about distributed computing models

    I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems.

    Download the article in PDF format.
    Download the article in MS Word format.

    I wait for your comments, and questions, and I will answer it in part two.

    Todd Hoff's picture

    At Some Point the Cost of Servers Outweighs the Cost of Programmers

    This is the intriguing quote by Bill Venners in an interview with Twitter's Alex Payne on Twitter's heretical switch from a pure Ruby stack to a Ruby on Rails stack on the front-end and JVM/Scala on the back-end:


    So performance was also one of the problems with JRuby, which I [Bill Venners] think helps explain better why they'd [Twitter] prefer Scala over Ruby or JRuby for some things.

    I have often heard Rubyists say that although Ruby is slower than Java, for many things it is plenty fast enough, and they are right. The logic goes further, saying that servers are cheap, and programmers expensive, so it makes sense to tradeoff some runtime performance for programmer productivity. And I think that's very often true too, but not always. If you have enough traffic, at some point the cost of servers outweighs the cost of programmers. I'm not sure whether Twitter is past that point, but they get a lot of traffic. And frankly this isn't an intrinsic tradeoff. Other dynamic languages are faster than Ruby, and Scala is too. And people can be quite productive in these other languages too, including Scala.

    I feel Alex's Max Payne. You might wonder why the geekosphere cares so passionately which technology stack Twitter uses? Well, it's Twitter and it's Ruby on Rails. That's like the Lindsay Lohan and Samantha Ronson of tech buzz. It creates it's own self-sustaining posting reaction. Boom!

    It took some giant cajones to switch from a well defended platform like Ruby on Rails to an obscure language like Scala. Few people would have been brave enough to pull the trigger on that decision.

    Twitter didn't take this large leap out of ignorance or incompetence. Twitter's Steve Jenson said they spent several weeks going over our options, running extensive load tests, and presented our findings to the team at each stage. We did our due diligence.

    They did the work and came to conclusions valid for their situation. They have to follow their own bliss. They aren't telling you to use Scala. They aren't telling you not to use Ruby. Have at it. But they have chosen the path less traveled and seem happy with the direction they are heading. If you aren't happy with their decision then that's a you problem, not a them problem.

    Audiogalaxy.com Architecture

    Update 3: Always Refer to Your V1 As a Prototype. You really do have to plan to throw one away.
    Update 2: Lessons Learned Scaling the Audiogalaxy Search Engine. Things he should have done and fun things he couldn’t justify doing.
    Update: Design details of Audiogalaxy.com’s high performance MySQL search engine. At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.

    Search was one of most interesting problems at Audiogalaxy. It was one of the core functions of the site, and somewhere between 50 to 70 million searches were performed every day. At peak times, the search engine needed to handle 1500-2000 searches every second against a MySQL database with about 200 million rows.

    Todd Hoff's picture

    Biggest Under Reported Story: Google's BigTable Costs 10 Times Less than Amazon's SimpleDB

    Why isn't Google's aggressive new database pricing strategy getting more pub? That's what Bill Katz, instigator of the GAE Meetup and prize winning science fiction author is wondering:

    It's surprising that the blogosphere hasn't picked up the biggest difference in pricing: 
    Google's datastore is less than a tenth of the price of Amazon's SimpleDB while offering a better API.
    

    If money matters to you then the burn rate under GAE could be convincingly lower. Let's compare the numbers:

    Todd Hoff's picture

    Blog: Adding Simplicity by Dan Pritchett

    Dan has genuine insight into building software and large scale scalable systems in particular. You'll always learn something interesting reading his blog.

    A Quick Hit of What's Inside

    Inverting the Reliability Stack, In Support of Non-Stop Software, Chaotic Perspectives, Latency Exists, Cope!, A Real eBay Architect Analyzes Part 3, Avoiding Two Phase Commit, Redux

    Todd Hoff's picture

    Blog: Esoteric Curio by Theo Schlossnagle

    Theo Schlossnagle is the author of Scalable Internet Architecture and the funder of OmniTI , a global leader in Internet technology services that power the World Wide Web and email.

    As you might imagine Theo frequently posts on interesting topics for the scalable website builder.

    A Quick Hit of What's Inside

    Partitioning vs. Federation vs. Sharding,
    PostgreSQL warm standby on ZFS crack
    , Scalability vs. Performance: it isn't a battle

    Todd Hoff's picture

    Blog: MySQL Performance Blog - Everything about MySQL Performance.

    Follow this blog and you'll learn a lot about MySQL and how to make it sing.

    A Quick Hit of What's Inside

    Working with large data sets in MySQL, PHP Large result sets and summary tables, Implementing efficient counters with MySQL.

    Todd Hoff's picture

    Blog: Occam’s Razor by Avinash Kaushik

    Author of Web Analytics An Hour of Day. Has a fresh and practical take on unlocking the power of web research and web analytics to create truly data driven organizations for gaining a strategic competitive advantage.

    A Quick Hit of What's Inside

    Find You Web Analytics Soul Mate (How To Run An Effective Tool Pilot), AK’s Web Analytics Tool Evaluation “Tips From A Tough Life”, Web Analytics Data Sampling 411, Six Data Visualizations That Rock!, Why “looking beyond the click” to optimize the experience is so necessary.

    Todd Hoff's picture

    Blog: Scalable Web Architectures by Royans Tharakan

    Royans' scalability blog and his main blog are excellent sources of scalability information. Take a look.

    A Quick Hit of What's Inside

    Sharding: Different from Partitioning and Federation ?, Adventures of scaling eins.de, Session, state and scalability

    Todd Hoff's picture

    Blog: The Next Generation Data Center by Eric Norlin

    A blog covering what the data center of the future will look like.

    A Quick Hit of What's Inside

    The Next Generation Data Center Blog, Funding Instrumentation

    Todd Hoff's picture

    Book: Building Scalable Web Sites

    Building, scaling, and optimizing the next generation of web applications. Learn the tricks of the trade so you can build and architect applications that scale quickly--without all the high-priced headaches and service-level agreements associated with enterprise app servers and proprietary programming and database products. Culled from the experience of the Flickr.com lead developer, Building Scalable Web Sites offers techniques for creating fast sites that your visitors will find a pleasure to use.

    Creating popular sites requires much more than fast hardware with lots of memory and hard drive space. It requires thinking about how to grow over time, how to make the same resources accessible to audiences with different expectations, and how to have a team of developers work on a site without creating new problems for visitors and for each other.

    Presenting information to visitors from all over the world
    * Integrating email with your web applications
    * Planning hardware purchases and hosting options to have as much as you need without breaking your wallet
    * Partitioning and distributing databases to support large datasets and simultaneous transactions
    * Monitoring your applications to find and clear bottlenecks
    * Providing services APIs and using services from other providers to increase your site's reach and capabilities

    Whether you're starting a small web site with hopes of growing big or you already have a large system that needs maintenance, you'll find Building

    Todd Hoff's picture

    Book: High Performance MySQL

    As users come to depend on MySQL, they find that they have to deal with issues of reliability, scalability, and performance--issues that are not well documented but are critical to a smoothly functioning site. This book is an insider's guide to these little understood topics. Author Jeremy Zawodny has managed large numbers of MySQL servers for mission-critical work at Yahoo!, maintained years of contacts with the MySQL AB team, and presents regularly at conferences. Jeremy and Derek have spent months experimenting, interviewing major users of MySQL, talking to MySQL AB, benchmarking, and writing some of their own tools in order to produce the information in this book. In High Performance MySQL you will learn about MySQL indexing and optimization in depth so you can make better use of these key features. You will learn practical replication, backup, and load-balancing strategies with information that goes beyond available tools to discuss their effects in real-life environments. And you'll learn the supporting techniques you need to carry out these tasks, including advanced configuration, benchmarking, and investigating logs.

    Topics include:
    * A review of configuration and setup options
    * Storage engines and table types
    * Benchmarking
    * Indexes
    * Query Optimization
    * Application Design
    * Server Performance
    * Replication
    * Load-balancing
    * Backup and Recovery
    * Security

    Todd Hoff's picture

    Book: Scalable Internet Architectures

    As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet on a daily basis. Scalable Internet Architecture addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget. Using open source and proprietary examples, you will be engaged in best practice design methodologies for building new sites, as well as appropriately scaling both growing and shrinking sites. Website development help has arrived in the form of Scalable Internet Architecture.

    Breakthrough Web-Tier Solutions with Record-Breaking Performance

    With the explosive growth of the Internet, increasing complexity of user requirements, and wide choice of hardware, operating systems, and middleware, IT executives are facing new challenges in their application infrastructures. Rapid expansion of the application tier has resulted in significant cost and complexity, and many organizations are simply running out of datacenter space, power, and cooling.

    Todd Hoff's picture

    Build an Infinitely Scalable Infrastructure for $100 Using Amazon Services

    Can you really create an infinitely scalable infrastructure for less than $100 using Amazon's storage, grid, and queuing services platform? It appears so, at least for the right application. Amazon beams a spot light on the future battle of the roll-your-own versus the connect-the-dots approach to building next generation websites using core external services. Their argument is strong. Using Amazon's platform you can quickly build an infrastructure that would otherwise take an eternity to make, a pile of money to create, and an unbounded mass of people to implement and maintain. Yet Amazon doesn't provide SLAs, so you can you really trust them with your crown jewels? Facebook recently leap frogged Amazon's vision with an even more comprehensive set of services. The battle for the future is on.

    Building a data cycle at LinkedIn with Hadoop and Project Voldemort

    Update: Building Voldemort read-only stores with Hadoop.

    A write up on what LinkedIn is doing to integrate large offline Hadoop data processing jobs with a fast, distributed online key-value storage system, Project Voldemort.

    Building a Scalable Architecture for Web Apps

    By Bhavin Turakhia CEO, Directi. Covers:
    * Why scalability is important. Viral marketing can result in instant success. With RSS/Ajax/SOA number of requests grow exponentially with user base. Goal is to build a web 2.0 app that can server millions of users with zero downtime.
    * Introduction to the variables. Scalability, performance, responsiveness, availability, downtime impact, cost, maintenance effort.
    * Introduction to the factors. Platform selection, hardware, application design, database architecture, deployment architecture, storage architecture, abuse prevention, monitoring mechanisms, etc.
    * Building our own scalable architecture in incremental steps: vertical scaling, vertical partitioning, horizontal scaling, horizontal partitioning, etc. First buy bigger. Then deploy each service on a separate node. Then increase the number of nudes and load balance. Deal with session management. Remove single points of failure. Use a shared nothing cluster. Choice of master-slave multi-master setup. Replication can be synchronous or asynchronous.
    * Platform selection considerations. Use a global redirector for serving multiple datacenters. Add object, session API, and page cache. Add reverse proxy. Think about CDNs, Grid computing.
    * Tips. Use a Horizontal DB architecture from the beginning. Loosely couple all modules. Use a REST interface for easier caching. Perform application sizing ongoingly to ensure optimal hardware utilization.

    Building and Scaling a Startup on Rails: 12 Things We Learned the Hard Way

    Garry Tan, cofounder of Posterous, lists 12 lessons for scaling that apply to more than just Rails.

  • Use cloud storage for static files.
  • Use HTTP Cache Control to tell the browser what it can cache.
  • Use Sphinx for text search.
  • Use InnoDB for more crash resistant and faster writes.
  • Don't use textbook Rails ActiveRecord objects. Use New Relic to find exactly what is slow in your system.
  • Use memcache later so you find your database bottlenecks now.
  • Use mongrel proctitle to find your slow queries. You are only as fast as your slowest queries.
  • Use asynchronous job queuing to do work in parallel.
  • Use monitoring so you'll know when your site went down and why.
  • Learn by reading the source code, fixing problems, and submitting them back to the community.
  • Use new plugins. Old plugins can't be trusted.
  • Use new information. Old information can't be trusted.

  • Todd Hoff's picture

    Can cloud computing smite down evil zombie botnet armies?

    In the more cool stuff I've never heard of before department is something called Self Cleansing Intrusion Tolerance (SCIT). Botnets are created when vulnerable computers live long enough to become infected with the will to do the evil bidding of their evil masters. Security is almost always about removing vulnerabilities (a process which to outside observers often looks like a dog chasing its tail). SCIT takes a different approach, it works on the availability angle. Something I never thought of before, but which makes a great deal of sense once I thought about it.

    With SCIT you stop and restart VM instances every minute (or whatever depending in your desired window vulnerability)....

    Todd Hoff's picture

    Can How Bees Solve their Load Balancing Problems Help Build More Scalable Websites?

    Bees have a similar problem to website servers: how to do a lot of work with limited resources in an ever changing environment. Usually lessons from biology are hard to apply to computer problems. Nature throws hardware at problems. Billions and billions of cells cooperate at different levels of organizations to find food, fight lions, and make sure your DNA is passed on.

    Nature's software is "simple," but her hardware rocks. We do the opposite. For us hardware is in short supply so we use limited hardware and leverage "smart" software to work around our inability to throw hardware at problems. But we might be able to borrow some load balancing techniques from bees. What do bees do that we can learn from?

    Todd Hoff's picture

    Can you profit from the coming Content Delivery Network wars?

    Playing like the big boys may be getting cheaper. The big boys, like YouTube, farm the serving of their most popular videos to a third party CDN. A lot of people were surprised YouTube didn't serve all their content themselves, but it makes sense. It allows them to keep up with demand without a large hit for infrastructure build out, much like leasing computers instead of buying them.

    Challanges for Developing Enterprise Application on the Cloud

    This post I provided a summary of recent discussions outlining the main challenges that developers face today when deploying their existing JEE application to the cloud such as complexity, database integration, security, standard JEE support etc. In this post i also provided summary of how we managed to handle those challenges with our new Cloud Computing Framework by pointing to an existing production reference of a leading Telco provider.

    Challenges from large scale computing at Google

    From Greg Linden on a talk Google Fellow Jeff Dean gave last week at University of Washington Computer Science titled "Research Challenges Inspired by Large-Scale Computing at Google" : Coming away from the talk, the biggest points for me were the considerable interest in reducing costs (especially reducing power costs), the suggestion that the Google cluster may eventually contain 10M machines at 1k locations, and the call to action for researchers on distributed systems and databases to think orders of magnitude bigger than they often are, not about running on hundreds of machines in one location, but hundreds of thousands of machines across many locations.

    bnewport's picture

    Classifying XTP systems and how cloud changes which type startups will use

    I try to group XTP in to two main groups, type 1 and 2 and then subdivide type 2 in to 2a and 2b. I describe how I do this grouping and then amplify it a little in the context of cloud services.

    CLOUD & GRID EVENT BY THE ONLINE GAMING HIGH SCALABILITY SIG

    The first meeting of this Online Gaming High Scalability SIG will be on the 9th of July 2009 in central London, starting at 10 AM and finishing around 5PM.

    The main topic of this meeting will be potentials for using cloud and grid technologies in online gaming systems. In addition to experience reports from the community, we have invited some of the leading cloud experts in the UK to discuss the benefits such as resource elasticity and challenges such as storage and security that companies from other industries have experienced. We will have a track for IT managers focused on business opportunities and issues and a track for architects and developers more focused on implementation issues.

    The event is free but up-front registration is required for capacity planning, so please let us know in advance, if you are planning to attend by completing the registration form on this page

    To propose a talk or for programme enquiries, contact meetings [at] gamingscalability [dot] org.

    Note: The event is planned to finish around 5 PM so that people can make their way to Victoria on time for CloudCamp London. CloudCamp is a meeting of the cloud computing community with short talks, is also free but you will have to register for it separately

    PROGRAMME: http://skillsmatter.com/event/cloud-grid/online-gaming-high-scalability-...

    Cloud computing, grid computing, utility computing - list of top providers

    You want to have a scalable website. You want a website which can handle traffic spikes (think if you are getting on Digg, Slahsdot, Reddit, Techcrunch or other very popular websites frontpage).

    Regular hosting companies (especially shared hosting) can offer only so much. The servers usually get crushed under the load in short time.

    But there is hope. A new breed of hosting companies emerged recently. A new breed which can offer you the scalability you need at a fraction of the cost.

    Welcome to the world of “cloud computing!” (or “grid computing” or “utility computing”, which are terms for the same thing).

    Here's a website which compiled a list of cloud computing hosting companies (with short descriptions, prices and customer lists for each of them).


    Read the entire article about Cloud computing, grid computing, utility computing list at MyTestBox.com - web software reviews, news, tips & tricks.

    Todd Hoff's picture

    Cloud Programming Directly Feeds Cost Allocation Back into Software Design

    Update 4: Caching becomes even more important in CPU based billing environments. Avoiding the CPU means saving money.
    Update 3: An interesting simple example of this idea showed up on the Google AppEngine list. With one paging algorithm and one use of AJAX the yearly cost of the site was $1000. By changing those algorithms the site went under quota and became free again. This will make life a lot more interesting for developers.
    Update 2: Business Model Influencing Software Architecture by Brandon Watson. The profitability of your project could disappear overnight on account of code behaving badly.
    Update: Amazon adds Elastic Block Store at $0.10 per 1 million I/O requests. Now I need some cost minimization storage algorithms!

    In the GAE Meetup yesterday a very interesting design rule came up: Design By Explicit Cost Model. A clumsy name I know, but it is explained like this:


    If you are going to be charged for an operation GAE wants you to explicitly ask for it. This is why some automatic navigation between objects isn't provided because that will force an explicit query to be written. Writing an explicit query is a sort of EULA for being charged. Click OK in the form of a query and you've indicated that you are prepared to pay for a database operation.

    Usually in programming the costs we talk about are time, space, latency, bandwidth, storage, person hours, etc. Listening to the Google folks talk about how one of their explicit design goals was to require programmers to be mindful of operations that will cost money made me realize in cloud programming cost will be another aspect of design we'll have to factor in.

    Instead of asking for the Big O complexity of an algorithm we'll also have to ask for the Big $ (or Big Euro) notation so we can judge an algorithm by its cost against a particular cloud profile. Maybe something like $(CPU=1.3,DISK=3,IN-BANDWIDTH=2,OUT=BANDWIDTH=3, DB=10). You could look at the Big $ notation for algorithm and shake your head saying that approach will never work for GAE, but it could work for Amazon. Can we find a cheaper Big $? ...

    CloudCamp London 2: private clouds and standardisation

    CloudCamp returned to London yesterday, organised with the help of Skills Matter at the Crypt on the Clarkenwell green. The main topics of this cloud/grid computing community meeting were service-level agreements, connecting private and public clouds and standardisation issues.

    clusteradmin.blogspot.com - blog about building and administering clusters

    A blog about cluster administration. Written by a System Administrator working at HPC (High Performance Computing) data-center, mostly dealing with PC clusters (100s of servers), SMP machines and distributed installations. The blog concentrates on software/configuration/installation management systems, load balancers, monitoring and other cluster-related solutions.

    Coming soon: better JRockit+Coherence integration

    At the Oracle Coherence Special Interest Group meeting today in London, Tomas Nilsson, the product manager for JRockit RT and JRockit Mission Control spoke about the future plans for JRockit and especially plans for improved Coherence JRockit integration.

    Content Delivery Networks (CDN) – a comprehensive list of providers

    We build web applications…and there are plenty of them around. Now, if we hit the jackpot and our application becomes very popular, traffic goes up, and our servers are brought down by the hordes of people coming to our website. What do we do in that situation?

    Of course, I am not talking here about the kind of traffic Digg, Yahoo Buzz or other social media sites can bring to a website, which is temporary overnight traffic, or a website which uses cloud computing like Amazon EC2 service, MediaTemple Grid Service or Mosso Hosting Cloud service.

    I am talking about traffic that consistently increases over time as the service achieves success. Google.com, Yahoo.com, Myspace.com, Facebook.com, Plentyoffish.com, Linkedin.com, Youtube.com and others are examples of services which have constant high traffic.

    Knowing that users want speed from their applications, these services will always use a Content Delivery Network (CDN) to deliver that speed.

    What is a Content Delivery Network?

    A Content Delivery Network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen. This will help scaling a web application by taking a part of the load from the service servers.

    Read the entire article about Content Delivery Networks (CDN) list of providers at MyTestBox.com - web software reviews, news, tips & tricks.

    Todd Hoff's picture

    Coyote Point Load Balancing Systems

    Appliances that:
    * Ensures Non-Stop application availability
    * Improves network and server maintainability
    * Delivers Enterprise-grade gigabit content switching
    * Offers true Application Acceleration
    * Provides maximum throughput at minimal cost

    Todd Hoff's picture

    CTL - Distributed Control Dispatching Framework

    CTL is a flexible distributed control dispatching framework that enables you to break management processes into reusable control modules and execute them in distributed fashion over the network.

    From their website:
    CTL is a flexible distributed control dispatching framework that enables you to break management processes into reusable control modules and execute them in distributed fashion over the network.

    What does CTL do?
    CTL helps you leverage your current scripts and tools to easily automate any kind of distributed systems management or application provisioning task. Its good for simplifiying large-scale scripting efforts or as another tool in your toolbox that helps you speed through your daily mix of ad-hoc administration tasks.

    What are CTL's features?
    CTL has many features, but the general highlights are:

    * Execute sophisticated procedures in distributed environments - Aren't you tired of writing and then endlessly modifying scripts that loop over nodes and invoke remote actions? CTL dispatches actions to remote controllers with network transparency (over SSH), parallelism, and error handling already built in.
    * Comes with pre-built utilities - CTL comes with pre-built utilities so you don't have to script actions like file distribution or process and port checking.
    * Define your own automation using the tools/languages you already know - New controller modules are defined in XML and your scripting can be done in multiple scripting languages (Perl, Python, etc.), *nix shell, Windows batch, and/or Ant.
    * Cross platform administration - CTL is Java-based, works on *nix and Windows.

    Data grid comparison: Oracle Coherence vs Gigaspaces XAP

    A short summary of differences between Oracle Coherence and GigaSpaces XAP.

    HFadeel's picture

    Database Optimize patterns

    Database Optimize patterns

    Most of websites and enterprise application rely on the database backing them to store the application and customer data. So at some point the database could be the main performance and scalability bottleneck for your system performance, so I ‘m here today to cure this!

    key points:

    • Database supporters and resisters:
      • Database supporters: MySQL, SQL Server, and PostgreSQL
      • Database resisters: HBase, MongoDB, Redis, and others
    • Database Optimizing pattern:
      • What to store into the Database?
      • Field data types
      • The primary key and the indexes
      • Data retrieve, SP’s, and Ad-hoc queries
      • Caching

    Todd Hoff's picture

    Database People Hating on MapReduce

    Update: Typical Programmer tackles the technical issues in Relational Database Experts Jump The MapReduce Shark. The culture clash is still what fascinates me.

    David DeWitt writes in the Database Column that MapReduce is a major step backwards:

  • A giant step backward in the programming paradigm for large-scale data intensive applications
  • A sub-optimal implementation, in that it uses brute force instead of indexing
  • Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
  • Missing most of the features that are routinely included in current DBMS
  • Incompatible with all of the tools DBMS users have come to depend on

    Listening to databasers and map reducers talk is like eavesdropping on your average family holiday mashup. Every holiday people who have virtually nothing in common are thrown together because they incidentally share a little DNA or are married to the shared DNA. In desperation everyone gravitates to some shared enemy they can all confidently bash. But after that moment is relieved and awkward silence once again looms, nothing is left but more drinking and tackling sensitive topics you just know will end badly.

  • Todd Hoff's picture

    Database Sharding at Netlog, with MySQL and PHP

    Jurriaan Persyn is a Lead Web Developer at Netlog, a social portal site that gets 50 million unique visitors and 5+ billion page views per month. In this paper Jurriaan goes into a lot of excellent nuts and bolts details about how they used sharding to scale their system. If you are pondering sharding as a solution to your scaling problems you'll want to read this paper. As the paper is quite well organized there's no reason to write a summary, but I especially liked this part from the conclusion:


    If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding? On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.

    Database Sharding for startups

    The most important aspect of a scalable web architecture is data partitioning. Most components in a modern data center are completely stateless, meaning they just do batches of work that is handed to them, but don't store any data long-term. This is true of most web application servers, caches like memcached, and all of the network infrastructure that connects them. Data storage is becoming a specialized function, delegated most often to relational databases. This makes sense, because stateless servers are easiest to scale - you just keep adding more. Since they don't store anything, failures are easy to handle too - just take it out of rotation.

    Stateful servers require more careful attention. If you are storing all of your data in a relational database, and the load on that database exceeds its capacity, there is no automatic solution that allows you to simply add more hardware and scale up. (One day, there will be, but that's for another post). In the meantime, most websites are building their own scalable clusters using sharding.

    Read more on LessonLearned blog.

    Database War Stories #3: Flickr

    [Tim O'Reilly] Continuing my series of queries about how "Web 2.0" companies used databases, I asked Cal Henderson of Flickr to tell me "how the folksonomy model intersects with the traditional database. How do you manage a tag cloud?"

    Deploying MySQL Database in Solaris Cluster Environments

    MySQL™ database, an open source database, delivers high performance and reliability while keeping costs low by eliminating licensing fees. The Solaris™ Cluster product is an integrated hardware and software environment that can be used to create highly-available data services. This article explains how to deploy the MySQL database in a Solaris Cluster environment. The article addresses the following topics:

    * "Advantages of Deploying MySQL Database with Solaris Cluster" on page 1 discusses the benefits provided by a Solaris Cluster deployment of the MySQL database.
    * "Overview of Solaris Cluster" on page 2 provides a high-level description of the hardware and software components of the Solaris Cluster.
    * "Installation and Configuration" on page 8 explains the procedure for deploying the MySQL database on a Solaris Cluster.

    This article assumes that readers have a basic understanding of Solaris Cluster and MySQL database installation and administration.

    Designing a Scalable Twitter

    There were many talks recently about twitter scalability and their specific choice of language such as Scala to address their existing Ruby based scalability. In this post i tried to provide a more methodical approach for handling twitter scalability challenges that is centered around the right choice of architecture patterns rather then the language itself.
    The architecture pattern are given in a generic fashion that is not specific to twitter itself and can serve anyone who is looking to build a scalable real time web application in the near future.

    Todd Hoff's picture

    Digg Architecture

    Update 4:: Introducing Digg’s IDDB Infrastructure by Joe Stump. IDDB is a way to partition both indexes (e.g. integer sequences and unique character indexes) and actual tables across multiple storage servers (MySQL and MemcacheDB are currently supported with more to follow).
    Update 3:: Scaling Digg and Other Web Applications.
    Update 2:: How Digg Works and How Digg Really Works (wear ear plugs). Brought to you straight from Digg's blog. A very succinct explanation of the major elements of the Digg architecture while tracing a request through the system. I've updated this profile with the new information.
    Update: Digg now receives 230 million plus page views per month and 26 million unique visitors - traffic that necessitated major internal upgrades.

    Traffic generated by Digg's over 22 million famously info-hungry users and 230 million page views can crash an unsuspecting website head-on into its CPU, memory, and bandwidth limits. How does Digg handle billions of requests a month?

    Todd Hoff's picture

    Drop ACID and Think About Data

    The abstract for the talk given by Bob Ippolito, co-founder and CTO of Mochi Media, Inc:

    Building large systems on top of a traditional single-master RDBMS data storage layer is no longer good enough. This talk explores the landscape of new technologies available today to augment your data layer to improve performance and reliability. Is your application a good fit for caches, bloom filters, bitmap indexes, column stores, distributed key/value stores, or document databases? Learn how they work (in theory and practice) and decide for yourself.

    Bob does an excellent job highlighting different products and the key concepts to understand when pondering the wide variety of new database offerings. It's unlikely you'll be able to say oh, this is the database for me after watching the presentation, but you will be much better informed on your options. And I imagine slightly confused as to what to do :-)

    An interesting observation in the talk is that the more robust products are internal to large companies like Amazon and Google or are commercial. A lot of the open source products aren't yet considered ready for prime-time and Bob encourages developers to join a project and make patches rather than start yet another half finished key-value store clone. From my monitoring of the interwebs this does seem to be happening and existing products are starting to mature.

    From all the choices discussed the column database Vertica seems closest to Bob's heart and it's the product they use. It supports clustering, column storage, compression, bitmapped indexes, bloom filters, grids, and lots of other useful features. And most importantly: it works, which is always a plus :-)

    Here's a summary of some of the points talked about in the presentation:

    wmworia's picture

    Dyrad

    The Dryad Project is investigating programming models for writing parallel and distributed programs to scale from a small cluster to a large data-center.

    wmworia's picture

    DyradLINQ

    The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).

    Easier Production Releases

    I’ve been a part of some late night release procedures and they’re never fun. You’ve got QA, Dev, IT and a handful of managers sitting in their jammies in a group IM (or worse, a conference call) from 2:00 AM until way too early in the morning. Everyone’s grumpy and sleepy, causing the release to be more difficult and take longer. Sometimes the dreaded “rollback!” is yelled. All this because you’re running a high profile website that needs to be accessible 24/7, and 2:00 AM - 5:00 AM downtime is better than daytime downtime.

    If you're a site that doesn't have 10s of thousands to drop on a real http load balancer, use this strategy to release software during business hours with no downtime using apache's mod_proxy_balancer....

    Todd Hoff's picture

    eBay Architecture

    Update 2: EBay's Randy Shoup spills the secrets of how to service hundreds of millions of users and over two billion page views a day in Scalability Best Practices: Lessons from eBay on InfoQ. The practices: Partition by Function, Split Horizontally, Avoid Distributed Transactions, Decouple Functions Asynchronously, Move Processing To Asynchronous Flows, Virtualize At All Levels, Cache Appropriately.
    Update: eBay Serves 5 Billion API Calls Each Month. Aren't we seeing more and more traffic driven by mashups composed on top of open APIs? APIs are no longer a bolt on, they are your application. Architecturally that argues for implementing your own application around the same APIs developers and users employ.

    Who hasn't wondered how eBay does their business? As one of the largest most loaded websites in the world, it can't be easy. And the subtitle of the presentation hints at how creating such a monster system requires true engineering: Striking a balance between site stability, feature velocity, performance, and cost.

    You may not be able to emulate how eBay scales their system, but the issues and possible solutions are worth learning from.

    Economies of Non-Scale

    Scalability forces us to think differently. What worked on a small scale doesn't always work on a large scale -- and costs are no different. If 90% of our application is free of contention, and only 10% is spent on a shared resources, we will need to grow our compute resources by a factor of 100 to scale by a factor of 10! Another important thing to note is that 10x, in this case, is the limit of our ability to scale, even if more resources are added.

    1. The cost of non-linearly scalable applications grows exponentially with the demand for more scale.
    2. Non-linearly scalable applications have an absolute limit of scalability. According to Amdhal's Law, with 10% contention, the maximum scaling limit is 10. With 40% contention, our maximum scaling limit is 2.5 - no matter how many hardware resources we will throw at the problem.

    This post discuss in further details how to measure the true cost of non linearly scalable systems and suggest a model for reducing that cost significantly.

    eHarmony.com describes how they use Amazon EC2 and MapReduce

    This slide show presents eHarmony.com experience (one of the biggest dating sites out there) in using Amazon EC2 and MapReduce to scale their service.

    Go to the Slideshare presentation

    Todd Hoff's picture

    Ehcache - A Java Distributed Cache

    Ehcache is a pure Java cache with the following features: fast, simple, small foot print, minimal dependencies, provides memory and disk stores for scalability into gigabytes, scalable to hundreds of caches
    is a pluggable cache for Hibernate, tuned for high concurrent load on large multi-cpu servers, provides LRU, LFU and FIFO cache eviction policies, and is production tested. Ehcache is used by LinkedIn to cache member profiles. The user guide says it's possible to get at 2.5 times system speedup for persistent Object Relational Caching, a 1000 times system speedup for Web Page Caching, and a 1.6 times system speedup Web Page Fragment Caching.
    From the website:

    Eight Best Practices for Building Scalable Systems

    Wille Faler has created an excellent list of best practices for building scalable and high performance systems. Here's a short summary of his points:

  • Offload the database - Avoid hitting the database, and avoid opening transactions or connections unless you absolutely need to use them.
  • What a difference a cache makes - For read heavy applications caching is the easiest way offload the database.
  • Cache as coarse-grained objects as possible - Coarse-grained objects save CPU and time by requiring fewer reads to assemble objects.
  • Don’t store transient state permanently - Is it really necessary to store your transient data in the database?
  • Location, Location - put things close to where they are supposed to be delivered.
  • Constrain concurrent access to limited resource - it's quicker to let a single thread do work and finish rather than flooding finite resources with 200 client threads.
  • Staged, asynchronous processing - separate a process using asynchronicity into separate steps mediated by queues and executed by a limited number of workers in each step.
  • Minimize network chatter - Avoid remote communication if you can as it's slower and less reliable than local computation.

  • Todd Hoff's picture

    Eucalyptus - Build Your Own Private EC2 Cloud

    Update: InfoQ links to a few excellent Eucalyptus updates: Velocity Conference Video by Rich Wolski and a Visualization.com interview Rich Wolski on Eucalyptus: Open Source Cloud Computing.

    Eucalyptus is generating some excitement on the Cloud Computing group as a potential vendor neutral EC2 compatible cloud platform. Two reasons why Eucalyptus is potentially important: private clouds and cloud portability:

    Event: MySQL Conference & Expo 2009

    The 5th annual MySQL Conference & Expo, co-presented by Sun Microsystems, MySQL and O'Reilly Media.
    Happening April 20-23, 2009 in Santa Clara, CA, at the Santa Clara Convention Center and Hyatt Regency Santa Clara, brings over 2,000 open source and database enthusiasts together to harness the power of MySQL and celebrate the huge MySQL ecosystem. All around the world, people just like you are innovating with MySQL—and MySQL is fueling the innovation engine by releasing new mission critical solutions to help you work smarter. This deeply technical conference brings all of that creativity, energy, and knowledge together in one place for four very full days.

    Early registration ends February 16, 2009.

    The largest gathering of MySQL developers, users, and DBAs worldwide, the event reflects MySQL's wide-ranging appeal and capabilities. The open atmosphere of the MySQL Conference & Expo helps IT professionals and community members launch and develop the best database applications, tools, and software. As companies of all sizes look for ways to remain competitive and manage costs, open source software and tools provide valuable and efficient solutions for the enterprise. The 2009 edition of the MySQL Conference & Expo will present strategies for businesses to not just survive, but thrive in a challenging economy.

    Through expert instruction, hands-on tutorials, and readily available MySQL developers, users at all levels gain the knowledge they need to rapidly build solid applications with MySQL that scale with the enterprise. New to the 2009 program will be MySQL Camp, a space where any and all participants can create an "unconference" within the larger event.

    HFadeel's picture

    Facebook Chat Architecture

    For those interested in building scalable systems, today I will speak about the Facebook Char architecture. Starting keynote:

    ''When your feature’s userbase will go from 0 to 70 million practically overnight, scalability has to be baked in from the start.''

    Eugene Lutuchy, lead engineer on Facebook Chat

    Facebook's Aditya giving presentation on Facebook Architecture

    Facebook's engg. director aditya talks about facebook architecture. How they use mysql, php and memcache. How they have modified the above to suit their requirements.

    Facebook, Hadoop, and Hive

    Facebook has the second largest installation of Hadoop (a software platform that lets one easily write and run applications that process vast amounts of data), Yahoo being the first.