advertise
Tuesday
Nov132007

Friendster Lost Lead Because of a Failure to Scale

Hey, this scaling stuff might just be important. Jim Scheinman, former Bebo and Friendster exec, puts the blame squarely on Friendster's inability to scale as why they lost the social networking race: VB: Can you tell me a bit about what you learned in your time at Friendster? JS: For me, it basically came down to failed execution on the technology side — we had millions of Friendster members begging us to get the site working faster so they could log in and spend hours social networking with their friends. I remember coming in to the office for months reading thousands of customer service emails telling us that if we didn’t get our site working better soon, they’d be ‘forced to join’ a new social networking site that had just launched called MySpace…the rest is history. To be fair to Friendster’s technology team at the time, they were on the forefront of many new scaling and database issues that web sites simply hadn’t had to deal with prior to Friendster. As is often the case, the early pioneer made critical mistakes that enabled later entrants to the market, MySpace, Facebook & Bebo to learn and excel. As a postscript to the story, it’s interesting to note that Kent Lindstrom (CEO of Friendster) and the rest of the team have done an outstanding job righting that ship. Hopefully with all the quality information out now on the intertubes visionaries can concentrate on making good stuff instead of always fighting the plumbing. When you think about, is there any industry or group that gives so much value away for free as the software community? I don't think so. We are an amazingly giving group and the world has benefited greatly from that impulse. A thought for Thanksgiving.

Click to read more ...

Monday
Nov122007

a8cjdbc - Database Clustering via JDBC

Practically any software project nowadays could not survive without a database (DBMS) backend storing all the business data that is vital to you and/or your customers. When projects grow larger, the amount of data usually grows larger exponentially. So you start moving the DBMS to a separate server to gain more speed and capacity. Which is all good and healthy but you do not gain any extra safety for this business data. You might be backing up your database once a day so in case the database server crashes you don't lose EVERYTHING, but how much can you really afford to lose? Well clearly this depends on what kind of data you are storing. In our case the users of our solutions use our software products to do their everyday (all day) work. They have "everything" they need for their business stored in the database we are providing. So is 24 hours of data loss acceptable? No, not really. One hour? Maybe. But what we really want is a second database running with the EXACT same data. We mostly use PostgreSQL which does not have built in database replication. There is some solution based on triggers to replicate the data from one database to another one. We have learned that setting all this up on an existing database with plenty of tables is rather complicated and changing the database structure afterwards can not be done with simple create/alter statements anymore. And since we ARE running solutions that constantly change and improve, we need to be able to deploy updates including database structure changes quickly and easily. So what we really wanted was a transparent JDBC layer that does the replication for us. We tested a great solution called "Sequoia", but it is also a rather heavy-weight product with a lot of features that did not really help in the performance department and that we didn't need anyway. What we needed was:

  • a JDBC driver so the application does not know anything about the replication
  • of course: transactional safety for write operations
  • load-balanced reads (we are running 2 database servers, so why waste the ability to do parallel reads from 2 servers and almost multiply the performance by 2?)
  • for backups: the ability to detach one server, do the backup on that machine and then reattach the server
  • automatic and transparent failover / failsafe
  • Fast In-VM-Replication - no serialisation
  • Easy integration

    Click to read more ...

Monday
Nov122007

Scaling Using Cache Farms and Read Pooling 

Michael Nygard talks about Two Ways To Boost Your Flagging Web Site. The idea behind cache farms is to move memory devoted to the various caching layers into one large farm of caches, as with memcached. The idea behind read pools is to allocate your database read requests to a pool of dedicated read servers, thus offloading the write server. Using a combination of the strategies you aren't forced to scale up the database tier to scale your website.

Click to read more ...

Monday
Nov122007

Slashdot Architecture - How the Old Man of the Internet Learned to Scale

Slashdot effect: overwhelming unprepared sites with an avalanche of reader's clicks after being mentioned on Slashdot. Sure, we now have the "Digg effect" and other hot new stars, but Slashdot was the original. And like many stars from generations past, Slashdot plays the elder statesman's role with with class, dignity, and restraint. Yet with millions and millions of users Slashdot is still box office gold and more than keeps up with the young'ins. And with age comes the wisdom of learning how to handle all those users. Just how does Slashdot scale and what can you learn by going old school? Site: http://slashdot.org

Information Sources

  • Slashdot's Setup, Part 1- Hardware
  • Slashdot's Setup, Part 2- Software
  • History of Slashdot Part 3- Going Corporate
  • The History of Slashdot Part 4 - Yesterday, Today, Tomorrow

    The Platform

  • MySQL
  • Linux (CentOS/RHEL)
  • Pound
  • Apache
  • Perl
  • Memcached
  • LVS

    The Stats

  • Started building the system in 1999.
  • 5.5 million user visits per month.
  • 7,000 comments are added every day.
  • Over 9 million pages views daily.
  • Over 21 million comments.
  • Average monthly bandwidth usage is around 40-50 mbit/sec.
  • For the same story Kottke.org found Slashdot delivered 4 times more users than Digg. So Slashdot ain't dead yet.
  • From The History of Slashdot Part 4: On [September 11th] the mainstream news websites buckled under the loads, and although we had to turn off logging, we managed to stay up, sharing news in a time where it was often difficult to get. That was the day where the team of engineers that make this site happen pulled together and did the impossible, forcing our limited little hardware cluster to handle traffic that was probably triple or quadruple a normal day.

    The Hardware Architecture

  • Data center design is similar to all the other SourceForge, Inc. sites and has proven to scale well.
  • Two Active-Active gigabit uplinks.
  • A pair of Cisco 7301s serve as gateway/border routers. Perform some basic filtering. Filtering is tiered to spread the load.
  • Foundry BigIron 8000s act as core switches/routers.
  • Foundry FastIron 9604s are used as switches for some racks.
  • A pair of Rackable System (1Us; P4 Xeon 2.66Gz, 2G RAM, 2x80GB IDE, running CentOS and LVS) serve as load balancing firewalls, distributing traffic to web servers. BIG-IP F5's are being deployed in their new datacenter.
  • All servers are at least RAID 1.
  • 16 web servers: - Running Red Hat 9. - Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives. - Two serve static content: javascript, images and the front page for non logged-in users. - Four serve the front page to logged in users - 10 handle comment pages. - Host roles are changed in response to load. - All NFS mounts are in read-only mode.
  • NFS server is a Rackable 2U with 2 Xeon 2.4Ghz processors, 2GB of RAM, and 4x36GB 15K RPM SCSI drives.
  • 7 database servers: - All run CentOS 4. - 2 in a Master-master configuration: -- Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI -- One master is the write only database. -- One master is the read only database. -- They can failover at any time and switch roles. - 2 reader databases: -- Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drive -- Each syncs from one of the master databases. -- Can add more to scale, but plenty fast enough for now. - 3 miscellaneous databases -- Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives -- Accesslog writer and accesslog reader. Separate databases are used because moderation and stats require a lot of CPU time for computation. -- Search database.

    The Software Architecture

  • Logged in and non-logged in users are treated differently. - Non-logged in user see the same page. This page is a static page that is updated every couple of minutes. - Logged in users have custom options which can't be cached so generating pages for these users take more resources.
  • 6 pound servers (1 for SSL) are used as reverse proxies: - If a request can't be handled it is forwarded on to a web server. - Pound servers are run on the same machines as the web servers. - They are distributed for load balancing and redundancy. - SSL is handled by the pound server so the web server doesn't need to support SSL.
  • 16 apache web servers (version 1.3): - Software is mounted from /usr/local on the read-only NFS server. - The images are kept simple. All that is compiled in is: -- mod_perl -- lingerd to free up RAM during delivery. -- mod_auth_useragent to block bots. - 1 For SSL. - 2 for static (.shtml) requests. - 4 for the dynamic homepage. - 6 for dynamic comment-delivery pages (comments, article, pollBooth.pl). - 3 for all other dynamic scripts (ajax, tags, bookmarks, firehose).
  • Reasons for segregating apache servers to different roles: - Isolate the servers in case there are performance problems or a DDoS attack on a specific page. The rest of the system will function even when one part is failing. - For efficiency reasons like httpd-level caching and MaxClients tuning. The web server can be tuned differently for each role. MaxClients is set to 5-15 for dynamic web servers and 25 for static servers. The bottleneck is CPU, not RAM so if requests aren't process quickly then something's wrong and queuing more requests won't help the CPU process them any faster.
  • Using read-only mounted has contributed to the robustness of the system. Tasks that write to /usr/local, for example, to update index.html every second, run on the NFS server.
  • Use their own SQL API built on top of DBD::mysql and DBI.pm.
  • A huge performance boost was provided by caching users, stories, and comment text using memcached.
  • Most data access is through get and set methods written custom for each data type and through methods that perform one specific update or select.
  • The Multiple-master replication architecture allows keeping the site fully live even during blocking queries like ALTER TABLE.
  • Multi-pass log processing is to detect abuse and picking which users get mod points.
  • The moderation system was created in response to spam. It was just a few friends at first and then a lot of friends. This didn't scale. So the 'mod points' system was introduced so that any user who contributed to the system could moderate the system.
  • Active users are banned to protect from excessive usage from bots.

    Lessons Learned

  • The most creatively satisfying period was when money was tight, the group was small, and everyone was helping everyone else with anything that needed to be done.
  • Don't waste your time optimizing code because you are too cheap to buy more machines. Buy the hardware and spend your time working on features.
  • Sell out to a large corporation and you lose control. There's continual pressure to go to the dark side of creating new products, blending in advertiser supplied content, and serving giant ads.
  • Say no to the forces that want you to become just like everyone else. Though many competitors have come and gone, Slashdot is still around because they: continue to maintain editorial independence, moderate advertising quantity with a clear distinction between advertising and content, and of course, that we continue to select the right stories to appeal to our existing audience... not to spend our time courting other audiences that would only dilute the discussions that bring so many of you here day after day.
  • Segregate servers into different policy domains so you can optimize their configuration.
  • Optimizing usually means caching, caching, caching.
  • Tables not fully, but mostly normalized. This improves performance in most cases.
  • Over the last seven years the process of developing database backed websites has changed: The database used to be the bottleneck: centralized, hard to expand, slow. Now even a cheap DB server can run a pretty big site if you code defensively, and thanks to Moore's Law, memcached, and improvements in open-source database software, that part of the scaling issue isn't really a problem until you're practically the size of eBay. It's an exciting time to be coding web applications.

    Click to read more ...

  • Sunday
    Nov112007

    Linkedin architecture

    Hi, An interesting post on Linkedin architecture: http://furiouspurpose.blogspot.com/2007/11/qcon-linkedin-architecture.html

    Click to read more ...

    Friday
    Nov092007

    Paper: Container-based Operating System Virtualization: A Scalable, High-performance Alternative to Hypervisors

    One stumbling block of the the great march towards virtualization is the relatively poor performance of resource hungry applications like databases. We are told to develop and test using VMs, but deploy without them. Which kind of sucks IMHO. Maybe better virtualization technology can remove this split. This paper talks about a different approach to virtualization called "container-based" virtualization that can reportedly double the performance of traditional hypervisor systems like Xen. It does this by trading isolation for efficiency. Rather than maintaining complete isolation between VMs the container approach shares resources between VMs and thus gives higher performance while still guaranteeing strong fault, resource, and security isolation. It's yet another battle in computing's endless war of creating and destroying abstraction layers. I learned a lot from from this paper because of how it compared and contrasted traditional hypervisor and container based virtualization strategies. Good job.

    Click to read more ...

    Thursday
    Nov082007

    ID generator

    Hi, I would like feed back on a ID generator I just made. What positive and negative effects do you see with this. It's programmed in Java, but could just as easily be programmed in any other typical language. It's thread safe and does not use any synchronization. When testing it on my laptop, I was able to generate 10 million IDs within about 15 seconds, so it should be more than fast enough. Take a look at the attachment.. (had to rename it from IdGen.java to IdGen.txt to attach it) IdGen.java

    Click to read more ...

    Thursday
    Nov082007

    scaling drupal - an open-source infrastructure for high-traffic drupal sites

    the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands a small website. if you are lucky, eventually the time comes when you need to service more users than your system can handle. at some point, you'll start looking at your hardware and network deployment.

    read more.

    Click to read more ...

    Wednesday
    Nov072007

    What CDN would you recommend?

    Hi all, a I run a site that after a complete redesign have gotten a lot more traffic. The site provides free flash games, so the biggest traffic share goes to serving flash files (from about 100K and up to several megabytes in size each.) I currently host the entire site on a hosting provider that have no traffic limits. But since they are very cheap (yet have served me very well all the time with at least 99,9% uptime), I don't trust them in allowing me to continue consuming more and more bandwidth. I just guess I'm going to reach some internal limit they have on day, so I'm looking into moving all the flash content over to a content delivery network of some sort. Some recent traffic stats: August: 12 GB September: 22 GB October: 55 GB November: Currently 2,3 GB pr day on average, but it's rising.. I've been looking into Amazon S3, but have not decided on anything yet. So therefor I'm asking if there are any other provides I should consider, that operates within the same price range as Amazon does (or lower)? Best regards, Christian Felde

    Click to read more ...

    Tuesday
    Nov062007

    Product: ChironFS

    If you are trying to create highly available file systems, especially across data centers, then ChironFS is one potential solution. It's relatively new, so there aren't lots of experience reports, but it looks worth considering. What is ChironFS and how does it work? Adapted from the ChironFS website: The Chiron Filesystem is a Fuse based filesystem that frees you from single points of failure. It's main purpose is to guarantee filesystem availability using replication. But it isn't a RAID implementation. RAID replicates DEVICES not FILESYSTEMS. Why not just use RAID over some network block device? Because it is a block device and if one server mounts that device in RW mode, no other server will be able to mount it in RW mode. Any real network may have many servers and offer a variety of services. Keeping everything running can become a real nightmare!

    Click to read more ...