By Bhavin Turakhia CEO, Directi. Covers: * Why scalability is important. Viral marketing can result in instant success. With RSS/Ajax/SOA number of requests grow exponentially with user base. Goal is to build a web 2.0 app that can server millions of users with zero downtime. * Introduction to the variables. Scalability, performance, responsiveness, availability, downtime impact, cost, maintenance effort. * Introduction to the factors. Platform selection, hardware, application design, database architecture, deployment architecture, storage architecture, abuse prevention, monitoring mechanisms, etc. * Building our own scalable architecture in incremental steps: vertical scaling, vertical partitioning, horizontal scaling, horizontal partitioning, etc. First buy bigger. Then deploy each service on a separate node. Then increase the number of nudes and load balance. Deal with session management. Remove single points of failure. Use a shared nothing cluster. Choice of master-slave multi-master setup. Replication can be synchronous or asynchronous. * Platform selection considerations. Use a global redirector for serving multiple datacenters. Add object, session API, and page cache. Add reverse proxy. Think about CDNs, Grid computing. * Tips. Use a Horizontal DB architecture from the beginning. Loosely couple all modules. Use a REST interface for easier caching. Perform application sizing ongoingly to ensure optimal hardware utilization.
By George Palmer of 3dogsbark.com. Covers: * How you start out: shared hosting, web server DB on same machine. Move two 2 machines. Minimal code changes. * Scaling the database. Add read slaves on their own machines. Then master-master setup. Still minimal code changes. * Scaling the web server. Load balance against multiple application servers. Application servers scale but the database doesn't. * User clusters. Partition and allocate users to their own dedicated cluster. Requires substantial code changes. * Caching. A large percentage of hits are read only. Use reverse proxy, memcached, and language specific cache. * Elastic architectures. Based on Amazon EC2. Start and stop instances on demand. For global applications keep a cache on each continent, assign users to clusters by location, maintain app servers on each continent, use transaction replication software if you must replicate your site globally.
By John Engales CTO, Rackspace. Good presentation of the stages a typical successful website goes through: * Stage 1 - The Beginning: Simple architecture, low complexity. no redundancy. Firewall, load balancer, a pair of web servers, database server, and internal storage. * Stage 2 - More of the same, just bigger. * Stage 3 - The Pain Begins: publicity hits. Use reverse proxy, cache static content, load balancers, more databases, re-coding. * Stage 4 - The Pain Intensifies: caching with memcached, writes overload and replication takes too long, start database partitioning, shared storage makes sense for content, significant re-architecting for DB. * Stage 5 - This Really Hurts!: rethink entire application, partition on geography user ID, etc, create user clusters, using hashing scheme for locating which user belongs to which cluster. * Stage 6 - Getting a little less painful: scalable application and database architecture, acceptable performance, starting to add ne features again, optimizing some code, still growing but manageable. * Stage 7 - Entering the unknown: where are the remaining bottlenecks (power, space, bandwidth, CDN, firewall, load balancer, storage, people, process, database), all eggs in one basked (single datacenter, single instance of data).
This article on scaling cookie baking recipes showed up in one my key word alerts. Lots of weird things show up in alerts, but I really like cookies and the parallels were just so delicious. Scaling in the cookie baking world is: the process of multiplying your recipe by many times to produce much more dough for many more cookies. It’s the difference between making enough dough in one batch to make two dozen cookies, or 2000 cookies. Hey, pretty close to the website notion. Yet as any good cook knows any scaled up recipe must be tweaked a little as things change at scale. Let's see what else we're supposed to do (quoted from the article):
CloudCamp is an interesting unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate. CloudCamp Silicon Valley 08 is scheduled for Tuesday, September 30, 2008 from 06:00 PM - 10:00 PM in Sun Microsystems' EBC Briefing Center 15 Network Circle Menlo Park, CA 94025 CloudCamp follows an interactive, unscripted unconference format. You can propose your own session or you can attend a session proposed by someone else. Either way, you are encouraged to engage in the discussion and “Vote with your feet”, which means … “find another session if you don’t find the session helpful”. Pick and choose from the conversations; rant and rave, or sit back and watch. At CloudCamp, we tend to discuss the following topics: * Infrastructure as a service (Joyent, Amazon Ec2, Nirvanix, etc) * Platform as a service (BungeeLabs, AppEngine, etc) * Software as a service (salesforce.com) * Application / Data / Storage (development in the cloud)
You want to have a scalable website. You want a website which can handle traffic spikes (think if you are getting on Digg, Slahsdot, Reddit, Techcrunch or other very popular websites frontpage). Regular hosting companies (especially shared hosting) can offer only so much. The servers usually get crushed under the load in short time. But there is hope. A new breed of hosting companies emerged recently. A new breed which can offer you the scalability you need at a fraction of the cost. Welcome to the world of “cloud computing!” (or “grid computing” or “utility computing”, which are terms for the same thing). Here's a website which compiled a list of cloud computing hosting companies (with short descriptions, prices and customer lists for each of them). Read the entire article about Cloud computing, grid computing, utility computing list at MyTestBox.com - web software reviews, news, tips & tricks.
How do we scale datacenters? Should we build a few mammoth million machine datacenters or many smaller micro datacenters? Intuitively we usually go with a bigger is better economies of scale type argument, but it may not be so. What works for Walmart may not work for White Box World. Mega datacenters may actually exhibit diseconomies of scale. It may be better to run applications over many distributed micro datacenters instead of one large one. This paper by Ken Church, Albert Greenberg, and James Hamilton, all from Microsoft, takes a look at the different issues and concludes:
Putting it all together, the micro model offers a design point with attractive performance, reliability, scale and cost. Given how much the industry is currently investing in the mega model, the industry would do well to consider the micro alternative.
Hi, I am very glad that this site exists, as I have learned more about clustering on this site than for quite some time reading stuff elsewhere. Oftentimes, one can find lots of material about clustering, but the practical real-life information is missing. Not so wih this site. I am currently planning the development of an application which has a lot of enterprise features and requirements. On the other side (if the tiny chance of success might strike us), this application would not be an in-house application of a financial institution, or something like that, but some kind of communit/web 2.0 web site. Thus it is an enterprise application with (hopefully, but surely unlikely) the user numbers of a social networking site. Each user initiated transaction involves huge resssources business logic wise (including insane amounts of encryption oprations). Of course, I do not intend to induldge into premature scaling, but to invest every minute I have into the implementation of business logic features. Nevertheless, I do not want to make some extremely bad choices which would force a complete reimplementation straight after the first tiny success - i.e. I want to start with the right technology and architecture, but wait with the implementation of the scalability and high availyability features. Because of the enterprise aspects of this software, my first thought was to use Java SE 6 and Java EE 5 technologies only in order to get all the JEE features and to be vendor independent at the same time. For implementation and testing purposes I thought of Glassfish v2UR2, Postgresql 8.3 and Solaris 10. As all of the major JEE-Appserver vendors advertise the clustering capabilities, I thought that this could not be a bad move. Hopefully, Glassfish would provide HA and scalability, if not there would always be Geronimo, JBoss, Weblogic, or Websphere. Now it seems that there are vast differences between different products: - JEE-Application servers are scaling only to some degree(?). It seems that JEE is almost exclusively used for enterprise applications like SAP ERP or applications at financial institutions? Therefore, there is no need for extreme scalability. - Terracotta seems to be very nice, as one do not have to learn the insanely huge JEE-technology stack, but can just write a mostly Java-SE-only threaded application(?). But Terracotta does not seem to scale very well either (bottleneck with write-operations caused by the master-worker architecture?) and we would be dependend on the future of the Terracotta Corporation. JEE on the other side is vendor neutral. - Oracle Coherence. This product seems to be the best distributed caching product and the holy grail of scalability(?). But it is oracle-expensive. Absolutely nothing for a tiny start-up with no financing. JEE is vendor neutral and thus possibly much cheaper. Do you think that it is possible that one could produce a JEE-Architecture which could provide massive scalability (many hundreds of AppServer) using only the Glassfish clustering features? Or am I on a completely wrong track? Do we have to plan for Oracle Coherence usage? Are there other possibilities? Thanks a lot for any opinions or hints! regards, mike
Func is used to manage a large network using bash or Python scripts. It targets easy and simple remote scripting and one-off tasks over SSH by creating a secure (SSL certifications) XMLRPC API for communication. Any kind of application can be written on top of it. Other configuration management tools specialize in mass configuration. They say here's what the machine should look like and keep it that way. Func allows you to program your cluster. If you've ever tried to securely remote script a gang of machines using SSH keys you know what a total nightmare that can be. Some example commands:
Using the command line: func "*.example.org" call yumcmd update Using the Pthon API: import func.overlord.client as fc client = fc.Client("*.example.org;*.example.com") client.yumcmd.update() client.service.start("acme-server") print client.hardware.info()Func may certainly overlap in functionality with other tools like Puppet and cfengine, but as programmers we always need more than one way to do it and definitely see how I could have used Func on a few projects.
Hello everyone, I'm designing a website/widget that my business partner and I expect to serve millions of hits daily. As such we must shard our database (and we're designing with shards in mind right from the beginning). However, the one thing I haven't been able to figure out from Googling is the best hardware to go with for shards. I'm using exclusively InnoDB tables. We'll (eventually) be running 3 groups of database servers: a) Session servers for php sessions. These will have a very high write volume. b) ID servers. These will match a couple primary indices (such as user ID) to a given shard. These will have an intense read load, plus a moderate amount of writes. c) Shard servers. These will hold the bulk of the data. These will have a high read load and a lowish write load. Group A is done as a database instead of using memcached so users aren't logged out if a memcached server goes down. As the write load is high, a pair of high performance master-master servers seems obvious. What's the ideal hardware setup for machines with this role? Maxed RAM and fast disks seem reasonable. Should I bother with RAID > 0 if I have a live backup on the other master? I hear 4 cores is optimal for InnoDB -- recommendations? Group B. Again, it looks like maxed RAM is recommended here. What about disks? Should I go for 10K or will regular SATA2 drives be okay? RAID 0, 5, 10? Cores? Should I think about slaves to a master-master setup? Group C. It seems to me these machines can be of any capacity because the data they hold is easily spread between shards. What is the query-per-second per dollar sweet spot when it comes to cores and number of disks? Should I beef these machines up, or stick with low end hardware? Should I still max the RAM? I have some other thoughts on system setup, too. As the data stored in the PHP sessions won't change frequently (it'll likely remain static for a user's entire visit -- all variable data can be stored in Group C shard servers), I'm thinking of using a memcached setup in front of the database and only pushing writes through to the database when necessary. Your thoughts? We're also starting this on a minimal budget (of course), so where in the above is it best spent? Keep in mind that I can recycle machines used in Group A & B in Group C as times goes on. Anyway, I'd love to hear from the expertise of the forum. I've been reading for a long time, and I'll be writing as our project evolves :) --Mark