One of the cool things about Mr. Scoble is he doesn't pretend to know everything, which can be an deadly boring affliction in this field. In this case Robert is asking for help in an upcoming interview. Maybe we can help? Here's Robert's plight: I’m really freaked out. I have one of the biggest interviews of my life coming up and I’m way under qualified to host it. It’s on Thursday and it’s about Scalability and Performance of Web Services. Look at who will be on. Matt Mullenweg, founder of Automattic, the company behind WordPress (and behind this blog). Paul Bucheit, one of the founders of FriendFeed and the creator of Gmail (he’s also the guy who gave Google the “don’t be evil” admonishion). Nat Brown, CTO of iLike, which got six million users on Facebook in about 10 days. What would you ask?
Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?
This is a useful post by Frank Mashraqi, Director of Business Operations & Technical Strategy for a top 50 website that delivers billions of page views per month.
Since scalability is considered a non-functional requirement, it is often overlooked in the hopes of decreasing time to market. Adding scalability down the road can decrease the time to market but only after assuming significant technical debt.
Balancing performance and scalability vs. fast iteration and cost efficiency can be a significant challenge for startups. The good news is that achieving this balance is not impossible.
Read the rest of the article here and view a presentation here.
I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing. Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to:
- Introduce Scalability
- Optimize Algorithm
- Add Hardware
- Add Parallelism
- Add Intra-Process Parallelism
- Add Inter-Porcess Parallelism
- Add Hybrid Parallelism
- Optimize Decentralization
- Control Shared Resources
- Automate Scalability
Compares MapReduce to other parallel processing approaches and suggests new paradigm for clouds and grids
Kent Langley was kind enough to create a profile template for Joyent, Kent's new employer. Joyent is an infrastructure and development company that has put together a multi-site, multi-million dollar hosting setup for their own applications and for the use of others. Joyent competes with the likes of Amazon and GoGrid in the multi-player cloud computing game and hosts Bumper Sticker: A 1 Billion Page Per Month Facebook RoR App. The template was originally created with web services in mind, not cloud providers, but I think it still works in an odd sort of way. Remember, anyone can fill out a profile template for their system and share their wonderfulness with the world.
Getting to Know You
How is your system architected?
How is your team setup?
What infrastructure do you use?
How do you handle customer support?We have a customer support team that is dedicated to helping our customers. Our services pretty much assume that you will have some degree of ability with building and deploying systems. However, if you don't, we have standard, extended plan, and partners that can all be combined in various ways to help our clients. Our support follows the sun around the world.
How is your data center setup?
SUMMARYThe Joyent Accelerator is an extremely flexible tool for building and deploying all manner of infrastructure. If you have questions, please just contact us at firstname.lastname@example.org. Email or at an address is the best way to reach us usually.
Software design patterns are an emerging tool for guiding and documenting system design. Patterns usually describe software abstractions used by advanced designers and programmers in their software. Patterns can provide guidance for designing highly scalable distributed systems. Let's see how! Patterns are in essence solutions to problems. Most of them are expressed in a format called Alexandrian form which draws on constructs used by Christopher Alexander. There are variants but most look like this:
- The pattern name
- The problem the pattern is trying to solve
- Design rationale: This tells where the pattern came from, why it works, and why experts use it
- Pipes and Filters
Brian Zimmer, architect at travel startup Yapta, highlights some worst practices jeopardizing the growth and scalability of a system: * The Golden Hammer. Forcing a particular technology to work in ways it was not intended is sometimes counter-productive. Using a database to store key-value pairs is one example. Another example is using threads to program for concurrency. * Resource Abuse. Manage the availability of shared resources because when they fail, by definition, their failure is experienced pervasively rather than in isolation. For example, connection management to the database through a thread pool. * Big Ball of Mud. Failure to manage dependencies inhibits agility and scalability. * Everything or Something. In both code and application dependency management, the worst practice is not understanding the relationships and formulating a model to facilitate their management. Failure to enforce diligent control is a contributing scalability inhibiter. * Forgetting to check the time. To properly scale a system it is imperative to manage the time alloted for requests to be handled. * Hero Pattern. One popular solution to the operation issue is a Hero who can and often will manage the bulk of the operational needs. For a large system of many components this approach does not scale, yet it is one of the most frequently-deployed solutions. * Not automating. A system too dependent on human intervention, frequently the result of having a Hero, is dangerously exposed to issues of reproducibility and hit-by-a-bus syndrome. * Monitoring. Monitoring, like testing, is often one of the first items sacrificed when time is tight.
Has a Java only Hadoop been getting you down? Now you can be Happy. Happy is a framework for writing map-reduce programs for Hadoop using Jython. It files off the sharp edges on Hadoop and makes writing map-reduce programs a breeze. There's really no history yet on Happy, but I'm delighted at the idea of being able to map-reduce in other languages. The more ways the better. From the website:
Happy is a framework that allows Hadoop jobs to be written and run in Python 2.2 using Jython. It is an easy way to write map-reduce programs for Hadoop, and includes some new useful features as well. The current release supports Hadoop 0.17.2. Map-reduce jobs in Happy are defined by sub-classing happy.HappyJob and implementing a map(records, task) and reduce(key, values, task) function. Then you create an instance of the class, set the job parameters (such as inputs and outputs) and call run(). When you call run(), Happy serializes your job instance and copies it and all accompanying libraries out to the Hadoop cluster. Then for each task in the Hadoop job, your job instance is de-serialized and map or reduce is called. The task results are written out using a collector, but aggregate statistics and other roll-up information can be stored in the happy.results dictionary, which is returned from the run() call. Jython modules and Java jar files that are being called by your code can be specified using the environment variable HAPPY_PATH. These are added to the Python path at startup, and are also automatically included when jobs are sent to Hadoop. The path is stored in happy.path and can be edited at runtime.
Kevin Clark, director of IT operations for Lucasfilm, discusses how their data center works: * Linux-based platform, SUSE (looking to change), and a lot of proprietary open source applications for content creation. * 4,500-processor render farm in the datacenter. Workstations are used off hours. * Developed their own proprietary scheduler to schedule their 5,500 available processors. * Render nodes, the blade racks (from Verari), run dual-core dual Opteron chips with 32GB of memory on board, but are expanding those to quad-core. Are an AMD shop. * 400TB of storage online for production. * Every night they write out 10-20TB of new data on a render. A project will use up to a hundred-plus terabytes of storage. * Incremental backups are a challenge because the data changes up to 50 percent over a week. * NetApps used for storage. They like the global namespace in the virtual file system. * Foundry Networks architecture shop. One of the larger 10-GbE-backbone facilities on the West coast. 350-plus 10 GbE ports that used for distribution throughout the facility and the backend. * Grid computing used for over 4 years. * A 10-Gig dark fiber connection connects San Rafael and their home office. Enables them to co-render and co-storage between the two facilities. No difference in performance in terms of where they went to look for their data and their shots. * Artists get server class machines: HP 9400 workstations with dual-core dual Opteron processors and 16GB of memory. * Challenge now is to better segment storage to not continue to sink costs into high-cost disks. * VMware used to host a lot of development environments. Allows the quick turn up of testing as the tests can be allocated across VMs. * Provides PoE (power-over-ethernet) out from the switch to all of our Web farms. * Next push on the facilities side. How to be more efficient at airflow management and power utilization.