Ringo is an experimental, distributed, replicating key-value store based on consistent hashing and immutable data. Unlike many general-purpose databases, Ringo is designed for a specific use case: For archiving small (less than 4KB) or medium-size data items (<100MB) in real-time so that the data can survive K - 1 disk breaks, where K is the desired number of replicas, without any downtime, in a manner that scales to terabytes of data. In addition to storing, Ringo should be able to retrieve individual or small sets of data items with low latencies (<10ms) and provide a convenient on-disk format for bulk data access. Ringo is compatible with the map-reduce framework Disco and it was started at Nokia Research Center Palo Alto.
This article is a primer, intended to shine some much needed light on the logical, process oriented implementations of database scalability strategies in the form of a broad introduction. More specifically, the intent is to elaborate on the majority of these implementations by example.
The SHOP.COM Cache System is now available at http://code.google.com/p/sccache/ The SHOP.COM Cache System is an object cache system that... * is an in-process cache and external, shared Cache * is horizontally scalable * stores cached objects to disk * supports associative keys * is non-transactional * can have any size key and any size data * does auto-GC based on TTL * is container and platform neutral It was built in-house at SHOP.COM (by me) and has powered our website for years. We are open-sourcing it in the hope that it will be useful to others and to get some help in its maintenance. This is our first open source attempt and we'd appreciate any help and comments.
I thought with the job situation these days that people might be interested in some open jobs at Facebook. Here's what's available:
Facebook is hiring! We are looking for a Systems Engineer/Architect and Site Reliability Engineer. I have attached the job descriptions below. If you are interested, please contact Michelle Bostock mbostock-at-facebook.com. Thanks and Happy Holidays! Systems Architect Palo Alto, CA Description Facebook is seeking a seasoned Systems Architect to join the Operations team. The position is full-time and is based in our main office in downtown Palo Alto and will report to the Manager of Systems Operations. Responsibilities * Analyze application flow and infrastructure design to improve performance and scalability of the site * Collaborate on design of services infrastructure from servers to networking * Monitor, analyze, and make recommendations as appropriate to improve site stability and availability * Evaluate hardware and software technologies to improve site efficiency and performance * Troubleshoot and solve issues with hardware, applications, and network components * Lead team efforts from design to implementation, prioritize tasks and resources while interacting with Engineering and Operations * Document current and future configuration processes and policies * Participate in 24x7 on-call support Requirements * B.S. in Computer Science or equivalent experience * 4+ years of experience in Operations with large web farms * Extensive knowledge of web architecture and technologies, including Linux, Apache, MySQL, PHP, TCP/IP, security, HTTP, LDAP and MTAs * Strong background/interest in application and infrastructure design * Scripting and programming skills * Excellent verbal and written communication skills
Site Reliability Engineer Palo Alto, CA Description Facebook is seeking talented operations engineers to join the Site Reliability Engineering team. The ideal candidate will have strong communication skills, a passion for tinkering with Linux, and an almost insane fondness for fast-paced, seat-of-your-pants troubleshooting and crisis management. The position is full-time and is based in our main office in downtown Palo Alto. This position reports to the Manager of Site Reliability Engineering. Responsibilities * Monitor the stability and performance of the website * Remotely troubleshoot and diagnose hardware problems * Debug issues with Linux software, applications and network * Resolve technical challenges encountered in LAMP technologies * Develop and maintain monitoring tools and automation systems * Predict and respond to utilization variances across multiple datacenters * Identify and triage all outage related events * Facilitate communication, coordinate escalation, and work with subject matter experts to implement critical fixes * Automate and streamline processes * Track issues and run reports Requirements * 2-3 years+ Linux support/sys admin experience in an Internet operations environment * BA/BS in Computer Science or a related field, or equivalent experience * Working knowledge of Linux, Cisco, TCP/IP, Apache and mySQL * Experience working with network management systems and monitoring tools, such as Nagios, Ganglia and Cacti * Competency in Shell, PHP, Perl or Python. C is a plus * Solid understanding of web services architecture and commonly employed technologies * A sense of urgency in responding to and resolving critical issues that relate to the performance of the site and/or core infrastructure * Excellent verbal and written communication skills * Participation in a shifted coverage schedule, including working nights and on-call rotations
How to scale MySQL on a 32 core system with 256 threads? Diagonal scalability in a box. An impressive benchmark that achieved more than 79,000 SQL queries per second on a single 4 RU server! Is this real? If so what is the role of good old horizontal scalability? The goals of the benchmark:
- Reach a high throughput of SQL queries on a 256-way Sun SPARC Enterprise T5440
- Do it 21st century style i.e. with MySQL and ZFS , not 20th century style i.e with OraSybInf... and VxFS
- Do it with minimal tuning i.e as close as possible as out-of-the-box
Our latest strategy is taken from a great post by Paul Saab of Facebook, detailing how with changes Facebook has made to memcached they have:
...been able to scale memcached to handle 200,000 UDP requests per second with an average latency of 173 microseconds. The total throughput achieved is 300,000 UDP requests/s, but the latency at that request rate is too high to be useful in our system. This is an amazing increase from 50,000 UDP requests/s using the stock version of Linux and memcached.
To scale Facebook has hundreds of thousands of TCP connections open to their memcached processes. First, this is still amazing. It's not so long ago you could have never done this. Optimizing connection use was always a priority because the OS simply couldn't handle large numbers of connections or large numbers of threads or large numbers of CPUs. To get to this point is a big accomplishment. Still, at that scale there are problems that are often solved.
Some of the problem Facebook faced and fixed:
When you read Paul's article keep in mind all the incredible number of man hours that went into profiling the system, not just their application, but the entire software hardware stack. Then add in the research, planning, and trying different solutions to see if anything changed for the better. It's a lot of work. Notice using a nifty new parallel language or moving to a cloud wouldn't have made a bit difference. It's complete mastery of their system that made the difference.
A summary of potential strategies:
You can find their changes on github, the hub that says "git."
This is an interesting and still relevant research paper by Jim Gray, Prashant Shenoy at Microsoft Research that examines the rules of thumb for the design of data storage systems. It looks at storage, processing, and networking costs, ratios, and trends with a particular focus on performance and price/performance. Jim Gray has an updated presentation on this interesting topic: Long Term Storage Trends and You. Robin Harris has a great post that reflects on the Rules of Thumb whitepaper on his StorageMojo blog: Architecting the Internet Data Center - Parts I-IV.
An excellent article by Bryan Cantrill and Jeff Bonwick on how to write multi-threaded code. With more processors and no magic bullet solution for how to use them, knowing how to write multiprocessor code that doesn't screw up your system is still a valuable skill. Some topics:
Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind. Warning #2: this post is wild.
Kevin KellyKevin Kelly is Senior Maverick at Wired magazine. He helped launch Wired in 1993, and served as its Executive Editor until January 1999. He co-founded the ongoing Hackers' Conference, and was involved with the launch of the WELL, a pioneering online service started in 1985. He authored the best-selling New Rules for the New Economy and the classic book on decentralized emergent systems, Out of Control
One MachineThere is only one time in the history of each planet when its inhabitants first wire up its innumerable parts to make one large Machine. Later that Machine may run faster, but there is only one time when it is born. You and I are alive at this moment. Is this global web of computers, servers and trunk lines a mere mechanical circuit, a very large tool, or does it reach a threshold where something, well, different happens? Kevin Kelly's hypothesis is this: The rapidly increasing sum of all computational devices in the world connected online, including wirelessly, forms a superorganism of computation with its own emergent behaviors. I define the One Machine as the emerging superorganism of computers. It is a megasupercomputer composed of billions of sub computers. The sub computers can compute individually on their own, and from most perspectives these units are distinct complete pieces of gear. But there is an emerging smartness in their collective that is smarter than any individual computer. We could say learning (or smartness) occurs at the level of the superorganism.
The Next 6500 Days of the WebKevin Kelly recently gave a short talk on the upcoming Web 10.0 at the Web 2.0 Summit in San Francisco. It is like an update to his previous TED talk on Predicting the next 5000 days of the web. He makes us realize that the Web is only around 6500 days old and argues that the next 6500 days will be something entirely different.
Dimensions of the One MachineKevin Kelly's post on his blog The Technium back from 2007 shows us the dimensions of the One Machine: The next stage in human technological evolution is a single thinking/web/computer that is planetary in dimensions. This planetary computer will be the largest, most complex and most dependable machine we have ever built. It will also be the platform that most business and culture will run on. Today it contains approximately 1.2 billion personal computers, 2.7 billion cell phones, 1.3 billion land phones, 27 million data servers, and 80 million wireless PDAs. The processor chips of all these parts are feeding the computation of the internet/web/telecommunications system. A very rough estimate of the computing power of this Machine then is that it contains a billion times a billion, or one quintillion (10 ^ 18) transistors. There are about 100 billion neurons in the human brain. Today the Machine has as 5 orders more transistors than you have neurons in your head. And the Machine, unlike your brain, is doubling in power every couple of years at the minimum. If the Machine has 100 quadrillion transistors, how fast is it running? If we include spam, there are 196 billion emails sent every day. That's 2.2 million per second, or 2 megahertz. Every year 1trillion text messages are sent. That works out to 31,000 per second, or 31 kilohertz. Each day 14 billion instant messages are sent, at 162 kilohertz. The number of searches runs at 14 kilohertz. Links are clicked at the rate of 520,000 per second, or .5 megahertz. There are 20 billion visible, searchable web pages and another 900 billion dark, unsearchable, or deep web pages. The average number of links found on each searchable web page is 62. Assuming the same count for dynamic pages that means there's 55 trillion links in the full web. We could think of each link as a synapse -- a potential connection waiting to me made. There is roughly between 100 billion and 100 trillion synapses in the human brain, which puts the Machine in the same neighborhood as our brains. We could start by saying the Machine currently has 1 HB (Human Brain) equivalent. That measure might hold up for a decade or so, but after it gets to 100 HB, or 10,000 HB, it begins to feel like using inches to measure galactic space. Check out Kevin Kelly's blog for the conclusions and more (wild?) ideas. How do You see the future of the Web?
- One Machine and its dimensions on Kevin Kelly's blog The Technium
- Wired: We are the Web
- Kevin Kelly on the Future of the Web
- Kevin Kelly @ TED: Predicting the next 5000 days of the web
- John H. Holland: Emergence: From Chaos To Order
At 37 Signals Joshua Sierles describes how 37 Signals uses Sprinkle to configure their servers within EC2. Sprinkle defines a domain specific meta-language for describing and processing the installation of software. You can find an interesting discussion of Sprinkle's creation story by the creator himself, Marcus Crafter, in Sprinkle Some Powder!. Marcus divides provisioning tools into two categories: