HighScalability has Moved to! 

You may have noticed something is a little a different when visiting HighScalability today: We've Moved! has switched hosting services to House warming gifts are completely unnecessary. Thanks for the thought though.

It's been a long long long process. Importing a largish Drupal site to Wordpress and then into Squarespace is a bit like dental work without the happy juice, but the results are worth it. While the site is missing a few features I think it looks nicer, feels faster, and I'm betting it will be more scalable and more reliable. All good things.
I'll explain more about the move later in this post, but there's some admistrivia that needs to be handled to make the move complete:

  • If you have a user account and have posted on HighScalability before then you have a user account, but since I don't know your passwords I had to make new passwords up for you. So please contact me and I'll give you your password so you can login and change it. Then you can post again. Sorry for this hassle, but for posts to be assigned to authors on import user accounts had to exist so I had to create them. Another issue is that login names in Squarespace are less flexible than under Drupal. The only allowable special character is the '-'. So if your login name contains a space or '_' or a '.' I changed those characters to a '-'.
  • If you have a user account and have never posted on HighScalability before you'll have to register in order to recreate your user account. Sorry, but with so many users I couldn't recreate all the user accounts by hand.
  • If you could switch RSS over to that would help a lot. The old RSS will still work.
  • A lot of links were broken during the move due to the imperfection of the export/import process. Some of the formatting looks a little strange now too. It's going to take me a while to fix all these problems. If there's anything you see that needs fixing please shoot me an email.
  • There's no tag cloud anymore, but there's an All Posts page that lists every post by category, by week, and by month.

This isn't pleasant but there was no way I could make the procees transparent. I appreciate your help and understanding.

Why was the move made?

I've played with and considered virtually every CMS available. I went with Squarespace based on weighing a few of my own personal goals and pain points:

Eating my own dog food. I've been a big advocate of cloud based memory grids. Since Squarespace uses a memory grid architecture I felt it would be a good experience to make use of their service (if I could make it work).

End-to-end management. I don't want to have to worry about my site. Ever. I want it to be managed end-to-end by the hosting service. In industry when they say they offer a managed service they usually mean the hardware/network/software stack are managed, you are still responsible for site uptime. The problem is a Drupal + LAMP + VPS stack isn't a hands off affair. Things go wrong and you have to be always on call. That's fine if you have a few people working a site, they can take turns handling the load. But if your are alone or on vacation, it doesn't work. You are always in the back of your mind worrying that something might be going wrong. By leaving the management of the entire stack to the hosting service then this worry largely goes away, assuming the host is good at their job.

Performance, scalability, reliability. I want the system to feel fast, to handle a lot of users, and to be reliable. For my purposes I don't really expect to have more users than I do now so I'm not looking for infinite headroom. But for the traffic I do have there should be no problems.

Price. A managed VPS with any sort of capability is expensive for a site that doesn't generate a lot of revenue yet gets too many users for shared hosting. A price point between shared hosting and a managed VPS would be very attractive. Some of the end-to-end managed services are enterprise plays and are way too expensive for the little guy.

Support. You are always at the mercy of your host, even with a cloud or colo. Good support you can count on makes all the difference when you are trying to get a site up and running and when disaster hits. Some service providers promise to get back to you within 8 hours. This is the Internet, 8 hours might as well be forever. No thanks. 

So far my experience with Squarespace has been very possitive accross all my criteria. 

They manage the site completely so my end-to-end management requirement is satisfied. A site is managed through a truly innovative browser based GUI that makes template customization and other operations quite straightforward. It will also tell you cool things like how many RSS readers you have and which posts are getting the most traffic.

I am impressed with how robust the system feels and how fast it is even doing large operations. I never feel like I'm going to break it or corrupt it and I'm almost never waiting on it to finish operations. Things just work. There's a lot of quality thought and work that's been put into the system and it shows.

Will it scale? Obviously I haven't tested that out yet, but it seems to handle largers sites so I'm fairly confident.

The price is quite reasonable, but I feel it's enough that they can make money without having to cut corners. It's a good value.

Support is excellent. Questions are answered within a short period of time and they are generally helpful. And I've asked some really stupid questions. When I couldn't set the date using a calendar widget they hardly even laughed. What they did do is make a screencast showing me what I needed to do and I was back in business. 

Or course nothing is perfect. Those imperfections will show up in a lack of a few features and some of what needs to happen to make the transition to the new system complete.

It's clear they've put a lot of work in their back-end and front-end. What is missing are the wide numbers of modules you'll find for products like Drupal, Joomla, and Wordpress. Squarespace offers a small set of widgets, which are good, but the widget set is small and isn't as configurable as for other products. Part of the problem is that Squarespace doesn't offer an API for their system so third parties can't make widgets. So simple widgets like avatars, tag clouds, today's popular posts, the most popular posts of all time, recent forum posts, read counts, and logged in users are not available. 

Other problems are in the process of moving an existing site into Squarespace.

Drupal is not one of Squarespace's supported import platforms. Drats! So I had to write scripts to export Drupal to Wordpress in such a way that as much of the meta data as possible was available in Squarespace. This was not easy to do. Squarespace does not have their own defined import format, which would have made life a lot easier.

Some other problems is there's no way to bulk operations to import users and map URLs. Each user has to be created by hand. If you don't have the users already created when posts are imported then the posts won't be assigned to the correct author. 

One of the most important things to do when moving a site is preserve your old URLs. Every service sucks at this. Squarespace does have a way to map URLs, but again there's no way to bulk import the mappings. You have to do them one by one through their GUI. It's an enourmous pain. But it was doable, so that's something at least.

These issues weren't serious enough for me not to go with Squarespace, but a site looking to build a real community may have to look a little closer.

So that's the story. There's a lot of work yet to fix broken links and formatting, but I hope that won't take too long.

Please let me know what you think.


Todd Hoff


Private Data Cloud: 'Do It Yourself' with Eucalyptus 

Private Clouds provide many of the benefits of the Public Cloud, namely elastic scalability, faster time-to-market and reduced OpEX, all within the Enterprises own perimeter that complies to its governance. Leading commercial Private Cloud products include VMware, Univa UD, Unisys. Open source solutions include pro ducts like Globus Nimbus, Enomaly Elastic Computing Platform, RESERVOIR and Eucalyptus.

Read more at:


Moving Beyond End-to-End Path Information to Optimize CDN Performance

You go through the expense of installing CDNs all over the globe to make sure users always have a node close by and you notice something curious and furious: clients still experience poor latencies. What's up with that? What do you do to find the problem? If you are Google you build a tool (WhyHigh) to figure out what's up. This paper is about the tool and the unexpected problem of high latencies on CDNs. The main problems they found: inefficient routing to nearby nodes and packet queuing. But more useful is the architecture of WhyHigh and how it goes about identifying bottle necks. And even more useful is the general belief in creating sophisticated tools to understand and improve your service. That's what professionals do. From the abstract:
Replicating content across a geographically distributed set of servers and redirecting clients to the closest server in terms of latency has emerged as a common paradigm for improving client performance. In this paper, we analyze latencies measured from servers in Google’s content distribution network (CDN) to clients all across the Internet to study the effectiveness of latency-based server selection. Our main result is that redirecting every client to the server with least latency does not suffice to optimize client latencies. First, even though most clients are served by a geographically nearby CDN node, a sizeable fraction of clients experience latencies several tens of milliseconds higher than other clients in the same region. Second, we find that queueing delays often override the benefits of a client interacting with a nearby server.
To help the administrators of Google’s CDN cope with these problems, we have built a system called WhyHigh. First, WhyHigh measures client latencies across all nodes in the CDN and correlates measurements to identify the prefixes affected by inflated latencies. Second, since clients in several thousand prefixes have poor latencies, WhyHigh prioritizes problems based on the impact that solving them would have, e.g., by identifying either an AS path common to several inflated prefixes or a CDN node where path inflation is widespread. Finally, WhyHigh diagnoses the causes for inflated latencies using active measurements such as traceroutes and pings, in combination with datasets such as BGP paths and flow records. Typical causes discovered include lack of peering, routing misconfigurations, and side-effects of traffic engineering. We have used WhyHigh to diagnose several instances of inflated latencies, and our efforts over the course of a year have significantly helped improve the performance offered to clients by Google’s CDN.

Related Articles

  • Product: Akamai
  • Tuesday

    How Ravelry Scales to 10 Million Requests Using Rails

    Tim Bray has a wonderful interview with Casey Forbes, creator of Ravelry, a Ruby on Rails site supporting a 400,000+ strong community of dedicated knitters and crocheters.

    Casey and his small team have done great things with Ravelry. It is a very focused site that provides a lot of value for users. And users absolutely adore the site. That's obvious from their enthusiastic comments and rocket fast adoption of Ravelry.

    Ten years ago a site like Ravelry would have been a multi-million dollar operation. Today Casey is the sole engineer for Ravelry and to run it takes only a few people. He was able to code it in 4 months working nights and weekends. Take a look down below of all the technologies used to make Ravelry and you'll see how it is constructed almost completely from free of the shelf software that Casey has stitched together into a complete system. There's an amazing amount of leverage in today's ecosystem when you combine all the quality tools, languages, storage, bandwidth and hosting options.

    Now Casey and several employees makes a living from Ravelry. Isn't that the dream of any small business? How might you go about doing the same thing?



  • 10 million requests a day hit Rails (AJAX + RSS + API)
  • 3.6 million pageviews per day
  • 430,000 registered users. 70,000 active each day. 900 new sign ups per day.
  • 2.3 million knitting/crochet projects, 50,000 new forum posts each day, 19 million forum posts, 13 million private messages, 8 million photos (the majority are hosted by Flickr).
  • Started on a small VPS and demand exploded from the start.
  • Monetization: advertisers + merchandise store + pattern sales


  • Ruby on Rails (1.8.6, Ruby GC patches)
  • Percona build of MySQL
  • Gentoo Linux
  • Servers: Silicon Mechanics (owned, not leased)
  • Hosting: Colocation with Hosted Solutions
  • Bandwidth: Cogent (very cheap)
  • Capistrano for deployment.
  • Nginx is much faster and less memory hungry than Apache.
  • Xen for virtualization
  • HAproxy for load balancing.
  • Munin for monitoring.
  • Tokyo Cabinet/Tyrant for large object caching
  • Nagios for alerts
  • HopToad for exception notifications.
  • NewRelic for tuning
  • Syslog-ng for log aggregation
  • S3 for storage
  • Cloudfront as a CDN
  • Sphinx for the search engine
  • Memcached for small object caching


  • 7 Servers (Gentoo Linux). Virtualization (Xen) creates 13 virtual servers.
  •  Front end uses Nginx and HAproxy. The request flow: nginx -> haproxy -> (load balanced) -> apache + mod_passenger. Nginx is first so it can provide functions like serving static files and redirects before passing a request to HAproxy for load balancing. Apache is probably used because it is more configurable than Nginx.
  •  One small backup server.
  • One small utility server for non-critical processes and staging.
  •  2 32 GB of RAM servers for the master database, slave database, Sphinx search engine.
  •  3 application servers running 6 Apache Passenger and Ruby instances, each capped at a pool size of 20. 6 quad core processors and 40 GB of RAM total. There's RAM to spare.
  • 5 terabytes of storage on Amazon S3. Cloudfront is used as a CDN.
  • Tokyo Cabinet/Tyrant is used instead of memcached in some places for caching larger objects. Specifically markdown text that has been converted to HTML.
  • HAproxy and Capistrano are used for rolling deploys of new versions of the site without affecting performance/traffic.

    Lessons Learned

  • Let your users create the site for you. Iterate and evolve. Start with something that works, get people in it, and build it together. Have a slow beta. Invite new people on slowly. Talk to the users about what they want every single day. Let your users help build your site. The result will be more reassuring, comforting, intuitive, and effective.
  • Let your users fund you. Ravelry was funded in part from users who donated $71K. That's a gift. Not stock. Don't give up equity in your company. It took 6 months of working full time and bandwidth/server costs before they started making a profit and this money helped bridge that gap. They key is having a product users feel passionate about and being the kind of people users feel good about supporting. That requires love and authenticity.
  • Become the farmer's market of your niche. Find an under serviced niche. Be anti-mass market. You don't always have to create something for the millions. The millions will likely yawn. Create something and do a good job for a smaller passionate group and that passion will transfer over to you.
  • Success is not about scale, it’s about sustainable execution. This lovely quote is from Jeff Putz.
  • The database is always the problem. Nearly all of the scaling/tuning/performance related work is database related. For example, MySQL schema changes on large tables are painful if you don’t want any downtime. One of the arguments for schemaless databases.
  • Keep it fun. Casey switched to Ruby on Rails because he was looking to make programming fun again. That reenchantment helped make the site possible.
  • Invent new things that delight your users. Go for magic. Users like that. This is one of Costco's principles too. This link, for example, describes some very innovative approaches to forum management.
  • Ruby rocks. It's a fun language and allowed them to develop quickly and release the site twice a day during beta.
  • Capture more profit using low margin services. Ravelry has their own merchandise store, wholesale accounts, printers, and fulfillment company. This allows them to keep all their costs lower so their profits aren't going third party services like CafePress.
  • Going from one server to many servers is the hardest transition to make. Everything changes and becomes more difficult. Have this transition in mind when you are planning your architecture.
  • You can do a lot with a little in today's ecosystem. It doesn't take many people or much money anymore to build a complex site like Ravelry. Take a look at all the different programs Ravelry uses to build there site and how few people are needed to run the site.

    Some people complain that there aren't a lot of nitty gritty details about how Raverly works. I think it should be illuminating that a site of this size doesn't need to have a lavish description of arcane scaling strategies. It can now be built from off the shelf parts smartly put together. And that's pretty cool.

    Related Articles

  • Ravelry gets funding from its own community.
  • Appache/Passenger vs Nginx/Mongrel by Matt Darby
  • The Ravelry Blog (note the number of comments on posts).
  • Podcast - Episode 4: Y Ravelry (featuring Jess & Casey)
  • Beta testing and beyond
  • Hacker News Thread - I included the reasoning from a user named Brett for why the HTTP request path is "Nginx out front passing requests to HAProxy and THEN to Apache + mod_rails."
  • Sunday

    PaxosLease: Diskless Paxos for Leases

    PaxosLease is a distributed algorithm for lease negotiation. It is based on Paxos, but does not require disk writes or clock synchrony. PaxosLease is used for master lease negotation in the open-source Keyspace replicated key-value store.


    Space Based Programming in .NET

    Space-based architectures are an alternative to the traditional n-tier model for enterprise applications. Instead of a vertical tier partitioning, space based applications are partitioned horizontally into self-sufficient units. This leads to almost linear scalability of stateful, high-performance applications.

    This is a recording of a talk I did last month where I introduce space based programming and demonstrate how that works in practice on the .NET platform using Oracle Coherence and GigaSpaces.


    Infinispan narrows the gap between open source and commercial data caches 

    Recently I attended a lecture presented by Manik Surtani , JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.

    Why did I attend? Well, over the past few years I have worked on projects that have used commercial distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence . These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?

    Read more at:


    Hot Links for 2009-9-17 

  • Save 25% on Hadoop Conference Tickets
    Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City.

    Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica,, and other companies.

    Readers get a 25% discount if you register by Sept. 21:

  • Essential storage tradeoff: Simple Reads vs. Simple Writes by Stephan Schmidt. Data in denormalized chunks is easy to read and complex to write.
  • Kickfire's approach to parallelism by DANIEL ABADI. Kickfire uses column-oriented storage and execution to address I/O bottlenecks and FPGA-based data-flow architecture to address processing and memory bottlenecks.
  • "Just in Time" Decompression in Analytic Databases by Michael Stonebraker. A DBMS that is optimized for compression through and through--especially with a query executor that features just in time decompression will not just reduce IO and storage overhead, but also offer better query performance with lower CPU resource utilization.
  • Reverse Proxy Performance – Varnish vs. Squid (Part 2) by Bryan Migliorisi. My results show that in raw cache hit performance, Varnish puts Squid to shame.
  • Building Scalable Databases: Denormalization, the NoSQL Movement and Digg by Dare Obasanjo. As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.
  • How To Make Life Suck Less (While Making Scalable Systems) by Bradford Stephens. Scalable doesn’t imply cheap or easy. Just cheaper and easier.
  • Some perspective to this DIY storage server mentioned at Storagemojo by by Joerg Moellenkamp. It's about making decision. Application and hardware has to be seen as one. When your application is capable to overcome the limitations and problems of such ultra-cheap storage
  • Wednesday

    The VeriScale Architecture - Elasticity and efficiency for private clouds

    The modern datacenter is evolving into the network centric datacenter model, which is applied to both public and private cloud computing. In this model, networking, platform, storage, and software infrastructure are provided as services that scale up or down on demand. The network centric model allows the datacenter to be viewed as a collection of automatically deployed and managed application services that utilize underlying virtualized services. Providing sufficient elasticity and scalability for the rapidly growing needs of the datacenter requires these collections of automatically-managed services to scale efficiently and with essentially no limits, letting services adapt easily to changing requirements and workloads. Sun’s VeriScale architecture provides the architectural platform that can deliver these capabilities. Sun Microsystems has been developing open and modular infrastructure architectures for more than a decade. The features of these architectures, such as elasticity, are seen in current private and public cloud computing architectures, while the non-functional requirements, such as high availability and security, have always been a high priority for Sun. The VeriScale architecture leverages experience and knowledge from many Sun customer engagements and provides an excellent foundation for cloud computing. The VeriScale architecture can be implemented as an overlay, creating a virtual infrastructure on a public cloud or it can be used to implement a private cloud.

    Read more at:


    Paper: A practical scalable distributed B-tree

    We've seen a lot of NoSQL action lately built around distributed hash tables. Btrees are getting jealous. Btrees, once the king of the database world, want their throne back. Paul Buchheit surfaced a paper: A practical scalable distributed B-tree by Marcos K. Aguilera and Wojciech Golab, that might help spark a revolution.

    From the Abstract:

    We propose a new algorithm for a practical, fault tolerant, and scalable B-tree distributed over a set of servers. Our algorithm supports practical features not present in prior work: transactions that allow atomic execution of multiple operations over multiple B-trees, online migration of B-tree nodes between servers, and dynamic addition and removal of servers. Moreover, our algorithm is conceptually simple: we use transactions to manipulate B-tree nodes so that clients need not use complicated concurrency and locking protocols used in prior work. To execute these transactions quickly, we rely on three techniques: (1) We use optimistic concurrency control, so that B-tree nodes are not locked during transaction execution, only during commit. This well-known technique works well because B-trees have little contention on update. (2) We replicate inner nodes at clients. These replicas are lazy, and hence lightweight, and they are very helpful to reduce client-server communication while traversing the B-tree. (3)We replicate version numbers of inner nodes across servers, so that clients can validate their
    transactions efficiently, without creating bottlenecks at the root node and other upper levels in the tree.

    Distributed hash tables are scalable because records area easily distributed across a cluster which gives the golden ability to perform many writes in parallel. The problem is keyed access is very limited.

    A lot of the time you want to iterate through records or search records in a sorted order. Sorted could mean time stamp order, for example, or last name order as another example.

    Access to data in sorted order is what btrees are for. But we simply haven't seen distributed btree systems develop. Instead, you would have to use some sort of map-reduce mechanism to efficiently scan all the records or you would have to maintain the information in some other way.

    This paper points the way to do some really cool things at a system level:

  • It's distributed so it can scale dynamically in size and handle writes in parallel.
  • It supports adding and dropping servers dynamically, which is an essential requirement for architectures based on elastic cloud infrastructures.
  • Data can be migrated to other nodes, which is essential for maintenance.
  • Multiple records can be involved in transactions which is essential for the complex data manipulations that happen in real systems. This is accomplished via a version number mechanism that looks something like MVCC.
  • Optimistic concurrency, that is, the ability to change data without explicit locking, makes the job for programmers a lot easier.

    These are the kind of features needed for systems in the field. Hopefully we'll start seeing more systems offering richer access structures while still maintaining scalability.