Drupal's Scalability Makeover - You give up some control and you get back scalability
Drupal 7 is having a scalability makeover. Karoly Negyesi, Drupal Core Developer and Public Development Team Lead, explains the process in this video: Drupal 7 APIs, scalability mindset. Karoly states the general theme of the changes as: You give up some control and you get back scalability. An interesting comment on the politics of scalability?
Makeover may not be quite the right word though. A makeover implies a cosmetic change, looking better by changing the surface. Drupal's changes will go deeper than that, right to Drupal's core. It's a genuine and authentic change that will hopefully allow one of the Internet's most venerable Content Management Systems (CMSs) to compete with a constant stream of younger and sexier models.
Drupal is based on an older LAMP stack approach where PHP modules are scooped up and merged together each time a request is made to Drupal. Drupal's most intriguing idea is how it is built, expands, and changes by weaving together a single system out of individual components called modules. Built-in modules include comments, RSS, contact forms, forums, and Clean URLs. Add in modules include things like CSE to add Google's Custom Search Engine, modules to add in AdSense, CAPTCHA, and Sitemaps. Drupal establishes AOP extension points that allow modules to work remarkably well together, creating a site that feels like one single site even though it has been constructed from dozens of modules hunted and gathered from all over the digital world.
The problem is the PHP code can directly access the database and directly render to the UI, there is little required layering. Part of Drupal's amazing configurability and extensibility has been how easy it is for everything to work together by changing the database. But when there's no layering it's almost impossible to optimize the system. If you have 20 different modules they each can make 20 separate calls to the database when what we really want is one call. And because of the direct SQL access when the number of writes increases there's no systematic way to distribute the writes across multiple servers. So we see as Drupal sites grow in the number of modules and the number of users both performance and scalability tank.
The younger models architect their systems differently. Sites like Google, Amazon, Facebook are written terms of an API and a framework, a service based approach. Using a service based approach the web tier can be programmed in terms of services that themselves are scalable so the entire system is scalable. When the API is skipped there are no leverage points that can be made to scale. It becomes a big ball of mud.
More layering and more APIs is exactly the direction Drupal is taking. Exactly how is Drupal changing?
- Forget SQL use APIs. Delegate control over what's happening to the API. This allows your site to scale. APIs in Drupal have historically been thought of as an inconvenience to be bypassed. You could just write a database query and dump something to the screen. Not with Drupal 7. The UI is seconary. Drupal 7 can be run without the UI because everything now is done through APIs. Previously some operations could only be done through the UI.
- New Database Layer. For modifications there's a new query builder that allows tricks to be done to enable writing more to the database.
- Queue. Queue API allows queueing jobs to be executed on a grid. For example, the aggregator in cron when handling several hundred feeds fails, it never finishes. In the new version the RSS feeds are put in the queue and processed when there's time. This type of asynchronous processing is at the heart of many of today's largest systems.
- Tests. Extensive unit tests are being developed to catch bugs. Previously testing was largely through UIs. Developers of new modules are encouraged to write tests.
Will this work? Will this be enough? It's a promising start using best practices that have worked for other sites: queues, APIs, and abstraction layers. The move to unit testing is also smart. Given that Drupal sites are built from community contributions the new emphasis on unit tests should really help product quality going forward.
What Drupal has going against it is an incredible installed based of software that will be hard to upgrade to new ways of doing things. As Drupal user few things are more frustrating than the module upgrade dance. And since the coding practices for modules has changed so much it will be quite a challenge to get all those modules moved to the new way of doing things. Without these modules Drupal isn't as attractive an option.
I'm really hoping that Drupal works it out. The idea behind Drupal is compelling and unique. Making a single functional system from components is dream we still have not fully realized, but of everything out there Drupal comes the closest. Nature works on these principles too: composition, customization, growth through accretion. Parts keep being added on to existing systems rather than being thrown away and redesigned from scratch. In your brain you'll still find the brain of the lizard, mammal, and the primate. In your gut you'll find billions and billions of bacteria without which we could not process food. In Drupal we see a similar process happening in building software.
Compare this approach against all the widgets now available on the Internets. In comparison widgets are like impermanent tattoos. It's easy to embed widgets on your site precisely because they have nothing to do with your site. Their data is kept elsewhere. They don't integrate with your user and log-in system, your template system, your search system, your backup system, and they can't be composed together or work together. Drupal's modules can do all those things. Modules share the same templating system, the can work together, they can be configure in the UI, they can be searched and their data can be backed up.
The great thing about Drupal is how easy it is to make a functional website. It's just been hard to make a great, well performing, and scalable website. Hopefully that will change.