advertise
« WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers | Main | Paper: Understanding and Building High Availability/Load Balanced Clusters »
Tuesday
Oct092007

High Load on production Webservers after Sourcecode sync

Hi everybody :)

We have a bunch of webservers (about 14 at this time) running Apache. As application framework we're using PHP with the APC Cache installed to improve performance. For load balancing we're using a Big F5 system with dynamic ratio (SNMP driven)

To sync new/updated sourcecode we're using subversion to "automaticly" update these servers with our latest software relases. After updating the new source to these production servers the load of the mashines is raising to hell.

While updating the servers, they are still in "production", serving webpages to the users. Otherwise the process of updating would take ages. Most of the time we're only updating in the morning hours while less users are online, because of the above issue.

My guess is, that the load is raising that high, because APC needs to recompile a bunch of new files each time. Before and while compiling the performance simply is "bad".

My goal would be to find a better solution. We want to "sync" code no matter how many users are online (in case of emergency) without taking the whole site down.

How you're handling this issues ? What do you think about the process above ? Do you may find the "problem" ? Do you have similiar issues ?

Feedback is highly welcome :)

Greetings,

Stephan Tijink
Head of Web Development

| fotocommunity GmbH & Co. KG
| Rheinwerkallee 2
| 53227 Bonn

Reader Comments (5)

Having a host provide service while its critical bits are being upgraded isn't typical. It's like having brain surgery while tap dancing. If the database schema is being changed, for example, bad things can happen.

Usually you over provision so you can bring down machines, upgrade them, and then restart them, without affecting your total load. This is what http://highscalability.com/flickr-architecture">Flickr does within their shards, for example. If you are in a data center with VPSs maybe you can temporarily rent those during the upgrade. Or maybe you can partition your machines into VMs so one can run while another is being upgraded.

November 29, 1990 | Unregistered CommenterTodd Hoff

Hi Todd,

you're right. Thats surly the better way to work.

But this can only happen in a slower pace. And that can be a problem as well. While a couple of servers already serving the "shiny new content", other boxes may still serving the old content. In my opinion that's a problem as well. That's why we walked the way as described above so far.

And what about automatisation ? Sure, i can "automaticly" (and remotly) stop the httpd deamon, update the source and restart the service again. But first i need to "disable" the server within the loadbalancer, to not produce errors to the users.

And i'm not sure how to solve this issue.

Greetings,

Stephan Tijink
Head of Web Development

| fotocommunity GmbH & Co. KG
| Rheinwerkallee 2
| 53227 Bonn

November 29, 1990 | Unregistered CommenterStephan Tijink

> While a couple of servers already serving the "shiny new content", other boxes
> may still serving the old content. In my opinion that's a problem as well.

Yep, it sucks. There are lots of answers and they all tend to be complicated and none are truly satisfactory. To some extent every upgrade is unique because every change has different implications. You'll have to take a look at what is being upgraded: software, assets, content, change in paths, change it database, etc and handle each problem individually for each upgrade. This takes a complicated upgrade infrastructure for handling roll outs.

The easiest solution is to just say you are undergoing maintenance and do all your upgrades in your maintenance window. Most users understand this and its solid and straightforward. People usually think their system is to important to go off line, but how often is that really true?

Most times you can get away with rolling through all your hosts, shutting everything down, letting your load balancer distribute new load to other servers, updating the software, restarting, and then rolling back on failure.

Maybe you can put up a read-only version of your site while its upgrading?

Or you go through the effort of making sure an upgrade is compatible with the previous version. This usually requires version numbers and lots of bug prone code to make work.

One of the interesting characteristics of http://highscalability.com/should-you-build-your-next-website-using-3teras-grid-os">grid systems is you can provision a set of new servers in parallel and the cut over all at once. This doesn't solve all problems of course. Sessions may need to be reestablished.

A stateless architecture helps because you don't have to worry about session state. A service approach like http://highscalability.com/amazon-architecture">Amazon is a real win too because you can hide database changes in your service layer. When your code is directly hitting the database any database changes are difficult to handle. Adding a service layers means you can more easily handle multiple code versions.

Sorry, but I got nothing great.

November 29, 1990 | Unregistered CommenterTodd Hoff

Do you have to deploy the code? I'm not familiar with PHP (I'm a java kind of guy), but we generally deploy the compiled binaries to avery server, one at the time, and afterwards restart the servers, one at the time. The new code is loaded on server start, so we know we are using the correct version on each server. This gives us a window of 5-10 minutes where a user may be directed across versions, but this in practice rarely a problem. We never compile on the target servers, they don't even have the ability to compile at all. This has been a conscious decision from our part: No code and no compilers on the production servers. Again, I'm not sure of it's possible to precompile php in this way, or to run it without the source, so this might not be an option for you, in which case you might have to go in one of the directions as Todd points out.
But, as Todd mentions, if your roll-out window is quite small, chances are that only a few people, if any, will notice.
Database changes is a completely different beast, though, and will require carefull thought on how to change the schema without affecting the users.

Todd also touches something that I consider important in this respect: cookies and server sessions. Everything becomes a lot more complicated once you add cookie-based sessions to the mix, especially if you don't want the users to loose their sessions during upgrade. Don't introduce cookie-based sessions or state unless you really, really need it. It might require some serious thought about your application architecture, but you will reap the benefits as the site grows.

November 29, 1990 | Unregistered CommenterKyrre Kristiansen

This si a provisioning problem that is related also to how your application is architected i think.

Important to realise that because BlixBlock is based on modules it makes it possible to upgrade a Massive web farm thats already in production. See the web site (Xykernel) is a micro kernel and allow partial upgrades.
ubuntusoftware dot net has more information about how we achieved this, but not in detail.

This is because each module ONLY needs upgrading, and so the farm does not get hit too hard.

But, you have to be very careful:
1. Copy the module Data in the DB to a new section. This is easy because module data is versioned.
2. Disable write on the existing module whilst the module is being updated.
3. Install the new Module on the farm (and hence DB) using the provisioning server
This will upgrade them farm one at a time OR all at once (if you want). Normally all at once is best.
This will run the change scripts on the DB against the copy of the data only. This is what you want, because nothing is hitting this data at this point in time.

4. Swap out the Modules now. The provisioning server does this for you after the DB upgrade has happened.
5. All Sections using the module automatically start pointing to the new module.
6. Enable write access on the module.

If you are upgrading the core, then you HAVE to take the site down.
If you are doing a complicated module deployment with a few dependencies then you may also have to take the

site down. However by versioning the dependencies ( sub services for example) you can get around this
too. However, you will have to first install the new service(s), and then the module.
Hence you have to examine each case on its own merits.

the web site ubuntusoftware .net has very little info on this, but the XyKernel explains the high level architecture which allows this to happen.

November 29, 1990 | Unregistered CommenterAnonymous

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>