advertise
« Sponsored Post: ScaleOut, aiCache, WAPT, Karmasphere, Kabam, Opera Solutions, Newrelic, Cloudkick, Membase, Joyent, CloudSigma, ManageEngine, Site24x7 | Main | Strategy: Eliminate Unnecessary SQL »
Monday
Feb282011

A Practical Guide to Varnish - Why Varnish Matters

This is a guest post by Jeff Su from Factual.

What is Varnish?

Varnish is an open source, high performance http accelerator that sits in front of a web stack and caches pages.  This caching layer is very configurable and can be used for both static and dynamic content.

One great thing about Varnish is that it can improve the performance of your website without requiring any code changes.  If you haven’t heard of Varnish (or have heard of it, but haven’t used it), please read on.  Adding Varnish to your stack can be completely noninvasive, but if you tweak your stack to play along with some of varnish’s more advanced features, you’ll be able to increase performance by orders of magnitude.

Some of the high profile companies using Varnish include: TwitterFacebookHeroku and LinkedIn.

Our Use Case

One of Factual’s first high profile projects was Newsweek’s “America’s Best High Schools: The List”. After realizing that we had only a few weeks to increase our throughput by tenfold, we looked into a few options. We decided to go with Varnish because it was noninvasive, extremely fast and battlefield tested by other companies. The result yielded a system that performed 15 times faster and a successful launch that hit the front page of msn.com.  Varnish now plays a major role in our stack and we’re looking to implement more performance tweaks designed with Varnish in mind.

A Simple Use Case

The easiest and safest way to add Varnish to your stack is to serve and cache static content.  Aside from using a CDN, Varnish is probably the next best thing that you can use for free.  However, dynamic content is where you can squeeze real performance out of your stack if you know where and how to use it.  This guide will only scratch the surface on how Varnish can drastically improve performance.  Advanced features such as edge side includes and header manipulation allow you to leverage Varnish for even higher throughput.  Hopefully, we’ll get to more of these advanced features in future blog posts, but for now, we’ll just give you an introduction.

“Hello World”

Installation

Please follow the installation guide on Varnish’s documentation page. http://www.varnish-cache.org/docs

Assuming you’ve installed it correctly, you should be able to run both your webserver and Varnish on different ports. The rest of this guide will assume that you have your webserver running on port 8080, Varnish running on port 80.

Varnish Configuration Language: VCL

Varnish uses its own domain specific language for configuration. Unlike a lot of other projects, Varnish’s configuration language is not declarative. Its very expressive and yet easy to follow. For ubuntu, Varnish’s config file is located here: /etc/varnish/default.vcl A lot of the examples we’ll dive into are based on Varnish’s own documentation here.

This is a simple Varnish config file that will cache all requests whose URI begins with “/sytlesheets”. There are a few things to note here that we’ll explain later:

  • the removal of the Accept-Encoding header
  • the removal of Set-Cookie
  • return(lookup) and return(pass) in vcl_recv
# Defining your webserver.
backend default {
  .host = "127.0.0.1";
  .port = "8080";
}
 
# Incoming request
# can return pass or lookup (or pipe, but not used often)
sub vcl_recv {
 
  # set default backend
  set req.backend = default;
 
  # remove
  unset req.http.Accept-Encoding;
 
 
  # lookup stylesheets in the cache
  if (req.url ~ "^/stylesheets") {
    return(lookup);
  }
 
  return(pass);
}
 
# called after recv and before fetch
# allows for special hashing before cache is accessed
sub vcl_hash {
 
}
 
 
# Before fetching from webserver
# returns pass or deliver
sub vcl_fetch {
  if (req.url ~ "^/stylesheets") {
    # removing cookie
    unset beresp.http.Set-Cookie;
 
    # Cache for 1 day
    set beresp.ttl = 1d;
    return(deliver);
  }
}
 
# called after fetch or lookup yields a hit
sub vcl_deliver {
 
}
 
#
sub vcl_error {
 
}

Now lets look at a few things in detail:

Removing Accept-Encoding Header

The reason this is done is because Varnish doesn’t handle encodings (gzip, deflate, etc…). Instead, Varnish will defer to the webservers to do this. For now, we’re going to ignore this header and just have the webservers give us non-encoded content. The proper way to handle encodings is to have the encoding normalized, but we’ll discuss this later.

Removal of Set-Cookie

We do this because we don’t want the webserver giving us session-specific content. This is just a safe guard and is probably a little unnecessary, but its probably a good thing to note when caching. We’ll discuss session-specific content later.

Returning “pass” vs “lookup”

Returning “pass” tells Varnish to not even try to do a cache lookup. Returning “lookup” tells Varnish to lookup the object from its cache in lue of fetching it from the webserver. If the object is cached, the webserver is never hit. If it isn’t in the cache, then vcl_fetch is called before fetching the content from the webserver.

Manipulating the Hashing Function

User/Session Specific Content

Let’s say that we want to cache every users “/profile” page. This can be done by including the cookie in the hash function like this:

sub vcl_hash {
  if (req.url ~ "^/profile$") {
    set req.hash += req.http.cookie;
  }
}

Canonicalized Url Caching

In Ruby on Rails, it is common practice to attach trailing timestamps at the end of static content to ensure that the web browser doesn’t cache it (e.g. /stylesheets/main.css?123232113). Let’s say we don’t want to include this when we cache our stylesheets. Here is an example that will remove the trailing timestamp.

sub vcl_hash {
  if (req.url ~ "^/stylesheets") {
    set req.url = regsub(req.url, "\?\d+", "");
  }
}

Browser Specific CSS

Caching browser specific content.  One trick we use is to have a small portion of our css be browser specific to handle various differences between browsers.  We do this by having a dynamic call that will serve up css based on the User-Agent header.  The problem with this technique is that we’ll have different css being served by the same url.   Varnish can still cache this by adding the User-Agent header to the hash like such:

sub vcl_hash {
  if (req.url ~ "^/stylesheets/browser_specific.css") {
    set req.hash += req.http.User-Agent
  }
}

ACLs

Varnish has options to create ACL’s to allow access to certain requests:

# create ACL
acl admin {
  "localhost";
  "192.168.2.20";
}
 
sub vcl_recv {
  # protect admin urls from unauthorized ip's
  if (req.url ~ "^/admin") {
    if (client.ip ~ admin) {
      return(pass);
    } else {
      error 405 "Not allowed in admin area.";
    }
  }
}

Purging

There are times when we need to purge certain cached objects without restarting the server. Varnish allows 2 ways to purge: lookup and url. These examples are based on the Varnish documentation page on purginge: http://www.varnish-cache.org/trac/wiki/VCLExamplePurging

Purge by lookup

Purging by lookup uses the vcl_hit function and “PURGE” http action:

acl purgeable {
  "localhost";
  "192.168.2.20";
}
 
sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purgeable) {
      set obj.ttl = 0s;
      error 405 "Not allowed to purge.";
    }
  }
}
 
sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged.";
  }
}
 
sub vcl_miss {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 404 "Not in cache.";
  }
}

Purge by URL

Purging by url is probably a safer bet if you are using cookies or any other tricks in your hash function:

sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purgeable) {
      error 405 "Not allowed.";
    }
    purge("req.url == " req.url " && req.http.host == " req.http.host);
    error 200 "Purged.";
  }
}

Handling Encodings

Its good to canonicalize your encoded requests because you could either get redundent cached objects, or you could end up returning incorrect encoded objects. For more details, please refer to the Varnish FAQ on Compression. Below is a snippet from that page.

if (req.http.Accept-Encoding) {
  if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
    # No point in compressing these
    remove req.http.Accept-Encoding;
  } elsif (req.http.Accept-Encoding ~ "gzip") {
    set req.http.Accept-Encoding = "gzip";
  } elsif (req.http.Accept-Encoding ~ "deflate" && req.http.user-agent !~ "Internet Explorer") {
    set req.http.Accept-Encoding = "deflate";
  } else {
    # unkown algorithm
    remove req.http.Accept-Encoding;
  }
}

Advanced Backends

Multiple Backends

Lets pretend that we have a special assets server that serves up just our stylesheets. Here is an example of having multiple backends for this purpose:

backend default {
  .host = "127.0.0.1";
  .port = "8080";
}
 
backend stylesheets {
  .host = "10.0.0.10";
  .port = "80";
}
 
sub vcl_recv {
  if (req.url ~ "^/stylesheets") {
    # set stylesheets backend
    set req.backend = stylesheets;
    return(lookup);
  }
 
  # set default backend
  set req.backend = default;
  return(pass);
}

Round Robin and Random Multiple Server Backend

backend server1 {
  .host = "10.0.0.10";
}
 
backend server2{
  .host = "10.0.0.11";
}
 
director multi_servers1 round-robin {
  {
    .backend = server1;
  }
  {
    .backend = server2;
  }
}
 
director multi_servers2 random {
  {
    .backend = server1;
  }
  {
    .backend = server2;
  }
}

Varnish stays on our stack happily ever after…

When we first started using Varnish, it was out of desperation and all new to us.  Over the past year, we’ve been figuring out ways to leverage its performance in more creative ways.  At this point, we couldn’t imagine putting together a stack that didn’t include this great project.

We hope this post has been helpful for anyone interested in getting varnish setup for the first time.  

Reader Comments (6)

I think there's a boo-boo in the example code for the purge by lookup section:

sub vcl_recv {
if (req.request == "PURGE") {
if (!client.ip ~ purgeable) {
set obj.ttl = 0s;
error 405 "Not allowed to purge.";
}
}
}

I don't believe the set obj.ttl = 0s; line should be there
I think that would purge the object even if the client was not authorised to do so.

February 28, 2011 | Unregistered CommenterAndrew

Also, obj is not available in vcl_recv. It should be taken into vcl_hit, as stated in http://www.varnish-cache.org/trac/wiki/VCLExamplePurging

March 1, 2011 | Unregistered Commenternav

Is there any reverse proxy for windows stack? I know only squid as a powerful proxy

March 1, 2011 | Registered CommenterAntony Blazer

Great Varnish write-up. Just wanted to share a link for those interested in installing/using Varnish caching with discussion forum software - in this case vBulletin forum software.

http://www.vbulletin.com/forum/entry.php/2440-vB4Mance-Part-5-Expert-Level-Boosting-vBulletin-Performance-with-Advanced-Caching

March 1, 2011 | Unregistered CommenterMike Anders

We tested Varnish and found it had three major flaws. First it was a nightmare to configure. You basically had to learn a programming language to do what should be a sinple configuraiton file. Second its spins up way to many processes and threads there is no pipelined architecture which we felt made it in herantly unscalable. The final albeit minor is if you trie to distribute it away from the origin the whole process broke down.

March 2, 2011 | Unregistered CommenterJim Allen

Hi,

Check http://logicsforyou.com/2013/07/23/profiling-and-benchmarking-drupal-with-xhprof-memcached-varnish/ for details learing.

August 8, 2013 | Unregistered CommenterMohit

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>