« Some things about Memcached from a Twitter software developer | Main | Paper: The End of an Architectural Era (It’s Time for a Complete Rewrite) »

Serving 250M quotes/day at CNBC.com with aiCache

As traffic to cnbc.com continued to grow, we found ourselves in an all-too-familiar situation where one feels that a BIG change in how things are done was in order, the status-quo was a road to nowhere. The spending on HW, amount of space and power required to host additional servers, less-than-stellar response times, having to resort to frequent "micro"-caching and similar tricks to try to improve code performance - all of these were surfacing in plain sight, hard to ignore.

While code base could clearly be improved, the limited Dev resources and having to innovate to stay competitive always limits ability to go about refactoring. So how can one go about addressing performance and other needs without a full blown effort across the entire team ? For us, the answer was aiCache - a Web caching and application acceleration product (aicache.com).

The idea behind caching is simple - handle the requests before they ever hit your regular Apache<->JK<->Java<->Database response generation train (we're mostly a Java shop). Of course, it could be Apache-PHP-Database or some other backend system, with byte-code and/or DB-result-set caching. In our case we have many more caching sub-systems, aimed at speeding up access to stock and company-related information. Developing for such micro-caching and having to maintain systems with such micro-caching sprinkled throughout is not an easy task. Nor is troubleshooting. But we digress...

aiCache takes this basic idea of caching and front-ending the user traffic to your Web environment to a whole new level. I don't believe any of aiCache's features are revolutionary in nature, rather it is the sheer number of features it offers that seems to address our every imaginable need.

We've also discovered that aiCache provides virtually unlimited performance, combined with incredible configuration flexibility and support for real-time reporting and alerting.

In interest of space, here're some quick facts about our experience with the product, in no particular order:

· Runs on any Linux distro, our standard happens to be RedHat 5, 64bit on HP DL360G5

· The responses are cached in the RAM, not on disk. No disk IO, ever (well, outside of access and error logging, but even that is configurable). No latency for cached responses - stress tests show TTFB at 0 ms. Extremely low resource utilization - aiCache servers serving in excess of 2000 req/sec are reported to be 99% idle ! Being not a trusting type, I verified the vendor's claim and stress tested these to about 25,000 req/sec per server - with load averages of about 2 (!).

· We cache both GET and POST results, with query and parameter busting (selectively removing those semi-random parameters that complicate caching)

· For user comments, we use response-driven expiration to refresh comment threads when a new comment is posted.

· Had a chance to use site-fallback feature (where aiCache serves cached responses and shields origin servers from any traffic) to expedite service recovery

· Used origin-server tagging a few times to get us out of code-deployment-gone-bad situations.

· We average about 80% caching ratios across about 10 different sub-domains, with some as high as 97% cache-hit-ratio. Have already downsized a number of production Web farms, having offloaded so much traffic from origin server infrastructure, we see much lower resource utilization across Web, DB and other backend systems

· Keynote reports significant improvement in response times - about 30%.

· Everyone just loves real-time traffic reporting, this is a standard window on many a desktop now. You get to see req/sec, response time, number of good/bad origin servers, client and origin server connections, input and output BW and so on - all reported per cached sub-domain. Any of these can be alerted on.

· We have wired up Nagios to read/chart some of aiCache extensive statistics via SNMP, pretty much everything imaginable is available as an OID.

· Their CLI interface is something I like a lot too: you see the inventory of responses, can write out any response, expire responses, report responses sorted by request, size, fill time, refreshes and so on, in real time, no log crunching is required. Some commands are cluster-aware, so you only execute them on one node and they are applied across.

Again, the list above is a small sample of product features that we use, there're many more that we use or explore using. Their admin guide weighs in at 140 pages (!) - and it is all hard-core technical stuff that I happen to enjoy.

Some details about our network setup . We use F5 load balancers and have configured the virtual IPs to have both aiCache servers _and origin server enabled at the same time. Using F5's VIP priority feature, we direct all of the traffic to aiCache servers, as long as at least one is available, but have ability to automatically, or on demand, failover all of the traffic to origin servers.

We also use a well known CDN to serve auxiliary content - Javascript, CSS and imagery.

I stumbled upon the product following a Wikipedia link, requested a trial download and was up and running in no time. It probably helped that I have experience with other caching products - going back to circa 2000, using Novell ICS. But it all mostly boils down to knowing what URLs can be cached and for how long.

And lastly - when you want stress test aiCache, make sure to hit it directly, right by server's IP - otherwise you will most likely melt down one or more of other network infrastructure components !

A bit about myself: an EE major, have been working with Internet infrastructures since 1992 - from an ISP in Russia (uucp over MNP-5 2400b modem seemed blazing fast back then!) to designing and running infrastructures of some of the busier sites for CNBC and NBC - cnbc.com, NBC's Olympics website and others.

Rashid Karimov, Platform, CNBC.com

Reader Comments (12)

How does it compare with other reverse proxies, such as Varnish?

November 29, 1990 | Unregistered CommenterAnonymous

We've looked at number of them, including varnish, squid, Apache in rev proxy mode. aiCache won on feature set. It appears they wrote it to be a single minded, hard-core accelerator, accommodating for every imaginable feature one'd every want in a product like this.

A nice little detail: they call their product "right-threaded". One could say - what a bunch of marketing bull, and would be wrong.

Problem has to do with servers of today - most of them are multi-core. A single threaded product would max out 1 CPU and stop there. To scale, you'd need to run another instance (different port, IP or both) - but then you can not share the response cache.

Having a dedicated thread per client (request) won't scale, ever. Just my personal, somewhat educated opinion.

aiCache runs as single 4-threaded process. This way you can utilize up to 4 cores, so you don't end up throwing money away, yet you don't have a thread per request. Like I wrote, 25,000 req/sec with load average of 2(!) is quite remarkable.

But I am sure other products might work for others, YMMV.

November 29, 1990 | Unregistered Commenterr11

What concerns do you have from picking a solution that is not as "standard"? Its surprising your feature set is that large. Having worked extensively on 2 of the top 5 largest sites by alexa dealing w/ caching, proxies and load balancers I never have had a need to use anything besides apache in revproxy, netscalers w/ caching, akamai, and/or custom cdn deployments.

November 29, 1990 | Unregistered CommenterInterested

I am known to have been a bit of a trailblazer throughout my career. On a serious note, we eased our way into the product, domain by domain, with careful testing. Setup is such that with a single CLI command I can bypass aiCache for any given domain or all of them. Never had to.

At this point we're at about 10 cached/accelerated domains, with more to be added, and haven't seen any issues. The site is an SOA-like, API-heavy architecture and being able to see exactly what's going on, while shaving off tens and hundreds of milliseconds off API calls amounts to very significant improvement in response times, user satisfaction, comprehension and troubleshooting abilities.

I don't want to recite their User Guide here, but so many of their features are just a godsend, simply not available in any other product.

Without disclosing too much, let me just say it is very affordable, compared to appliance-type offerings and is very much "unappliance", as you run it on your own HW under standard-issue OS. You can probably see that I am not a big fan of appliances :)

It doesn't necessarily compete with CDNs, it is rather complimentary instead. Nor does it completely displace webservers/containers - you'll still have your Apaches and IIS and Tomcats. But it does allow you to dramatically reduce the hosting footprint - I expect we shall slash ours by 50%+, have already started.

I have a sneaking suspicion after putting the product through some stress testing that just of a pair of these can run pretty much any website out there, as long as it is the typical 95/5 site - with 95% reads and 5% writes, so you could cache the h*ll out of it. With the usual caveat of having enough BW and overall network capacity (and this is where most larger sites would augment their capacity with a CDN).

November 29, 1990 | Unregistered Commenterr11

Would be interesting to know what Java Application Server and what kind of fronted technology (Wicket?) you've in place, in order to better understand if this technology could fit good on any Java shop deployment.

November 29, 1990 | Unregistered CommenterMatteo

The product is completely agnostic to the back-end - ASP, PHP, Java or static pages. We run Java code, servlets and JSPs, on Tomcat - these can be qualified as Servlet Containers, a subset of functionality of a full blown Application Server. The apps are largely MVC-like, mostly using Spring framework.

Many people, including myself, have pretty strong opinions about EJBs - the original reasoning behind App Servers, but that discussion would take us off-topic.

I'd only say that a site that generates content using EJBs would most likely see an especially dramatic improvement after deploying aiCache .

November 29, 1990 | Unregistered Commenterr11

Badass. No frills, common-sense, empirically-verified, engineering. Sounds like you guys are running a good shop there, congrats.

November 29, 1990 | Unregistered CommenterIvan

is aiCache working HTTP in/HTTP out? What about SSL?

November 29, 1990 | Unregistered CommenterBernd Eckenfels

We normally terminate HTTPS on HW appliances - these then forward decrypted traffic to the aiCache and/or web servers. This way you move the burden of encryption from servers to appliances - these normally have HW (ASIC) support and can accomplish much higher session rates.

November 29, 1990 | Unregistered Commenterr11

Thanks for the straight forward engineering.

I would like to know if you have had experience with the Mobile user-agent based grouping or the Pre-fetch for ad-serving. We are looking at this software for a customer and these two features where of interest but I would like to see if they have been tried in a high volume situation like you have mentioned here. I did not see any significant client references on their site.

November 29, 1990 | Unregistered CommenterAnonymous

What I don't understand is why anyone would ever buy into a closed source product such as aiCache? There is Varnish that is open source AND comes with paid commercial support from Redpill-Linpro for anyone that needs. Also don't forget Varnish runs on more platforms aside just linux; freebsd,solaris, aix, etc. The one thing it does lack at the current moment is HTTPs support. You can always throw nginx on port 443 and let it forward requests back to varnish.

Concerning Java/Wicket compat; it actually works quite well. Wicket is a stateful framework that is REALLY good at being stateless. As long as your pages are stateless Varnish can cache it.

February 19, 2010 | Unregistered Commentervictori

Seems clear from Rashid's comments they evaluated Varnish and it did not have the feature set.
in my environment I would only choose an open source product if it was as good as the commercial version.
According to this it was not.

I think there is a lot of difference between an application supported by the people who wrote it and a product supported by the people who did not. We have used varnish and Squid and in both instances found them to be lacking in many key areas. Based upon this write up I plan to give aiCache a try and see how it ranks.
Hopefully it is a little better thought out.

March 18, 2010 | Unregistered CommenterAurthur

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>