advertise
« Sponsored Post: Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, InMemory.Net, Zohocorp | Main | Stuff The Internet Says On Scalability For September 22nd, 2017 »
Monday
Sep252017

Adcash - 1 Trillion HTTP Requests Per Month

This is a guest post by Arnaud Granal, CTO at Adcash.

Adcash is a worldwide advertising platform. It belongs to a category called DSP (demand-side platform). A DSP is a platform where anyone can buy traffic from many different adnetworks.

The advertising ecosystem is very fragmented behind the two leaders (Google and Facebook) and DSPs help to solve this fragmentation problem.

If you want to run a campaign across 50 adnetworks, then you can imagine the hassle to do it on each adnetwork (different targetings, minimum to spend, quality issues, etc). What we do, is consolidate the ad inventory of the internet in one place and expose it through a self-service unified interface.

We are a technology provider; if you want to buy native advertisement, if you want to buy popups, if you want to buy banners, then it is your choice. The platform is free to use, we take a % on the success.

A platform like Adcash has to run on a very lean budget, you do not earn big money, you get micro-cents per transaction. It is not unusual to earn less than 0.0001 USD per impression.

Oh, by the way, we have 100 ms to take a decision.

In Numbers

  • Over 1 trillion (1 000 000 000 000+) HTTP requests per month
  • 1 PB Hadoop cluster
  • 500,000 rq/second
  • 1500 servers

Our tech stack evolved a lot during the time, we try to keep our architecture modern. One internal joke is that our stack is not just cutting-edge but bleeding-edge. Eventually, with time, what was bleeding-edge yesterday is now simply modern or disappeared from our stack (like Gearman).

When we started using Redis or even nginx, there was clear lack of consensus. The same for databases like Druid or TokuDB (an alternative storage engine for MySQL) where we had been early adopter.

If you haven’t heard of Druid before, it is an equivalent to OpenTSDB, just very much optimized for OLAP queries. It is reasonably popular in the Adtech ecosystem (like kdb+ for HFT-folks).

We Follow Two Main Precepts

  • Keep it simple, stupid
  • Unix philosophy.

This translates into different choices but in majority:

  • Release early, release often
  • If ain't broke don't fix it
  • Each service has to do one thing, and do it well

What Adcash is Using Today

  • Languages: PHP7, Hack, Lua (core adserving), Scala, Java (core prediction), Go
  • Database (operations): MariaDB, Druid, Redis
  • Database (machine learning): Hadoop, Hbase, Spark MLLIB
  • Data pipe: Apache Kafka
  • Log collection: Elasticsearch, Kibana, Logstash
  • Charting: Grafana
  • Server: Openresty (nginx)
  • Hosting: Google Cloud, Bare metal provider
  • Javascript UI Library: Angular, React (new projects)
  • Productivity suite: Google Apps
  • Project management: Jira and Confluence
  • Transactional email: Amazon SES
  • Promotional email: Mailchimp
  • CRM: Zoho
  • Code collaboration and version control: Gitlab
  • Continuous integration and task scheduling: Jenkins
  • Communication: Slack, Skype
  • Server management: Ansible
  • Operating system: Ubuntu and Debian, CoreOS for Kubernetes deployments

We are very heavy users of Jenkins, we can fix pretty much anything from the UI.

As costs are an important part of our model, anyone in the company (including non-developers) can follow on TV screens the real-time traffic and spendings of the platform.

It took almost 5 years to arrive to our current stack and we are very happy with it.

This is How We Came to our Solution (including dark sides)

In 2007, at the very beginning Adcash was running on a single Pentium IV and using the following services:
Apache 1.3, mod_php, MySQL 5

The concept was simple, give a link and a banner to the webmaster, record the number of clicks, and the number of people who registered on the offer.

Pay the webmaster a fixed commission for each registration.

The Architecture was the Following

Backoffice/MySQL/Web01
                   |
               Visitor

Pretty simple right? Everything was centralized on one machine, and backoffice was just a directory (/admin) where you could run a few commands to validate or reject websites.

We didn’t plan for any scale, the important was to move fast and implement features to meet what our customers want. Building a MVP in its pure aspect.

Adcash was owner of the machine (we got the machine for free!, just paying for electricity) and was renting space in a datacenter in Clichy (France).

This datacenter had a very unreliable power source and we often had to go to Clichy to fix broken PSUs but the onsite engineer was awesome.

This machine became limited so we found a couple of Pentium IV machines that were abandoned by their owners and we took the first machine and promoted it to become MySQL + NFS server.

Single Point of Failure

What will prevent you to sleep, to have vacation, to have a girlfriend, to have a family or to live more than 5 meters away from a reliable internet connection 

Using NFS was a big mistake (we later replaced it with lsyncd/unison). Using a central MySQL server was a big mistake, that being said, it allowed us to move fast.

Technical debt is a loan, you need to pay it back someday. Some debt are useful for growing as it is a powerful leverage, but you need to handle it carefully.

Making wrong choices is expensive (pick the default latin1 in MySQL, and years later, you will hate yourself), and sometimes, you actually benefit from it (do not over-engineer).

Note that Galera didn’t exist at this time, MySQL replication was tough to get right. The state of the art used to be an active/passive setup using DRBD (hard-drive replication) with Heartbeat. Every single DDL statement was triggering (!) an intense sweat (no matter how "online" the ALTER TABLE is supposed to be).

At this Stage We had the Following Setup

 

                                                       MySQL/NFS
                                                                  |
Web01, Web02, Web03, Web04, Web05, Web06, Web07, Web08, Web09
     \______\______\______\______|_____/______/______/______/
                                                                 |
                                                            Visitor

 

In this architecture, it was very difficult to add new servers, since we had to order them at Dell long time in advance, send an engineer in the Datacenter. He would take his car, and go to fix the server. If there was no hardware piece, well, too bad.

This was also an expensive operation since we had to pay for all hardware upfront.

We had no money for servers, money was used to pay for traffic. Whatever remains goes into the servers, no salaries.

Pingdom was too expensive, so we ended up with living with a web page that pretty much looked like this:

<?php
$clients = @file_get_contents('http://srv231.adcash.com:8080/check-load');
if ($clients >= 90)
{
?>
         // http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html
         s = 'RIFF' + mk_le_bs(4 + 90 + (8 + 1 * 1 * data.length + pad.length), 4) +
           'WAVE' + 'fmt ' + mk_le_bs(16, 4) +
           mk_le_bs(WAVE_FORMAT_PCM, 2) +
           mk_le_bs(1, 2) +
           mk_le_bs(sample_rate, 4) +
           mk_le_bs(sample_rate * 1 * 1, 4) +
           mk_le_bs(1 * 1, 2) +
           mk_le_bs(8 * 1, 2) +
           'data' + mk_le_bs(1 * 1 * data.length, 4)
         ;

         bs = [];
         for (var idx=0; idx<s.length; idx++) {
           bs.push(s.charCodeAt(idx));
         }
         bs = bs.concat(data);
         if (pad.length > 0)
           bs.push(0x00);

         s = 'data:audio/wav;base64,' + base64_encoder(bs);
         var audio = new Audio(s);
        audio.play();

 

You had to keep this page open all the time. Anywhere, on your phone, on the computer, on the tablet.
Everytime one server crashed, the javascript on the page was triggering a very annoying and loud sound.

Nothing better than a perfect A at 440Hz to give you nightmares.

We needed more scale, AWS was prohibitive so we decided to go and try a very popular bare metal hosting provider because we needed to scale-up our infrastructure due to big waves of traffic coming in.

Despite the good reputation of this hosting company (probably one of the biggest provider in Europe), we have discovered that the company provided servers with extremely poor networking quality.

The Infrastructure Was the Following


                                                        Datacenter A                                                                  Datacenter B

                                                                                                     mysql replication

                                                       MySQL/NFS <---------------------------------------------------> MySQL/NFS
                                                                  |                                                                                               |
                    Web01, Web02, Web03, Web04, Web05, Web06, Web07, Web08, Web09       Web01    Web02   Web03
                       \______\______\______\______|_____/______/______/______/________/_____/_______/
                                                                 |
                                                           Visitor (routed to datacenter A or datacenter B based on round-robin DNS)




As you can guess, at this stage we had everything running on “Web” servers (generic application servers). This meant, Apache, PHP, MySQL slave, Memcache, some basic machine learning and anything that was needed.

So We Expanded Again

Click to expand picture

Nice, until 23 October 2012. Do you remember I mentioned we were using DRBD (Distributed Replicated Block Device) for databases ? All data disappeared. Several dozens of terabytes.

The hosting provider made a mistake (damn serial cable!). Triggered double-failover and accidentally corrupted databases disks.

We got compensated a lot of money which was not that bad, considering that we managed to restore everything from backups and that there was barely no impact for customers.

We had only a few physical machines (for costs reasons) but we needed to start splitting the different services. We went with OpenVZ, and later LXC. This was terrible, it worked, but terrible. Docker didn’t exist.

At least, we did not have this issue of dealing with failing hardware anymore and we could finally run Hadoop and PHP on the same machine with a decent resource isolation.

Still, we did not have enough ports on the physical switches and had a pending order of 40 nodes, so we had to decommission and reuse one of the switch. After that, it became clear that we didn’t have space to grow and it was time to move on.

We Rewrote the Full Architecture Following a New Logical Schema

Click to expand picture


We finally moved to Google Cloud, totally abstracting any hardware concern and we have a perfect service isolation.

It works and for us, a very important point is that all systems are auto-scaled. If there is 100,000 requests per second or 500,000 requests per second, we barely see the difference (except on the bill).

All RPC/IPC is happening via our unified messaging system, based on Redis.

RPC communication is clear-text, we stay away from binary serialization protocols whenever possible as we favor debuggability.

Fortunately, the code base in Adcash is very little interdependent. One service is doing one thing.

Similar to what happened with Gitlab, one developer accidentally dropped the main production application. Was it avoidable ? Probably. Was it an issue ? Not really.

The important is to set the correct environment so you can recover when a bad event happen. Having good deployment tooling is key in this situation. We are using Combo (the name is from Street Fighter), an in-house equivalent to Terraform.

Lessons Learned

  • If it can fail, it will fail; if it’s not a datacenter flooding, an electrical failure, an hurricane, a farmer that cuts your fiber optic line, something bad will happen.

  • The result really matters. If you have spent 20 hours engineering a solution, and at the end of the 20 hours process you discovered that a simple 5 minute trick works better, take it.

  • Use active-active replication, active-passive requires more attention and will likely not work when needed.

  • Keep your services independant.

  • Do not test if backup is working, test if restore is working.

Because we are so used to our own solutions, if you are keen on sharing your experience and (open-source) solutions, feel free to have a chat with our team.

Overall, we are very happy of the current platform; and have many ideas to improve it. If you want to join us in the adventure, you can check our open positions. We usually have a few machine learning challenges as well.

Reader Comments (5)

Despite the good reputation of this hosting company (probably one of the biggest provider in Europe), we have discovered that the company provided servers with extremely poor networking quality.

Is this OVH ?

September 26, 2017 | Unregistered Commenterddorian43

I think it is OVH indeed. If it is, to me it never had a good reputation except the one of being cheap.

On the article: `When we started using Redis or even nginx, there was clear lack of consensus`: still no nginx/redis on the 2012 stack, and nginx was already praised at this time, the sentence looks weird to me

BTW handling 500.000 requests/sec on 1500 servers for 100ms requests, does it mean that there is ~40 concurrent requests / host ?

I'm interested about the GCE/Redis part:

How much message/sec are going through the Redis cluster ? Does GCE allow Redis auto-scaling easily ?

September 26, 2017 | Unregistered CommenterKoren

Let's say it like this: OVH offers excellent servers for their price.


I remember writing some code for nginx in 2011:
http://nginx.org/en/CHANGES so probably we adopted nginx much earlier yep :)


To answer your question about GCE/Redis:

We don't use Redis cluster; what we are doing with Redis is the following:

A typical flow is in a task/worker mode (a bit like Sidekiq).

In this setup, all Redis instances are independent shards.

Example:
- nginx receives a new ad request to process (via http)
- nginx pushes the request into a redis (randomly selected).
- a worker pulls this request and pushes the request into hhvm (via fastcgi)

On the ad bidder, we have this Java software that connects to the Redises queue and infinitely ask
"hey, do you have a new task of type BID?".

This Java software speaks Redis protocol and speaks FastCGI so it is able to make the bridge.

There is a big advantage to this technique; in case there is too much traffic, then we LTRIM the queue and requests don't pile up.

In reality, we don't do this just for Nginx->FastCGI but everywhere where there is need to remotely call methods between services.


The second type of flow is when we use Redis as a bridge to native C libraries.

When we need to call native C code from different places (ex: from HHVM, and also from Java), we code a custom command in Redis.
We thought about using SWIG instead, that would have been cool but from what I know SWIG doesn't support HHVM yet.


If you are looking to use Redis for storage:

There are many popular K/V stores that do the job, and if you are looking for an auto-scalable solution on GCP, then Google Datastore (Bigtable) may be a better fit.

About the number of servers, this is the raw figures:

https://ibin.co/3blaAxLR1oe3.png

so your estimate should be relatively accurate.

September 26, 2017 | Unregistered CommenterArnaud GRANAL

“Using NFS was a big mistake (we later replaced it with lsyncd/unison).”

I'm curious if you'd be willing to go into more detail about this statement? What problems and impact lead to this conclusion?

October 2, 2017 | Unregistered CommenterStaringDownTheBarrelOfNFS

NFS stalls were pretty difficult to debug.
We were serving production files from NFS so any lag coming from the filer could have very unpleasant consequences.

https://unix.stackexchange.com/questions/267138/preventing-broken-nfs-connection-from-freezing-the-client-system is an example.

You can't really accept to let intermittent issues on production, even to troubleshoot them.

At some point, you just put in crontab:
* * * * * flock -n /var/run/sync.lock rsync -avz --progress /folder user@dest:/
and your clients are happy, your team is happy, and you can go back to sleep.

A slightly more advanced version of this strategy is to resynchronise files when they change ( https://linux.die.net/man/7/inotify ).

I'm personally a big fan of GlusterFS, but of course, depending on your requirements, the network conditions and your own comfort zone, NFS, Ceph or Hadoop may do the job perfectly.

October 2, 2017 | Registered CommenterArnaud GRANAL

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>