advertise
« Sponsored Post: Akiban, Booking, Teradata Aster, Hadapt, Zoosk, Aerospike, Server Stack, Wiredrive, NY Times, CouchConf, FiftyThree, Percona, ScaleOut, New Relic, NetDNA, GigaSpaces, AiCache, Logic Monitor, AppDynamics | Main | Gone Fishin': PlentyOfFish Architecture »
Monday
Nov262012

BigData using Erlang, C and Lisp to Fight the Tsunami of Mobile Data

This is a guest post by Jon Vlachogiannis. Jon is the founder and CTO of BugSense.

BugSense, is an error-reporting and quality metrics service that tracks thousand of apps every day. When mobile apps crash, BugSense helps developers pinpoint and fix the problem. The startup delivers first-class service to its customers, which include VMWare, Samsung, Skype and thousands of independent app developers. Tracking more than 200M devices requires fast, fault tolerant and cheap infrastructure.

The last six months, we’ve decided to use our BigData infrastructure, to provide the users with metrics about their apps performance and stability and let them know how the errors affect their user base and revenues.

We knew that our solution should be scalable from day one, because more than 4% of the smartphones out there, will start DDOSing us with data.

We wanted to be able to:

  • Abstract the application logic and feed browsers with JSON
  • Run complex algorithms on the fly
  • Experiment with data, without the need of a dedicated Hadoop cluster
  • Pre-process data and then store them (cutting down storage)
  • Be able to handle more than 1000 concurrent request on every node
  • Make “joins” in more than 125M rows per app
  • Do this without spending a fortune in servers

The solution uses:

  • Less than 20 large instances running on Azure
  • An in-memory database
  • A full blown custom LISP language written in C to implement queries, which is many times faster that having a VM (with a garbage collector) online all the time
  • Erlang for communication between nodes
  • Modified TCP_TIMEWAIT_LEN for an astonishing drop of 40K connections, saving on CPU, memory and TCP buffers

 

Long Live In-Memory Databases

We knew that the only way we could handle all of this traffic was by using an In-Memory database.

To answer ad-hoc questions fast on a huge dataset (e.g.,  “How many unique users with Samsung devices have this specific error for more than a week”) not only do you have to deal with memory limitations but also with the data serialization and deserialization before and after the data processing. That’s the reason, we started the LDB Project.

LDB Project

Would you believe it that you can feed data coming from various sources (even thousand different resources - like mobile devices) into a system, describe what information you want to extract in a few lines of code and then have all this information in your finger tips? In real time. While the system keeps running?
LDB is more of an application server than just a database. And even though it is In-Memory, data is actually stored in the hard-drive and replicated across other nodes.

With LDB we don’t run queries. We run algorithms because we have a full blown custom LISP language written in C that has access to the same address space with the database. That means that you can search extremely fast for data, increase counters, get/put etc.

 
The advantage of having a LISP is that you can easily create an SQL-like language like Hive and query your data, in realtime like this:

LDB works like this:

Every app has it’s own LDB. That means its own memory space. In this way we can easily move bigger apps (in terms of traffic) to different machines.

 

When a request comes from a mobile device, the main LDB node, accepts the connection (using a pool of erlang threads) and forwards the data to a specific DB. This request handling mechanism is implemented with fewer than 20 lines of Erlang code. Another reason we chose Erlang for communication between nodes.

When the request is “streamed” to LDB, a file called “process.lql” is responsible for analyzing, tokenizing the data and creating any counters. All this is done on the fly and for every request.



We are able to do this, because starting our LISP-VM and doing all these processes on every request, is still many times faster that having a VM (with a garbage collector) online all the time.

With LDB we can create time series and aggregated data with no more than 3 lines of code.
For example. this creates a timeseries for unique users for 7 days:


Alternatives 

During our tests, we saw that SQL databases weren’t a good fit due to the fact that our data were unstructured and we needed a lot of complex “joins” (and many indexes). One the other hand for NoSQL databases, we couldn’t run our algorithms on the data (while the system was running) and having mappers/reducers made the whole thing complicated and slow. We needed a high concurrent system with no big locks or DB locks that can track millions of unique events in just a few KBs and be very easy to extend.

A very good alternative was using a Stream database (like Storm). Our main issue was having a lot of moving parts and performance of single nodes. With LDB, we have the advantage of being able to process data extremely fast (they reside in the same memory space), store them as aggregated counters or symbols (thus fitting gigabytes of data in KBs) and then having a DSL to do whatever correlations we want on the fly. No serialization/deserialization, no network calls and no garbage collectors. It is like having assembly code mapped onto your data.

 
On top of that with LDB we have receivers that can scale and handle incoming data, a stream component where everything is defined in a couple of lines of code, a Storage Engine and a Replication engine.

Optimizing kernel - UDP behavior with TCP

What is unique when doing analytics in contrast to other services that come in massive requests/sec, is that the dialog between the mobile device and the server is extremely small (3 TCP handshake packets, 1 payload packet and 3 TCP termination packets).

 
However, TCP was not designed with something like that in mind (that is, small dialogs between devices) and implements a state called TIME_WAIT (which lasts about 1 minute in 2.6 Linux kernels) where after the last FIN packet is sent, the TCP state for that specific connection tuple remains open for sometime in order to receive any stray packets that may have gotten delayed (that is, before the connection close). In our case this is a bit useless though (we want something resembling UDP behavior but with TCP guarantees), since the payload is only 1 packet (and up to 4 or 5 for view requests) so we decided to modify the kernel source and reduce the constant for this down to 20". The result was an astonishing drop of 40K connections, saving on CPU, memory and TCP buffers.

The patch we applied was in file:
linux-kernel-source/include/net/tcp.h

#define TCP_TIMEWAIT_LEN (60*HZ)
to
#define TCP_TIMEWAIT_LEN (20*HZ)


Using this architecture, we can provide real time analytics and insights about your mobile apps, to all of our paying customers with having less than 20 large instances running on Azure, including fallback and backup servers.

Related Articles

 

 

Reader Comments (10)

Since you don't have a garbage collector how do you reclaim memory in your LISP implementation? Thanks.

November 26, 2012 | Unregistered Commenterwwh

Nice post! I especially liked the TCP trick. Could you share how many events and how much data you're ingressing on a daily basis? Also, what is the inmemory data set size that you're serving the queries from? I'm especially interested in the join sizes you're performing.

November 26, 2012 | Unregistered CommenterDan

Oh dear. What's wrong with tcp_tw_reuse or tcp_tw_recycle?

November 27, 2012 | Unregistered CommenterJubal

Ho Jon!

I enjoyed the reading a lot! I am experienced Erlang programmer and I would love to read more about practical use of Lisp. If you have a link to any good intro article about List - that would be great!

Thank you very much for the great article!

November 27, 2012 | Unregistered CommenterMax

The article is full of unusual statements and design decisions that I suspect that BugSense is controlled by aliens :)

What is wrong with: echo "20" > /proc/sys/net/ipv4/tcp_fin_timeout? Did not work?

Starting LISP VM on every request is faster than having VM online all the time? OK, I got it, Lisp VM does not do GC well. But there is a JVM for that and Clojure (which is Lisp dialect btw)

Combination of LISP + Erlang ? Somebody knows Lisp and Erlang very well :).

November 27, 2012 | Unregistered CommenterVladimir Rodionov

Hmmm -- rather than hacking TCP Time Wait on the server, couldn't you just have the client close the TCP connection before the server does? This would put the burden of doing Time Wait on the client.

I guess your protocol is pretty widely deployed by now, so it's a little late to change it?

November 27, 2012 | Unregistered CommenterK

I am willing to look the other side on this but I still fail to see how after coming up with such a beautifully efficient architecture, you elected to go with windows Azure, which is the worst such offering (IaaS/PaaS)
...I keeps bugging me..this seems to introduce all sorts of performance and financial bottlenecks. Unless you some how need it to support windows phone devices (highly unlikely). So I conclude that you got offered a free deal or something..please help me understand.

December 2, 2012 | Unregistered CommenterDennis M S

For people suggesting tcp_tw_reuse:

http://serverfault.com/questions/303652/time-wait-connections-not-being-cleaned-up-after-timeout-period-expires


tcp_fin_timeout sets the time a socket will spend in the FIN-WAIT-2 state. This parameter does not affect the time spent in TIME_WAIT state. Try netstat -o and observe the timer values of socket in TIME_WAIT state (with your choice of tcp_fin_timeout values).

December 4, 2012 | Unregistered CommenterSumit R

@Sumit R: your point being…? Did you read the whole serverfault thread, specifically the answer to the question about tcp_tw_reuse? Did you actually ever use tcp_tw_reuse=1?

December 5, 2012 | Unregistered CommenterJubal

The memory handling for the Lisp interpreter is done by us. We use generation containers and reclaim the allocated memory upon each "short" script completion (for "young" entries. "Stubborn" entries, due to the system architecture/design do not need to be collected. Generally, each of the 3 generations we have has its own collection strategy). So, no need to have a full-fledged garbage collector.

We don't use a JVM or any other managed VM solution (Clojure, F#, etc), because we only want a subset of the features each VM (and language that lives on the particular VM) offers and there's plenty of space to optimize for our own "special" needs. In addition, Scheme (Lisp) offers everything you need to build expressive-powered programmes that boast abstraction and forms/patterns of the functional paradigm in general.

December 18, 2012 | Unregistered Commenterdgk

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>