Scaling the World Cup - How Gambify runs a massive mobile betting app with a team of 2

This is a guest post by Elizabeth Osterloh and Tobias Wilke of cloudControl.

Startups face very different issues than big companies when they build software. Larger companies develop projects over much longer time frames and often have entire IT-departments to support them in creating customized architecture. It’s an entirely different story when a startup has a good idea, it gets popular, and they need to scale fast.

This was the situation for Gambify, an app for organizing betting games released just in time for the soccer World Cup. The company was founded and is run in Germany by only two people. When they managed to get a few major endorsements (including Adidas and the German team star Thomas Müller), they had to prepare for a sudden deluge of users, as well as very specific peak times.

The Gambify App: Basic Architecture

The core of Gambify is a Symfony2 based PHP backend which serves the data over a Rest API to the frontend. The frontend is an Ember.js application for desktop browsers, wrapped with PhoneGap for the mobile application.

The Gambify codebase is organized in bundles which allows for different scenarios, e.g. a tournament scenario for the World Cup and league scenarios for the German national league or other European leagues subsequently. To store data, Gambify uses a regular MySQL database, apart from the result tables. Here, they aggregate the bets into Redis.

Main Challenges

Implementing a production-ready infrastructure with no dedicated team. Initially, the team started with a smaller, less advanced version of the app hosted on a dedicated server. They faced difficulties configuring and maintaining it, and made the decision to focus on developing the app itself. They required a solution that is easier to maintain and scale.

Planning demand-based resource use, especially during peak times after matches. Users tend to log on to check the match results and their ranking immediately after games – at this point, the load can increase up to 10.000 requests a minute.

Maximizing app speed, especially when updating match results and rankings. Users expect minimum lag while using the app, and fast response when accessing current results and rankings.

Integrating with Cloud Infrastructure

Gambify decided to search for a Platform as a Service provider after it proved difficult configuring and maintaining a dedicated server with such a small team. They decided on using cloudControl PaaS.

Buildpacks: Gambify was written in PHP, and originally ran on an Apache server for testing. The cloudControl platform uses the Buildpack system, an open standard for preparing images for deployment that is becoming a de-facto industry standard of interoperability between cloud platforms. The cloudControl PHP buildpack provides the same open source components as their original setup, so Gambify was able to “plug in” their existing application to the cloudControl platform without making any major changes.

Containers: The cloudControl platform is based on containers. Containers are based on LXC technology, and contain a stack image, a deployment image, and configurations. The stack image provides the underlying operating system and common libraries. The deployment image contains the ready-made application. The configurations could include access credentials for databases and other third-party services.

Containers can be scaled vertically (across several instances) or horizontally (by increasing the memory and processing power). Gambify was able to test their application on a single container, and then scale up according to demand using cloudControl’s granular scaling feature.

Routing & request distribution: The cloudControl routing tier uses a cluster of reverse proxy loadbalancers to manage the acceptance and forwarding of user requests to applications running on the platform. The smart DNS provides a fast and reliable service resolving domain names in a round robin fashion. All nodes are equally distributed to the three different availability zones but can route requests to any container in any other availability zone. To keep latency low, the routing tier tries to route requests to containers in the same availability zone unless none are available. This is handled automatically, so Gambify was able to outsource this aspect  to cloudControl as a service provider.

Optimizing for Demand-Based Resource Use

Gambify monitors their performance with New Relic. This helps them identify patterns of user peaks before, during and after games. They also use Google Analytics in real time to see when the user load increases before the request load increases.

The main part of Gambify’s optimization was done in advance and indicated via load tests using Loader.io. This allowed Gambify to identify bottlenecks before their customer base grew too large to handle the workload.

Gambify App: Original State

- 10 containers @ 512 mb

- Database: MySQLd “micro” Add-on

(Loader.io: 2000 clients in 60 seconds)

From an optimization perspective, Gambify mainly tuned on database accesses by using queries via indexed fields, moving parts that are not time sensitive into asynchronous processing as well as skipping some abstraction layers (ORM) in certain requests. These optimizations helped to improve load times of single requests.

In order to handle the peak times, they use cloudControl’s granular scaling feature to scale up especially immediately after games when people log on to check the results and their scores. Then the load can increase up to 10.000 requests a minute.

During the day, Gambify runs on six (128mb) containers with one worker. During games, they scale up to 18 (1024mb)  containers with eight workers after the match. MySQL databases are challenging to scale, so they decided on using one large RDS setup. For the future, they are considering migrating to a scalable database.

Gambify App: Current State

- 18 containers (512 mb)

- MySQLd “medium” Add-on

(Loader.io: 2000 clients in 60 seconds)

Maximizing App Speed

Much of the app speed optimizations for Gambify are accomplished through asynchronous job processing in order to keep main requests fast. For this purpose, Gambify uses several third-party Add-on services that are integrated with the cloudControl platform. The job queue is processed via Redis.

User Search: One of the asynchronously processed parts are external web services, for example the user search function. Users are synchronized with a search index (Searchly), that allows people to find their friends.

User Content: Amazon S3 is used for storing user content, e.g. profile pictures. Picture uploads are asynchronously processed and resized in order to prevent mobile clients from loading a much larger original picture when it’s only necessary to display a thumbnail.

Bet Packaging: Gambify is able to process bets and post results extremely fast because all bets are grouped into packages of 1000 bets each. These are then processed from the job queue in Redis. Because they know the biggest workload occurs after each match, they start several workers to process the results as fast as possible. These jobs load all bets and calculate the points for each individual bet, then recalculate the respective score in the table.

Solution Summary

Implementing a production-ready infrastructure with no dedicated team. The Gambify team decided to focus on developing the app itself and outsource infrastructure to cloudControl. Through cloudControl, resource allocation, routing, and request distribution were automated.

Planning demand-based resource use, especially during peak times after matches. By monitoring performance with New Relic, Gambify was able to identify peak times before, during, and after matches. They used cloudControl’s granular scaling feature to scale up during peak times directly after matches, and back down during the rest of the day.

Maximizing app speed, especially when updating match results and rankings. Gambify used several integrated third-party services to maximize response times in their app, especially by using Redis for asynchronous job processing.

At the End of the Match

This is what a real request peak looks like for Gambify after a Germany match – specifically, the game against Algeria on June 30th. Interesting fact: the peak for this game is slightly lower because according to Gambify, when Germany wins, people are out celebrating instead of checking results.

Goal!