Building nginx and Tarantool based services

Dmitriy Kalugin-Balashov

17 Feb 2016 — 5 min read

Are you familiar with this architecture? A bunch of daemons are dancing between a web-server, cache and storage.

What are the cons of such architecture? While working with it we come across a number of questions: which language (-s) should we use? Which I/O framework to choose? How to synchronize cache and storage? Lots of infrastructure issues. And why should we solve the infrastructure issues when we need to solve a task? Sure, we can say that we like some X and Y technologies and treat these cons as ideological. But we can’t ignore the fact that the data is located some distance away from the code (see the picture above), which adds latency that could decrease RPS.

The main idea of this article is to describe an alternative, built on nginx as a web-server, load balancer and Tarantool as app server, cache, storage.

Improving cache and storage

Tarantool has a number of interesting features. Tarantool isn’t just an efficient in-memory DB, but also a fully functional app server; applications are created on Lua (luajit), C, or C++, which means that any logic, no matter how complex, can be created and your fantasy is a limit. If the amount of data exceeds the memory limit, then it can be partially stored on disk using Sophia. Sophia is an optional feature so if you need to use something else then you can store the hot parts of data in-memory and the cold part of data in some other storage system. What are the benefits?

No “third parties”. The hot data part is located on the same level with the code.
Hot data in-memory
Lua applications are simple and easily updated
Safe and production ready - Tarantool supports transactions, replication, and sharding

Improving web-server

The ultimate data consumer is your user. Usually the user receives data from app server via nginx as a balancer/proxy. The option of creating a daemon capable of communicating with both Tarantool and HTTP wouldn’t work, as it brings us back to the first image where we started. So let’s try to look at this situation from a different angle and ask ourselves another question: “How to get rid of the third party between the data and the user?” The answer to this question was our implementation of the Tarantool nginx upstream module.

About nginx upstream

Nginx upstream is a persistent connection via pipe/socket and backend referred to below as “proxying”. Nginx offers a variety of features for creating the upstream rules; the following possibilities become of key importance for HTTP proxying in Tarantool:

Load balancing across many Tarantool instances via nginx upstream
The possibility to have a backup

All these make it possible to:

Distribute the load across Tarantool instances; for example, together with sharing you can build a cluster with an even load distribution between nodes
Create a fault tolerance system with the help of Tarantool replication
Using item 1 and 2 to get a fault tolerance cluster

An example of nginx configuration that partially illustrates the capabilities of Tarantool and nginx:

# Proxying settings in Tarantool
upstream tnt {
server 127.0.0.1:10001; # first server located on localhost
server node.com:10001; # second someplace else
server unix:/tmp/tnt; # third via unix socket
server node.backup.com backup; # here backup
}# HTTP-server
server {
listen 8081 default;
location = /tnt/pass {
# Telling nginx that we need to use Tarantool upstream module
# and specify the name upstream
tnt_pass tnt;
}
}

More information on nginx upstream configuration can be found here: http://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream

About nginx Tarantool upstream module

The main features

The module is activated in nginx.conf by tnt_pass command
Transform HTTP+JSON to Tarantool protocol
Non-blocking I/O in both directions
All nginx and nginx upstream features
The module allows you invoke stored Tarantool procedures via a JSON-based Protocol
The data is delivered through HTTP(S) POST, which is convenient for modern web-apps and not only for them

Input data[ { “method”: STR, “params”:[arg0 … argN], “id”: UINT }, …N ]

“method”

The name of a stored procedure. The name should match the procedure name in Tarantool. For example, to invoke the lua-function do_something(a, b), we need: “method”: “do_something”

”params”

The arguments of a stored procedure. For example, to send the arguments to the lua-function do_something(a, b), we need: “params”: [ { “field1”: [ {“a”: ”b”} ], 2 ]

“id”

Numerical identifier; set up by a user

Output data[ { “result”: JSON_RESULT_OBJECT, “id”:UINT, “error”: { “message”: STR, “code”: INT } }, …N ]

“result”

The data returned by a stored procedure. For example, lua-function do_something(a, b) brings back: return {1, 2} то “result”: [[1, 2]]

“id”

Numerical identifier; set up by a user

“error”

In case an error occurs, the information on what caused it will be shown here

Let’s try it

Starting up nginx$ git clone https://github.com/tarantool/nginx_upstream_module.git
$ cd nginx_upstream_module
$ git submodule update -init -recursive
$ git clone https://github.com/nginx/nginx.git
$ cd nginx && git checkout release-1.9.7 && cd -
$ make build-all-debug

“build-all-debug” is a debug-version. We are aiming at less nginx configuration. For those who want to configure from scratch, there is a “build-all”.$ cat test-root/conf/nginx.conf
http {
# Adds one Tarantool server as a backend
upstream echo {
server 127.0.0.1:10001;
}
server {
listen 8081 default; #goes to *:8081
server_name tnt_test;
location = /echo { # on *:8081/echo we send ‘echo’
tnt_pass echo;
}
}
}$ ./nginx/obj/nginx # starting up nginx

Starting up Tarantool

Tarantool can be set up with packages or built.-- hello-world.lua file
-- This is our stored procedure, it’s fairly simple and it doesn’t use Tarantool as a DB.
--All it does — is just returning its first argument.
function echo(a)
return {{a}}
endbox.cfg {
listen = 10001; -- Specifying the location of Tarantool
}box.schema.user.grant('guest', 'read,write,execute') -- Give access

If you set up Tarantool with packages, you can start it up this way:$ tarantool hello-world.lua # the first argument is the name of lua-script.

Invoking the stored procedure

Echo stored procedure can be invoked by any HTTP-connector; all you need to do — HTTP POST 127.0.0.1/echo and in the body there will be the following JSON (see Input Data){
"method": "echo", // Tarantool method name
"params": [
{"Hello world": "!"} // 1 method’s argument
],
"id": 1
}

I’ll invoke this procedure with wget$ wget 127.0.0.1:8081/echo — post-data '{"method": "echo","params":[{"Hello world": "!"}],"id":1}'$ cat echo
{"id":1,"result":[[{"hello world":"!"}]]}

Other examples:

Let’s sum it up

The pros of using Tarantool nginx upstream module

No “third parties”; as a rule, the code and the data are on the same level
Relatively simple configuration
Load balancing on Tarantool nodes
High performance speed, low latency
JSON-based protocol instead of binary protocol; no need to search for Tarantool driver, JSON can be found anywhere
Tarantool Sharing/Replication and nginx = cluster solution. But that’s the topic for another article
The solution is used in production

The cons

Overhead JSON instead of more compact and fast MsgPack
It’s not a packaged solution. You need to configure it, to think how to deploy it

Plans

OpenResty and nginScript support
WebSocket and HTTP 2.0 support

The benchmark results - which are actually pretty cool- will be in a different article. Tarantool and Upstream Module is open source and welcoming to new users. If you wish to try it out, use it or share your ideas, go to GitHub, google group.