Scaling PostgreSQL using CUDA
Combining GPU power with PostgreSQL
PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily.
Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications.
When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided?
CUDA is C / C++
Basically a CUDA program is simple a C program with some small extensions. The CUDA subsystem transforms your CUDA program to normal C code which can then be compiled and linked nicely with existing code. This also means that CUDA code can basically be used to work inside a PostgreSQL stored procedure easily. The advantages of this mechanism are obvious:
GPUs can do matrix and FPU related operations hundreds of times faster than any CPU
the GPU is used inside the database and thus no data has to be transported over slow lines
basically any NVIDIA graphics card can be used
you get enormous computing power for virtually zero cost
you can even build functional indexes on top of CUDA stored procedures
not so many boxes are needed because one box is ways faster
How to make it work?
How to make this all work now? The goal for this simplistic example is to generate a set of random number on the CPU, copy it to the GPU and make the code callable from PostgreSQL.
Here is the function to generate random numbers and to copy them to the GPU:
/* implement random generator and copy to CUDA */
nn_precision*
generate_random_numbers(int number_of_values)
{
nn_precision *cuda_float_p;
/* allocate host memory and CUDA memory */
nn_precision *host_p = (nn_precision *)pg_palloc(sizeof(nn_precision) * number_of_values);
CUDATOOLS_SAFE_CALL( cudaMalloc( (void**) &cuda_float_p,
sizeof(nn_precision) * number_of_values));
/* create random numbers */
for (int i = 0; i < number_of_values; i++)
{
host_p[i] = (nn_precision) drand48();
}
/* copy data to CUDA and return pointer to CUDA structure */
CUDATOOLS_SAFE_CALL( cudaMemcpy(cuda_float_p, host_p,
sizeof(nn_precision) * number_of_values, cudaMemcpyHostToDevice) );
return cuda_float_p;
}
Now we can go and call this function from a PostgreSQL stored procedure:/* import postgres internal stuff */
#include "postgres.h"
#include "fmgr.h"
#include "funcapi.h"
#include "utils/memutils.h"
#include "utils/elog.h"
#include "cuda_tools.h"
PG_MODULE_MAGIC;
/* prototypes to silence compiler */
extern Datum test_random(PG_FUNCTION_ARGS);
/* define function to allocate N random values (0 - 1.0) and put it into the CUDA device */
PG_FUNCTION_INFO_V1(test_random);
Datum
test_random(PG_FUNCTION_ARGS)
{
int number = PG_GETARG_INT32(0);
nn_precision *p = generate_random_numbers(number);
cuda_free_array(p);
PG_RETURN_VOID();
}
This code then now be nicely compiled just like any other PostgreSQL C extension. The test random function can be called just like this:SELECT test_random(1000);
Of course this is a just brief introduction to see how things can practically be done. A more realistic application will need more thinking and can be integrated into the database even more closely.
More information:
Professional CUDA programming
Professional PostgreSQL services
The official PostgreSQL Website
The official CUDA site