Product

Batoo JPA - The new JPA Implementation that runs over 15 times faster...

This post is by Hasan Ceylan, an Open Source software enthusiast from Istanbul.

I loved the JPA 1.0 back in early 2000s. I started using it together with EJB 3.0 even before the stable releases. I loved it so much that I contributed bits and parts for JBoss 3.x implementations.

Those were the days our company was considerably still small in size. Creating new features and applications were more priority than the performance, because there were a lot of ideas that we have and we needed to develop and market those as fast as we can. Now, we no longer needed to write tedious and error prone xml descriptions for the data model and deployment descriptors. Nor we needed to use the curse called “XDoclet”.

On the other side, our company grew steadily, our web site has become the top portal in the country for live events and ticketing. We now had the performance problems! Although the company grew considerably, due to the economics in the industry, we did not make a lot of money. The challenge we had was our company was a ticketing company. Every e-commerce business has high and low seasons. But for ticketing there is low seasons and high hours. While you sell avarage x tickets an hour, when a blockbuster event goes on sale suddenly demand becomes 1000s of xs for an hour. Welcome to hell!

We worked day and night to tweak and enhance the application to use whatever available to keep it up on a big day. To be frank there was always a bigger event that was capable of bringing the site down no matter how hard we tried.

The dream was over, I came to realize that developing applications on top of frameworks is a bit “be careful!” along with “fun”.

I Kept Learning

I loved programming, I loved Java, I loved opensource. I developed almost every possible type applications on every possible platform I could. For the rest I went in and discovered stuff. I learned a lot from masters thanks to open source. In contrast to most, I read articles and codes written by great programmers like Linus Torvalds, Gavin King, Ed Merks and so many others.

With the experiences I gathered, I quit the ticketing company I loved and became a Software Consultant. This opened a new era in front of me that there were a lot of industries and a lot of different platforms and industries.

In each project I became the performance police of the application.

I am now the performance freak!

I Took The Red Pill!

One day I said to myself, could JPA be faster? If yes, how fast can it be. I spent about two weeks to create an entitymanager that persisted and loaded entities. Then I ran it and compared the results to ones off of Hibernate. The results were not really promising I was only about %50 faster than Hibernate in persisting and finding the entities. I spent another week to tweak the loops, cached metamodel chunks, changed access to classes from interfaces to abstract classes, modified the lists to arrays and so many other things. Suddenly I had a prototype that were 50+ times faster than Hibernate!

Development of Batoo JPA

I was astonished by how drastically performance went up by just paying attention to performance centric coding. By then I was using Visual VM to measure the times spent in the JPA layer. I got down and wrote a self profiling tool that measured the CPU resources spent at the JPA Layer and started implementing every aspect of the JPA 2.0 Specification. After each iteration I re-run the benchmark and when the performance dropped considerably I went back to changes and inspected the new code line by line - the profiling tool I created reported performance hit of every line of the JPA Stack.

It took about 6 months to implement the specification as a whole, on top of it, I introduced a Maven Plugin to create bytecode instrumentation at build time and a complementary Eclipse Plugin to allow use of instrumentation in Eclipse IDE.

After a carriage of 6 months Batoo JPA was born in August 2012. it measured over 15 times faster than Hibernate.

Benchmark

As stated earlier, a benchmark was introduced to measure every micro development iteration of Batoo JPA. This benchmark was not created to put forward the areas Batoo JPA was fast so that other would believe in Batoo JPA, but was created to put together a most common domain model and persistence operations that existed in almost every JPA application - so that I knew how fast Batoo JPA was.

Performance Metrics

The scenario is:

A Person object
- With phonenumbers - PhoneNumber object
- With addresses - Address object
  - That point to country - Country Object

Common life-cycle tasks has been introduced:

Persist 100K person objects with two phone numbers and two addresses in lots of 10 per session
Locate and load 250K person objects with lots of 10 per session
Remove 5K person objects with lots of 5 per session
Update 100K person objects with lots of 100
Query person objects 25K times using Object Oriented Criteria Querying API.
Query person objects 25K times using JPQL - Java Persistence Query Language, an SQL-like query scripting language.

For the sake of simplicity, the benchmark was run on top of in-memory embedded Derby with the profiler slicing the times spent at the

Unit Test Layer
JPA Layer
Derby Layer

The times spent at the Unit Test Layer is omitted from the Results due to irrelevancy.

Results

The times given in the below tables are in milliseconds spent in the JPA layer while running the benchmark scenario. The same tests are run for Batoo and Hibernate JPA in different runs to isolate boot, memory, cache, garbage collection etc. effects.

The tables below show

the total time spent at Derby Layer as DB Operation Total
the type of the test as Test
the times for each test at Derby Layer as DB Operation
the times for each test at JPA Layer as Core Operation
the total time spent at JPA Layer as Core Operation Total
the total time spent at both JPA and Derby Layers as Operation Total

Below are the ratios of CPU resources spent by Hibernate and Batoo JPA. It is assumed that an an application generates average 1 save, 5 locate, 2 remove and 3 update and 5 + 5 total of ten queries in ratios. Now although these numbers are extremely dependent on the application nature, some sort of assumption is needed to measure the overall speed comparison.

Given the scenario above, Batoo JPA measures over 15 times faster than Hibernate - the leading JPA implementation.

As you may have noticed Batoo JPA not only performs insanely fast at the JPA Layer it also employs a number of optimizations to relieve the pressure on the database. This is why Batoo JPA measures half the time at DB Layer in comparison to the one off of Hibernate.

Interpretation of Results

We do appreciate that JPA is not the single part of an application. But we do believe that the current JPA implementation consume quite a bit of your server budget. While a typical application cluster spends CPU resources for persistence layer about %20 to %40, Batoo JPA will well be able to bring your cluster down to half of its size allowing you save a lot on licensing administration and hardware, as well as room to scale up even for non-cluster friendly applications - in my experience I saw applications running on 96 core Solaris systems simply because they are not scalable.

Conclusion

We have managed to create a JPA Product that allows you to enjoy the great features of JPA Technology but also do not require you to compromise on performance!

On top of that Batoo JPA is developed using the Apache Coding Standards and has valuable documentation within the code. The project codebase is released with LGPL license and there is absolutely no closed source part and we envision that it would be that way forever.

As stated earlier, it also has a complementary Maven and Eclipse plugin to provide instrumentation for build and development phases.

Batoo JPA deviates from the specification almost zero, making it easy for existing JPA applications be migrated to Batoo JPA, while requiring no additional learning phase to start using it.

Last but not the least, Batoo JPA not only saves you when you run your application, but also during the time you deploy your application. Batoo JPA employs parallel deployer managers to handle deployment in parallel. Considering a developer deploys the application during his / her development phase well 10x times a day if not 100, with a moderately large domain model this may take quite a bit of developers time when summed up. Although we haven’t made a concrete benchmark on deployment speed, we know that Batoo JPA deploys about 3 4 times faster than Hibernate.