Imagine having a tool that can automatically detect JPA and Hibernate performance issues.
Wouldn’t that be just awesome?
Well,
Hypersistence Optimizer
is that tool!
And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, or Play Framework.
So, enjoy spending your time on the things you love rather than fixing performance issues in your production system on a Saturday night!
Introduction
JDBC
has long been offering support for
DML
statement batching. By default, all statements are sent one after the other, each one in a separate network round-trip. Batching allows us to send multiple statements in one-shot, saving unnecessary socket stream flushing.
Hibernate hides the database statements behind a transactional
write-behind abstraction layer
. An intermediate layer allows us to hide the JDBC batching semantics from the persistence layer logic. This way, we can change the JDBC batching strategy without altering the data access code.
Configuring Hibernate to support JDBC batching is not as easy as it should be, so I’m going to explain everything you need to do in order to make it work.
Testing time
We’ll start with the following entity model:
The
Post
has a one-to-many association with the
Comment
entity:
@OneToMany(
cascade = CascadeType.ALL,
mappedBy = "post",
orphanRemoval = true)
private List<Comment> comments = new ArrayList<>();
Or test scenario issues both
INSERT
and
UPDATE
statements, so we can validate if
JDBC
batching is being used:
LOGGER.info("Test batch insert");
long startNanos = System.nanoTime();
doInTransaction(session -> {
int batchSize = batchSize();
for(int i = 0; i < itemsCount(); i++) {
if(i % batchSize == 0 && i > 0) {
session.flush();
session.clear();
Post post = new Post(
String.format("Post no. %d", i)
int j = 0;
post.addComment(new Comment(
String.format(
"Post comment %d:%d", i, j++
post.addComment(new Comment(
String.format(
"Post comment %d:%d", i, j++
session.persist(post);
LOGGER.info("{}.testInsert took {} millis",
getClass().getSimpleName(),
TimeUnit.NANOSECONDS.toMillis(
System.nanoTime() - startNanos
LOGGER.info("Test batch update");
startNanos = System.nanoTime();
doInTransaction(session -> {
List<Post> posts = session.createQuery(
"select distinct p " +
"from Post p " +
"join fetch p.comments c")
.list();
for(Post post : posts) {
post.title = "Blog " + post.title;
for(Comment comment : post.comments) {
comment.review = "Blog " + comment.review;
LOGGER.info("{}.testUpdate took {} millis",
getClass().getSimpleName(),
TimeUnit.NANOSECONDS.toMillis(
System.nanoTime() - startNanos
This test will persist a configurable number of
Post
entities, each one containing two
Comments
. For the sake of brevity, we are going to persist 3
Posts
and the
Dialect
default batch size:
protected int itemsCount() {
return 3;
protected int batchSize() {
return Integer.valueOf(Dialect.DEFAULT_BATCH_SIZE);
Default batch support
Hibernate doesn’t implicitly employ
JDBC
batching and each
INSERT
and
UPDATE
statement is executed separately:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Configuring
hibernate.jdbc.batch_size
To enable
JDBC
batching, we have to configure the
hibernate.jdbc.batch_size
property:
A non-zero value enables use of JDBC2 batch updates by Hibernate (e.g. recommended values between 5 and 30)
We’ll set this property and rerun our test:
properties.put("hibernate.jdbc.batch_size",
String.valueOf(batchSize()));
This time, the
Comment
INSERT
statements are batched, while the
UPDATE
statements are left untouched:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]}
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
A
JDBC
batch can target one table only, so every new
DML
statement targeting a different table ends up the current batch and initiates a new one. Mixing different table statements is therefore undesirable when using
SQL
batch processing.
Ordering statements
Hibernate can sort
INSERT
and
UPDATE
statements using the following configuration options:
properties.put("hibernate.order_inserts", "true");
properties.put("hibernate.order_updates", "true");
While the
Post
and
Comment
INSERT
statements are batched accordingly, the
UPDATE
statements are still executed separately:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Adding version data batch support
There’s the
hibernate.jdbc.batch_versioned_data
configuration property we need to set, in order to enable
UPDATE
batching:
Set this property to true if your JDBC driver returns correct row counts from executeBatch(). It is usually safe to turn this option on. Hibernate will then use batched DML for automatically versioned data. Defaults to false.
We will rerun our test with this property set too:
properties.put("hibernate.jdbc.batch_versioned_data", "true");
Now both the
INSERT
and the
UPDATE
statements are properly batched:
Query:{[insert into Post (title, version, id) values (?, ?, ?)][Post no. 0,0,1]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 1,0,2]} {[insert into Post (title, version, id) values (?, ?, ?)][Post no. 2,0,3]}
Query:{[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:0,0,51]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][1,Post comment 0:1,0,52]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:0,0,53]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][2,Post comment 1:1,0,54]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:0,0,55]} {[insert into Comment (post_id, review, version, id) values (?, ?, ?, ?)][3,Post comment 2:1,0,56]}
Query:{[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:0,1,51,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][1,Blog Post comment 0:1,1,52,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:0,1,53,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][2,Blog Post comment 1:1,1,54,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:0,1,55,0]} {[update Comment set post_id=?, review=?, version=? where id=? and version=?][3,Blog Post comment 2:1,1,56,0]}
Query:{[update Post set title=?, version=? where id=? and version=?][Blog Post no. 0,1,1,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 1,1,2,0]} {[update Post set title=?, version=? where id=? and version=?][Blog Post no. 2,1,3,0]}
Benchmark
Now that we managed to configure Hibernate for
JDBC
batching, we can benchmark the performance gain of statement grouping.
the test case uses a
PostgreSQL
database installed on the same machine with the currently running
JVM
a batch size of
50
was chosen and each test iteration increases the statement count by an order of magnitude
all durations are expressed in milliseconds
The results are:
Statement count
No batch Insert duration
No batch Update duration
Batch Insert duration
Batch Update duration
30000
300000
51785
57869
16052
20954