添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
相关文章推荐
风流的人字拖  ·  More Radio Breakfast ...·  1 月前    · 
冷静的咖啡  ·  指针 delete ...·  1 月前    · 
酒量大的哑铃  ·  System.Runtime.Interop ...·  1 月前    · 
打酱油的课本  ·  Ludwig van Beethoven ...·  6 月前    · 
眼睛小的生姜  ·  尼泊尔 ...·  6 月前    · 

So I was facing a problem where I wanted to insert millions of records into the database which needed to be imported from the file.

I did some research around this and I would like to share with you what I found which helped me improve the insert records throughput by nearly 100 times.

Initially when I was just trying to do bulk insert using spring JPA’s saveAll method, I was getting a performance of about 185 seconds per 10,000 records . After doing the following changes below, the performance to insert 10,000 records was just in 4.3 seconds .

Yes, 4.3 Seconds for 10k records.

So to achieve this, I had to change the way I was inserting data.

  • Change the number of records while inserting.
  • When I was inserting initially, I was pushing all the 10k records from the list directly by calling the saveAll method. I changed this to the batch size of 30. You could also increase the batch size to even 60, But it doesn’t half the time taken to insert records. (See the table below)

    For this you need to set the hibernate property batch_size=30 .

    spring.jpa.properties.hibernate.jdbc.batch_size=30
        Enter fullscreen mode
        Exit fullscreen mode
    
    for (int i = 0; i < totalObjects; i = i + batchSize) {
        if( i+ batchSize > totalObjects){
            List<Book> books1 = books.subList(i, totalObjects - 1);
            repository.saveAll(books1);
            break;
        List<Book> books1 = books.subList(i, i + batchSize);
        repository.saveAll(books1);
    

    This reduced the time by a little, dropped from 185 secs to 153 Secs. Thats approximately 18% improvement.

    3. Change the ID generation strategy.

    This made a major impact.

    Initially, I was using the @GeneratedValue annotation with strategy i.e GenerationType.IDENTITY on my entity class.

    Hibernate has disabled batch update with this strategy, Because it has to make a select call to get the id from the database to insert each row. You can read more about it here

    I changed the strategy to SEQUENCE and provided a sequence generator.

    public class Book {
        @GeneratedValue(strategy = SEQUENCE, generator = "seqGen")
        @SequenceGenerator(name = "seqGen", sequenceName = "seq", initialValue = 1)
        private Long id;
        Enter fullscreen mode
        Exit fullscreen mode
    

    This drastically changed the insert performance as Hibernate was able to leverage bulk insert.

    From the previous performance improvement of 153 secs, the time to insert 10k records reduced to only 9 secs. Thats an increase in performance by nearly 95%.

    Note: MySQL doesn’t support creating sequences.

    To get around this, I created a table with the name of the sequence having a single field called next_val. Then I added a single row with an initial value.

    For the above sequence I created the following :

    CREATE TABLE `seq` (
      `next_val` bigint(20) DEFAULT NULL
    INSERT INTO `seq` (`next_val`) VALUES(1);
        Enter fullscreen mode
        Exit fullscreen mode
    

    Hibernate then used this table as a sequence generator.

    Next, I pushed it further to use higher batch sizes and I noticed that doubling the batch size does not double down on time. The time to insert only gradually reduces. You can see this bellow.

    The most optimal batch size for my case was a 1000 which took around 4.39 secs for 10K records. After that, I saw the performance degrading as you can see in the graph.

    Here are the stats I got.

    As always you can find the code on my GitHub repo.

    Built on Forem — the open source software that powers DEV and other inclusive communities.

    Made with love and Ruby on Rails. DEV Community © 2016 - 2024.