Portable JPA Batch / Bulk Insert

2020-06-23 05:26发布

问题:

I just jumped on a feature written by someone else that seems slightly inefficient, but my knowledge of JPA isn't that good to find a portable solution that's not Hibernate specific.

In a nutshell the Dao method called within a loop to insert each one of the new entities does a "entityManager.merge(object);".

Isnt' there a way defined in the JPA specs to pass a list of entities to the Dao method and do a bulk / batch insert instead of calling merge for every single object?

Plus since the Dao method is annotated w/ "@Transactional" I'm wondering if every single merge call is happening within its own transaction... which would not help performance.

Any idea?

回答1:

No there is no batch insert operation in vanilla JPA.

Yes, each insert will be done within its own transaction. The @Transactional attribute (with no qualifiers) means a propagation level of REQUIRED (create a transaction if it doesn't exist already). Assuming you have:

public class Dao {
  @Transactional
  public void insert(SomeEntity entity) {
    ...
  }
}

you do this:

public class Batch {
  private Dao dao;

  @Transactional
  public void insert(List<SomeEntity> entities) {
    for (SomeEntity entity : entities) {
      dao.insert(entity);
    }
  }

  public void setDao(Dao dao) {
    this.dao = dao;
  }
}

That way the entire group of inserts gets wrapped in a single transaction. If you're talking about a very large number of inserts you may want to split it into groups of 1000, 10000 or whatever works as a sufficiently large uncommitted transaction may starve the database of resources and possibly fail due to size alone.

Note: @Transactional is a Spring annotation. See Transactional Management from the Spring Reference.



回答2:

What you could do, if you were in a crafty mood, is:

@Entity
public class SomeEntityBatch {

    @Id
    @GeneratedValue
    private int batchID;
    @OneToMany(cascade = {PERSIST, MERGE})
    private List<SomeEntity> entities;

    public SomeEntityBatch(List<SomeEntity> entities) {
        this.entities = entities;
    }

}

List<SomeEntity> entitiesToPersist;
em.persist(new SomeEntityBatch(entitiesToPersist));
// remove the SomeEntityBatch object later

Because of the cascade, that will cause the entities to be inserted in a single operation.

I doubt there is any practical advantage to doing this over simply persisting individual objects in a loop. It would be an interesting to look at the SQL that the JPA implementation emitted, and to benchmark.



标签: jpa insert bulk