I just jumped on a feature written by someone else that seems slightly inefficient, but my knowledge of JPA isn't that good to find a portable solution that's not Hibernate specific.
In a nutshell the Dao method called within a loop to insert each one of the new entities does a "entityManager.merge(object);".
Isnt' there a way defined in the JPA specs to pass a list of entities to the Dao method and do a bulk / batch insert instead of calling merge for every single object?
Plus since the Dao method is annotated w/ "@Transactional" I'm wondering if every single merge call is happening within its own transaction... which would not help performance.
Any idea?
No there is no batch insert operation in vanilla JPA.
Yes, each insert will be done within its own transaction. The @Transactional
attribute (with no qualifiers) means a propagation level of REQUIRED
(create a transaction if it doesn't exist already). Assuming you have:
public class Dao {
@Transactional
public void insert(SomeEntity entity) {
...
}
}
you do this:
public class Batch {
private Dao dao;
@Transactional
public void insert(List<SomeEntity> entities) {
for (SomeEntity entity : entities) {
dao.insert(entity);
}
}
public void setDao(Dao dao) {
this.dao = dao;
}
}
That way the entire group of inserts gets wrapped in a single transaction. If you're talking about a very large number of inserts you may want to split it into groups of 1000, 10000 or whatever works as a sufficiently large uncommitted transaction may starve the database of resources and possibly fail due to size alone.
Note: @Transactional
is a Spring annotation. See Transactional Management from the Spring Reference.
What you could do, if you were in a crafty mood, is:
@Entity
public class SomeEntityBatch {
@Id
@GeneratedValue
private int batchID;
@OneToMany(cascade = {PERSIST, MERGE})
private List<SomeEntity> entities;
public SomeEntityBatch(List<SomeEntity> entities) {
this.entities = entities;
}
}
List<SomeEntity> entitiesToPersist;
em.persist(new SomeEntityBatch(entitiesToPersist));
// remove the SomeEntityBatch object later
Because of the cascade, that will cause the entities to be inserted in a single operation.
I doubt there is any practical advantage to doing this over simply persisting individual objects in a loop. It would be an interesting to look at the SQL that the JPA implementation emitted, and to benchmark.