@BatchSize a smart or stupid use?

2019-01-07 00:01发布

问题:

First I'll explain how I understood and use @BatchSize : @BatchSize is made in order to load relations of objects in batch, making less SQL request to the database. This is specially usefull on LAZY @OneToMany relations.

However it's even useful on LAZY @OneToOne relation and @ManyToOne : if you load a list of entities from the database and ask to load a lazyed @*ToOne entity, it will load the entities by batch even if i just use a test that load the relation of the 1st entity of the list.

Note if some want to tests : This only show if the entities are not already loaded : for instance if you have a list of user with manager and list all users, when you will access to the manager, no request will be triggered since it's already loaded.

The only drawback that i see on that method is if you load a list of item from the database but only use a part of it. This is a post-filtering operation.

So let's get to the main point.

Let's assume that i make everything good to never do post-filtering-like operations even if it's makes me do native SQL queries or use DTO objects for multiselect criteria query and so on.

  1. Am I right to consider that I can just @BatchSize every lazyed relations after having carefully think about using eager loading / join and finally choose a lazy relation ?
  2. Do i have any interest to search for an adequate value for the @BatchSize or can i think "the bigger the better" ? This would mean "is there any a limit of number in "IN" SQL operator that can make my request enough slower to not be worth anymore ? I use Postgres but if you have answers for others SGBD i'm interested too.
  3. Optional question : it seems that using @BatchSize on a class isn't producing a lot of results. I still have to annotate every lazy relationships, did i miss something about it or is it useless ?

EDIT : The point of my 3 is that i'm getting a different behaviour.

Let say i'm loading a list of entities of class "A" which has a LAZY OneToMany relationship to B. Now i want to print all creationDate of B. So i'm doing a classic 2 for loop.

I annotated B with BatchSize now :

  • @OneToMany is not annotated with BatchSize : each set of B are loaded on each iteration independently without batching. So my annotation on B class seems to be totally ignored. Even if i set a value to "two" and i have 6 entries in one set, i have one query for that set.
    • @OneToMany is annotated : i have the specific query of batches that are loaded. If i fix the batch size to two and i have a total of 10 B accros i just get 5 requests : whatever the number of A i have. If i set it to 100 : i have 1 query for B objects.

PS : i'm not considering any related query to B that might fire to load B fields with fetch select/subselect.

EDIT 2 : i just found this post Why would I not use @BatchSize on every lazy loaded relationship? althought i googled and search on SO beforeposting my question, guess i didn't use the right words...

However i'm adding something different that might lead to a different answer : when i'm wondering about using BatchSize on every relations, it's after choosing if i want a eager loading, with join / select fetch or if i want lazy loading.

回答1:

  1. Yes, @BatchSize is meant to be used with lazy associations.
  2. Hibernate will execute multiple statements in most sitations anyway, even if the count of uninitialized proxies/collections is less than the specified batch size. See this answer for more details. Also, more lighter queries compared to less bigger ones may positively contribute to the overall throughput of the system.
  3. @BatchSize on class level means that the specified batch size for the entity will be applied for all @*ToOne lazy associations with that entity. See the example with the Person entity in the documentation.

The linked question/answers you provided are more concerned about the need for optimization and lazy loading in general. They apply here as well of course, but they are not related to batch loading only, which is just one of the possible approaches.

Another important thing relates to eager loading which is mentioned in the linked answers and which suggests that if a property is always used then you may get better performance by using eager loading. This is in general not true for collections and in many situations for to-one associations either.

For example, suppose you have the following entity for which bs and cs are always used when A is used.

public class A {
  @OneToMany
  private Collection<B> bs;

  @OneToMany
  private Collection<C> cs;
}

Eagerly loading bs and cs obviously suffers from N+1 selects problem if you don't join them in a single query. But if you join them in a single query, for example like:

select a from A
  left join fetch a.bs
  left join fetch a.cs

then you create full Cartesian product between bs and cs and returning count(a.bs) x count(a.cs) rows in the result set for each a which are read one by one and assembled into the entities of A and their collections of bs and cs.

Batch fetching would be very optimal in this situation, because you would first read As, then bs and then cs, resulting in more queries but with much less total amount of data that is transferred from the database. Also, the separate queries are much simpler than a big one with joins and are easier for database to execute and optimize.