The "N+1 selects problem" is generally stated as a problem in Object-Relational mapping (ORM) discussions, and I understand that it has something do to with having to make a lot of database queries for something that seems simple in the object world.
Does anybody have a more detailed explanation of the problem?
That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.
The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.
Consider:
and tables with a similar structure. A single query for the address "22 Valley St" may return:
The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.
A N+1 query for the same address used above would result in:
with a separate query like
and resulting in a separate data set like
and the final result being the same as above with the single query.
The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.
It is much faster to issue 1 query which returns 100 results than to issue 100 queries which each return 1 result.
Take Matt Solnit example, imagine that you define an association between Car and Wheels as LAZY and you need some Wheels fields. This means that after the first select, hibernate is going to do "Select * from Wheels where car_id = :id" FOR EACH Car.
This makes the first select and more 1 select by each N car, that's why it's called n+1 problem.
To avoid this, make the association fetch as eager, so that hibernate loads data with a join.
But attention, if many times you don't access associated Wheels, it's better to keep it LAZY or change fetch type with Criteria.
Here's a good description of the problem - http://www.realsolve.co.uk/site/tech/hib-tip-pitfall.php?name=why-lazy
Now that you understand the problem it can typically be avoided by doing a join fetch in your query. This basically forces the fetch of the lazy loaded object so the data is retrieved in one query instead of n+1 queries. Hope this helps.
Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.
Factors:
Lazy mode for Supplier set to “true” (default)
Fetch mode used for querying on Product is Select
Fetch mode (default): Supplier information is accessed
Caching does not play a role for the first time the
Supplier is accessed
Fetch mode is Select Fetch (default)
Result:
This is N+1 select problem!
The N+1 query issue happens when you forget to fetch an association and then you need to access it:
Which generates the following SQL statements:
First, Hibernate executes the JPQL query, and a list of
PostComment
entities is fetched.Then, for each
PostComment
, the associatedpost
property is used to generate a log message containing thePost
title.Because the
post
association is not initialized, Hibernate must fetch thePost
entity with a secondary query, and for NPostComment
entities, N more queries are going to be executed (hence the N+1 query problem).First, you need proper SQL logging and monitoring so that you can spot this issue.
Second, this kind of issue is better to be caught by integration tests. You can use an automatic JUnit assert to validate the expected count of generated SQL statements. The db-unit project already provides this functionality, and it's open source.
When you identified the N+1 query issue, you need to use a JOIN FETCH so that child associations are fetched in one query, instead of N. If you need to fetch multiple child associations, it's better to fetch one collection in the initial query and the second one with a secondary SQL query.