how h2 chooses right/wrong index in Join

2019-08-21 03:04发布

问题:

I had an issues with a named query in Java, but the issue was that the problem was in H2.

I thought ANALYZE was my Solution and would solve my problem. It did locally on my dev machine. On client side it did made it worse.

Scenario: I have an H2 Database with data version 105. After importing some more data it becomes version 106.

The Table looks like

The Query (get the rows with given guid, local and highest version):

SELECT tdo.TECDOC_GUID as guid, tdo.TECDOC_LOCALE as locale , tdo.TECDOC_VERSION as version, tdo.DATA as data
FROM TECDOC_OBJECTS tdo
LEFT OUTER JOIN TECDOC_OBJECTS tdo1
ON (
    tdo.TECDOC_GUID = tdo1.TECDOC_GUID AND 
    tdo.TECDOC_LOCALE = tdo1.TECDOC_LOCALE AND 
    tdo.TECDOC_VERSION < tdo1.TECDOC_VERSION)
WHERE tdo1.id IS NULL 
AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
AND tdo.TECDOC_LOCALE = 'de';

Before I ran ANALYZE command the execution plan (scanCount really low):

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_TECDOC_GUID: TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0') */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 19 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 37
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 6 (16%)
TECDOC_OBJECTS.IDX_TECDOC_GUID read: 8 (21%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 23 (62%)
*/

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_LOCALE = 'de'
        AND TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
     */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 287385 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 11891
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 11884 (99%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 7 (0%)
*/

After I ran ANALYZE command the execution plan (scanCount really high):

SELECT
    TDO.TECDOC_GUID AS GUID,
    TDO.TECDOC_LOCALE AS LOCALE,
    TDO.TECDOC_VERSION AS VERSION,
    TDO.DATA AS DATA
FROM PUBLIC.TECDOC_OBJECTS TDO
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_LOCALE = 'de'
        AND TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
     */
    /* WHERE (TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
        AND (TDO.TECDOC_LOCALE = 'de')
    */
    /* scanCount: 287385 */
LEFT OUTER JOIN PUBLIC.TECDOC_OBJECTS TDO1
    /* PUBLIC.IDX_GUID_LOCALE_VERSION: TECDOC_GUID = TDO.TECDOC_GUID
        AND TECDOC_LOCALE = TDO.TECDOC_LOCALE
        AND TECDOC_VERSION > TDO.TECDOC_VERSION
     */
    ON (TDO.TECDOC_VERSION < TDO1.TECDOC_VERSION)
    AND ((TDO.TECDOC_GUID = TDO1.TECDOC_GUID)
    AND (TDO.TECDOC_LOCALE = TDO1.TECDOC_LOCALE))
    /* scanCount: 4 */
WHERE (TDO.TECDOC_LOCALE = 'de')
    AND ((TDO.TECDOC_GUID IN('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'))
    AND (TDO1.ID IS NULL))
/*
total: 11891
TECDOC_OBJECTS.IDX_GUID_LOCALE_VERSION read: 11884 (99%)
TECDOC_OBJECTS.TECDOC_OBJECTS_DATA read: 7 (0%)
*/

But on my developer laptop, after ANALYZE the query is still fast. Somehow H2 uses the wrong index (as it can only use one index per join, according to documentation).

Does anyone has any suggestions?

回答1:

Your query is not complex. I think the key aspect of it is in the where condition.

WHERE tdo1.id IS NULL 
  AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6',
    'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
  AND tdo.TECDOC_LOCALE = 'de';

For some reason H2 is using the index the wrong way. I would try to rephrase the condition, and see how H2's SQL optimizer works it out.

For example, you can try option #1:

SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6'
    AND tdo.TECDOC_LOCALE = 'de'
     OR tdo.TECDOC_GUID = 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 

Or you can decouple the query in two to make sure it uses the index, as in option #2:

SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 
UNION ALL
SELECT
    ... -- columns, FROM, and OUTER JOIN here
  WHERE tdo.TECDOC_GUID = 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0'
    AND tdo.TECDOC_LOCALE = 'de'
    AND tdo1.id IS NULL 

This way, you are using equality only when searching. This one is much simpler to understand to the SQL optimizer. Note the use of union all that is cheaper than union.



回答2:

what somehow solved the issue is that I used

USE INDEX

to specify which index it should use.

Here is the Query which uses a certain index by force (or index hint http://www.h2database.com/html/performance.html#database_performance_tuning).

SELECT tdo.TECDOC_GUID as guid, tdo.TECDOC_LOCALE as locale , tdo.TECDOC_VERSION as version, tdo.DATA as data
FROM TECDOC_OBJECTS tdo USE INDEX (IDX_TECDOC_GUID)
LEFT OUTER JOIN TECDOC_OBJECTS tdo1
ON (
    tdo.TECDOC_GUID = tdo1.TECDOC_GUID AND 
    tdo.TECDOC_LOCALE = tdo1.TECDOC_LOCALE AND 
    tdo.TECDOC_VERSION < tdo1.TECDOC_VERSION)
WHERE tdo1.id IS NULL 
AND tdo.TECDOC_GUID in ('GUID-F2F77CE5-D8F5-4286-9A30-8FD500F735F6', 'GUID-41FD28DC-63C0-44D0-B8AE-0FCF7C78CEB0')
AND tdo.TECDOC_LOCALE = 'de';

This will solve this issue. If you use it with Java and Hibernate, be aware that the parser of H2 does not understand USE INDEX in versions before 1.4.194. I had the issue, that with version 1.4.194 some other issues came up. And I deleted some combined Indexes in my table.

Cheers