Improve performance of query with range indexes in

2019-07-18 07:08发布

Reading the docs http://exist-db.org/exist/apps/doc/indexing.xml I'm finding difficult to understand how and if I can improve the performances of a 'read' query (with 2 parameters: a string and an integer). Do eXist-db have a default structural index? Can I improve a 2 params query with a 'range index'?

More details about my XML db (note there are 2 different dbs simply merged on the same root):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<db>
    <docs>
        <doc>
            <header>
                <year>2001</year>
                <number>1</number>
                <type>O</type>
            </header>
            <metas>
                <meta>
                    <number>26001</number>
                    <details>
                        <detail>
                            <description>legge</description>
                            <number>19</number>
                            <date>14/01/1994</date>

                        </detail>
                        <detail>
                            <description>decreto legge</description>
                            <number>453</number>
                            <date>15/11/1993</date>
                        </detail>
                    </details>
                </meta>
            </metas>
        </doc>
        <doc>
            <header>
                <year>2001</year>
                <number>2</number>
                <type>O</type>
            </header>
            <metas>
                <meta>
                    <number>26002</number>
                    <details>
                        <detail>
                            <description>decreto legislativo</description>
                            <number>29</number>
                            <date>03/02/1993</date>
                        </detail>
                    </details>
                </meta>
                <meta>
                    <number>26016</number>
                    <details>
                        <detail>
                            <description>decreto legislativo</description>
                            <number>29</number>
                            <date>03/02/1993</date>

                        </detail>
                    </details>
                </meta>
            </metas>
        </doc>
    </docs>



    <full_text_docs>
        <doc>
            <header>
                <year>2001</year>
                <number>1</number>
                <type>O</type>
                <president>ferrari</president>
            </header>
            <text>lorem ipsum ...

            </text>
        </doc>
        <doc>
            <header>
                <year>2001</year>
                <number>2</number>
                <type>O</type>
                <president>ferrari</president>
            </header>
            <text>lorem ipsum......
            </text>
        </doc>
    </full_text_docs>
</db>

This is my xquery

xquery version "3.0";

let $doc := doc("/db//index_test/test_general.xml")//db/docs/doc
let $fulltxt := doc("/db//index_test/test_general.xml")//db/full_text_docs/doc

return <root> {
  for $a in $doc[metas/meta/details/detail[date="03/02/1993" and number = "29"]]/header
     return $fulltxt[header/year/text()=$a/year/text() and
            header/number/text()=$a/number/text() and
            header/type/text()=$a/type/text()
            ]

} </root>

Basically I simply find for the detail/number and detail/date that matches the input in the first db and take the results for querying the second db. The results are all the <full_text_header> documents that matches.

I would to know if I can create indexes for the fields number and date to improve performance. Note this is the ONLY query I need to optimize (the only I do on this db) obviously number and date changes :).

SOLUTION: For a clear explanation read the joewiz answer. My problem was the correct recognition of the .xconf file. It have to be placed in /db/yourcollectiondir. If you're using eXide when you create the file you should select Xml type with template "eXist-db collection configuration". When you try to save the file you will see a prompt "Apply configuration?" then click 'ok'. Just then run this xquery xmldb:reindex('/db/yourcollectiondir'). Now if all it's right when you run an xquery involving an index you will see the usage in "Monitoring and profiling".

1条回答
萌系小妹纸
2楼-- · 2019-07-18 07:59

As that documentation page states, eXist does create a structural index for all XML stored in the database. This is not an index of values, though, so without further indexes, queries based on value (rather than structure) would involve a lookup of values in the DOM. As your data grows larger, looking up values in the DOM gets slower and slower. This is where value-based indexes, such a range index, saves the day. (For a fuller explanation, see the "Indexing" section of Wolfgang Meier's "Tuning the Database" article, which is essential for getting the most performance out of eXist.)

So, yes, you can create indexes for the <number> and <date> fields. I'd recommend the "new range" index, as described on that documentation page. Your collection.xconf file setting up these indexes would look like this:

<collection xmlns="http://exist-db.org/collection-config/1.0" 
            xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <index>
        <range>
            <create qname="number" type="xs:integer"/>
            <create qname="date" type="xs:string"/>
        </range>
    </index>
</collection>

You have to store this within the /db/system/config/ collection, in a subcollection corresponding to the location of your data in the database. So if your data is located in /db/apps/myapp/data, you would place this collection.xconf file in /db/system/config/db/apps/myapp/data.

Note that the configuration here would only affect the for clause's queries of date and number values, and not the predicates in the return clause, which depend on the values of <year> and <type> elements. So, to ensure your query maximized the use of indexes, you should declare indexes on these; it seems that xs:integer would be the appropriate type for each.

Lastly, I would suggest eliminating the /text() steps, which are completely extraneous. For more on the use/abuse of text(), see Evan Lenz's article, "text() is a code smell".

Update (2016-07-17): With the updated code sample above, I have a couple of additional suggestions. First, since the code is in /db/index_test, we will store our files as follows:

Contents of <code>/db/index_test</code> collection, as seen in eXide

Assuming you're using eXide, when you store the collection.xconf file in a collection, eXide will prompt you to have a copy of the file placed in the correct location in /db/system/config. If you're not using eXide, you need to store the collection.xconf file there yourself.

Using the unmodified query, I can confirm that despite the presence of the collection.xconf file, monex shows no indexes are being applied:

monex showing no indexes applied

Let's make a few modifications to the file to ensure indexes are properly applied:

xquery version "3.0";

<root> {
    for $a in doc("/db/index_test/test_general.xml")//detail[date = "03/02/1993" and number = 29]/ancestor::doc/header
    return 
        doc("/db/index_test/test_general.xml")/db/full_text_docs/doc
            [
                header/year = $a/year and
                header/number = $a/number and
                header/type = $a/type
            ]

} </root>

With these modifications, monex shows that indexes are applied to the comparisons in the for clause:

monex showing images applied

The insights here are derived from the "Tuning the Database" article. To get full indexing for all comparisons, you will need to define additional indexes and may need to make similar modifications to your query.

One final note: the version of monex you see in these pictures is using a feature I added this weekend, called "Tare", which tries to filter out other operations from the query profiling results in order to help the user see just the effects of their own query. This feature is still just a pull request, so running the current release version, you won't see identical results.

查看更多
登录 后发表回答