Reading the docs http://exist-db.org/exist/apps/doc/indexing.xml I'm finding difficult to understand how and if I can improve the performances of a 'read' query (with 2 parameters: a string and an integer). Do eXist-db have a default structural index? Can I improve a 2 params query with a 'range index'?
More details about my XML db (note there are 2 different dbs simply merged on the same root):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<db>
<docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26001</number>
<details>
<detail>
<description>legge</description>
<number>19</number>
<date>14/01/1994</date>
</detail>
<detail>
<description>decreto legge</description>
<number>453</number>
<date>15/11/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26002</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
<meta>
<number>26016</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
</docs>
<full_text_docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum ...
</text>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum......
</text>
</doc>
</full_text_docs>
</db>
This is my xquery
xquery version "3.0";
let $doc := doc("/db//index_test/test_general.xml")//db/docs/doc
let $fulltxt := doc("/db//index_test/test_general.xml")//db/full_text_docs/doc
return <root> {
for $a in $doc[metas/meta/details/detail[date="03/02/1993" and number = "29"]]/header
return $fulltxt[header/year/text()=$a/year/text() and
header/number/text()=$a/number/text() and
header/type/text()=$a/type/text()
]
} </root>
Basically I simply find for the detail/number
and detail/date
that matches the input in the first db and take the results for querying the second db. The results are all the <full_text_header>
documents that matches.
I would to know if I can create indexes for the fields number
and date
to improve performance. Note this is the ONLY query I need to optimize (the only I do on this db) obviously number and date changes :).
SOLUTION:
For a clear explanation read the joewiz answer. My problem was the correct recognition of the .xconf file. It have to be placed in /db/yourcollectiondir. If you're using eXide when you create the file you should select Xml type with template "eXist-db collection configuration". When you try to save the file you will see a prompt "Apply configuration?" then click 'ok'. Just then run this xquery xmldb:reindex('/db/yourcollectiondir')
.
Now if all it's right when you run an xquery involving an index you will see the usage in "Monitoring and profiling".
As that documentation page states, eXist does create a structural index for all XML stored in the database. This is not an index of values, though, so without further indexes, queries based on value (rather than structure) would involve a lookup of values in the DOM. As your data grows larger, looking up values in the DOM gets slower and slower. This is where value-based indexes, such a range index, saves the day. (For a fuller explanation, see the "Indexing" section of Wolfgang Meier's "Tuning the Database" article, which is essential for getting the most performance out of eXist.)
So, yes, you can create indexes for the
<number>
and<date>
fields. I'd recommend the "new range" index, as described on that documentation page. Yourcollection.xconf
file setting up these indexes would look like this:You have to store this within the
/db/system/config/
collection, in a subcollection corresponding to the location of your data in the database. So if your data is located in/db/apps/myapp/data
, you would place thiscollection.xconf
file in/db/system/config/db/apps/myapp/data
.Note that the configuration here would only affect the
for
clause's queries ofdate
andnumber
values, and not the predicates in thereturn
clause, which depend on the values of<year>
and<type>
elements. So, to ensure your query maximized the use of indexes, you should declare indexes on these; it seems thatxs:integer
would be the appropriate type for each.Lastly, I would suggest eliminating the
/text()
steps, which are completely extraneous. For more on the use/abuse oftext()
, see Evan Lenz's article, "text() is a code smell".Update (2016-07-17): With the updated code sample above, I have a couple of additional suggestions. First, since the code is in
/db/index_test
, we will store our files as follows:Assuming you're using eXide, when you store the
collection.xconf
file in a collection, eXide will prompt you to have a copy of the file placed in the correct location in/db/system/config
. If you're not using eXide, you need to store thecollection.xconf
file there yourself.Using the unmodified query, I can confirm that despite the presence of the
collection.xconf
file, monex shows no indexes are being applied:Let's make a few modifications to the file to ensure indexes are properly applied:
With these modifications, monex shows that indexes are applied to the comparisons in the
for
clause:The insights here are derived from the "Tuning the Database" article. To get full indexing for all comparisons, you will need to define additional indexes and may need to make similar modifications to your query.
One final note: the version of monex you see in these pictures is using a feature I added this weekend, called "Tare", which tries to filter out other operations from the query profiling results in order to help the user see just the effects of their own query. This feature is still just a pull request, so running the current release version, you won't see identical results.