I've been porting an SQL database over to exist over the last few weeks, and while I so far managed to get over all previous hindrances, I've now run into something for which neither the official documentation, searching online, nor common sense was able to help.
In short, I have a pretty big collection (about 90k entries, spread over 20 files), with most individual entries looking something like this (this is a massive simplification so I can get the point across):
<document>
<document_id>Numerical Value</document_id>
<page_id>Some other numerical value</page_id>
</document>
I then pass a value through php, lets call it $val
The strange part is that when I run the standard query
$p in collection("/db/folder_location")//documentset/document[xs:integer(document_id) eq $val]
No matter what value I pass, it returns all the results in a matter of seconds. If I slightly modify it, however, making it:
$p in collection("/db/folder_location")//documentset/document[xs:integer(page_id) eq $val]
It either takes over 30 seconds to return the values or simply stays locked in a running query and never returns anything. Of all the 30 queries I already converted, this is the only time I ran into this problem and could not find a workaround.
To address the query performance problem, I would suggest some changes to your query and/or the addition of a range index on the document_id
and page_id
elements.
Your query casts all document_id
and page_id
elements as xs:integer
. This is an inefficient operation given a large dataset. Consider (a) removing this type casting, (b) reversing it (cast $val
as xs:string
), or (c) adding a range index on these two elements, with type="xs:integer"
. This latter option will let you remove the casting in your predicate (allowing you to change it to document[document_id eq $val]
and document[page_id eq $val]
), and the index should greatly speed the lookup.
To add a range index for your query, create a collection configuration document like this:
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index xmlns:xs="http://www.w3.org/2001/XMLSchema">
<range>
<create qname="document_id" type="xs:integer"/>
<create qname="page_id" type="xs:integer"/>
</range>
</index>
</collection>
If your data is in /db/folder_location
, then store this document as collection.xconf
in /db/system/config/db/folder_location
. Then reindex your collection with xmldb:reindex("/db/system/config/db/folder_location")
. As the documentation on range indexes states, with this index definition in place:
indexes will be used automatically for general or value comparisons as well as string functions like fn:contains
, fn:starts-with
, fn:ends-with
.
For more on range indexes in eXist, see https://exist-db.org/exist/apps/doc/newrangeindex.xml. For query optimization techniques, see https://exist-db.org/exist/apps/doc/tuning.xml. For indexes in general in eXist, see https://exist-db.org/exist/apps/doc/indexing.xml.