I've got a BaseX XML database with only one small XML file. These file basically consists of two structures. One is PlatformCategory
with 46 instances, the other one PlatformGenericType
with 213 instances.
PlatformGenericType
has references to PlatformCategory
in the href
attribute.
<PlatformGeneralType id="/plib/platformgeneraltypes/pgt1">
<name>No statement</name>
<enum>NO_STATEMENT</enum>
<isOfPlatformCategory href="/plib/platformcategories/pc1"/>
<readOnly>true</readOnly>
</PlatformGeneralType>
<PlatformCategory id="/plib/platformcategories/pc1">
<name>No statement</name>
<enum>NO_STATEMENT</enum>
<environment>AIR</environment>
<readOnly>true</readOnly>
</PlatformCategory>
When I perform following query, it takes about six seconds to get the result:
//PlatformGeneralType[isOfPlatformCategory/@href=//PlatformCategory[environment="AIR"]/@id]
What can I do to optimize this query?
Note that I ran "optimize all".
Update: The issue on the previous query seems to be solved. But when I extends the query with following, the query takes 44,28 seconds:
/PLib/PlatformSpecificTypes/PlatformSpecificType
[isOfPlatformGeneralType/@href=/PLib/PlatformGeneralTypes/PlatformGeneralType
[isOfPlatformCategory/@href=/PLib/PlatformCategories/PlatformCategory
[environment='AIR']/@id]/@id]
The there are 8939 instances of PlatformSpecificType
and its structure:
<PlatformSpecificTypes>
<PlatformSpecificType id="/plib/platformspecifictypes/DataShip.3">
<name>Meko 360H2</name>
<lethalityLevel>LOW</lethalityLevel>
<isOfPlatformGeneralType href="/plib/platformgeneraltypes/pgt62"/>
<ownedByCountry href="/plib/countries/10"/>
</PlatformSpecificType>
</PlatformSpecificTypes>
Its query info:
Query: /PLib/PlatformSpecificTypes/PlatformSpecificType[isOfPlatformGeneralType/@href=/PLib/PlatformGeneralTypes/PlatformGeneralType[isOfPlatformCategory/@href=/PLib/PlatformCategories/PlatformCategory[environment='AIR']/@id]/@id] Result: - Hit(s): 3642 Items - Updated: 0 Items - Printed: 2048 KB - Read Locking: local [command_plib] - Write Locking: none Timing: - Parsing: 1.25 ms - Compiling: 0.71 ms - Evaluating: 44248.94 ms - Printing: 37.11 ms - Total Time: 44288.02 ms Query plan:
Database properties:
Database Properties
Name: command_plib
Size: 20247 KB
Nodes: 781606
Documents: 1
Binaries: 0
Timestamp: 2015-06-12-10-12-14
Resource Properties
Input Path: /home/sceran/Documents/PLIB/command_plib.xml
Input Size: 21354 KB
Timestamp: 2015-06-11-15-34-07
Encoding: UTF-8
CHOP: true
Indexes
Up-to-date: true
TEXTINDEX: true
ATTRINDEX: true
FTINDEX: false
LANGUAGE: English
STEMMING: true
CASESENS: true
DIACRITICS: false
STOPWORDS:
UPDINDEX: false
AUTOOPTIMIZE: false
MAXCATS: 100
MAXLEN: 96
Query info:
Compiling:
- rewriting descendant-or-self step(s)
- rewriting descendant-or-self step(s)
- converting descendant::*:PlatformGeneralType[(*:isOfPlatformCategory/@*:href = root()/descendant::*:PlatformCategory[(*:environment = "AIR")]/@*:id)] to child steps
Query:
//PlatformGeneralType[isOfPlatformCategory/@href=//PlatformCategory[environment="AIR"]/@id]
Optimized Query:
db:open-pre("command_plib",0)/*:PLib/*:PlatformGeneralTypes/*:PlatformGeneralType[(*:isOfPlatformCategory/@*:href = root()/descendant::*:PlatformCategory[(*:environment = "AIR")]/@*:id)]
Result:
- Hit(s): 55 Items
- Updated: 0 Items
- Printed: 12776 Bytes
- Read Locking: local [command_plib]
- Write Locking: none
Timing:
- Parsing: 0.55 ms
- Compiling: 0.3 ms
- Evaluating: 5786.29 ms
- Printing: 1.0 ms
- Total Time: 5788.15 ms
Query plan:
<QueryPlan compiled="true">
<IterPath>
<DBNode name="command_plib" pre="0"/>
<IterStep axis="child" test="*:PLib"/>
<IterStep axis="child" test="*:PlatformGeneralTypes"/>
<IterStep axis="child" test="*:PlatformGeneralType">
<CmpG op="=">
<CachedPath>
<IterStep axis="child" test="*:isOfPlatformCategory"/>
<IterStep axis="attribute" test="*:href"/>
</CachedPath>
<IterPath>
<Root/>
<IterStep axis="descendant" test="*:PlatformCategory">
<CmpG op="=">
<CachedPath>
<IterStep axis="child" test="*:environment"/>
</CachedPath>
<Str value="AIR" type="xs:string"/>
</CmpG>
</IterStep>
<IterStep axis="attribute" test="*:id"/>
</IterPath>
</CmpG>
</IterStep>
</IterPath>
</QueryPlan>
Update two: I suspect the structure of PlatformSpecificTypes prevents indexing. I am wondering if I change it as below, will it improves the query performance?
<PlatformSpecificTypes>
<PlatformSpecificType id="/plib/platformspecifictypes/DataShip.3">
<name>Meko 360H2</name>
<lethalityLevel>LOW</lethalityLevel>
**<isOfPlatformGeneralType>/plib/platformgeneraltypes/pgt62 </isOfPlatformGeneralType>**
<ownedByCountry href="/plib/countries/10"/>
</PlatformSpecificType>
</PlatformSpecificTypes>
Update three: I have uploaded the XML file in a gist, so that you can inspect it.
Now, when I execute following query, I takes about 28 seconds to get the result.
/root/PlSpTys/PlSpTy[isOfPlGeTy/@href=/root/PlGeTys/PlGeTy[isOfPlCt/@href=/root/PlCts/PlCt[environment='AIR']/@id]/@id]
Here is the query info:
Query:
/root/PlSpTys/PlSpTy[isOfPlGeTy/@href=/root/PlGeTys/PlGeTy[isOfPlCt/@href=/root/PlCts/PlCt[environment='AIR']/@id]/@id]
Result:
- Hit(s): 3642 Items
- Updated: 0 Items
- Printed: 257 KB
- Read Locking: local [Output6]
- Write Locking: none
Timing:
- Parsing: 0.66 ms
- Compiling: 0.34 ms
- Evaluating: 28398.32 ms
- Printing: 4.63 ms
- Total Time: 28403.97 ms
Query plan:
<QueryPlan compiled="true">
<IterPath>
<DBNode name="Output6" pre="0"/>
<IterStep axis="child" test="*:root"/>
<IterStep axis="child" test="*:PlSpTys"/>
<IterStep axis="child" test="*:PlSpTy">
<CmpG op="=">
<CachedPath>
<IterStep axis="child" test="*:isOfPlGeTy"/>
<IterStep axis="attribute" test="*:href"/>
</CachedPath>
<IterPath>
<Root/>
<IterStep axis="child" test="*:root"/>
<IterStep axis="child" test="*:PlGeTys"/>
<IterStep axis="child" test="*:PlGeTy">
<CmpG op="=">
<CachedPath>
<IterStep axis="child" test="*:isOfPlCt"/>
<IterStep axis="attribute" test="*:href"/>
</CachedPath>
<IterPath>
<Root/>
<IterStep axis="child" test="*:root"/>
<IterStep axis="child" test="*:PlCts"/>
<IterStep axis="child" test="*:PlCt">
<CmpG op="=">
<CachedPath>
<IterStep axis="child" test="*:environment"/>
</CachedPath>
<Str value="AIR" type="xs:string"/>
</CmpG>
</IterStep>
<IterStep axis="attribute" test="*:id"/>
</IterPath>
</CmpG>
</IterStep>
<IterStep axis="attribute" test="*:id"/>
</IterPath>
</CmpG>
</IterStep>
</IterPath>
</QueryPlan>
Could you help me on optimizing the query duration?