i have a extremely large xml-file - which is derived from the field of geo informatics. i got it from a German subsite or the OpenStreetMap-Project: the Geograpical-Engineering-site that deilvers a weekly snapshot of OpenStreetMap of a certain area: i took the germany.osm.bz2 from here http://ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de/
For doing some tests with xslt i want to run a request to find out certain entity - let us take for example the restaurants. we want to find out all the restaurants in the area.
now we can run that directly on the bz2 compressed file, that we downloaded - for example if we use the following code:
bzcat germany.osm.bz2 | xsltproc restaurants.xslt - > restaurants,csv
well i splitted the file with xml_split -which is a great perl-module from CPAN.
The problem: with the following xslt-processor i get only bad results - the parsed files werent not parsed enough i only get a minor set of informations when i run the code on a xml-file. see the xslt-processor - and below - a litte data-chunk out of the file i run and parse if you want to check it - just get the little dataset - note it is a splitted file
here you can get it: https://rapidshare.com/#!download|643p12|2523227518|germany-001.xml|100000
Note: see therefore the important lines: xmlns:xml_split="http://xmltwig.com/xml_split"
and this one here:
<xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">
Note- you can run a little test - and see how long it takes to parse time xsltproc restaurants.xslt germany-001.xml > restaurants-001.csv
real 0m0.308s
user 0m0.283s
sys 0m0.022s
here we have the xslt-processor that contains the code for parsing - ( called atest3.xslt )
<xsl:stylesheet version = '1.0'
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xml_split="http://xmltwig.com/xml_split"
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:for-each select="xml_split:root/node/tag[@k='amenity' and @v='restaurant']">
<xsl:value-of select="../@id"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="../@lat"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="../@lon"/>
<xsl:text>	</xsl:text>
<xsl:for-each select="../tag[@k='name']">
<xsl:value-of select="@v"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:value-of select="./tag[@k = 'cuisine']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'wheelchair']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'website']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'addr:country']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'addr:city']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'addr:street']/@v"/>
<xsl:text>	</xsl:text>
<xsl:value-of select="./tag[@k = 'addr:housenumber']/@v"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
and here below we have a data-chunk out of the xml-file that we have parsed: see it
<node id="52768810" lat="48.2044749" lon="11.3249434" version="7" changeset="9490517" user="wheelmap_visitor" uid="290680" timestamp="2011-10-07T20:24:46Z">
<tag k="addr:city" v="Olching" />
<tag k="addr:country" v="DE" />
<tag k="addr:housenumber" v="72" />
<tag k="addr:postcode" v="82140" />
<tag k="addr:street" v="Hauptstraße" />
<tag k="amenity" v="restaurant" />
<tag k="cuisine" v="mexican" />
<tag k="email" v="info@cantina-olching.de" />
<tag k="name" v="La Cantina" />
<tag k="opening_hours" v="Mo-Su 17:00-01:00" />
<tag k="phone" v="+49 (8142) 444393" />
<tag k="website" v="http://www.cantina-olching.com/" />
<tag k="wheelchair" v="no" />
</node>
see the results - note there are missing some parts - unfortunatly..
51923772 49.0812534 8.5637183 Zur Talschänke
52040576 49.4635433 12.4287292 Emil-Kemmer-Haus
52141326 49.4144243 12.4143153 Gasthaus Plecher
52623232 48.9293634 8.2722549 Korfu
52664989 49.0435133 8.3919370 Restaurant Zentrum
52754898 49.3243828 12.3618662 Gasthaus Irlbacher
52762875 49.0099641 8.2528132 Langasthof Stober
52765672 50.0082768 9.2139632 Wirtshaus im Frohnrad
52768810 48.2044749 11.3249434 La Cantina
52768816 48.2051698 11.3257964 Indian Palace
52768826 48.2073264 11.3276147 Dorfstub'n
52768830 48.2075968 11.3281055 Le Candele
52774284 49.0319471 8.2888353 Zum Anker
well it is somewhat a problem that i get the results - ive tried alot but at the moment i am glueless why i get the little output - that is totally contrary to the tags i have in the xslt -processor - any idea and hint will be greatly appreciatdd
btw: after all i want to run approx 5000 files that are the result of the split - and subsequently i want to collect all the results in a mysql-database...
here you can get the original-file: http://ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de ( germany.osm.bz2 01-Apr-2012 14:51 1.7G )
and here a splitted one: https://rapidshare.com/#!download|643p12|2523227518|germany-001.xml|100000
i have to refactor the coed -so the question - is - how can i get the mysql-results on a efficient way?
*update:*thx to the first answer in this thread i startet to refactor the code - but still lack of some better results. i have to retry it again..lots of changes were suggested - i did a quick walktrough on the xslt-parser: with the first trial of refactoring i got some funny results. But i will try again - i go trough all the xslt-processor-code and have a closer look if i find the errors and finally i try to refactor all the xslt-file. - any pointers and subbestions or code-snippets are greatly wellcome. Greetings your zero
It looks like your
./tag[@k = '???']/@v
xpath should be../tag[@k='???']
, because your context node is your original matchingtag
element, not thenode
element.You should consider changing your context node to make this code clearer and avoid errors like this:
Then you can use XPaths like
select="tag/@id"
andtag[@k='country']/@v
.But you should consider refactoring this code to make better use of
template
instead offor-each
.