可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am building an application that works primarily with GPX files as the input data. Given the fact that gpx files are supposed to be defined by the gpx schema (as defined here: http://www.topografix.com/gpx.asp), the first thing that I taught is reasonable to do with my application was to validate the input file against the schema(bearing in mind the different versions of course). So far so good. For the initial files I was testing, everything was perfect. However, sometimes I use .gpx files which turn out to be invalid against the relevant gpx schema. So, I was trying to import them with other similar tools and strangely, no error was returned and the file was parsed perfectly. So, the most logical conclusion is that there is a problem with my code. However, after a thorough investigation, my code was totally fine and no problems with it were at present. This was even verified by the suggested validation technique from topografix.com ( http://www.topografix.com/gpx_validation.asp ) , and it again concluded the file is invalid.
So, it turns out that there are some GPS devices/GPS recording systems/etc. which produce .gpx files without conforming to the official GPX schema. This conclusions leads me to ask the question: WHY ? I seriously do not understand the idea behind it. Furthermore, with most of the invalid files I have found, the problem is not something that may be regarded as an additional feature but is something like not following the right order with some elements' children which I consider to be totally stupid.
This leads me to ask two questions to you. Firstly, I would be happy if someone can explain to me why is that a lot of the GPX files that I found on the web do not conform to the official GPX schema. Secondly, I would like to ask you how do you deal with this problem if you are to parse GPX files. After all, the track points will be in the file anyway, so do I simply have to ignore XML schema validation and proceed with direct parsing ? But then again, if there is a misspelled attribute name, my system would crash. Any information on how do you deal with parsing GPS files will be very much appreciated.
Thanks for your time and help.
Regards,
Petar
EDIT: I have posted a new thread: GPX parsing patterns and "standards" where I am asking how people are actually parsing GPX files in practice. If you have an idea please post your answer there.
回答1:
The schema authors chose to use <xs:sequence>
instead of <xs:all>
. XSD sequences are order-specific.
A misspelled attribute name would be invalid input. You have to validate and fail gracefully. Computers do not do well with ambiguity.
Hope that helps...
UPDATE:
Sorry, allow me to elaborate then. The problem was created by the authors of the schema AND the GPX output authors (of the various software packages and devices).
Basically, if a person can look at piece of data and understand what it means, the onus is on the software implementer to create flexible validation so that the program is usable.
For example, suppose you have a input field that you're supposed to enter a dollar amount, and the user enters " $.05", the software should be smart enough to recognize that as 5 cents and smart enough to recognize that there's a space in front and it's useless.
The same applies for files from a device. Yes, the problem was created by them. Yes, it's ridiculous to have to treat output from a device as user input when there's a perfectly good strict definition for the format. But that's the problem that you're currently faced with. And at the end of the day, no one cares what technical challenges you had to overcome to make it work. All they care about is "does it work" and "how useful is this for me".
So, if you see that the fields are out of order, but otherwise all of the required data is present, rearrange the fields so they pass validation. Make your import flexible. Fill in gaps in data with a warning message, if gaps are missing. But, make it work.
Besides trying to massage the data before validating it against the XSD, the other thing you can do, if you find that the validation errors are constantly being caused by simply the ordering of fields (which is a common misunderstand in XSD between xs:sequence and xs:all), is change your XSD. Switch it from sequence to all. You could try the official XSD first. If it passed, then you wouldn't have to validate for looser versions.
I hope that helps...
回答2:
Example of why?: geocaching.com produces .gpx files with special schema extensions that they have defined.
GSAK also adds "value" to gpx files. The gpx world is NOT standardized the way you think, I'm afraid.
None of this stuff is part of the schema you are using. In other words, your idea of strict schema checking has problems.
It is them, not you. But you are forced to accomodate "them".
回答3:
So, it turns out that there are some GPS devices/GPS recording
systems/etc. which produce .gpx files without conforming to the
official GPX schema. This conclusions leads me to ask the question:
WHY ? I seriously do not understand the idea behind it. Furthermore,
with most of the invalid files I have found, the problem is not
something that may be regarded as an additional feature but is
something like not following the right order with some attributes'
children which I consider to be totally stupid.
The only thing I can think of is that you pre-process the GPX input prior to validation (the only requirement at this point would be that the GPX data was well-formed).
I would use XSLT starting with an identity transform to pass everything through unchanged. You can then override the identity transform by stripping out everything with a specific namespace(s). You could also enforce the order of child elements (attributes don't have children so I think that was a typo) and correct misspelled element/attribute names.
回答4:
the problem with schemas, as you have well pointed out, is that not all implementations are standardized and some are outright proprietary. in these cases, the best way is to import the file into a program and have it SAVE AS GPX (even if the original was GPX).
i use a free program called GPS TRACKMAKER that opens from and saves to various formats, including GPX. It also downloads data directly from various handheld GPS devices (garmin, magellan, etc). download GPSTM http://www.gpstm.com/dwlpage.php
in linked article, i uploaded a barebones php dom parsing from GPX that works fairly well (100% compatible with GPX generated by GPSTM) GPX parsing patterns and "standards"
thanx to Odilon Ferreira Junior (GPSTM author) for offering such an excellent free tool.
回答5:
As Homer6 already noted, one problem is the sequence of tags in the XML file.
Before I continue, note that GPX 1.0 and GPX 1.1 are very different. Most applications produce/consume GPX 1.0. In 1.0, for example, there is a regex for the email:
[\p{L}_]+(\.[\p{L}_]+)*@[\p{L}_]+(\.[\p{L}_]+)
If any application has a text field asking the user to enter his email (which will be stored later in the GPX file) it should be pretty strict about it. If the user enters "name AT gmail.com" the resulting GPX file is invalid.
The schema for the XML file is very rigid. Most application developers do not want the same rigit validations when their users enter data that will be stored in the GPX file. And, that's why most files aren't correct GPX files. Also, that's why most parsing applications ignore those strict rules.