Among the data my application sends to a third-party SOA server are complex XMLs. The server owner does provide the XML schemas (.xsd
) and, since the server rejects invalid XMLs with a meaningless message, I need to validate them locally before sending.
I could use a stand-alone XML schema validator but they are slow, mainly because of the time required to parse the schema files. So I wrote my own schema validator (in Java, if that matters) in the form of an HTTP Server which caches the already parsed schemas.
The problem is: many things can go wrong in the course of the validation process. Other than unexpected exceptions and successful validation:
- the server may not find the schema file specified
- the file specified may not be a valid schema file
- the XML is invalid against the schema file
Since it's an HTTP Server I'd like to provide the client with meaningful status codes. Should the server answer with a 400 error (Bad request) for all the above cases? Or they have nothing to do with HTTP and it should answer 200 with a message in the body? Any other suggestion?
Update: the main application is written in Ruby, which doesn't have a good xml schema validation library, so a separate validation server is not over-engineering.
Status code 422 ("Unprocessable Entity") sounds close enough:
"The 422 (Unprocessable Entity) status code means the server understands the content type of the request entity (hence a 415(Unsupported Media Type) status code is inappropriate), and the syntax of the request entity is correct (thus a 400 (Bad Request) status code is inappropriate) but was unable to process the contained instructions. For example, this error condition may occur if an XML request body contains well-formed (i.e., syntactically correct), but semantically erroneous, XML instructions."
Amazon could be used as a model for how to map http status codes to real application level conditions: http://docs.amazonwebservices.com/AWSImportExport/latest/API/index.html?Errors.html (see Amazon S3 Status Codes heading)
I'd go with
400 Bad request
and a more specific message in the body (possibly with a secondary error code in a header, likeX-Parse-Error: 10451
for easier processing)That sounds like a neat idea, but the HTTP status codes don't really provide an "operation failed" case. I would return HTTP 200 with an
X-Validation-Result: true/false
header, using the body for any text or "reason" as necessary. Save the HTTP 4xx for HTTP-level errors, not application-level errors.It's kind of a shame and a double-standard, though. Many applications use HTTP authentication, and they're able to return HTTP 401 Not Authorized or 403 Forbidden from the application level. It would be convenient and sensible to have some sort of blanket HTTP 4xx Request Rejected that you could use.
Say you're posting XML files to a resource, eg like so:
POST /validator Content-type: application/xml
If the request entity fails to parse as the media type it was submitted as (ie as application/xml), 400 Bad Request is the right status.
If it parses syntactically as the media type it was submitted as, but it doesn't validate against some desired schema, or otherwise has semantics which make it unprocessable by the resource it's submitted to - then 422 Unprocessable Entity is the best status (although you should probably accompany it by some more specific error information in the error response; also note it's technically defined in an extension to HTTP, WebDAV, although is quite widely used in HTTP APIs and more appropriate than any of the other HTTP error statuses when there's a semantic error with a submitted entity).
If it's being submitted as a media type which implies a particular schema on top of xml (eg as application/xhtml+xml) then you can use 400 Bad Request if it fails to validate against that schema. But if its media type is plain XML then I'd argue that the schema isn't part of the media type, although it's a bit of a grey area; if the xml file specifies its schema you could maybe interpret validation as being part of the syntactic requirements for application/xml.
If you're submitting the XML files via a multipart/form or application/x-www-form-urlencoded form submissions, then you'd have to use 422 Unprocessable Entity for all problems with the XML file; 400 would only be appropriate if there's a syntactic problem with the basic form upload.
It's a perfectly valid thinking to map error situations in the validation process to meaningful HTTP status codes.
I suppose you send the XML file to your validation server as a POST content using the URI to determine a specific schema for validation.
So here are some suggestions for error mappings: