i'm writing a client to a third-party API, and they provide data in a weird format. At first, it might look like JSON but it's not, and i'm a bit confused about how i should handle that.
It's a key-value based format (much like JSON).
- Keys are separated by '=' from their values.
- Keys and values are wrapped within double-quotes.
- Dictionaries start with '{' and end with '}'.
- Arrays start with '(' and end with ')'
- Lines end with ';' (Excepted for arrays content) and end-of-line character (\r i think).
- Sometimes, there seem to be unicode (Stuff like \U2623 for the BioHazard sign) in strings.
What could possibly be this format? Shall i use a premade gem to parse it, or should i build my own parser?
{ "anArray" = (
"100",
"200",
"300"
);
"aDictionary" = {
"aString" = "Something";
};
}
EDIT This format seems to be Apple's property list, but it's not XML neither Binary... This make sense as the API is from a WebObjects webservice. i will try to use CFPropertyList gem to parse it, if there is a better solution, please let me know.
EDIT 2 This is a NextSTEP Property List.
Here's a very quick-and-dirty hack that transforms the syntax into valid Ruby and then evals it. Note that this could be dangerous. More importantly, this will convert all parentheses inside keys and values into square brackets.
Here's a robust answer using a custom StringScanner-based parser. It allows whitespace to be optional, allows trailing commas after the last item in a list and allows omitting the semicolon after the last dictionary key/value pair. It allows the outermost item to be an dictionary, array, or string. And it allows really any sort of legal string content, including parens and curly braces and escaped text like
\n
.Seen in action:
The code:
As an alternative to rolling your own parser from scratch in the future, you might also want to look into the Treetop Ruby library.
Edit: I've replaced the implementation of
getstr
above with one that should prevent running arbitrary Ruby code inside theeval
. For more details, see "Eval a string without interpolation". Seen in action: