I've been using the YAML format with reasonable success in the last 6 months or so.
However, the pure Perl implementation of the YAML parser is fairly
fidgety to hand-write a readable file for and has (in my opinion)
annoying quirks such as requiring a newline at end of the file. It's also
gigantically slow compared to the rest of my program.
I'm pondering the next evolution of my project, and I'm considering
using JSON instead (a mostly strict subset of YAML, as it turns
out). But which format has the most community traction and effort in Perl?
Which appears today to be the better long-term format for simple
data description in Perl, YAML or JSON, and why?
YAML vs JSON is something very much not settled in Perl, and I will admit I tend to be in the middle of that. I would advice that either is going to get you about as much community traction. I'd make the decision based on the various pros and cons of the formats. I break down the various data serializing options like so (I'm going to community wiki this so people can add to it):
- Human friendly, people write basic YAML without even knowing it
- WYSIWYG strings
- Expressive (it has the TMTOWDI nature)
- Expandable type/metadata system
- Perl compatible data types
- Portable
- Familiar (a lot of the inline and string syntax looks like Perl code)
- Good implementations if you have a compiler (YAML::XS)
- Good ability to dump Perl data
- Compact use of screen space (possible, you can format to fit in one line)
- Large spec
- Unreliable/incomplete pure Perl implementations
- Whitespace as syntax can be contentious.
- Human readable/writable
- Small spec
- Good implementations
- Portable
- Perlish syntax
- YAML 1.2 is a superset of JSON
- Compact use of screen space
- Perl friendly data types
- Lots of things handle JSON
- Strings are not WYSIWYG
- No expandability
- Some Perl structures have to be expressed ad-hoc (objects & globs)
- Lack of expressibility
XML Pros
- Widespread use
- Syntax familiar to web developers
- Large corpus of good XML modules
- Schemas
- Technologies to search and transform the data
- Portable
XML Cons
- Tedious for humans to read and write
- Data structures foreign to Perl
- Lack of expressibility
- Large spec
- Verbose
Perl/Data::Dumper Pros
- No dependencies
- Surprisingly compact (with the right flags)
- Perl friendly
- Can dump pretty much anything (via DDS)
- Expressive
- Compact use of screen space
- WYSIWYG strings
- Familiar
Perl/Data::Dumper Cons
- Non-portable (to other languages)
- Insecure (without heroic measures)
- Inscrutable to non-Perl programmers
Storable Pros
- Compact? (don't have numbers to back it up)
- Fast? (don't have numbers to back it up)
Storable Cons
- Human hostile
- Incompatible across Storable versions
- Non-portable (to other languages)
As with most things, it depends. I think if you want speed and interoperability (with other languages), use JSON, in particular JSON::XS.
If you want something that's only ever going to be used by Perl modules, stick with YAML. It's much more common to find Perl modules on CPAN that support data description with YAML, or which depend on YAML, than JSON.
Note that I am not an authority and this opinion is based largely on hunch and conjecture. In particular, I have not profiled JSON::XS vs. YAML::XS. If I am offensively ignorant, I can only hope I will make someone irate enough to bring useful information to the discussion by correcting me.
It's all about human-readability, if this is your main concern choose YAML:
- Boston Red Sox
- Detroit Tigers
- New York Yankees
- New York Mets
- Chicago Cubs
- Atlanta Braves
"american": [
"Boston Red Sox",
"Detroit Tigers",
"New York Yankees"
"national": [
"New York Mets",
"Chicago Cubs",
"Atlanta Braves"
The pure-Perl YAML implementation (YAML
module as opposed to YAML::Syck
) seems to have some serious problems. I recently ran into issues where it could not process YAML documents with very long lines (32k characters or so).
YAML is able to store and load blessed variables and does so by
default (The snippet below was copied from a *sepia-repl*
buffer in
I need user feedback! Please send questions or comments to seano@cpan.org.
Sepia version 0.98.
Type ",h" for help, or ",q" to quit.
main @> use YAML
main @> $foo = bless {}, 'asdf'
bless( {}, 'asdf' )
main @> $foo_dump = YAML::Dump $foo
'--- !!perl/hash:asdf {}
main @> YAML::Load $foo_dump
bless( {}, 'asdf' )
This is quite scary security-wise because untrusted data can be used
to call any DESTROY
method that has been defined in your application
-- or any of the modules it uses.
The following short program demonstrates the problem:
use YAML;
use Data::Dumper;
package My::Namespace;
print Data::Dumper::Dumper \@_;
package main;
my $var = YAML::Load '--- !!perl/hash:My::Namespace
bar: 2
foo: 1
JSON does not allow this by default -- it is possible to serialize
Perl "objects", but in order to do that, you have to define TO_JSON
if you are considering JavaScript Object Notation, why not use "Perl Object Notation"?
{"name": "bob", "parents": {"mother": "susan", "father": "bill"}, "nums": [1, 2, 3]}
{name => "bob", parents => {mother => "susan", father => "bill"}, nums => [1, 2, 3]}
You might also want to consider using Storable. You will likely get a very good speed boost with it. The trade-offs are:
- the Storable format is binary and not human readable like JSON or YAML
- Storable is not a pure Perl module (if that matters)
I use YAML for tracking status of processes because I can read YML in the middle of the process. You (technically) need fully formed documents to read XML or JS. YAML is nice for tracking status because you can write lots of mini docs to a file. Otherwise, I usually go with XML or JS. Nice summary of pros & cons above, btw.