I am trying to read a XML form using Perl but I can not use any XML modules like XML::Simple, XML::Parse.
It is a simple XML form which has some basic information and a MS Doc attachment.
I want to read this XML and download this attached Doc file then print the XML information in the screen.
But I don't know any way how I can do this without a XML module, I heard that XML file can be parse using Data::Dumper but I am not familiar with this module, so not getting how to do this.
Could you please help me on this if there is any way to do this without a XML modules?
Sample XML:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
I'd like to re-iterate that this is a BAD IDEA. Because whilst XML looks like plain text - it's isn't plain text. And if you treat it as such, you are creating brittle, unmaintainable and unsupportable code, which may well break one day, because someone changes the XML format in a valid way.
I would strongly suggest that your first port of call is go back to your project, and point out how parsing XML without an XML parser is rather like trying to use a hammer to put screws into a piece of wood. In that it sort of works, but the results are rather shoddy, and frankly it's completely unnecessary because screwdrivers exist and they do the job properly, easily and are widely available.
E.g.
can you tell me how I can print the author, title and price for each book id for the above XML file with a XML module ?
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> new -> parsefile ( 'your_file.xml' );
foreach my $book ( $twig -> get_xpath ( '//book' ) ) {
print join ("\n",
$book -> att('id'),
$book -> field('author'),
$book -> field('title'),
$book -> field('price'), ),"\n----\n";
}
However:
Given your very specific sample, you may be able to get away with treating it as 'plain text'. Before you do this, you should point out to your project lead that this is a risky approach - you're putting in screws with a hammer - and therefore creating ongoing risk of support problems, which is trivially resolved by just installing a bit of freely available, open source code.
I am only suggesting this AT ALL because I've had to deal with ludicrously unreasonable similar project demands.
Like this:
#!/usr/bin/env perl
use strict;
use warnings;
while ( <> ) {
if ( m/<book/ ) {
my ( $id ) = ( m/id="(\w+)"/ );
print $id,"\n";
}
if ( m/<author/ ) {
my ( $author ) = ( m/>(.*)</ );
print $author,"\n";
}
}
Now, the reason this doesn't work is your sample above can be perfectly validly formatted as:
<?xml version="1.0"?>
<catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications
with XML.</description></book><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description></book></catalog>
Or
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
Or:
<?xml version="1.0"?>
<catalog
><book
id="bk101"
><author
>Gambardella, Matthew</author><title
>XML Developer's Guide</title><genre
>Computer</genre><price
>44.95</price><publish_date
>2000-10-01</publish_date><description
>An in-depth look at creating applications
with XML.</description></book><book
id="bk102"
><author
>Ralls, Kim</author><title
>Midnight Rain</title><genre
>Fantasy</genre><price
>5.95</price><publish_date
>2000-12-16</publish_date><description
>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description></book></catalog>
Or:
<?xml version="1.0"?>
<catalog>
<book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applications
with XML.</description></book>
<book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description></book>
</catalog>
This is why you have so many comments that say 'use a parser' - from those snippets above, the simplistic example I gave you... will only work on one and break messily on the others.
But the XML::Twig
solution handles them all correctly. XML::Twig
is freely available on CPAN. (There's other libraries that do the job too just as well). And it's also pre-packaged with a lot of operating systems 'default' repositories.
Well, an XML parser is just code. And CPAN modules are all open source, so I suppose that you could copy the code from an XML parsing module from CPAN into your program.
But really, that's an incredibly stupid idea. Why wouldn't you just use the module? You would be far better off spending your time getting your bar on using modules removed. A lot of modern Perl Perl programming consists of installing the right modules from CPAN and plumbing them together. If you're not using CPAN modules then you're cutting yourself of from a large proportion of Perl's power.
If you really can't get that restriction lifted then (seriously) get better employers.
If you can not use any module then you should check out the source code of modules like XML::LibXML and understand how they deal with XML and then implement it your way, which is not recommended though.
See: Perl for XML Processing