how to read and change <!Doctype> tag and <?

2019-02-21 07:37发布

问题:

I'm new to xml twig... how to read and change <!DOCTYPE article SYSTEM "loose.dtd"> and <?xml version="1.0" encoding="UTF-8"?> . how can I modification in this tag.. because i don't know how to this read and change this tag in xml::Twig...

my input:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE art SYSTEM "loose.dtd">
<art>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
</art>

I need output as:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DTD>
<Contents type="&lt;!DOCTYPE article SYSTEM &quot;loose.dtd&gt;"/>
</DTD>
<art>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
<fr>
<p>Text</p>
<p>Text</p>
</fr>
</art>

how can alter <?xml ?> and <!Doctype> tag, can you any one help this process..

回答1:

You can try the following (code it's commented). The important point to understand it is to create a new twig, copy all the elements you want to keep and create what it changes:

#!/usr/bin/env perl

use warnings;
use strict;
use XML::Twig;

## Create a twig based in an input xml file.
my $twig = XML::Twig->new;
$twig->parsefile(shift);

## Create a new twig that will be the output.
my $new_twig = XML::Twig->new( pretty_print => 'indented' );

## Create a root tag.
$new_twig->set_root( XML::Twig::Elt->new( 'root' ) );

## Create the xml processing instruction.
my $e = XML::Twig::Elt->new( 'k' => 'v' );
$e->set_pi( 'xml', 'version="1.0" encoding="UTF-8" standalone="yes"' );
$e->move( before => $new_twig->root );

## Copy the whole tree from the old twig.
my $r = $twig->root;
$r->paste( first_child => $new_twig->root );

## Copy the doctype from the old twig to the new one.
my $contents_elt = XML::Twig::Elt->new( Contents  => { type => $twig->doctype } );
my $dtd_elt = XML::Twig::Elt->new( DTD => '#EMPTY' );
$contents_elt->move( last_child => $dtd_elt );
$dtd_elt->move( first_child => $new_twig->root );

## Print the whole twig created.
$new_twig->print;

Run it like:

perl script.pl xmlfile

That yields:

  <?xml version="1.0" encoding="UTF-8" standalone="yes"?><root>
  <DTD>
    <Contents type="&lt;!DOCTYPE art SYSTEM &quot;loose.dtd&quot;>&#x0a;"/>
  </DTD>
  <art>
    <fr>
      <p>Text</p>
      <p>Text</p>
    </fr>
    <fr>
      <p>Text</p>
      <p>Text</p>
    </fr>
  </art>
</root>


回答2:

Having found this question when trying to do something similar: Assembling XML in Perl

You probably don't want set_pi to do the XML header, and instead:

$twig->set_xml_version("1.0");
$twig->set_encoding('utf-8');
$twig->set_standalone('yes');

The XML::Twig doc mentions DTD handling though:

DTD handling The DTD handling methods are quite bugged. No one uses them and it seems very >difficult to get them to work in all cases, including with several slightly >incompatible versions of XML::Parser and of libexpat.

Basically you can read the DTD, output it back properly, and update entities, >but not much more.

So use XML::Twig with standalone documents, or with documents referring to an >external DTD, but don't expect it to properly parse and even output back the >DTD.

With that in mind, the solution you've got above from Birei will probably be the best way of handling it.