Perl和XPath的:在数据库表中缺少的项(Perl and XPath: missing ent

2019-10-20 21:18发布

我想从一个XML文件中提取数据,并将它们导入到MariaDB的/ MySQL数据库。 XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<database>
  <row1s>
    <row1 name="fox" category="mammal">
       <row2s>
         <row2 type="1" size="10"/>
         <row2 type="2" size="8"/>
       </row2s>
       </row1>
    <row1 name="horse" category="mammal">
       <row2s>
             <row2 type="3" size="100"/>
       </row2s>
    </row1>
    <row1 name="bee" category="insect"> 
       <row2s/>
    </row1>
    <row1 name="wasp" category="insect">
       <row2s/>
    </row1>
  </row1s>
</database>

和Perl的代码是:

use strict;
use warnings;
use DBI;

use XML::XPath;
use XML::XPath::XMLParser;

my $xp = XML::XPath->new( filename => "animals4.xml" );
# my $xp = XML::XPath->new( ioref => \*DATA );

my $dbh = DBI->connect( "DBI:mysql:test", "user", "pw", { RaiseError => 1, PrintError => 0 } )
    or die "Fehler beim Verbidungsaufbau zum MariaDB-Server:" . " $DBI::err -< $DBI::errstr \n";

for my $row1 ( $xp->findnodes('//row1s/row1') ) {
    printf "Level --- row1 \"name\" gives: %s\n", $row1->getAttribute("name");

    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        printf "Level row2 \"type\" gives: %s\n", $row2->getAttribute("type");
        printf "Level row2 \"size\" gives: %s\n", $row2->getAttribute("size");

        $dbh->do(
            "INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)",
            undef,
            $row1->getAttribute("name"),
            $row1->getAttribute("category"),
            $row2->getAttribute("type"),
            $row2->getAttribute("size")
        ) or die "Error during execution: " . "$DBI::err -> $DBI::errstr (animal $DBI::state)\n";
    }
}

端子输出为:

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp

这是我所期待的。 但表具有以下项目:

name  category  type    size
fox   mammal    1         10
fox   mammal    2          8
horse mammal    3        100

蜜蜂和黄蜂都无缘。 谁能帮助我如何解决这个问题? 我不知道为什么这个happend因为终端的输出是确定的。

感谢帮助。

下面是表的代码:

CREATE TABLE test01.animal4 (
name VARCHAR(50) DEFAULT NULL
, category VARCHAR(50) DEFAULT NULL
, type     INTEGER DEFAULT NULL
, size     INTEGER DEFAULT NULL
);

这是一个后续问题的层次结构问题 。

Answer 1:

你已经有一个解释和修正,但我建议以下变化

  • 你应该prepareINSERT INTO SQL语句,然后execute循环内它。 do有一个更大的开销

  • //descendant-or-self::node()的结构是昂贵的,你应该保留它,你根本不知道该元素将是文档,这是非常罕见的病例中。 在这种情况下, row1元件是在/database/row1s/row1row2元件是在row2s/row2相对于该

  • 它是干净多了,如果你想带引号的字符串中使用引号字符使用不同的分隔符。 比如"My name is \"$name\""是好得多qq{My name is "$name"}

这是你的程序,可以帮助的一个版本。

use strict;
use warnings;

use XML::XPath;
use DBI;

my $xp = XML::XPath->new( filename => 'animals4.xml' );

my $dbh = DBI->connect(
   'DBI:mysql:test', 'user', 'pw',
   { RaiseError => 1, PrintError => 0}
) or die "Fehler beim Verbidungsaufbau zum MariaDB-Server: $DBI::err -< $DBI::errstr\n";

my $insert_animal = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('/database/row1s/row1') ) {

   my $name     = $row1->getAttribute('name');
   my $category = $row1->getAttribute('category');

   printf qq{Level --- row1 "name" gives: $name\n};

   my @row2 = $xp->findnodes('row2s/row2', $row1);

   if ( @row2 ) {
      for my $row2 ( @row2 ) {

         my $type = $row2->getAttribute('type');
         my $size = $row2->getAttribute('size');

         print qq{Level row2 "type" gives: $type\n};
         print qq{Level row2 "size" gives: $size\n};

         $insert_animal->execute($name, $category, $type, $size);
      }
   }
   else {
      $insert_animal->execute($name, $category, undef, undef);
   }
}

产量

Level --- row1 "name" gives: fox
Level row2 "type" gives: 1
Level row2 "size" gives: 10
Level row2 "type" gives: 2
Level row2 "size" gives: 8
Level --- row1 "name" gives: horse
Level row2 "type" gives: 3
Level row2 "size" gives: 100
Level --- row1 "name" gives: bee
Level --- row1 "name" gives: wasp


Answer 2:

从你的代码,数据库写,当你第二次查询,下$ ROW1节点,返回结果只发生:

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    for my $row2 ( $row1->findnodes('.//row2s/row2') ) {
        $dbh->do("INSERT INTO animal4 (name, category,type,size) VALUES(?,?,?,?)"
        [...]  
        ) or die        ;   
    }
}

如果没有$ 2行节点,没有数据库的写入。

如果你想拥有数据库的写无论发生$ ROW2节点的存在或不存在的,你需要移动分贝写出来的,对于循环,即:

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # get name and category here
    my $name = $row1->getAttribute('name');
    my $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2'); ## creates a Nodeset object
    if ($row2set->size > 0) {
        ## we found nodes!!
        foreach my $row2 ($row2set->get_nodelist) {
           # get size and type here
           my $type = $row2->getAttribute('type');
           my $size = $row2->getAttribute('size');
           # write to db

        }
    } else {
        ## no row2 nodes found.
        ## write to db - just write the row1 values; type and size will be undefined.

    }
}

NODESET文档: http://search.cpan.org/~msergeant/XML-XPath-1.13/XPath/NodeSet.pm

关于设置变量和范围快速注

范围是指其中一个实体(变量,子例程,对象,等)是在Perl代码可见和可访问的; 设置实体的范围有助于它们封装并防止随处可用的程序的每一个部分数据或功能。

范围使用代码的结构,例如子程序,环,套,对象上设置的-由花括号分隔的任何代码块( {} )。 这是在Perl(和许多其他语言)标准的做法进入一个块时,增加缩进和离开块时减少它; 这样一来,就可以很容易确定范围的阅读代码的时候。

使用my为限于在这里变量被设定的代码块集的变量(或函数,对象等)的范围; 例如

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # $row1 is available inside this code block

    my $row2set = $row1->find('row2s/row2');
    # $row2set is now available inside this code block

    if ($row2set->size > 0) {
        my $size = $row2set->size;
        # $size is now available inside this code block

        foreach my $row2 ($row2set->get_nodelist) {
            # $row2 is available inside this code block
            # we can also access $row1, $row2set, $size
        }

        # we can access $row1, $row2set, $size
        # $row2 is out of scope, i.e. we cannot access it

        say "The value of row2 is $row2";
        # Perl will complain 'Global symbol "$row2" requires explicit package name'
    }
    # we can access $row1 and $row2set
    # $size and $row2 are out of scope
}
# $row1, $row2set, $size, and $row2 are out of scope

让我们回到你的代码,让我们假设你决定设置变量$name$category$type$size捕捉你的数据,并将其写入到数据库中。 你必须确保你正确设置变量的范围,否则将不恰当的存储数据。 例如:

# declare all our variables
my ($name, $cat, $type, $size);
for my $row1 ( $xp->findnodes('//row1s/row1') ){
    # we can set $name and $cat from the data in row1:
    $name = $row1->getAttribute('name');
    $cat = $row1->getAttribute('category');
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # row2 gives us the type and size info
            $type = $row2->getAttribute('type');
            $size = $row2->getAttribute('size');
            # "say" prints a string and adds a "\n" to the end,
            # so it's very handy for debugging
            say "row2s found: name: $name; category: $cat; type: $type; size: $size";
        }
    } else {
        say "row2s empty: name: $name; category: $cat; type: $type; size: $size";
    }
}

这给了我们以下的输出:

row2s found: name: fox; category: mammal; type: 1; size: 10
row2s found: name: fox; category: mammal; type: 2; size: 8
row2s found: name: horse; category: mammal; type: 3; size: 100
row2s empty: name: bee; category: insect; type: 3; size: 100
row2s empty: name: wasp; category: insect; type: 3; size: 100

这是因为的范围$type$size被设置为整个代码块,并且值在ROW1循环的每次迭代和内ROW2环路之间保留下来。 蜜蜂和黄蜂不具备规模和类型的值,所以从以前的动物的值来代替。

有很多不同的方法来解决这个问题,但最有效的可能是这样的:

my $db_insert = $dbh->prepare('INSERT INTO animal4 (name, category, type, size) VALUES (?, ?, ?, ?)');

for my $row1 ( $xp->findnodes('//row1s/row1') ){
    my $row2set = $row1->find('row2s/row2');
    if ($row2set->size > 0) {
        foreach my $row2 ($row2set->get_nodelist) {
            # for debugging
            say "row2s found: name: " . $row1->getAttribute('name') .
            "; category: " . $row1->getAttribute('category') .
            "; type: " . $row2->getAttribute('type') .
            "; size: " . $row2->getAttribute('size');

            $db_insert->execute( $row1->getAttribute('name'),
            $row1->getAttribute('category'),
            $row2->getAttribute('type'),
            $row2->getAttribute('size') );
        }
    } else {
        # for debugging
        say "row2s empty: name: " . $row1->getAttribute('name') .
        "; category: " . $row1->getAttribute('category') .
        "; type: NOT SET" .
        "; size: NOT SET";
        $db_insert->execute( $row1->getAttribute('name'),
        $row1->getAttribute('category'),
        undef,
        undef );
    }
}


文章来源: Perl and XPath: missing entries in database table