Get the values of attributes with namespace, using

2019-05-26 15:53发布

问题:

I'm parsing a document.xml file using Nokogiri, extracted from .docx file and need to get values of attributes with names, like "w:val".

This is a sample of the source XML:

<w:document>
  <w:body>
    <w:p w:rsidR="004D5F21" w:rsidRPr="00820E0B" w:rsidRDefault="00301D39" pcut:cut="true">
      <w:pPr>
        <w:jc w:val="center"/>
      </w:pPr>
  </w:body>
</w:document>

This is a sample of the code:

require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.search('//w:jc').each do |n|
    puts n['//w:val']
  end

There is nothing in the console, only empty lines. How can I get the values of the attributes?

回答1:

require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.xpath('//jc').each do |n|
    puts n.attr('val')
  end

Should work. Don't forget to look at the docs : http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1



回答2:

The document is missing its namespace declaration, and Nokogiri isn't happy with it. If you check the errors method for your doc, you'll see something like:

puts doc.errors
Namespace prefix w on document is not defined
Namespace prefix w on body is not defined
Namespace prefix w for rsidR on p is not defined
Namespace prefix w for rsidRPr on p is not defined
Namespace prefix w for rsidRDefault on p is not defined
Namespace prefix pcut for cut on p is not defined
Namespace prefix w on p is not defined
Namespace prefix w on pPr is not defined
Namespace prefix w for val on jc is not defined
Namespace prefix w on jc is not defined
Opening and ending tag mismatch: p line 3 and body
Opening and ending tag mismatch: body line 2 and document
Premature end of data in tag document line 1

By using Nokogiri's CSS accessors, rather than XPath, you can step around namespace issues:

puts doc.at('jc')['val']

will output:

center

If you need to iterate over multiple jc nodes, use search or one of its aliases or act-alike methods, similar to what you did before.



回答3:

Show there:

require 'nokogiri' 

doc = Nokogiri::XML(File.open(path))
  doc.search('jc').each do |n|
  puts n['val']
end

Also, yes, read this: http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1