I'm parsing a document.xml file using Nokogiri, extracted from .docx file and need to get values of attributes with names, like "w:val
".
This is a sample of the source XML:
<w:document>
<w:body>
<w:p w:rsidR="004D5F21" w:rsidRPr="00820E0B" w:rsidRDefault="00301D39" pcut:cut="true">
<w:pPr>
<w:jc w:val="center"/>
</w:pPr>
</w:body>
</w:document>
This is a sample of the code:
require 'nokogiri'
doc = Nokogiri::XML(File.open(path))
doc.search('//w:jc').each do |n|
puts n['//w:val']
end
There is nothing in the console, only empty lines. How can I get the values of the attributes?
Should work. Don't forget to look at the docs : http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1
Show there:
Also, yes, read this: http://nokogiri.org/tutorials/searching_a_xml_html_document.html#fn:1
The document is missing its namespace declaration, and Nokogiri isn't happy with it. If you check the
errors
method for yourdoc
, you'll see something like:By using Nokogiri's CSS accessors, rather than XPath, you can step around namespace issues:
will output:
If you need to iterate over multiple
jc
nodes, usesearch
or one of its aliases or act-alike methods, similar to what you did before.