可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

Extraction of data from a simple XML file 9 answers

I'm trying to extract a value from an xml document that has been read into my script as a variable. The original variable, $data, is:

<item> 
  <title>15:54:57 - George:</title>
  <description>Diane DeConn? You saw Diane DeConn!</description> 
</item> 
<item> 
  <title>15:55:17 - Jerry:</title> 
  <description>Something huh?</description>
</item>

and I wish to extract the first title value, so

15:54:57 - George:

I've been using the sed command:

title=$(sed -n -e 's/.*<title>\(.*\)<\/title>.*/\1/p' <<< $data)

but this only outputs the second title value:

15:55:17 - Jerry:

Does anyone know what I have done wrong? Thanks!

回答1:

As Charles Duffey has stated, XML parsers are best parsed with a proper XML parsing tools. For one time job the following should work.

grep -oPm1 "(?<=<title>)[^<]+"

Test:

$ echo "$data"
<item> 
  <title>15:54:57 - George:</title>
  <description>Diane DeConn? You saw Diane DeConn!</description> 
</item> 
<item> 
  <title>15:55:17 - Jerry:</title> 
  <description>Something huh?</description>
$ title=$(grep -oPm1 "(?<=<title>)[^<]+" <<< "$data")
$ echo "$title"
15:54:57 - George:

回答2:

XMLStarlet or another XPath engine is the correct tool for this job.

For instance, with data.xml containing the following:

<root>
  <item> 
    <title>15:54:57 - George:</title>
    <description>Diane DeConn? You saw Diane DeConn!</description> 
  </item> 
  <item> 
    <title>15:55:17 - Jerry:</title> 
    <description>Something huh?</description>
  </item>
</root>

...you can extract only the first title with the following:

xmlstarlet sel -t -m '//title[1]' -v . -n <data.xml

Trying to use sed for this job is troublesome. For instance, the regex-based approaches won't work if the title has attributes; won't handle CDATA sections; won't correctly recognize namespace mappings; can't determine whether a portion of the XML documented is commented out; won't unescape attribute references (such as changing Brewster & Jobs to Brewster & Jobs), and so forth.

回答3:

I agree with Charles Duffy that a proper XML parser is the right way to go.

But as to what's wrong with your sed command (or did you do it on purpose?).

$data was not quoted, so $data is subject to shell's word splitting, filename expansion among other things. One of the consequences being that the spacing in the XML snippet is not preserved.

So given your specific XML structure, this modified sed command should work

title=$(sed -ne '/title/{s/.*<title>\(.*\)<\/title>.*/\1/p;q;}' <<< "$data")

Basically for the line that contains title, extract the text between the tags, then quit (so you don't extract the 2nd <title>)

Extract XML Value in bash script [duplicate]

问题:

回答1:

Test:

回答2:

回答3:

收藏的人(0)

Extract XML Value in bash script [duplicate]

问题:

回答1:

Test:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮