Extract value from xml in bash on a mac

2019-09-20 00:57发布

问题:

I need to extract the name value (Product Finder) from this xml:

File: config.xml

<?xml version="1.0" encoding="utf-8"?>
<widget id="com.abc.app" version="1.3.1" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0" ios-CFBundleVersion="1.3.1.5" android-versionCode="5">
    <name>Product Finder</name>
    <description>
        Description
    </description>
</widget>

I've tried:

mles$ cat config.xml | grep '<name>'
    <name>Product Finder</name>

Some other answers suggest using grep -oPm1 "(?<=<xmltag>)[^<]+", but that yields an error:

mles$ cat config.xml | grep -oPm1 "(?<=<name>)[^<]+"
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]

How can I get the name value? I need a solution without dependencies, so grep would be preferred

回答1:

grep only finds the line, you have to use an additional tool to extract the name, like sed (not an additional dependency):

grep '<name>' config.xml | sed "s@.*<name>\(.*\)</name>.*@\1@"

What sed does here is takes everything between <name></name> and substitutes the whole line with the found text between the tags



回答2:

You should use a xml parser, like xmllint for example.
Your xml is invalid and you should fix it, if you can't, use the following regex:

perl -n -e'/<name>(.*)<\/name>/ && print $1' file.xml
# Product Finder

Options:

-n                assume "while (<>) { ... }" loop around program
-e program        one line of program (several -e's allowed, omit programfile)


回答3:

Your XML isn't syntactically right. The W3School XML validitor page says so,

error on line 8 column 1. Extra content at the end of the document

Because the header line <?xml version="1.0" encoding="utf-8"?> is a processing instruction that identifies the document as being XML. All XML documents should begin with an XML declaration.

Also, xmllint should be built-into native Mac OS X bash by default in which you can just do

xmllint --xpath "/widget/name/text()" xml
Product Finder

The right formatting for your XML should have been

<?xml version="1.0" encoding="UTF-8"?>
<widget id="123" version="1.3.1">
   <name>Product Finder</name>
   <description>Description</description>
</widget>


回答4:

The following bash built-in will do the job but it's not an xml parser

while IFS=\> read -d\< -r tag value || [[ -n $tag ]]; do
    if [[ $tag == name ]]; then
        echo "$value";
        break;
    fi;
done < config.xml