How to find information inside a xml tag using gre

I am working on a linux shell script to find information in a xml file using grep. I am on a mac which I hope doesn't matter too much.

To find the information I need, I run:

grep -oP "<title>(.*)</title>" temp.xml

I get in return a list of matches and this includes the <title> tag.

How can I get a list with only the information inside the title tag but without the title tag using grep?

标签： xml regex shell grep

5条回答

萌系小妹纸

2楼-- · 2020-02-25 09:09

It's not the best solution, I would search for XML lib in bash but you can do:

grep -oP "<title>(.*)</title>" temp.xml | cut -d ">" -f 2 | cut -d "<" -f 1

0人赞添加讨论(0) 举报

三岁会撩人

3楼-- · 2020-02-25 09:17

grep -oP "<foo>(.*)</foo>" "XML.xml" | sed -n 's/.*<foo>\([^<]*\)<\/foo>.*/\1/p' >> "foo.txt"

0人赞添加讨论(0) 举报

女痞

4楼-- · 2020-02-25 09:28

You could install xgrep using xpath as suggested in Tom's answer

man xgrep

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

5楼-- · 2020-02-25 09:31

I can't see why you'd want to use grep for this, while it can be solved with a trivial XPath expression:

//title/text()

There are many command line tools for XPath and they're usually bundled with the OS.

Answers to this question on Stack Overflow list a number of such tools.

The problem with grep here is that it's a generic tool for text processing and it's not aware of any XML structure. For a very simple scenario, you can get it working. If the document is complex or if you're using this in a script that will survive months or years and not just a one-off job, you may end up feeling sorry for the results.

XPath makes it easy to tell the difference between similarly named tags that appear in different contexts in a document.

<article>
    <author>
        <name>Jon Doe</name>
        <title>Chief Editor</title>
    </author>
    <title>On the Benefits of grep</title>
    <publicationDate>2018-02-12</publicationDate>
    <text>blah blah blah</text>
</article>

Extracting the title of the article represented by this document with grep would fail if you used any of the other answers posted here. You could technically write the regular expression to get what you need but it's a lot easier with XPath.

/article/title/text()

If you know you're dealing with a trivial document and the format doesn't change or if it's a one time job where you can quickly validate the results, you can go for grep as explained by others.

0人赞添加讨论(0) 举报

闹够了就滚

6楼-- · 2020-02-25 09:31

Since you already use grep -P, why don't you use its features?

grep -oP '(?<=<title>).*?(?=</title>)'

In the general case, XPath is the correct solution, but for toy scenarios, yes Virginia, it can be done.

0人赞添加讨论(0) 举报

How to find information inside a xml tag using gre

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间