How to exclude multiple line patterns using diff?

2019-09-05 22:15发布

问题:

I want to do a diff over two xml files but ignore 2-3 line patterns.

eg: Say I want to ignore availability & price while comparing the xml format below.

Here is what I have so far:

diff -I '^<PRICE>*' 1.xml 2.xml

<CATALOG>

    <PLANT>  
    <COMMON>Bloodroot</COMMON>  
    <BOTANICAL>Sanguinaria canadensis</BOTANICAL>  
    <ZONE>4</ZONE>  
    <LIGHT>Mostly Shady</LIGHT>  
    <PRICE>$2.44</PRICE>  
    <AVAILABILITY>031599</AVAILABILITY>  
    </PLANT>  
</CATALOG>

Using the above diff, it skips the price but how do I add availability to this regex?

回答1:

Run the files through grep -v to remove the lines you don't need and then diff them as shown below:

diff <(grep -vE "^(<PRICE>|<AVAILABILITY>)" 1.xml) <(grep -vE "^(<PRICE>|<AVAILABILITY>)" 2.xml)

<(...) syntax is called Process Substitution. If your shell does not support it, you will have to use temporary files, like this:

$ grep -vE "^(<PRICE>|<AVAILABILITY>)" 1.xml > 1.new.xml
$ grep -vE "^(<PRICE>|<AVAILABILITY>)" 2.xml > 2.new.xml
$ diff 1.new.xml 2.new.xml

I don't think diff -I can be used with multiple patterns.



回答2:

Did you try using -I again?

This works for me:

diff -I 'PRICE' -I 'AVAILABILITY' 1.xml 2.xml


标签: regex unix diff