Replacing text in a file from a list in another fi

2019-08-08 01:42发布

问题:

I asked this question before but don't think I really explained it properly based on the answers given.

I have a file named backup.xml that is 28,000 lines and contains the phrase *** in it 766 times. I also have a file named list.txt that has 766 lines in it, each with different keywords.

What I basically need to do is insert each of the lines from list.txt into backup.xml to replace the 766 places *** is mentioned.

Here's an example of what's contained in list.txt:

Anaheim
Anchorage
Ann Arbor
Antioch
Apple Valley
Appleton

Here's an example of one of the lines with *** in it from backup.xml:

<title>*** Hosting Services - Company Review</title>

So, for example, the first line that has *** mentioned should be changed to this according to the sample above:

<title>Anaheim Hosting Services - Company Review</title>

Any help would be greatly appreciated. Thanks in advance!

回答1:

In this case you can probably get away with treating the XML as pure text. So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw( open);

my $xml_file  = 'backup.xml';
my $list_file = 'list.txt';
my $out_file  = 'out.xml';  

my $pattern='***';

# I assumed all files are utf8 encoded
open( my $xml,  '<:utf8', $xml_file  );
open( my $list, '<:utf8', $list_file );
open( my $out,  '>:utf8', $out_file  );

while( <$xml>)
  { s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
    print {$out} $_;
  }

rename $out_file, $xml_file;


回答2:

How about this:

awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml

The first line used awk to make a list of search/replace commands based on the list, which is then executed on the next line via sed.



回答3:

Using awk. It reads backup.xml file and when found a *** text, I extract a word from the list.txt file. The BEGIN block removes list.txt from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one *** string per line.

awk '
        BEGIN { listfile = ARGV[2]; --ARGC }
        /\*\*\*/ {
                getline word <listfile
                sub( /\*\*\*/, word )
        }
        1     ## same as { print }
' backup.xml list.txt


回答4:

If the two files sequentially correspond, you can use paste command to join lines from both files and then postprocess.

paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'

paste command will produce the following:

Anaheim \t <title>*** Hosting Services - Company Review</title>

while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.

Another variation is:

paste list.txt backup.xml | 
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' | 
cut -f 2-