I asked this question before but don't think I really explained it properly based on the answers given.
I have a file named backup.xml
that is 28,000 lines and contains the phrase ***
in it 766 times. I also have a file named list.txt
that has 766 lines in it, each with different keywords.
What I basically need to do is insert each of the lines from list.txt
into backup.xml
to replace the 766 places ***
is mentioned.
Here's an example of what's contained in list.txt
:
Anaheim
Anchorage
Ann Arbor
Antioch
Apple Valley
Appleton
Here's an example of one of the lines with ***
in it from backup.xml
:
<title>*** Hosting Services - Company Review</title>
So, for example, the first line that has ***
mentioned should be changed to this according to the sample above:
<title>Anaheim Hosting Services - Company Review</title>
Any help would be greatly appreciated. Thanks in advance!
In this case you can probably get away with treating the XML as pure text.
So read the XML file, and replace each occurrence of the marker with a line read from the keyword file:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
my $xml_file = 'backup.xml';
my $list_file = 'list.txt';
my $out_file = 'out.xml';
my $pattern='***';
# I assumed all files are utf8 encoded
open( my $xml, '<:utf8', $xml_file );
open( my $list, '<:utf8', $list_file );
open( my $out, '>:utf8', $out_file );
while( <$xml>)
{ s{\Q$pattern\E}{my $kw= <$list>; chomp $kw; $kw}eg;
print {$out} $_;
}
rename $out_file, $xml_file;
How about this:
awk '{print NR-1 ",/\\*\\*\\*/{s/\\*\\*\\*/" $0 "/}"}' list.txt > list.sed
sed -f list.sed backup.xml
The first line used awk
to make a list of search/replace commands based on the list, which is then executed on the next line via sed
.
Using awk
. It reads backup.xml
file and when found a ***
text, I extract a word from the list.txt
file. The BEGIN
block removes list.txt
from the argument list to avoid its processing. The order of arguments is very important. Also I assume that there is only one ***
string per line.
awk '
BEGIN { listfile = ARGV[2]; --ARGC }
/\*\*\*/ {
getline word <listfile
sub( /\*\*\*/, word )
}
1 ## same as { print }
' backup.xml list.txt
If the two files sequentially correspond, you can use paste
command to join lines from both files and then postprocess.
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print substr($0, length($1)+2)}'
paste command will produce the following:
Anaheim \t <title>*** Hosting Services - Company Review</title>
while the one-liner in AWK will replace *** with the first field, subsequently removing the first field and the field separator (\t) after it.
Another variation is:
paste list.txt backup.xml |
awk 'BEGIN {FS="\t"} {sub(/\*\*\*/, $1); print $0}' |
cut -f 2-