Substitute all characters between two strings by c

2019-01-12 13:13发布

问题:

In a Bash script, I am trying to in-file replace the characters between two given strings by 'X'. I have bunch of string pair, between which I want the replacement of characters by 'X' should happen.
In the below code, the first string in the pair is declared in cpi_list array. The second string in the pair is always either %26 or & or ENDOFLINE

This is what I am doing.

# list of "first" or "start" string
declare -a cpi_list=('%26Name%3d' '%26Pwd%3d')  

# This is the "end" string
myAnd=\%26
newfile="inputlog.txt"

for item in "${cpi_list[@]}";
do
    sed -i -e :a -e "s/\($item[X]*\)[^X]\(.*"$myAnd"\)/\1X\2/;ta" $newfile;
done

The input

CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete

I want to make it

CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete

PS: The last item need also change %26Name%3dCOTT to %26Name%3dXXXX even though there is no end %26 because I am looking for either %26 as the end point or the END OF THE LINE

But somehow it is not working.

回答1:

This will work in any awk called from any shell in any UNIX installation:

$ cat tst.awk
BEGIN {
    begs = "%26Name%3d|%26Pwd%3d"
    ends = "%26|&"
}
{
    head = ""
    tail = $0
    while( match(tail, begs) ) {
        tgtStart = RSTART + RLENGTH
        tgt = substr(tail,tgtStart)
        if ( match(tgt, ends) ) {
            tgt = substr(tgt,1,RSTART-1)
        }

        gsub(/./,"X",tgt)
        head = head substr(tail,1,tgtStart-1) tgt
        tail = substr(tail,tgtStart+length(tgt))
    }
    $0 = head tail

    print
}

$ cat file
CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT
CPI.%26Name%3dVoorhees&machete

$ awk -f tst.awk file
CPI.%26Name%3dXXXXX%26Pwd%3dXXXXXX%26Name%3dXXXX
CPI.%26Name%3dXXXXXXXX&machete

Just like with a sed subsitution, any regexp metacharacter in the beg and end strings would need to be escaped or we'd have to use a loop with index()s instead of match() so we'd do string matching instead of regexp matching.



回答2:

You can avoid %26 doing this:

a='CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT'
echo "$a" |sed -E ':a;s/(%3dX*)([^%X]|%[013-9a-f][0-9a-f]|%2[0-5789a-f])/\1X/g;ta;'

Note that each encoded character %xx counts for one X.



回答3:

It is not pretty but you can use perl:

$ s1="CPI.%26Name%3dJASON%26Pwd%3dBOTTLE%26Name%3dCOTT"
$ echo "$s1" | perl -lne 'if (/(?:^.*%26Name%3d)(.*)(?:%26Pwd%3d)(?:.*%26Name%3d)(.*)((?:%26Pwd%3d)|(?:$))/) { 
        $i1=$-[1];
        $l1=$+[1]-$-[1];
        $i2=$-[2];
        $l2=$+[2]-$-[2];
        substr($_, $i1, $l1, "X"x$l1);
        substr($_, $i2, $l2, "X"x$l2);
        print;
        }'
CPI.%26Name%3dXXXXX%26Pwd%3dBOTTLE%26Name%3dXXXX

That is for two pairs like the example. N pairs in a line will be a slight modification.