replace multiple lines identifying end character

2019-03-01 05:20发布

I have the below code

CREATE TABLE Table1(
        column1 double NOT NULL,
        column2 varchar(60) NULL,
        column3 varchar(60) NULL,
        column4 double NOT NULL,
 CONSTRAINT Index1 PRIMARY KEY CLUSTERED
(
        column2 ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON PRIMARY
) ON PRIMARY

GO
GO

and I want to replace

 CONSTRAINT Index1 PRIMARY KEY CLUSTERED
(
        column2 ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON PRIMARY
) ON PRIMARY

GO

with

)

You can't assume GO is the last character of the file. After Go there can be another table script. How can I do that with single sed or awk.

标签: shell awk sed
3条回答
我命由我不由天
2楼-- · 2019-03-01 05:25

This may look scary but it is not difficult to grasp with a bit of explanation:

SED_DELIM=$(echo -en "\001")
START=' CONSTRAINT Index1 PRIMARY KEY CLUSTERED'
END='GO' 
sed -n $'\x5c'"${SED_DELIM}${START}${SED_DELIM},"$'\x5c'"${SED_DELIM}${END}${SED_DELIM}{s${SED_DELIM}GO${SED_DELIM})${SED_DELIM};t a;d;:a;};p" test2.txt

The sed has the following form you may be more familiar with:
sed /regex1/,/regex2/{commands}

First it uses the SOH non-printable as the delimiter \001
Sets the START and END tags for sed multiline match
Then performs the sed command:
-n do not print by default
$'\x5c' is a Bash string literal that corresponds to backslash \
The backslashes are necessary to escape the non-printable delimiter on the multiline range match.
{s${SED_DELIM}GO${SED_DELIM})${SED_DELIM};t a;d;:a;};p:
s${SED_DELIM}GO${SED_DELIM})${SED_DELIM} replace the line that matches GO with )
t a; if there is a successful substitution in the prior statement then branch to the :a label
d if there is no subsitution then delete the line
p print whatever the result is after the commands branch to the

I didn't see their answers prior to posting this - this answer is the same as FredPhil/hek2mgl - except in this manner you have a mechanism to be more dynamic on the LHS since you can change the delimiter to a character that is much less likely to appear in the dataset.

查看更多
冷血范
3楼-- · 2019-03-01 05:38

With GNU awk for multi-char RS and assuming you want to get rid of the comma before the "CONSTRAINT":

$ cat tst.awk
BEGIN{ RS="^$"; ORS="" }
{
    gsub(/\<GO\>/,"\034")
    gsub(/,\s*CONSTRAINT[^\034]+\034/,")")
    gsub(/\034/,"GO")
    print
}
$ gawk -f tst.awk file
CREATE TABLE Table1(
        column1 double NOT NULL,
        column2 varchar(60) NULL,
        column3 varchar(60) NULL,
        column4 double NOT NULL)
GO

The above works by replacing every stand-alone "GO" with a control char that's unlikely to appear in your input (in this case I used the same value as the default SUBSEP) so we can use that char in a negated character list in the middle gsub() to create a regexp that ends with the first "GO" after "CONSTRAINT". This is one way to do "non-greedy" matching in awk.

If there is no char that you KNOW cannot appear in your input, you can create one like this:

$ cat tst.awk
BEGIN{ RS="^$"; ORS="" }
{
    gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/\<GO\>/,"b")
    gsub(/,\s*CONSTRAINT[^b]+b/,")")
    gsub(/b/,"GO"); gsub(/aB/,"b"); gsub(/aA/,"a")
    print
}
$ 
$ gawk -f tst.awk file
CREATE TABLE Table1(
        column1 double NOT NULL,
        column2 varchar(60) NULL,
        column3 varchar(60) NULL,
        column4 double NOT NULL)
GO

The above initially converts all "a"s to "aA" and "b"s to "aB" so that

  1. there are no longer any "b"s in the record, and
  2. since all original "a"s now have an "A" after them, the only occurrences of "aB" represent where "bs" were originally located

and that means that we can now convert all "GO"s to "b"s just like we converted them to "\034" in the first script above. Then we do the main gsub() and then unroll our initial gsub()s.

This idea of gsub()ing to create chars that cannot previously exist, using those chars, then unrolling the initial gsub()s is an extremely useful idiom to learn and remember, e.g. see https://stackoverflow.com/a/13062682/1745001 for another application.

To see it working one step at a time:

$ cat file                                                                                                   
foo bar Hello World World able bodies

$ awk '{gsub(/a/,"aA")}1' file                                                                               
foo baAr Hello World World aAble bodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB")}1' file                                                               
foo aBaAr Hello World World aAaBle aBodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b")}1' file                                            
foo aBaAr Hello b b aAaBle aBodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The")}1' file                         
foo aBaAr We Are The b aAaBle aBodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World")}1' file 
foo aBaAr We Are The World aAaBle aBodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World"); gsub(/aB/,"b")}1' file
foo baAr We Are The World aAble bodies

$ awk '{gsub(/a/,"aA"); gsub(/b/,"aB"); gsub(/World/,"b"); gsub(/Hello[^b]+b/,"We Are The"); gsub(/b/,"World"); gsub(/aB/,"b"); ; gsub(/aA/,"a")}1' file
foo bar We Are The World able bodies
查看更多
贪生不怕死
4楼-- · 2019-03-01 05:42

Update:

You can use the following sed command to replace even the last , before the CONSTRAINT block:

sed -r '/,/{N;/CONSTRAINT/{:a;N;/GO/!ba;s/([^,]+).*/\1\n)/};/CONSTRAINT/!n}' input.sql

Let me explain it as a multiline script:

# Search for a comma
/,/ {
  # If a command was found slurp in the next line
  # and append it to the current line in pattern buffer
  N
  # If the pattern buffer does not contain the word CONSTRAINT
  # print the pattern buffer and go on with the next line of input
  # meaning start searching for a comma
  /CONSTRAINT/! n

  # If the pattern CONSTRAINT was found we loop until we find the 
  # word GO
  /CONSTRAINT/ {
    # Define a start label for the loop 
    :a
    # Append the next line of input to the pattern buffer
    N
    # If GO is still not found in the pattern buffern
    # step to the start label of the loop
    /GO/! ba

    # The loop was exited meaning the pattern GO was found.
    # We keep the first line of the pattern buffer - without
    # the comma at the end and replace everything else by a )
    s/([^,]+).*/\1\n)/
  }
}

You can save the above multiline script in a file and execute it using

sed -rf script.sed input.sql

You can use the following sed command:

sed '/CONSTRAINT/{:a;N;/GO/!ba;s/.*/)/}' input.sql

The pattern searches for a line containing /CONSTRAINT/. If the pattern is found a block of commands is started wrapped between { }. In the block we first define a label a through :a. The we get the next line of input through N and append it to the pattern buffer. Unless we find the pattern /GO/! we'll continue at label a using the branch command b. If the pattern /GO/ is found we simply replace the buffer by a ).


An alternative can be using using a range like FredPhil suggested:

sed '/CONSTRAINT/,/GO/{s/GO/)/;te;d;:e}'
查看更多
登录 后发表回答