How to perform a sed transform within a matching p

2019-07-13 02:48发布

问题:

It's easy to do a sed transform within a line matching a certain pattern, but what if we only want to transform something in a certain part of the line?

Simple example

Suppose we want to make all characters uppercase in all lines starting with #. We could do that with a command of the following form.

sed '/^#/ y/abcdef/ABCDEF/'

Suppose we only want to turn the first word in these lines uppercase. How would we go about that using a sed translation?

More advanced application

I want to interchange slashes with backslashes in the graph part of the output of git --no-pager log --all --graph --decorate --oneline --color=always | tac.

Before

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |/ /
| |/| / /
| | |/ / /
| | |\ \ \
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |\ \
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge 

After

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge 

Notice that any slashes in the commit messages are kept the same, but the slashes in the graphical part are transformed.

回答1:

Keep it simple, just use awk. e.g. with GNU awk for the 3rd arg to match():

$ cat tst.awk        
{
    match($0,/([| *\/\\]+)(.*)/,a)
    gsub(/\//,RS,a[1])
    gsub(/\\/,"/",a[1])
    gsub(RS,"\\",a[1])
    print a[1] a[2]
}

$ awk -f tst.awk file
| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge 

With any awk and comments added in case it's not obvious what the script does:

$ cat tst.awk        
{
    match($0,/[| *\/\\]+/)              # find the segment of text you want
    tgt = substr($0,RSTART,RLENGTH)     # save that segment in a variable tgt
    gsub(/\//,RS,tgt)                   # change all /s to newlines in tgt
    gsub(/\\/,"/",tgt)                  # change all \s to /s in tgt
    gsub(RS,"\\",tgt)                   # change all newlines to \s in tgt
    print tgt substr($0,RSTART+RLENGTH) # print tgt plus rest of the line
}

We use newlines as the tmp value during the character swap since there's guaranteed to not already be a newline present in the line.

To turn the first word of each line that starts with # to uppercase, btw, might just be:

awk '/^#/{$1=toupper($1)}1' file

or:

awk '/^#/{$2=toupper($2)}1' file

depending on your input data, definition of a word, and white space requirements.

If the text you want to match can contain control characters, as it sounds like from your comments, then just allow that in the regexp, e.g.:

    match($0,/([[:space:][:cntrl:]|*\/\\]+)(.*)/,a)


回答2:

Here's a simple sed solution that should be portable (i.e. works in sed variants other than GNU). This swaps slashes that do not follow a letter (which works in your sample data at least).

sed -e 's:\([^a-z]\)/:\1\\:g;t' -e 's:\([^a-z]\)\\:\1/:g' file

The breakdown of this goes a little like this:

  • s:\([^a-z]\)/:\1\\:g - replace forward slashes with backslashes
  • t - If we just did a substitution, skip to the end (avoiding the next substitution)
  • s:\([^a-z]\)\\:\1/:g - replace backslashes with forward slashes.

The reason to split this into two -e expressions is that some variants of sed require the branch name to be at the end of a line in the script. The end of a -e expression is deemed equivalent to the the end of a line.



回答3:

This might work for you (GNU sed):

sed '/^#/s/\w\+/\U&/' file

or:

sed '/^#/!b;s/\w\w*/&\n/;h;y/abcdef/ABCDEF/;G;s/\n.*\n//' file


回答4:

If your version of sed supports it, you can use \U to transform text to uppercase:

sed -r 's/(^# *)([^ ]*)/\1\U\2/'

This captures the first part of any line starting with # (including optional spaces), then anything up to the next space character. The second capture group is transformed to uppercase.

If it doesn't support it, then you can always use perl:

perl -pe 's/(^#\s*)([\S]*)/$1\U$2/'

I've used \s and \S in this version, which are equivalent to [[:space:]] (space characters) and [^[:space:]] (non-space characters) respectively. You might want to use a slightly different pattern depending on the specifics of the files you're working with.