How to perform a sed transform within a matching p

It's easy to do a sed transform within a line matching a certain pattern, but what if we only want to transform something in a certain part of the line?

Simple example

Suppose we want to make all characters uppercase in all lines starting with #. We could do that with a command of the following form.

sed '/^#/ y/abcdef/ABCDEF/'

Suppose we only want to turn the first word in these lines uppercase. How would we go about that using a sed translation?

More advanced application

I want to interchange slashes with backslashes in the graph part of the output of git --no-pager log --all --graph --decorate --oneline --color=always | tac.

Before

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |/ /
| |/| / /
| | |/ / /
| | |\ \ \
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |\ \
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

After

| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

Notice that any slashes in the commit messages are kept the same, but the slashes in the graphical part are transformed.

标签： regex bash awk replace sed

4条回答

Anthone

2楼-- · 2019-07-13 02:38

Keep it simple, just use awk. e.g. with GNU awk for the 3rd arg to match():

$ cat tst.awk        
{
    match($0,/([| *\/\\]+)(.*)/,a)
    gsub(/\//,RS,a[1])
    gsub(/\\/,"/",a[1])
    gsub(RS,"\\",a[1])
    print a[1] a[2]
}

$ awk -f tst.awk file
| * | | 279e9ad (tag: v0.0.4.334, origin/DR) asdfasdf
| | |\ \
| |\| \ \
| | |\ \ \
| | |/ / /
| | * | |   1fc7ab7 (tag: v0.0.4.337) Merge branch 'DR' into NextMajor
| | | * | d24e21d (tag: v0.0.4.341, origin/DR-01) DR-010728 Updated unit tests
| | |/ /
| | * |   8c01099 (tag: v0.0.4.338, tag: 0.0.4_MILESTONE_RELEASE) Merge

With any awk and comments added in case it's not obvious what the script does:

$ cat tst.awk        
{
    match($0,/[| *\/\\]+/)              # find the segment of text you want
    tgt = substr($0,RSTART,RLENGTH)     # save that segment in a variable tgt
    gsub(/\//,RS,tgt)                   # change all /s to newlines in tgt
    gsub(/\\/,"/",tgt)                  # change all \s to /s in tgt
    gsub(RS,"\\",tgt)                   # change all newlines to \s in tgt
    print tgt substr($0,RSTART+RLENGTH) # print tgt plus rest of the line
}

We use newlines as the tmp value during the character swap since there's guaranteed to not already be a newline present in the line.

To turn the first word of each line that starts with # to uppercase, btw, might just be:

awk '/^#/{$1=toupper($1)}1' file

or:

awk '/^#/{$2=toupper($2)}1' file

depending on your input data, definition of a word, and white space requirements.

If the text you want to match can contain control characters, as it sounds like from your comments, then just allow that in the regexp, e.g.:

    match($0,/([[:space:][:cntrl:]|*\/\\]+)(.*)/,a)

0人赞添加讨论(0) 举报

ゆ、 Hurt°

3楼-- · 2019-07-13 02:41

If your version of sed supports it, you can use \U to transform text to uppercase:

sed -r 's/(^# *)([^ ]*)/\1\U\2/'

This captures the first part of any line starting with # (including optional spaces), then anything up to the next space character. The second capture group is transformed to uppercase.

If it doesn't support it, then you can always use perl:

perl -pe 's/(^#\s*)([\S]*)/$1\U$2/'

I've used \s and \S in this version, which are equivalent to [[:space:]] (space characters) and [^[:space:]] (non-space characters) respectively. You might want to use a slightly different pattern depending on the specifics of the files you're working with.

0人赞添加讨论(0) 举报

唯我独甜

4楼-- · 2019-07-13 02:52

Here's a simple sed solution that should be portable (i.e. works in sed variants other than GNU). This swaps slashes that do not follow a letter (which works in your sample data at least).

sed -e 's:\([^a-z]\)/:\1\\:g;t' -e 's:\([^a-z]\)\\:\1/:g' file

The breakdown of this goes a little like this:

s:\([^a-z]\)/:\1\\:g - replace forward slashes with backslashes
t - If we just did a substitution, skip to the end (avoiding the next substitution)
s:\([^a-z]\)\\:\1/:g - replace backslashes with forward slashes.

The reason to split this into two -e expressions is that some variants of sed require the branch name to be at the end of a line in the script. The end of a -e expression is deemed equivalent to the the end of a line.

0人赞添加讨论(0) 举报

萌系小妹纸

5楼-- · 2019-07-13 02:53

This might work for you (GNU sed):

sed '/^#/s/\w\+/\U&/' file

or:

sed '/^#/!b;s/\w\w*/&\n/;h;y/abcdef/ABCDEF/;G;s/\n.*\n//' file

0人赞添加讨论(0) 举报