Replace first two whitespace occurrences with a co

2019-05-01 19:57发布

问题:

I have a whitespace delimited file with a variable number of entries on each line. I want to replace the first two whitespaces with commas to create a comma delimited file with three columns.

Here's my input:

a b  1 2 3 3 2 1
c d  44 55 66 2355
line http://google.com 100 200 300
ef jh  77 88 99
z y 2 3 33

And here's my desired output:

a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line,http://google.com,100 200 300
ef,jh,77 88 99
z,y,2 3 33

I'm trying to use perl regular expressions in a sed command but I can't quite get it to work. First I try capturing a word, followed by a space, then another word, but that only works for lines 1, 2, and 5:

$ cat test | sed -r 's/(\w)\s+(\w)\s+/\1,\2,/'
a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line http://google.com 100 200 300
ef jh  77 88 99
z,y,2 3 33

I also try capturing whitespace, a word, and then more whitespace, but that gives me the same result:

$ cat test | sed -r 's/\s+(\w)\s+/,\1,/'
a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line http://google.com 100 200 300
ef jh  77 88 99
z,y,2 3 33

I also try doing this with the .? wildcard, but that does something funny to line 4.

$ cat test | sed -r 's/\s+(.?)\s+/,\1,/'
a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line http://google.com 100 200 300
ef jh,,77 88 99
z,y,2 3 33

Any help is much appreciated!

回答1:

How about this:

sed -e 's/\s\+/,/' | sed -e 's/\s\+/,/'

It's probably possible with a single sed command, but this is sure an easy way :)

My output:

a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line,http://google.com,100 200 300
ef,jh,77 88 99
z,y,2 3 33


回答2:

Try this:

sed -r 's/\s+(\S+)\s+/,\1,/'

Just replaced \w (one "word" char) with \S+ (one or more non-space chars) in one of your attempts.



回答3:

You can provide multiple commands to a single instance of sed by just providing multiple -e arguments.

To do the first two, just use:

sed -e 's/\s\+/,/' -e 's/\s\+/,/'

This basically runs both commands on the line in sequence, the first doing the first block of whitespace, the second doing the next.

The following transcript shows this in action:

pax$ echo 'a b  1 2 3 3 2 1
c d  44 55 66 2355
line http://google.com 100 200 300
ef jh  77 88 99
z y 2 3 33
' | sed -e 's/\s\+/,/' -e 's/\s\+/,/'

a,b,1 2 3 3 2 1
c,d,44 55 66 2355
line,http://google.com,100 200 300
ef,jh,77 88 99
z,y,2 3 33


回答4:

Sed s/// supports a way to say which occurrence of a pattern to replace: just add the n to the end of the command to replace only the nth occurrence. So, to replace the first and second occurrences of whitespace, just use it this way:

$ sed 's/  */,/1;s/  */,/2' input
a,b ,1 2 3 3 2 1
c,d ,44 55 66 2355
line,http://google.com 100,200 300
ef,jh ,77 88 99
z,y 2,3 33

EDIT: reading another proposed solutions, I noted that the 1 and 2 after s/ */,/ is not only unnecessary but plainly wrong. By default, s/// just replaces the first occurrence of the pattern. So, if we have two identical s/// in sequence, they will replace the first and the second occurrence. What you need is just

$ sed 's/  */,/;s/  */,/' input 

(Note that you can put two sed commands in one expression if you separate them by a semicolon. Some sed implementations do not accept the semicolon after the s/// command; use a newline to separate the commands, in this case.)



回答5:

A Perl solution is:

perl -pe '$_=join ",", split /\s+/, $_, 3' some.file


回答6:

Not sure about sed/perl, but here's an (ugly) awk solution. It just prints fields 1-2, separated by commas, then the remaining fields separated by space:

awk '{
  printf("%s,", $1)
  printf("%s,", $2)
  for (i=3; i<=NF; i++)
    printf("%s ", $i)
    printf("\n")
}' myfile.txt