Using awk (or sed) to remove newlines based on fir

here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

Ideally, I would want this output to look like:

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!

标签： bash shell sed awk

5条回答

放荡不羁爱自由

2楼-- · 2019-03-26 21:31

This might work for you:

# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755

Explanation:

This comes in two parts:

Append the next line and then if the appended line begins with a , , delete the embedded new line \n and start again. If not print upto the newline and then delete upto the new line. Repeat.

Replace the 5th , with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2019-03-26 21:38

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

0人赞添加讨论(0) 举报

走好不送

4楼-- · 2019-03-26 21:50

Without special-casing field 3, easy.

awk '
    !/^,/   { if (NR > 1) print x ; x = $0 }
    /^,/    { x = x OFS $0 }
    END     { if (NR) print x }
'

With, more complex but still not too hard.

awk '
    !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
    /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
    END     { if (n && n < 3) print x }
'

0人赞添加讨论(0) 举报

Fickle 薄情

5楼-- · 2019-03-26 21:52

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

0人赞添加讨论(0) 举报

仙女界的扛把子

6楼-- · 2019-03-26 21:55

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

Shortest code here!

0人赞添加讨论(0) 举报

Using awk (or sed) to remove newlines based on fir

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间