here's my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp's, but each "piece" of information I pulled is on a separate line, I'd like for each "record" to be on its own line so it can be easily imported into a DB.
Here's a sample of my data right now:
92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755
Ideally, I would want this output to look like:
92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
79217,999,000
,5417178
,PK90755
This may be harder to do, so I would settle for the output of that last "record" to only appear once with the additional "PK..." to be the 4th "field" of that line.
In the end, the simplest way I could think of doing is if the line starts with a comma ( ^, ) the newline before it should be removed... I'm not too familiar with awk though so if you could give me a start on this it would really be appreciated! Thanks!
$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755
Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.
Shortest code here!
Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this:
In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:
awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'
and it works like a charm outputting exactly the way I wanted!
sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename
Without special-casing field 3, easy.
awk '
!/^,/ { if (NR > 1) print x ; x = $0 }
/^,/ { x = x OFS $0 }
END { if (NR) print x }
'
With, more complex but still not too hard.
awk '
!/^,/ { if (n && n < 3) print x ; x = $0 ; n = 1 }
/^,/ { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
END { if (n && n < 3) print x }
'
This might work for you:
# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755
Explanation:
This comes in two parts:
Append the next line and then if the appended line begins with a ,
, delete the embedded new line \n
and start again. If not print upto the newline and then delete upto the new line. Repeat.
Replace the 5th ,
with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.