Joining Columns on Command Line with Paste or PR N

2019-07-07 03:48发布

问题:

So I have two files that I want to take columns out of and join them in a single file.

f1:

02/10/2013,16:00:00.091,123.82,OTCX,GLO,,123.82
02/10/2013,16:00:03.072,123.766,FXN,NAM,,123.766
02/10/2013,16:00:03.491,123.769,FXN,,,123.769
02/10/2013,16:00:03.565,123.79,COMM,ASI,HKG,123.79
02/10/2013,16:00:03.721,123.769,FXN,NAM,NYC,123.769
02/10/2013,16:00:04.194,123.81,AKM,EUR,MOW,123.81
02/10/2013,16:00:06.130,123.764,FXN,NAM,NYC,123.764
02/10/2013,16:00:06.330,123.764,FXN,,,123.764
02/10/2013,16:00:08.989,123.766,FXN,,,123.766
02/10/2013,16:00:09.034,123.791,FXN,,,123.791

f2:

02/10/2013,16:00:00.091,123.82,123.83,OTCX,GLO,
02/10/2013,16:00:03.072,123.766,123.888,FXN,NAM,
02/10/2013,16:00:03.491,123.769,123.888,FXN,,
02/10/2013,16:00:03.565,123.79,123.87,COMM,ASI,HKG
02/10/2013,16:00:03.721,123.769,123.891,FXN,NAM,NYC
02/10/2013,16:00:04.194,123.81,123.85,AKM,EUR,MOW
02/10/2013,16:00:06.130,123.764,123.891,FXN,NAM,NYC
02/10/2013,16:00:06.330,123.764,123.888,FXN,,
02/10/2013,16:00:08.989,123.766,123.886,FXN,,
02/10/2013,16:00:09.034,123.791,123.861,FXN,,

I saw the reference to a previous SO question here: How to paste columns from separate files using bash?

but for some reason both the paste and the pr commands are not working for this data set: instead paste -d <(cut -d "," -f 3,7 f1) <(cat f2) just appends a comma to the front of every line of f2

,02/10/2013,16:00:00.091,123.82,123.83,OTCX,GLO,
,02/10/2013,16:00:03.072,123.766,123.888,FXN,NAM,
,02/10/2013,16:00:03.491,123.769,123.888,FXN,,
,02/10/2013,16:00:03.565,123.79,123.87,COMM,ASI,HKG
,02/10/2013,16:00:03.721,123.769,123.891,FXN,NAM,NYC
,02/10/2013,16:00:04.194,123.81,123.85,AKM,EUR,MOW
,02/10/2013,16:00:06.130,123.764,123.891,FXN,NAM,NYC
,02/10/2013,16:00:06.330,123.764,123.888,FXN,,
,02/10/2013,16:00:08.989,123.766,123.886,FXN,,
,02/10/2013,16:00:09.034,123.791,123.861,FXN,, 

pr -mts, yields the same behavior as paste.

Any advice on why these files are behaving differently?

Thanks!

回答1:

Note that you missed giving the -d option a value.

To put columns 3 and 7 at the beginning of "f2" lines, separated with a comma

paste -d, <(cut -d, -f 3,7 f1) f2

Accounting for CRLF line endings

paste -d, <(sed 's/\r$//' f1 | cut -d, -f 3,7) <(sed 's/\r$//' f2)


回答2:

Combining the two answers (from jaypal's comments + glenn's answer) together, the final answer turned out to be, the CR end of line.

For a holistic answer, one way to diagnose this problem is

file filename

if it returns something like ASCII text, with CRLF line terminators, then you know that you have this problem. CR = '\r', LF = '\n', so in reality every line was terminating with '\r\n'.

Running tr -d, "\r" < f1 > tmp && mv tmp f1 will delete all of the extra '\r' from the code and then allow paste to work as expected.