Merging two files by a single column in unix

2020-02-10 13:27发布

I would like to merge two files by one column in unix.

I have file_a:

subjectid name age  
12 Jane 16  
24 Kristen 90  
15 Clarke 78  
23 Joann 31  

I have another file_b:

subjectid prob_disease  
12 0.009  
24 0.738  
15 0.392  
23 1.2E-5  

I would like to merge these files in the command line. I'd like to merge files a and b by subjectid. Since each file is about 2 million lines long, I tried in R but it froze due to the amount of data, could someone please help me do this in linux? Desired output:

subjectid prob_disease name age  
12 0.009 Jane 16  
24 0.738 Kristen 90   
15 0.392 Clarke 78  
23 1.2E-5 Joanna 31     

Please help and thank you!

标签: linux unix merge
2条回答
甜甜的少女心
2楼-- · 2020-02-10 13:57

Check out join(1). In your case, you don't even need any flags:

$ join file_b file_a
subjectid prob_disease name age
12 0.009 Jane 16
24 0.738 Kristen 90
15 0.392 Clarke 78
23 1.2E-5 Joann 31
查看更多
够拽才男人
3楼-- · 2020-02-10 14:03

You're looking for the join command:

$ cat test.1
12 Jane 16
24 Kristen 90
15 Clarke 78
23 Joann 31 
$ cat test.2
12 0.009
24 0.738
15 0.392
23 1.2E-5 
$ join -j1 -o 2.1,2.2,1.2,1.3  <(sort test.1) <(sort test.2)
12 0.009 Jane 16
15 0.392 Clarke 78
23 1.2E-5 Joann 31
24 0.738 Kristen 90
$ 
查看更多
登录 后发表回答