合并使用Python 2个CSV文件(Merging two CSV files using Pyt

2019-09-01 07:55发布

好吧,我在这里对堆栈溢出读取多个线程。 我认为这将是相当容易的,我做的,但我发现我还没有一个Python的很好的把握。 我试过位于例如如何使用常见的列值2个CSV文件合并,但两个文件有不同的行数 ,那就是有益的,但我还没有,我希望达到的结果。

基本上我有一个共同的第一列2个的CSV文件。 我想合并2即

filea.csv

title,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921

fileb.csv

title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956

output.csv(不是一个我得到,但我想要的东西)

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956

output.csv(我居然得到了输出)

title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn,0.532,0.951

我试图代码:

'''
testing merging of 2 csv files
'''
import csv
import array
import os

with open('Z:\\Desktop\\test\\filea.csv') as f:
    r = csv.reader(f, delimiter=',')
    dict1 = {row[0]: row[3] for row in r}

with open('Z:\\Desktop\\test\\fileb.csv') as f:
    r = csv.reader(f, delimiter=',')
    #dict2 = {row[0]: row[3] for row in r}
    dict2 = {row[0:3] for row in r}

print str(dict1)
print str(dict2)

keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
    w = csv.writer(f, delimiter=',')
    w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])

任何帮助是极大的赞赏。

Answer 1:

当我正在使用csv文件,我经常使用的大熊猫库。 它使这样的事情很容易。 例如:

import pandas as pd

a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)

一些解释如下。 首先,我们看在CSV文件:

>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
   title  stage    jan    feb
0   darn  3.001  0.421  0.532
1     ok  2.829  1.036  0.751
2  three  1.115  1.146  2.921
>>> b
   title    mar    apr    may       jun  Unnamed: 5
0   darn  0.631  1.321  0.951    1.7510         NaN
1     ok  1.001  0.247  2.456    0.3216         NaN
2  three  0.285  1.283  0.924  956.0000         NaN

我们看到有数据的一个额外的列(请注意,第一行fileb.csv - title,mar,apr,may,jun, -在年底有一个额外的逗号)。 我们可以摆脱的,很容易不够:

>>> b = b.dropna(axis=1)
>>> b
   title    mar    apr    may       jun
0   darn  0.631  1.321  0.951    1.7510
1     ok  1.001  0.247  2.456    0.3216
2  three  0.285  1.283  0.924  956.0000

现在,我们可以合并ab的标题栏:

>>> merged = a.merge(b, on='title')
>>> merged
   title  stage    jan    feb    mar    apr    may       jun
0   darn  3.001  0.421  0.532  0.631  1.321  0.951    1.7510
1     ok  2.829  1.036  0.751  1.001  0.247  2.456    0.3216
2  three  1.115  1.146  2.921  0.285  1.283  0.924  956.0000

终于写出了这一点:

>>> merged.to_csv("output.csv", index=False)

生产:

title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0


Answer 2:

您需要的所有这些文件中的额外行的存储在你的字典里,不只是其中之一:

dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}

然后,因为在字典中的值列表,你只是需要拼接列表一起:

w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])


文章来源: Merging two CSV files using Python