好吧,我在这里对堆栈溢出读取多个线程。 我认为这将是相当容易的,我做的,但我发现我还没有一个Python的很好的把握。 我试过位于例如如何使用常见的列值2个CSV文件合并,但两个文件有不同的行数 ,那就是有益的,但我还没有,我希望达到的结果。
基本上我有一个共同的第一列2个的CSV文件。 我想合并2即
filea.csv
title,stage,jan,feb
darn,3.001,0.421,0.532
ok,2.829,1.036,0.751
three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun,
darn,0.631,1.321,0.951,1.751
ok,1.001,0.247,2.456,0.3216
three,0.285,1.283,0.924,956
output.csv(不是一个我得到,但我想要的东西)
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv(我居然得到了输出)
title,feb,may
ok,0.751,2.456
three,2.921,0.924
darn,0.532,0.951
我试图代码:
'''
testing merging of 2 csv files
'''
import csv
import array
import os
with open('Z:\\Desktop\\test\\filea.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {row[0]: row[3] for row in r}
with open('Z:\\Desktop\\test\\fileb.csv') as f:
r = csv.reader(f, delimiter=',')
#dict2 = {row[0]: row[3] for row in r}
dict2 = {row[0:3] for row in r}
print str(dict1)
print str(dict2)
keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
任何帮助是极大的赞赏。
当我正在使用csv
文件,我经常使用的大熊猫库。 它使这样的事情很容易。 例如:
import pandas as pd
a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
一些解释如下。 首先,我们看在CSV文件:
>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
title stage jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2 three 1.115 1.146 2.921
>>> b
title mar apr may jun Unnamed: 5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2 three 0.285 1.283 0.924 956.0000 NaN
我们看到有数据的一个额外的列(请注意,第一行fileb.csv
- title,mar,apr,may,jun,
-在年底有一个额外的逗号)。 我们可以摆脱的,很容易不够:
>>> b = b.dropna(axis=1)
>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2 three 0.285 1.283 0.924 956.0000
现在,我们可以合并a
和b
的标题栏:
>>> merged = a.merge(b, on='title')
>>> merged
title stage jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000
终于写出了这一点:
>>> merged.to_csv("output.csv", index=False)
生产:
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
您需要的所有这些文件中的额外行的存储在你的字典里,不只是其中之一:
dict1 = {row[0]: row[1:] for row in r}
...
dict2 = {row[0]: row[1:] for row in r}
然后,因为在字典中的值列表,你只是需要拼接列表一起:
w.writerows([[key] + dict1.get(key, []) + dict2.get(key, []) for key in keys])