OK I have read several threads here on Stack Overflow. I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of Python. I tried the example located at How to combine 2 csv files with common column value, but both files have different number of lines and that was helpful but I still do not have the results that I was hoping to achieve.
Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.
filea.csv
title,stage,jan,feb darn,3.001,0.421,0.532 ok,2.829,1.036,0.751 three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun, darn,0.631,1.321,0.951,1.751 ok,1.001,0.247,2.456,0.3216 three,0.285,1.283,0.924,956
output.csv (not the one I am getting but what I want)
title,stage,jan,feb,mar,apr,may,jun darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv (the output that I actually got)
title,feb,may ok,0.751,2.456 three,2.921,0.924 darn,0.532,0.951
The code I was trying:
'''
testing merging of 2 csv files
'''
import csv
import array
import os
with open('Z:\\Desktop\\test\\filea.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {row[0]: row[3] for row in r}
with open('Z:\\Desktop\\test\\fileb.csv') as f:
r = csv.reader(f, delimiter=',')
#dict2 = {row[0]: row[3] for row in r}
dict2 = {row[0:3] for row in r}
print str(dict1)
print str(dict2)
keys = set(dict1.keys() + dict2.keys())
with open('Z:\\Desktop\\test\\output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
Any help is greatly appreciated.
When I'm working with
csv
files, I often use the pandas library. It makes things like this very easy. For example:Some explanation follows. First, we read in the csv files:
and we see there's an extra column of data (note that the first line of
fileb.csv
--title,mar,apr,may,jun,
-- has an extra comma at the end). We can get rid of that easily enough:Now we can merge
a
andb
on the title column:and finally write this out:
producing:
You need to store all of the extra rows in the files in your dictionary, not just one of them:
Then, since the values in the dictionaries are lists, you need to just concatenate the lists together: