Python: `paste' multiple (unknown) csvs togeth

2019-09-15 10:13发布

问题:

What I am essentially looking for is the `paste' command in bash, but in Python2. Suppose I have a csv file:

a1,b1,c1,d1
a2,b2,c2,d2
a3,b3,c3,d3

And another such:

e1,f1
e2,f2
e3,f3

I want to pull them together into this:

a1,b1,c1,d1,e1,f1
a2,b2,c2,d2,e2,f2
a3,b3,c3,d3,e3,f3

This is the simplest case where I have a known number and only two. What if I wanted to do this with an arbitrary number of files without knowing how many I have.

I am thinking along the lines of using zip with a list of csv.reader iterables. There will be some unpacking involved but seems like this much python-foo is above my IQ level ATM. Can someone suggest how to implement this idea or something completely different?

I suspect this should be doable with a short snippet. Thanks.

回答1:

Assuming the number of files is unknown, and that all the files are properly formatted as csv have the same number of lines:

files = ['csv1', 'csv2', 'csv3']
fs = map(open, files)

done = False

while not done:
    chunks = []
    for f in fs:
        try:
            l = next(f).strip()
            chunks.append(l)
        except StopIteration:
            done = True
            break
    if not done:
        print ','.join(chunks)

for f in fs:
    f.close()

There seems to be no easy way of using context managers with a variable list of files easily, at least in Python 2 (see a comment in the accepted answer here), so manual closing of files will be required as above.



回答2:

file1 = open("file1.csv", "r")
file2 = open("file2.csv", "r")

for line in file1:
    print(line.strip().strip(",") +","+ file2.readline().strip()+"\n")

Extendable for as many files as you wish. Just keep adding to the print statement. Instead of print you can also have a append to a list or whatever you wish. You may have to worry about length of files, I did not as you did not specify.



回答3:

You could try pandas

In your case, group of [a,b,c,d] and [e,f] could be treated as DataFrame in Pandas, and it's easy to do join because Pandas has function called concat.

import pandas as pd

# define group [a-d] as df1
df1 = pd.read_csv('1.csv')
# define group [e-f] as df2
df2 = pd.read_csv('2.csv')

pd.concat(df1,df2,axis=1)


标签: python csv zip