Loading multiple csv files of a folder into one da

2020-01-29 12:29发布

问题:

i have multiple csv files saved in one folder with the same column layout and want to load it into python as a dataframe in pandas.

The question is really simliar to this thread.

I am using the following code:

import glob
import pandas as pd
salesdata = pd.DataFrame()
for f in glob.glob("TransactionData\Promorelevant\*.csv"):
    appenddata = pd.read_csv(f, header=None, sep=";")
    salesdata = salesdata.append(appenddata,ignore_index=True)

Is there a better solution for it with another package?

This is taking to much time.

Thanks

回答1:

I suggest use list comprehension with concat:

import glob
import pandas as pd

files = glob.glob("TransactionData\Promorelevant*.csv")
dfs = [pd.read_csv(f, header=None, sep=";") for f in files]

salesdata = pd.concat(dfs,ignore_index=True)


回答2:

With a help from link to actual answer

This seems to be the best one liner:

import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "*.csv"))))


回答3:

Maybe using bash will be faster:

head -n 1 "TransactionData/Promorelevant/0.csv" > merged.csv
tail -q -n +2 TransactionData/Promorelevant*.csv >> merged.csv

Or if using from within a jupyter notebook

!head -n 1 "TransactionData/Promorelevant/0.csv" > merged.csv
!tail -q -n +2 "TransactionData/Promorelevant*.csv" >> merged.csv

The idea being that you won't need to parse anything.

The first command copies the header of one of the files. You can skip this line if you don't have a header. Tail skips the headers for all the files and adds them to the csv.

Appending in Python is probably more expensive.

Of course, make sure your parse is still valid using pandas.

pd.read_csv("merged.csv")

Curious to your benchmark.



回答4:

i checked all this approaches except the bash one with the time function (only one run, and also note that the files are on a shared drive).

Here are the results:

My approach: 1220.49

List comphrension+concat: 1135.53

concat+map+join: 1116.31

I will go for list comphrension+concat which will save me some minutes and i feel quite familiar with.

Thanks for your ideas.