I would like to read several csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:
import glob
import pandas as pd
# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")
dfs = []
for filename in filenames:
dfs.append(pd.read_csv(filename))
# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)
I guess I need some help within the for loop???
Edit: I googled my way into https://stackoverflow.com/a/21232849/186078. However of late I am finding it faster to do any manipulation using numpy and then assigning it once to dataframe rather than manipulating the dataframe itself on an iterative basis and it seems to work in this solution too.
I do sincerely want anyone hitting this page to consider this approach, but don't want to attach this huge piece of code as a comment and making it less readable.
You can leverage numpy to really speed up the dataframe concatenation.
Timing stats:
The Dask library can read a dataframe from multiple files:
(Source: http://dask.pydata.org/en/latest/examples/dataframe-csv.html)
The Dask dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call
df.compute()
to convert the dataframe into a Pandas dataframe.If you have same columns in all your
csv
files then you can try the code below. I have addedheader=0
so that after readingcsv
first row can be assigned as the column names.If you want to search recursively (Python 3.5 or above), you can do the following:
Note that the three last lines can be expressed in one single line:
You can find the documentation of
**
here. Also, I usediglob
instead ofglob
, as it returns an iterator instead of a list.EDIT: Multiplatform recursive function:
You can wrap the above into a multiplatform function (Linux, Windows, Mac), so you can do:
Here is the function:
An alternative to darindaCoder's answer:
If the multiple csv files are zipped, you may use zipfile to read all and concatenate as below: