I would like to read multiple CSV files (hundreds of files,hundreds of lines each but with the same number of columns) from a target directory into a single Python Pandas DataFrame.
The code below I wrote works but too slow.It takes minutes to run 30 files(so how long should I wait if I load all of my files). What can I alter to make it work faster?
Besides, in replace
function, I want to replace a "_"(don't know the encoding, but not a normal one) to a "-"(normal utf-8), how can I do with that? I use coding=latin-1
because I have french accents in the files.
#coding=latin-1
import pandas as pd
import glob
pd.set_option('expand_frame_repr', False)
path = r'D:\Python27\mypfe\data_test'
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
df = pd.read_csv(file_, index_col = None, header = 0, sep = ';', dayfirst = True,
parse_dates=['HeurePrevue','HeureDebutTrajet','HeureArriveeSurSite','HeureEffective'])
df.drop(labels=['aPaye','MethodePaiement','ArgentPercu'],axis=1,inplace=True)
df['Sens'].replace("\n", "-", inplace=True,regex=True)
list_.append(df)
print "fichier lu:",file_
frame = pd.concat(list_)
print frame