I am trying to merge & append different timeseries, importing them from csv files. I have tried the following basic code:
import pandas as pd
import numpy as np
import glob
import csv
import os
path = r'./A08_csv' # use your path
#all_files = glob.glob(os.path.join(path, "A08_B1_T5.csv"))
df5 = pd.read_csv('./A08_csv/A08_B1_T5.csv', parse_dates={'Date Time'})
df6 = pd.read_csv('./A08_csv/A08_B1_T6.csv', parse_dates={'Date Time'})
print len(df5)
print len(df6)
df = pd.concat([df5],[df6], join='outer')
print len(df)
and the result is:
12755 (df5)
24770 (df6)
12755 (df)
Shouldn't df as long as the longest of the two files (which have lots of rows in common, in terms of values on ['Date Time'] column)??
I have tried to index the data based on datetime, adding this line:
#df5.set_index(pd.DatetimeIndex(df5['Date Time']))
However I received the error:
KeyError: 'Date Time'
Any clue on why this happens?
I think you need:
df5.set_index(['Date Time'], inplace=True)
Or better in read_csv
add parameter index_col
:
import pandas as pd
import io
temp=u"""Date Time,a
2010-01-27 16:00:00,2.0
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp), index_col=['Date Time'], parse_dates=['Date Time'])
print (df)
a
Date Time
2010-01-27 16:00:00 2.0
2010-01-27 16:10:00 2.2
2010-01-27 16:30:00 1.7
print (df.index)
DatetimeIndex(['2010-01-27 16:00:00', '2010-01-27 16:10:00',
'2010-01-27 16:30:00'],
dtype='datetime64[ns]', name='Date Time', freq=None)
Another solution is add to paramaters column by order - if column Date Time
is first, add 0
to index_col
and parse_dates
(python count from 0
):
import pandas as pd
import io
temp=u"""Date Time,a
2010-01-27 16:00:00,2.0
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp), index_col=0, parse_dates=[0])
print (df)
a
Date Time
2010-01-27 16:00:00 2.0
2010-01-27 16:10:00 2.2
2010-01-27 16:30:00 1.7
print (df.index)
DatetimeIndex(['2010-01-27 16:00:00', '2010-01-27 16:10:00',
'2010-01-27 16:30:00'],
dtype='datetime64[ns]', name='Date Time', freq=None)
This is incorrect:
pd.concat([df5],[df6], join='outer')
The second argument to concat
is the axis
. Instead, you want:
pd.concat([df5, df6], join='outer')