I have a csv file, when I read into pandas data frame, it looks like:
data = pd.read_csv('test1.csv')
print(data)
output looks like:
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
Is there a way to split the data frame base on the value in the result column.I.e. If the result=0, go to a new data frame data_0:
v1 v2 v3 result
0 12 31 31 0
1 7 89 2 0
and if result=1, go to a data frame data_1
v1 v2 v3 result
0 34 52 4 1
1 32 4 5 1
Is there any pandas function can do that? Or I have to write my own loop function to create two data frames? Thanks a lot!
Pandas allow you to slice and manipulate the data in a very straightforward way. You may also do the same as Yakym accessing with the key instead of attribute name.
data_0 = data[data['result'] == 0]
data_1 = data[data['result'] == 1]
You can even add results columns by manipulating row data directly eg:
data['v_sum'] = data[v1] + data[v2] + data[v3]
You can try create dictionary
of DataFrames
by groupby
, if column result
has many different values:
print data
v1 v2 v3 result
0 12 31 31 0
1 34 52 4 1
2 32 4 5 1
3 7 89 2 0
datas = {}
for i, g in data.groupby('result'):
#print 'data_' + str(i)
#print g
datas.update({'data_' + str(i) : g.reset_index(drop=True)})
print datas['data_0']
v1 v2 v3 result
0 12 31 31 0
1 7 89 2 0
print datas['data_1']
v1 v2 v3 result
0 34 52 4 1
1 32 4 5 1
df1 = data[data.result==0]
df2 = data[data.result==1]
Have a look at this.