python pandas : split a data frame based on a colu

2020-07-23 03:13发布

问题:

I have a csv file, when I read into pandas data frame, it looks like:

data = pd.read_csv('test1.csv')
print(data)

output looks like:

   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0

Is there a way to split the data frame base on the value in the result column.I.e. If the result=0, go to a new data frame data_0:

   v1  v2  v3  result
0  12  31  31       0
1   7  89   2       0

and if result=1, go to a data frame data_1

   v1  v2  v3  result
0  34  52   4       1
1  32   4   5       1

Is there any pandas function can do that? Or I have to write my own loop function to create two data frames? Thanks a lot!

回答1:

Pandas allow you to slice and manipulate the data in a very straightforward way. You may also do the same as Yakym accessing with the key instead of attribute name.

data_0 = data[data['result'] == 0]
data_1 = data[data['result'] == 1]

You can even add results columns by manipulating row data directly eg:

data['v_sum'] = data[v1] + data[v2] + data[v3]


回答2:

You can try create dictionary of DataFrames by groupby, if column result has many different values:

print data
   v1  v2  v3  result
0  12  31  31       0
1  34  52   4       1
2  32   4   5       1
3   7  89   2       0

datas = {}
for i, g in data.groupby('result'):
    #print 'data_' + str(i)
    #print g
    datas.update({'data_' + str(i) : g.reset_index(drop=True)})

print datas['data_0']
   v1  v2  v3  result
0  12  31  31       0
1   7  89   2       0

print datas['data_1']
   v1  v2  v3  result
0  34  52   4       1
1  32   4   5       1


回答3:

df1 = data[data.result==0]
df2 = data[data.result==1]

Have a look at this.