Filter item iteration in dataframe (with FOR or an

2019-08-26 12:57发布

问题:

I have the following dataframe:

d = pd.DataFrame([['A', 1989, 100],
                  ['A', 1990, 200],
                  ['A', 2017, 100],
                  ['B', 1989, 500],
                  ['B', 1990, 200],
                  ['C', 1990, 200],
                  ['C', 19870, 400]],
                 columns=['Univers', 'year', 'amount'])
    Univer  year   amount
0       A   1989     100
1       A   1990     200
2       A   2017     100
3       B   1989     500
4       B   1990     200
5       C   1990     200
6       C  19870     400
.
.
.

I would like to perform a filter by Univer. I applied only for A d2 = d[d['Univers']=='A']:

 Univers  year  amount
0       A  1989     100
1       A  1990     200
2       A  2017     100

Now, imagine I have a thousand of items in Univers column (and their corresponding ítems in the dataframe), how can I do this for the remaining items in Univers using a FOR (or any other)?

回答1:

Option 1
Perform a groupby on Univers, since you need to save each group.

for i, g in df.groupby('Univers'):
    g.to_csv('{}.csv'.format(i))

This generates 3 files -

A.csv

  Univers  year  amount
0       A  1989     100
1       A  1990     200
2       A  2017     100 

B.csv

  Univers  year  amount
3       B  1989     500
4       B  1990     200 

C.csv

  Univers   year  amount
5       C   1990     200
6       C  19870     400   

Option 2
Another alternative would be to call pd.Series.unique and then filter on this condition -

for v in df.Univers.unique():
    df[df.Univers == v].to_csv('{}.csv'.format(i))

Which does the same thing. You can also use query/eval to perform filtering.



回答2:

This is a simple (and probably not optimized) way:

key_name = 'Univers'
univers = set(d[key_name])
for uni in univers:
    print d[d[key_name] == uni]

Output:

Univers  year  amount
0       A  1989     100
1       A  1990     200
2       A  2017     100

Univers   year  amount
5       C   1990     200
6       C  19870     400

Univers  year  amount
3       B  1989     500
4       B  1990     200


回答3:

Im assuming that you have a list of acceptable values for "Univer" in another dataframe lets say x...

x Univers Col2

A test1

B test2

C test3

You can join both the dataframes and filter out the rows which you need. Approx syntax result = pd.concat([d, x], on='Univers').. Is that what you wanted?