I have this dataframe:
d = {'city':['Barcelona','Madrid','Rome','Torino','London','Liverpool','Manchester','Paris'],
'country': ['ES','ES','IT','IT','UK','UK','UK','FR'],
'revenue': [1,2,3,4,5,6,7,8],
'amount': [8,7,6,5,4,3,2,1]
df = pd.DataFrame(d)
I want to obtain this for each country:
españa = {'city':['Barcelona','Madrid']
'revenue':[1,2]
'amount':[8,7]}
ES = pd.DataFrame(españa)
So that in the end I will have 4 dataframes named ES,IT,UK and FR.
I have tried this so far:
a = set(df.loc[:]["country"])
for country in a:
country = df.loc[(df["country"]== country),['date','sum']]
But that only gave me one dataframe with one value.
The loop gave you all four data frames, but you threw the first three into the garbage.
You iterate through
a
with the variablecountry
, but then destroy that value in the next statement,country = ...
. Then you return to the top of the loop, resetcountry
to the next two-letter abbreviation, and continue this conflict through all four nations.If you need four data frames, you need to keep each one in a separate place. For instance:
Now you have a dictionary with four data frames, each one indexed by its country code. Does that help?
You can use a dictionary comprehension with
groupby
:Country is an iterator variable that is being over written.
In order to generate 4 different dataframes, try using a generator function.
def country_df_generator(data): for country in data['country']unique(): yield df.loc[(df["country"]== country), ['date','sum']] countries = country_df_generator(data)