I have a Pandas dataframe with 1000s of rows. and it has the Names
column includes the customer names and their records. I want to create individual dataframes for each customer based on their unique names. I got the unique names into a list
customerNames = DataFrame['customer name'].unique().tolist()
this gives the following array
['Name1', 'Name2', 'Name3, 'Name4']
I tried a loop by catching the unique names in the above list and creating dataframes for each name and assign the dataframes to the customer name. So for example when I write Name3
, it should give the Name3
's data as a separate dataframe
for x in customerNames:
x = DataFrame.loc[DataFrame['customer name'] == x]
x
Above lines returned the dataframe for only Name4
as dataframe result, but skipped the rest.
How can I solve this problem?
To create a dataframe for all the unique values in a column, create a
dict
of dataframes, as follows.dict
, where each key is a unique value from the column of choice and the value is a dataframe.df_names['Name1']
).groupby()
creates a generator, which can be unpacked.k
is the unique values in the column andv
is the data associated with eachk
.With a
for-loop
and.groupby
:With a Python Dictionary Comprehension
Using
.groupby
.groupby
is faster than.unique
..groupby
is faster, at 104 ms compared to 392 ms.groupby
is faster, at 147 ms compared to 1.53 s.for-loop
is slightly faster than a comprehension, particularly for more unique column values or lots of rows (e.g. 10M).Using
.unique
:Testing
maybe i get you wrong but
when
gives you the right output for the last list entry its because your output is out of the indent of the loop
you get the output:
Or if you dont like loops you could go with
Output:
df.isin is better explained under:How to implement 'in' and 'not in' for Pandas dataframe
Your current iteration overwrites
x
twice every time it runs: thefor
loop assigns a customer name tox
, and then you assign a dataframe to it.To be able to call each dataframe later by name, try storing them in a dictionary: