I am pulling a subset of data from a column based on conditions in another column being met.
I can get the correct values back but it is in pandas.core.frame.DataFrame. How do I convert that to list?
import pandas as pd
tst = pd.read_csv('C:\\SomeCSV.csv')
lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list
I'd like to clarify a few things:
pandas.Series.tolist()
. I'm not sure why the top voted answer leads off with usingpandas.Series.values.tolist()
since as far as I can tell, it adds syntax/confusion with no added benefit.tst[lookupValue][['SomeCol']]
is a dataframe (as stated in the question), not a series (as stated in a comment to the question). This is becausetst[lookupValue]
is a dataframe, and slicing it with[['SomeCol']]
asks for a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you remove the extra set of brackets, as intst[lookupValue]['SomeCol']
, then you are asking for just that one column rather than a list of columns, and thus you get a series back.pandas.Series.tolist()
, so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a one-column dataframe that isn't easily avoidable like this, you can usepandas.DataFrame.squeeze()
to convert it to a series.tst[lookupValue]['SomeCol']
is getting a subset of a particular column via chained slicing. It slices once to get a dataframe with only certain rows left, and then it slices again to get a certain column. You can get away with it here since you are just reading, not writing, but the proper way to do it istst.loc[lookupValue, 'SomeCol']
(which returns a series).ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()
Demo Code:
Result:
Use
.values
to get anumpy.array
and then.tolist()
to get a list.For example:
Result:
or you can just use
To drop duplicates you can do one of the following:
You can use
pandas.Series.tolist
e.g.:
Run:
You will get
The above solution is good if all the data is of same dtype. Numpy arrays are homogeneous containers. When you do
df.values
the output is annumpy array
. So if the data hasint
andfloat
in it then output will either haveint
orfloat
and the columns will loose their original dtype. Consider dfSo if you want to keep original dtype, you can do something like
this will return each row as a string.
Then split each row to get list of list. Each element after splitting is a unicode. We need to convert it required datatype.