I try to understand where my Python script goes awry. I have a pandas Series (diagnoses
) of lists, each a list of strings (never empty). I can and did verify this with diagnoses.map(type)
and
for x in diagnoses[0]:
type x
Yet when I would map a lambda function to this Series of lists, I get a TypeError: 'float' object not iterable
.
Imagine the data looking like this:
LopNr AR var3 va4 var5 var6 var7 var8 var9 var10 DIAGNOS
6 2011 S834
6 2011 K21 S834
And the code is:
from pandas import *
tobacco = lambda lst: any( (((x >= 'C30') and (x<'C40')) or ((x >= 'F17') and (x<'F18'))) for x in lst)
treatments = read_table(filename,usecols=[0,1,10])
diagnoses = treatments['DIAGNOS'].str.split(' ')
treatments['tobacco'] = diagnoses.map(tobacco)
What is going on, and how can I fix this?
PS: The same code definitely runs on a very similar Series if I import the source text file with IOpro
first and build a dataframe from that adapter, see below. I am not sure why that would change the relevant datatypes, as far as I could verify the pandas Series has lists of strings in either case… This is with Python 2.7.6 and pandas 0.13.1.
import iopro
adapter = iopro.text_adapter(filename,parser='csv',field_names=True,output='dataframe',delimiter='\t')
treatments = adapter[['LopNr','AR','DIAGNOS']][:]
The
TypeError: 'float' object is not iterable
could happen if the data is missing a value forDIAGNOS
. For example, when data looks like this:Then
The
NaN
in theDIAGNOS
column is the source of the problem, sincestr.split(' ')
preserves the NaN:The
NaN
gets passed to thetobacco
function whendiganose.map(tobacco)
is called. SinceNaN
is a float and not iterable, thefor x in lst
loop raises theTypeError
.To avoid this error, replace the NaNs in
treatments['DIAGNOS']
:yields