I'm using Python 2.7
on Mac OSX Lion. I'm using IPython
with the Pandas 0.11.0
, Numpy
and Statsmodels
packages.
I'm writing a function that allows the user to do logistic regression on a file, specifying which variables to be used in building the model, which should be transformed into dummy variables and which variable should be the independent variable.
When I do the following, for instance:
cols_to_keep = []
print (df.columns)
i = eval(raw_input('How many of these variables would you like to use in logistic regression?: '))
while i != 0:
i = i - 1
print (df.columns)
addTo = raw_input('Enter a variable for this list that you would like to keep and use in logistic regression.: ')
cols_to_keep.append(addTo)
I end up running into problems down the road. Specifically when I ask the user to specify the dependent variable from a list and then need to take that variable out of the list of training variables:
print (df.columns)
dependent = raw_input('Which of these columns would you like to be the dependent variable?: ')
training.remove(dependent)
I found, after inserting a print statement, that the variables added to the list of training variables, looks like this:
('these are the traing variables: ', ['access', u'age_age6574', u'age_age75plus', u'sex_male', u'stage_late', u'death_death'])
It appears that a u
has been placed before each user-specified variable.
My question is: why is this and how do fix/get around this issue so that, when the user specifies the dependent variable, it is actually removed from the list. This also occurs in all other instances where a user specifies a variable and it is added to a list, creating confusion if I ever need the user to observe the list.