Python 2.7 - IPython 'raw_input' and appen

2019-06-05 13:20发布

问题:

I'm using Python 2.7 on Mac OSX Lion. I'm using IPython with the Pandas 0.11.0, Numpy and Statsmodels packages.

I'm writing a function that allows the user to do logistic regression on a file, specifying which variables to be used in building the model, which should be transformed into dummy variables and which variable should be the independent variable.

When I do the following, for instance:

 cols_to_keep = []
 print (df.columns)
 i = eval(raw_input('How many of these variables would you like to use in logistic regression?: '))
 while i != 0:
    i = i - 1
    print (df.columns)
    addTo = raw_input('Enter a variable for this list that you would like to keep and use in logistic regression.: ')
    cols_to_keep.append(addTo)

I end up running into problems down the road. Specifically when I ask the user to specify the dependent variable from a list and then need to take that variable out of the list of training variables:

print (df.columns)

dependent = raw_input('Which of these columns would you like to be the dependent variable?: ')
training.remove(dependent)

I found, after inserting a print statement, that the variables added to the list of training variables, looks like this:

('these are the traing variables: ', ['access', u'age_age6574', u'age_age75plus', u'sex_male', u'stage_late', u'death_death'])

It appears that a u has been placed before each user-specified variable.

My question is: why is this and how do fix/get around this issue so that, when the user specifies the dependent variable, it is actually removed from the list. This also occurs in all other instances where a user specifies a variable and it is added to a list, creating confusion if I ever need the user to observe the list.

回答1:

Those are just unicode strings, as opposed the byte strings. There is nothing wrong, and the content of the string is not affected. The u'text' is just so that you can tell the difference between byte strings and unicode strings in Python 2 when you look at the repr. If you print the string, you will see no difference. This is reversed in Python 3, where "text" means a unicode string, while b"bytes" means a byte string.

If you really want to coerce them to bytestrings (unlikely), you can do:

def ensure_str(s):
    if isinstance(s, unicode):
        s = s.encode('utf-8')
    return s

s = ensure_str(raw_input("prompt >"))