可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I tried the following on Codecademy's Python lesson

hobbies = []

# Add your code below!
for i in range(3):
    Hobby = str(raw_input("Enter a hobby:"))
    hobbies.append(Hobby)

print hobbies

With this, it works fine but if instead I try

Hobby = raw_input("Enter a hobby:")

I get [u'Hobby1', u'Hobby2', u'Hobby3']. Where are the extra us coming from?

回答1:

The question's subject line might be a bit misleading: Python 2's raw_input() normally returns a byte string, NOT a Unicode string.

However, it could return a Unicode string if it or sys.stdin has been altered or replaced (by an application, or as part of an alternative implementation of Python).

Therefore, I believe @ByteCommander is on the right track with his comment:

Maybe this has something to do with the console it's running in?

The Python used by Codecademy is ostensibly 2.7, but (a) it was implemented by compiling the Python interpreter to JavaScript using Emscripten and (b) it's running in the browser; so between those factors, there could very well be some string encoding and decoding injected by Codecademy that isn't present in plain-vanilla CPython.

Note: I have not used Codecademy myself nor do I have any inside knowledge of its inner workings.

回答2:

'u' means its a unicode. You can also specify raw_input().encode('utf8') to convert to string.

Edited: I checked in python 2.7 it returns byte string not unicode string. So problem is something else here.

Edited: raw_input() returns unicode if sys.stdin.encoding is unicode.

In codeacademy python environment, sys.stdin.encoding and sys.stdout.decoding both are none and default endcoding scheme is ascii.

Python will use this default encoding only if it is unable to find proper encoding scheme from environment.

回答3:

Where are the extra us coming from?

raw_input() returns Unicode strings in your environment
repr() is called for each item of a list if you print it (convert to string)
the text representation (repr()) of a Unicode string is the same as Unicode literal in Python: u'abc'.

that is why print [raw_input()] may produce: [u'abc'].

You don't see u'' in the first code example because str(unicode_string) calls the equivalent of unicode_string.encode(sys.getdefaultencoding()) i.e., it converts Unicode strings to bytestrings—don't do it unless you mean it.

Can `raw_input()` return `unicode`?

Yes:

#!/usr/bin/env python2
"""Demonstrate that raw_input() can return Unicode."""
import sys

class UnicodeFile:
    def readline(self, n=-1):
        return u'\N{SNOWMAN}'

sys.stdin = UnicodeFile()
s = raw_input()
print type(s)
print s

Output:

<type 'unicode'>
☃

The practical example is win-unicode-console package which can replace raw_input() to support entering Unicode characters outside of the range of a console codepage on Windows. Related: here's why sys.stdout should be replaced.

May `raw_input()` return `unicode`?

Yes.

raw_input() is documented to return a string:

The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that.

String in Python 2 is either a bytestring or Unicode string :isinstance(s, basestring).

CPython implementation of raw_input() supports Unicode strings explicitly: builtin_raw_input() can call PyFile_GetLine() and PyFile_GetLine() considers bytestrings and Unicode strings to be strings—it raises TypeError("object.readline() returned non-string") otherwise.

回答4:

You could encode the strings before appending them to your list:

hobbies = []

# Add your code below!
for i in range(3):
    Hobby = raw_input("Enter a hobby:")
    hobbies.append(Hobby.encode('utf-8')

print hobbies