I tried the following on Codecademy's Python lesson
hobbies = []
# Add your code below!
for i in range(3):
Hobby = str(raw_input("Enter a hobby:"))
hobbies.append(Hobby)
print hobbies
With this, it works fine but if instead I try
Hobby = raw_input("Enter a hobby:")
I get [u'Hobby1', u'Hobby2', u'Hobby3']
. Where are the extra u
s coming from?
You could encode the strings before appending them to your list:
The question's subject line might be a bit misleading: Python 2's
raw_input()
normally returns a byte string, NOT a Unicode string.However, it could return a Unicode string if it or
sys.stdin
has been altered or replaced (by an application, or as part of an alternative implementation of Python).Therefore, I believe @ByteCommander is on the right track with his comment:
The Python used by Codecademy is ostensibly 2.7, but (a) it was implemented by compiling the Python interpreter to JavaScript using Emscripten and (b) it's running in the browser; so between those factors, there could very well be some string encoding and decoding injected by Codecademy that isn't present in plain-vanilla CPython.
Note: I have not used Codecademy myself nor do I have any inside knowledge of its inner workings.
'u'
means its a unicode. You can also specifyraw_input().encode('utf8')
to convert to string.Edited: I checked in python 2.7 it returns byte string not unicode string. So problem is something else here.
Edited: raw_input() returns unicode if sys.stdin.encoding is unicode.
In codeacademy python environment, sys.stdin.encoding and sys.stdout.decoding both are none and default endcoding scheme is ascii.
Python will use this default encoding only if it is unable to find proper encoding scheme from environment.
raw_input()
returns Unicode strings in your environmentrepr()
is called for each item of a list if you print it (convert to string)repr()
) of a Unicode string is the same as Unicode literal in Python:u'abc'
.that is why
print [raw_input()]
may produce:[u'abc']
.You don't see
u''
in the first code example becausestr(unicode_string)
calls the equivalent ofunicode_string.encode(sys.getdefaultencoding())
i.e., it converts Unicode strings to bytestrings—don't do it unless you mean it.Can
raw_input()
returnunicode
?Yes:
Output:
The practical example is
win-unicode-console
package which can replaceraw_input()
to support entering Unicode characters outside of the range of a console codepage on Windows. Related: here's whysys.stdout
should be replaced.May
raw_input()
returnunicode
?Yes.
raw_input()
is documented to return a string:String in Python 2 is either a bytestring or Unicode string :
isinstance(s, basestring)
.CPython implementation of
raw_input()
supports Unicode strings explicitly:builtin_raw_input()
can callPyFile_GetLine()
andPyFile_GetLine()
considers bytestrings and Unicode strings to be strings—it raisesTypeError("object.readline() returned non-string")
otherwise.