My code is
f = codecs.open(r'C:\Users\Admin\Desktop\nepali.txt', 'r', 'UTF-8')
nepali = f.read().split()
for i in nepali:
print i
Display the words in file:
यो
किताब
टेबुल
मा
छ
यो
एक
किताब
हो
केटा
But when I try to create a list of the words with code:
file=codecs.open(r'C:\Users\Admin\Desktop\nepali.txt', 'r', 'UTF-8')
nepali = list(file.read().split())
print nepali
The output now is displayed like this
[u'\ufeff\u092f\u094b', u'\u0915\u093f\u0924\u093e\u092c', u'\u091f\u0947\u092c\u0941\u0932', u'\u092e\u093e', u'\u091b', u'\u092f\u094b', u'\u090f\u0915', u'\u0915\u093f\u0924\u093e\u092c', u'\u0939\u094b',]
The output should look like:
[यो, किताब, टेबुल, मा, छ,यो, एक, किताब, हो]
You are looking at the output of the
repr()
function, which is always used for displaying the contents of containers. The output is meant for debugging, not end-user displays; any non-printable non-ASCII codepoint is represented by an escape sequence (which can, depending on the codepoint, be a single character escape like\t
or\n
, or use 2, 4, or 8 hex digits, like\xe5
,\u2603
or\U0001f4e2
).You'll have to produce the output manually:
This produces a unicode string formatted to look like a list object, but without using
repr()
, simply by adding square brackets around the strings, joined with', '
(comma and space).Demo:
However, if you want to show this to an end-user, why use the square brackets at all?