It's been a long day and I'm a bit stumped.
I'm reading a binary file that contains lots of wide-char strings and I want to dump these out as Python unicode strings. (To unpack the non-string data I'm using the struct module, but I don't how to do the same with the strings.)
For example, reading the word "Series":
myfile = open("test.lei", "rb")
myfile.seek(44)
data = myfile.read(12)
# data is now 'S\x00e\x00r\x00i\x00e\x00s\x00'
How can I encode that raw wide-char data as a Python string?
Edit: I'm using Python 2.6
>>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00'
>>> data.decode('utf-16')
u'Series'
If the string in question is known not to have any characters beyond FF, another possibility that generates a string rather than a unicode object, by eliding the zero-bytes:
>>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2]
'Series'
I also recommend to use rstrip
with '\x00'
after decode - to remove all '\x00'
trailing characters, unless, of course, they are not needed.
>>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00'
>>> print '"%s"' % data.decode('utf-16').rstrip('\x00')
>>> "Some Data"
Without rstrip('\x00')
the result will be with trailing spaces:
>>> "Some Data "
Hmm, why do you say "open" is preferrable to "file"? I see in the reference (python 2.5):
3.9 File Objects File objects are implemented using C's stdio package
and can be created with the built-in
constructor file() described in
section 2.1, ``Built-in
Functions.''3.6 ----- Footnote (3.6) file() is new in Python 2.2. The older built-in open() is an alias for file().