my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform
could help:
-- configfile.cfg --
[rules]
Häjsan = 3
☃ = my snowman
-- myapp.py --
# -*- coding: utf-8 -*-
import ConfigParser
def _optionxform(s):
try:
newstr = s.decode('latin-1')
newstr = newstr.encode('utf-8')
return newstr
except Exception, e:
print e
cfg = ConfigParser.ConfigParser()
cfg.optionxform = _optionxform
cfg.read("myconfig")
Of course, when I read the config I get:
'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.
-- config.py --
# -*- coding: utf-8 -*-
apa = {'rules': [(u'Häjsan', 3), (u'☃', u'my snowman')]}
class DummyConfig(object):
def sections(self):
return apa.keys()
def items(self, section):
return apa[section]
def add_section(self, apa):
pass
def set(self, *args):
pass
Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()
!
The ConfigParser.readfp()
method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:
cfg.readfp(codecs.open("myconfig", "r", "utf8"))
For Python 3.2 or above, readfp()
is deprecated. Use read_file()
instead.
Try to overwrite the write
function in RawConfigParser()
like this:
class ConfigWithCoder(RawConfigParser):
def write(self, fp):
"""Write an .ini-format representation of the configuration state."""
if self._defaults:
fp.write("[%s]\n" % "DEFAULT")
for (key, value) in self._defaults.items():
fp.write("%s = %s\n" % (key, str(value).replace('\n', '\n\t')))
fp.write("\n")
for section in self._sections:
fp.write("[%s]\n" % section)
for (key, value) in self._sections[section].items():
if key == "__name__":
continue
if (value is not None) or (self._optcre == self.OPTCRE):
if type(value) == unicode:
value = ''.join(value).encode('utf-8')
else:
value = str(value)
value = value.replace('\n', '\n\t')
key = " = ".join((key, value))
fp.write("%s\n" % (key))
fp.write("\n")
The config module is broken when reading and writing unicode strings as values. I tried to fix it, but got caught up in the odd way the parser works.
Seems to be a problem with the ConfigParser version for python 2x, and version for 3x is free of this problem. In this issue of the Python Bug Tracker, the status is Closed + WONTFIX.
I've fixed it editing the ConfigParser.py file. In the write method (about the line 412), change:
key = " = ".join((key, str(value).replace('\n', '\n\t')))
by
key = " = ".join((key, str(value).decode('utf-8').replace('\n', '\n\t')))
I don't know if it's a real solution, but tested in Windows 7 and Ubuntu 15.04, works like a charm, and I can share and work with the same .ini file in both systems.
In python 3.2 encoding
parameter was introduced to read()
, so it can now be used as:
cfg.read("myconfig", encoding='utf-8')
what I did is just:
file_name = file_name.decode("utf-8")
cfg.read(file_name)