ConfigParser with Unicode items

2019-03-18 00:16发布

问题:

my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform could help:

-- configfile.cfg -- 
[rules]
Häjsan = 3
☃ = my snowman

-- myapp.py --
# -*- coding: utf-8 -*-  
import ConfigParser

def _optionxform(s):
    try:
        newstr = s.decode('latin-1')
        newstr = newstr.encode('utf-8')
        return newstr
    except Exception, e:
        print e

cfg = ConfigParser.ConfigParser()
cfg.optionxform = _optionxform    
cfg.read("myconfig") 

Of course, when I read the config I get:

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.

-- config.py --
# -*- coding: utf-8 -*-                
apa = {'rules': [(u'Häjsan', 3), (u'☃', u'my snowman')]}

class DummyConfig(object):
    def sections(self):
        return apa.keys()
    def items(self, section):
       return apa[section]
    def add_section(self, apa):
        pass  
    def set(self, *args):
        pass  

Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()!

回答1:

The ConfigParser.readfp() method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:

cfg.readfp(codecs.open("myconfig", "r", "utf8"))

For Python 3.2 or above, readfp() is deprecated. Use read_file() instead.



回答2:

Try to overwrite the write function in RawConfigParser() like this:

class ConfigWithCoder(RawConfigParser):
def write(self, fp):
    """Write an .ini-format representation of the configuration state."""
    if self._defaults:
        fp.write("[%s]\n" % "DEFAULT")
        for (key, value) in self._defaults.items():
            fp.write("%s = %s\n" % (key, str(value).replace('\n', '\n\t')))
        fp.write("\n")
    for section in self._sections:
        fp.write("[%s]\n" % section)
        for (key, value) in self._sections[section].items():
            if key == "__name__":
                continue
            if (value is not None) or (self._optcre == self.OPTCRE):
                if type(value) == unicode:
                    value = ''.join(value).encode('utf-8')
                else:
                    value = str(value)
                value = value.replace('\n', '\n\t')
                key = " = ".join((key, value))
            fp.write("%s\n" % (key))
        fp.write("\n")


回答3:

The config module is broken when reading and writing unicode strings as values. I tried to fix it, but got caught up in the odd way the parser works.



回答4:

Seems to be a problem with the ConfigParser version for python 2x, and version for 3x is free of this problem. In this issue of the Python Bug Tracker, the status is Closed + WONTFIX.

I've fixed it editing the ConfigParser.py file. In the write method (about the line 412), change:

key = " = ".join((key, str(value).replace('\n', '\n\t')))

by

key = " = ".join((key, str(value).decode('utf-8').replace('\n', '\n\t')))

I don't know if it's a real solution, but tested in Windows 7 and Ubuntu 15.04, works like a charm, and I can share and work with the same .ini file in both systems.



回答5:

In python 3.2 encoding parameter was introduced to read(), so it can now be used as:

cfg.read("myconfig", encoding='utf-8')


回答6:

what I did is just:

file_name = file_name.decode("utf-8")
cfg.read(file_name)