sqlite remove non utf-8 characters

2019-02-25 16:56发布

I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized.

Here is the error I get:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...

Thanks for the help

2条回答
等我变得足够好
2楼-- · 2019-02-25 17:16

django.utils.encoding has a greate set of robust unicode encoding and decoding functions.

查看更多
仙女界的扛把子
3楼-- · 2019-02-25 17:18

Well, if you really want to shoehorn a rich unicode string into a plain ascii string (and don't mind some goofs), you could use this:

import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
    # This removes accents, but also other things, like ß‘’“”
    return ud.normalize('NFKD', s).encode('ascii','ignore')

For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode), see this answer.

Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary.

查看更多
登录 后发表回答