ruby `encode': “\\xC3” from ASCII-8BIT to UTF-

2019-01-19 10:16发布

问题:

Hannibal episodes in tvdb have weird characters in them.

For example:

Œuf

So ruby spits out:

./manifesto.rb:19:in `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
    from ./manifesto.rb:19:in `to_json'
    from ./manifesto.rb:19:in `<main>'

Line 19 is:

puts @tree.to_json

Is there a way to deal with these non utf characters? I'd rather not replace them, but convert them? Or ignore them? I don't know, any help appreciated.

Weird part is that script works fine via cron. Manually running it creates error.

回答1:

It seems you should use another encoding for the object. You should set the proper codepage to the variable @tree, for instance, using iso-8859-1 instead of ascii-8bit by using @tree.force_encoding('ISO-8859-1'). Because ASCII-8BIT is used just for binary files.

To find the current external encoding for ruby, issue:

Encoding.default_external

If sudo solves the problem, the problem was in default codepage (encoding), so to resolve it you have to set the proper default codepage (encoding), by either:

  1. In ruby to change encoding to utf-8 or another proper one, do as follows:

    Encoding.default_external = Encoding::UTF_8
    
  2. In bash, grep current valid set up:

    $ sudo env|grep UTF-8
    LC_ALL=ru_RU.UTF-8
    LANG=ru_RU.UTF-8
    

    Then set them in .bashrc properly, in a similar way, but not exactly with ru_RU language, such as the following:

    export LC_ALL=ru_RU.UTF-8
    export LANG=ru_RU.UTF-8
    


回答2:

File.open(yml_file, 'w') should be change to File.open(yml_file, 'wb')



回答3:

I just suffered through a number of hours trying to fix a similar problem. I'd checked my locales, database encoding, everything I could think of and was still getting ASCII-8BIT encoded data from the database.

Well, it turns out that if you store text in a binary field, it will automatically be returned as ASCII-8BIT encoded text, which makes sense, however this can (obviously) cause problems in your application.

It can be fixed by changing the column encoding back to :text in your migrations.