Invalid byte sequence in UTF-8 (ArgumentError)

2019-04-19 19:28发布

问题:

I'm trying to run a Ruby script, and always getting an error on this line:

file_content.gsub(/dr/i,'med')

Where I'm trying to replace "dr" by "med".

The error is:

program.rb:4:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)

Why is that, how can I fix this issue?

I'm working on a MAC OS X Yosemite machine, with Ruby 2.2.1p85.

回答1:

Probably your string is not in UTF-8 format, so use

if ! file_content.valid_encoding?
  s = file_content.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
  s.gsub(/dr/i,'med')
end

See "Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8".