Ruby 1.9 - Invalid multibyte character (utf-8)

I have a ruby file with only these two lines:

# encoding: utf-8
puts "—"

When I run it with ruby test_enc.rb it fails with:

test_enc.rb:2: invalid multibyte char (UTF-8)
test_enc.rb:2: unterminated string meets end of file

I don't know how to properly specify the character code of — (emdash), but vim tells me it is 151, Hex 97, Octal 227. It fails the same way with other characters like ã as well, so I doubt it is related specifically to that character. I am running on Windows XP and the version of ruby I'm using is:

ruby 1.9.1p430 (2010-08-16 revision 28998) [i386-mingw32]

I feel like there is something very obvious I am missing here. Any ideas?

EDIT: Learned a valuable lesson about assumptions today - specifically assuming your editor IS using UTF-8 without actually checking it. Oops!

Thanks for the quick and accurate replies all!

EDIT AGAIN: The 'setting up vim properly for utf-8' grew too big and wasn't really relevant to this question, so it is now a separate question.

标签： ruby vim encoding utf-8 character-encoding

2条回答

欢心

2楼-- · 2019-05-31 10:54

Your file is in latin1. Ruby is right.

emdash would be encoded on two bytes not one in UTF-8.

0人赞添加讨论(0) 举报

老娘就宠你

3楼-- · 2019-05-31 11:08

Given that Ruby is explicitly calling your attention to UTF-8, I strongly suspect that you haven't actually written out a UTF-8 file to start with. Make sure that Vim (or whatever text editor you're using to create the file) is really set to write out UTF-8.

Note that in UTF-8, any non-ASCII character will be represented by multiple bytes, not a single byte as you've described from the Vim diagnostics. I'd recommend using a binary file editor (or dump, or whatever) to really show what's in the text file though. Something that doesn't already have some preconceived notion of the encoding - something that isn't even trying to think of it as a text file.

Notepad lets you write out a file in UTF-8, so you might want to try that just to see what happens. (I don't have Ruby installed myself, otherwise I'd try it for you.)

0人赞添加讨论(0) 举报

Ruby 1.9 - Invalid multibyte character (utf-8)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间