I want to port a rails app from Ruby 1.8.7 to 1.9.2. Some of the files contain umlauts like ä/ö/ü both within strings and comments. The files were saved as UTF-8 but without a BOM (byte order mark) at the beginning.
As you might know, Ruby 1.9 refuses to parse these files, giving an invalid multibyte char (US-ASCII)
I was googling and reading a lot but the only solution to this seems to be to
- insert a BOM or
- insert
# coding: utf-8
at the beginning of each file.
My editor of choice (gEdit) doesn't seem to insert a BOM. I also read that having a BOM is bad practice because it may break some editors, it also breaks shell scripts if you want to use the shebang notation.
EDIT: The BOM breaks the Ruby 1.8.7 parser, giving a syntax error, unexpected kEND, expecting $end (SyntaxError)
for the file!
I tried forcing the external encoding with ruby -Eutf-8:utf-8 but this seems to be ignored when calling rake (I tried: /home/malte/.rvm/gems/ruby-1.9.2-p180/bin/rake test).
So my question is:
As RVM is building ruby 1.9 from source anyway, is there a build option or a patch to change the default encoding from US-ASCII to UTF-8?
I took a quick look at the source code but couldn't find the line where the default is set (I'm no C expert, tough).