I have been given some HTML files that use the Mac OS Roman file encoding. The files have French text, but in an editor many of the diacritical chars look strange (i.e. non French)
Si cette option est sÈlectionnÈe, <removed> tentera de communiquer avec votre tÈlescope seulement ‡ líaide díun ...
The capital E with accent does display properly in the browser as é as do the other strange characters.
I also have some UTF-8 French files that look normal in an editor (é looks like é). What I'd like to do is convert all the Mac Roman files to UTF-8 for easier maintenance.
Simply changing the file encoding in the editor doesn't do this. The strange characters are still strange.
Short of making a conversion dictionary and doing a Find/Replace on all the files, is there a way to do this?
To actually answer the question "Converting Mac Roman character to equivalent UTF-8"
Convert the encoding of the file from Mac OS Roman to UTF-8:
C++ code:
If your editor isn’t showing it correctly when you specify the encoding, you have given it the wrong encoding. You need to figure what encoding you really have.
You appear to have a byte valued 0xE9 where you need a Unicode
LATIN SMALL LETTER E WITH ACUTE
character. A MacRoman 0xE9 byte is aLATIN CAPITAL LETTER E WITH GRAVE
character, which is what your editor is displaying because you said it was MacRoman. But it is not.However, Unicode code point U+00E9 is indeed
LATIN SMALL LETTER E WITH ACUTE
.Therefore, it is not MacRoman that you have there, but almost certainly ISO-8859-1 or ISO-8859-15.
So use something like
to do the conversion.
To convert lots of old java code files in directory tree, this worked for me. Observe that the command will change files recursively in all directories from where you did cd into. Make sure you are positioned in the right directory and that you have a backup of your files, and computer, first. When you know what you are doing, correct the rm statement. Hope this can help somebody, took me hours to correct the tiny details to get this working.: