I have a weird behaviour in my params whichare passed as utf-8 but the special characters are not well managed. Instead of 1 special character, I have 2 characters: the normal letter + the accent.
Parameters: {"name"=>"Mylène.png", "_cardbiz_session"=>"be1d5b7a2f27c7c4979ac4c16fe8fc82", "authenticity_token"=>"9vmJ02DjgKYCpoBNUcWwUlpxDXA8ddcoALHXyT6wrnM=", "asset"=>{"file"=># < ActionDispatch::Http::UploadedFile:0x007f94d38d37d0 @original_filename="Mylène.png", @content_type="image/png", @headers="Content-Disposition: form-data; name=\"asset[file]\"; filename=\"Myle\xCC\x80ne.png\"\r\nContent-Type: image/png\r\n", @tempfile=# < File:/var/folders/q5/yvy_v9bn5wl_s5ccy_35qsmw0000gn/T/RackMultipart20130805-51100-1eh07dp > >}, "id"=>"copie-de-sm"}
I log this:
- logger.debug file_name
- logger.debug file_name.chars.map(&:to_s).inspect
Each time, same result:
- Mylène
- ["M", "y", "l", "e", "̀", "n", "e"]
As i try to use the filename as a matcher with already existing names properly encoded utf-8, you see my problem ;)
- Encodings are utf-8 everywhere.
- working under ruby 1.9.3 and rails 3.2.14.
- Added #encoding: utf-8 in top of any file involved.
I anyone as an idea, take it !
I also published an Issue here : https://github.com/carrierwaveuploader/carrierwave/issues/1185 but not sure if its a carrierwave issue or me missing something...
Seems to be linked to MACOSX.
https://www.ruby-forum.com/topic/4407424 explains it and refers to https://bugs.ruby-lang.org/issues/7267 for more details and discution.
MACOSX decomposing special characters into utf8-mac instead of utf-8...
While you can't know the encoding of a file name, just presupose it.
Thanks to our Linux guy where it works properly. ;)
Perhaps you have a Combining character and a problem with Unicode equivalence
When I check the codepoints with:
I get
Myl\u00E8ne.png
, but I think that's a conversion problem when I copy the text. It would be helpfull, if you can provide a file with the raw data.I expect you have a combining grave accent and a
e
The solution would be a Unicode normalization. (Sorry, I don't know how to do it with ruby. Perhaps somebody else has an answer for it).
You found your problem, so this is not needed any longer for you.
But in meantime I found a mechanism to normalize Unicode strings:
Maybe there is an easier way, but up to now I found none.