I've got to send text to a printservice, which only accepts certain types of special characters, i.e. ï. My client somehow inputs text in such a way that the letters look the same, but have a different underlying unicode symbol, and are thereby not processed correctly by the printservice. Example:
Mine: ï (unicode \u00EF)
Theirs: ï (unicode \u0069\u0308), copy pasting the 2 symbols in chrome bar for example, will show that it actually looks the same in textarea's)
How can I convert all special characters from "their style" to "my style" (dutch keyboard layout on Windows)? I guess this has something to do with OS or keyboard layouts, but I cannot find a list stating the differences, or anything related to this issue. Does someone has a suggestion how to proceed?
As correctly pointed out in the comments, there are two ways (or "normalization forms") to represent accented characters in unicode:
\u00EF == ï
)i + ¨ == i + \u0308 == ï
)ES6 adds a dedicated function, which converts strings between normalization forms :
String.normalize
.If your system doesn't support
normalize
yet, look around for shims.\u00EF is ï or the Latin Small Letter I with Diaeresis (and \u0020 is the Space character)
\u0069\u0308 is the Latin Small Letter I followed by the Combining Diaeresis
Normalization is needed to transform the second, two-character sequence into the first. You will need to find some utility to perform this normalization before you send to your print service.
See JavaScript Unicode normalization for some options.