For a poor man's implementation of near-collation-correct sorting on the client side I need a JavaScript function that does efficient single character replacement in a string.
Here is what I mean (note that this applies to German text, other languages sort differently):
native sorting gets it wrong: a b c o u z ä ö ü collation-correct would be: a ä b c o ö u ü z
Basically, I need all occurrences of "ä" of a given string replaced with "a" (and so on). This way the result of native sorting would be very close to what a user would expect (or what a database would return).
Other languages have facilities to do just that: Python supplies str.translate()
, in Perl there is tr/…/…/
, XPath has a function translate()
, ColdFusion has ReplaceList()
. But what about JavaScript?
Here is what I have right now.
// s would be a rather short string (something like
// 200 characters at max, most of the time much less)
function makeSortString(s) {
var translate = {
"ä": "a", "ö": "o", "ü": "u",
"Ä": "A", "Ö": "O", "Ü": "U" // probably more to come
};
var translate_re = /[öäüÖÄÜ]/g;
return ( s.replace(translate_re, function(match) {
return translate[match];
}) );
}
For starters, I don't like the fact that the regex is rebuilt every time I call the function. I guess a closure can help in this regard, but I don't seem to get the hang of it for some reason.
Can someone think of something more efficient?
Answers below fall in two categories:
- String replacement functions of varying degrees of completeness and efficiency (what I was originally asking about)
- A late mention of
String#localeCompare
, which is widely supported among JS engines and could solve this category of problem much more elegantly.
Answer os Crisalin is almost perfect. Just improved the performance to avoid create new RegExp on each run.
Usage:
I've solved it another way, if you like.
Here I used two arrays where searchChars containing which will be replaced and replaceChars containing desired characters.
I made a Prototype Version of this:
Use like:
This will will change the String to a_o_u_A_O_U_ss
The correct terminology for such accents is Diacritics. After Googling this term, I found this function which is part of
backbone.paginator
. It has a very complete collection of Diacritics and replaces them with their most intuitive ascii character. I found this to be the most complete Javascript solution available today.The full function for future reference:
I think this might work a little cleaner/better (though I haven't test it's performance):
Or if you are still too worried about performance, let's get the best of both worlds:
EDIT (by @Tomalak)
I appreciate the idea. However, there are several things wrong with the implementation, as outlined in the comment below.
Here is how I would implement it.
I can't speak to what you are trying to do specifically with the function itself, but if you don't like the regex being built every time, here are two solutions and some caveats about each.
Here is one way to do this:
This will obviously make the regex a property of the function itself. The only thing you may not like about this (or you may, I guess it depends) is that the regex can now be modified outside of the function's body. So, someone could do this to modify the interally-used regex:
So, there is that option.
One way to get a closure, and thus prevent someone from modifying the regex, would be to define this as an anonymous function assignment like this:
Hopefully this is useful to you.
UPDATE: It's early and I don't know why I didn't see the obvious before, but it might also be useful to put you
translate
object in a closure as well: