I'm trying to sort a List of objects by String field "country". Each country is in it's native language
- Argentina
- Australia
- Österreich
- Ελλάδα
- България ...
What I want to do is to get "България" for instance, to appear after "A*" countries, as letter 'Б' corresponds to latin 'B'. I'm trying to use default Collater but non-latin names still end up last in list.
Here's my code so far:
private static final Comparator<DomainTO> DOMAIN_COUNTRY_COMPARATOR =
new Comparator<DomainTO>() {
@Override
public int compare(DomainTO t, DomainTO t1) {
Collator defaultCollator = Collator.getInstance();
return defaultCollator.compare(t.getCountry(), t1.getCountry());
}
};
Perhaps you can compare the normalized Strings. Something like this:
See related question about normalizing: Converting Java String to ascii (this question is linked to several similar questions)
How to sort words from different languages? There are many alphabets (English, Russian, German etc). Everyone has ordered list of letters. It is easy to sort words coming from one alphabet. But is it possible to merge all these alphabets into one?
I think it is not possible to do it in a way that could be accepted by everyone. As an example take English and Russian alphabets. Russian letters can be casted to English letters (at least most of them) but after this casting they would change the order. This would be favoring one alphabet over another. Why not to cast English letters to Russian?
Another issue is that there are special letters. In German there is Ö between O and P and in Polish there is Ó in this place. So we have following relations:
But what is the relation between Ö and Ó? If there was a country Ósterreich should it be befor or after Österreich? So there is impossible to define universal rules of sorting words from different languages.
All we can do is casting all alphabets to the chosen one. And this is what OP is trying to do.
The chosen one is Latin alphabet and other alphabets have to be casted to this one. The problem is that this casting is often ambiguous. Easily we can only cast most of Russian or Greek letters.
Much bigger problem is with Arabic or Asian languages. And we should remember that when casting from one alphabet to another we often lose something.
So how can we do such sorting?
Then we could sort by latin name and display names.
Code:
This way we converted all letters from Russian alphabet. Now we have to add similar code for other alphabets. And Russian was the simplest one.
But assume that we succeeded and we managed to do such sorting of words from all languages of the world.
But what are the consequences of making such sorting? Before we answer this question lets ask what were the intentions of doing this. OP didn't say his reasons of doing such sorting. But we can deduce it:
So let's answer the question: Is this sorting makes it easier to find specific country to man who only knows his native language?
But if we did sorting for all these names then there could be a problem. If I saw ايران then I would not be able to decide if go up or down the list. So in this example sorting is not helpful. Worse scenario is when I encounter Волгоград on the list. I don't know Russian alphabet and I would assume that I am near 'B' letter when in fact I am close to the end of the list. Then I would chose the wrong direction.
Summary:
Sorting country names written in different languages is difficult to define and implement. And when implemented it would be either not-helpful or harmful.