Sorting String with non-western characters

2019-01-23 20:24发布

I wanted to print sorted Polish names of all available languages.

import java.util.*;

public class Tmp
{
  public static void main(String... args)
  {
    Locale.setDefault(new Locale("pl","PL"));
    Locale[] locales = Locale.getAvailableLocales();
    ArrayList<String> langs = new ArrayList<String>();
    for(Locale loc: locales) {
      String  lng = loc.getDisplayLanguage();
      if(!lng.trim().equals("") && ! langs.contains(lng)){
        langs.add(lng);
      }
    }
    Collections.sort(langs);
    for(String str: langs){
      System.out.println(str);
    }
  }
}

Unfortunately I have issue with the sorting part. The output is:

:
:
kataloński
koreański
litewski
macedoński
:
:
węgierski
włoski
łotewski

Unfortunately in Polish ł comes after l and before m so the output should be:

:
:
kataloński
koreański
litewski
łotewski
macedoński
:
:
węgierski
włoski

How can I accomplish that? Is there an universal non-language-dependent method (say I now want to display this and sort in another language with another sorting rules).

5条回答
【Aperson】
2楼-- · 2019-01-23 20:54

Unfortunately in Polish ł comes after l and before m so the output should be:

You can define your own Compararable or Comparator interface.

Or also this might help you:

查看更多
欢心
3楼-- · 2019-01-23 21:03

You should pass a Collator to the sort method:

// sort according to default locale
Collections.sort(langs, Collator.getInstance());

The default sort order is defined by the Unicode codepoints in the string, and that's not the correct alphabetical order in any language.

查看更多
孤傲高冷的网名
4楼-- · 2019-01-23 21:03

try

Collections.sort(langs, Collator.getInstance(new Locale("pl", "PL")));

it will produce

...
litewski
łotewski
...

see Collator API for details

查看更多
家丑人穷心不美
5楼-- · 2019-01-23 21:03

I'am dealing with the same problem. I found that the local collector solution works fine for android 7.0, but does not on earlier android versions. I've implemented the following algorithm. It is pretty fast ( I sort more than 3000 strings) and does it on earlier android versions too.

public class SortBasedOnName implements Comparator {

    private Map<Character, Integer> myCharMap;
    private final static Map<Character, Integer>myPolCharTable = new HashMap<Character, Integer>();
    static {
        myPolCharTable.put(' ',0x0020);
        myPolCharTable.put('!',0x0021);
        myPolCharTable.put('"',0x0022);


        myPolCharTable.put('a',0x0040);
        myPolCharTable.put('ą',0x0041);
        myPolCharTable.put('b',0x0042);
        myPolCharTable.put('c',0x0043);
        myPolCharTable.put('ć',0x0044);


        myPolCharTable.put('{',0x0066);
        myPolCharTable.put('|',0x0067);
        myPolCharTable.put('}',0x0068);
    }

    public SortBasedOnName() {}

    public int compare(Object o1, Object o2) {

        Dictionary dd1 = (Dictionary) o1;
        Dictionary dd2 = (Dictionary) o2;

    return strCompareWithDiacritics(dd1.getOriginal(), dd2.getOriginal());
    }

    private  int strCompareWithDiacritics(String s1, String s2) {

        int i = 0;
        int result = 0;
        int length =0;

        s1 = s1.toLowerCase();
        s2 = s2.toLowerCase();
        if (s1.length() > s2.length()) {
            result = 1;
            length = s2.length();
        } else if (s1.length() < s2.length()) {
            result = -1;
            length = s1.length();
        } else if (s1.length() == s2.length()) {
            result = 0;
            length = s1.length();
        }

        try {
            while (i <length) {
                if (myPolCharTable.get(s1.charAt(i)) > myPolCharTable.get(s2.charAt(i))) {
                    result = 1;
                    break;
                } else if (myPolCharTable.get(s1.charAt(i)) < myPolCharTable.get(s2.charAt(i))) {
                    result = -1;
                    break;
                }
                i++;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;
    }
}
查看更多
该账号已被封号
6楼-- · 2019-01-23 21:07

Have a look at java.text.Collator.newInstance(Locale). You need to supply the Polish locale in your case. Collators implement the Comparator interface, so you can use that in sort APIs and in sorted datastructures like TreeSet.

查看更多
登录 后发表回答