Efficently Replacement of all unsupported chars in

Possible Duplicate:
Converting Symbols, Accent Letters to English Alphabet

I need to replace all accented characters, such as

"à", "é", "ì", "ò", "ù"

with

"a'", "e'", "i'", "o'", "u'"...

because of an issue with reloading nested strings with accented characters after they've been saved.

Is there a way to do this without using different string replacement for all chars?

For example, I would prefer to avoid doing

text  = text.replace("a", "a'");
text2 = text.replace("è", "e'");
text3 = text2.replace("ì", "i'");
text4 = text3.replace("ò", "o'");
text5 = text4.replace("ù", "u'");

etc.

标签： java android string replace

4条回答

虎瘦雄心在

2楼-- · 2019-08-05 05:39

I tried this from this post it seems to work.

String str= Normalizer.normalize(str, Normalizer.Form.NFD);
str= str.replaceAll("\\p{InCombiningDiacriticalMarks}+", "'");

Edit: But replacing the Combining diacritical marks, has a side effect that you cannot distinguish between À Á Â

0人赞添加讨论(0) 举报

乱世女痞

3楼-- · 2019-08-05 05:43

After reading the comments in the main approach, I think a better option would be fix the problem - which is encoding related? - and not try to cover up the symptoms.

Also, this still requires a manual explicit mapping, which might make it less ideal than nandeesh's answer with a regular expression unicode character class.

Here is a skeleton for code to perform the mapping. It is slightly more complicated than a char-char.

This code tries to avoid extra Strings. It may or not be "more efficient". Try it with the real data/usage. YMMV.

String mapAccentChar (char ch) {
    switch (ch) {
        case 'à': return "a'";
        // etc
    }
    return null;
}

String mapAccents (String input) {
  StringBuilder sb = new StringBuilder();
  int l = input.length();
  for (int i = 0; i < l; i++) {
    char ch = input.charAt(i);
    String mapped = mapAccentChar(ch);
    if (mapped != null) {
      sb.append(mapped);
    } else {
      sb.append(ch);
  }
  return sb.toString();
}

0人赞添加讨论(0) 举报

该账号已被封号

4楼-- · 2019-08-05 05:52

If you don't mind adding commons-lang as a dependency, try StringUtils.replaceEach I believe the following perform the same task:

import org.apache.commons.lang.StringUtils;

public class ReplaceEachTest
{
   public static void main(String [] args)
   {
      String text = "àéìòùàéìòù";
      String [] searchList = {"à", "é", "ì", "ò", "ù"};
      String [] replaceList = {"a'", "e'", "i'", "o'", "u'"};
      String newtext = StringUtils.replaceEach(text, searchList, replaceList);
      System.out.println(newtext);
   }
}

This example prints a'e'i'o'u'a'e'i'o'u' However in general I agree that since you are creating a custom character translation, you will need a solution where your explicitly specify the replacement for each character of interest.

My previous answer using replaceChars is no good because it only handles one-to-one character replacement.

0人赞添加讨论(0) 举报

淡お忘

5楼-- · 2019-08-05 05:53

Since there is no strict correlation between ASCII value of a char and its accented version, your replacement seems to me the most straightforward way.

0人赞添加讨论(0) 举报

Efficently Replacement of all unsupported chars in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间