I have a problem on trimming whitespaces in Chinese characters. I tried to log the content and here is how it looks like:
When displaying it in textview, it does display Chinese characters but the problem is the whitespace before and after the string text. Can someone help me to encode/decode this? thanks in advance.
EDIT 1 : Added screenshot of result.
EDIT 2 : Added content charset in response.
HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);
but I still get the square characters when logging and when displaying in XML layout, the square characters become whitespaces.
EDIT 3 : Added my working solution.
private String removeWhiteSpace(String oldString) {
String newString = null;
if (oldString.length() > 0) {
Character c = oldString.charAt(0);
boolean isWhiteSpace = Character.isWhitespace(c);
if (isWhiteSpace) {
newString = oldString.replace(c, ' ');
} else {
newString = oldString;
}
newString = newString.trim();
}
return newString;
}
Chinese and Japanese don't use the regular space character ' '. The languages use their own that is the same width as the characters. This is the character here ' ', you should write a manual trim function to check for that character at the beginning and end of the string.
You may be able to directly use the character if you convert your code file to unicode (if java will allow). Otherwise you will need to find the unicode character code for ' ' and check if the character code is at the beginning or end of the string.
The following link tells us that the ideographic space is 0xe38080 in UTF-8 and 0x3000 in UTF-16, and that Java's Character.isSpaceChar() function will return true. I would have thought String.trim() would have used this property to determine whether or not to trim though.
http://www.fileformat.info/info/unicode/char/3000/index.htm
You can use Googles Guava library for this;
you can refer more about this here:
How to properly trim whitespaces from a string in Java?
To trim whitespaces in unicode which is having 2 byte use string replace.
replace 2byte space with 1byte space. 0x3000 is the hexadecimal value of unicode IDEOGRAPHIC SPACE