How can I get the UTF8 code of a char in Java ? I have the char 'a' and I want the value 97 I have the char 'é' and I want the value 233
here is a table for more values
I tried Character.getNumericValue(a)
but for a it gives me 10 and not 97, any idea why?
This seems very basic but any help would be appreciated!
There is an open source library MgntUtils that has a Utility class StringUnicodeEncoderDecoder. That class provides static methods that convert any String into Unicode sequence vise-versa. Very simple and useful. To convert String you just do:
For example a String "Hello World" will be converted into
"\u0048\u0065\u006c\u006c\u006f\u0020 \u0057\u006f\u0072\u006c\u0064"
It works with any language. Here is the link to the article that explains all te ditails about the library: MgntUtils. Look for the subtitle "String Unicode converter". The article gives you link to Maven Central where you can get artifacts and github where you can get the project itself. The library comes with well written javadoc and source code.
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
Your question is unclear. Do you want the Unicode codepoint for a particular character (which is the example you gave), or do you want to translate a Unicode codepoint into a UTF-8 byte sequence?
If the former, then I recommend the code charts at http://www.unicode.org/
If the latter, then the following program will do it:
(there's also an online Unicode to UTF8 page, but I don't have the URL on this machine)
char
is actually a numeric type containing the unicode value (UTF-16, to be exact - you need twochar
s to represent characters outside the BMP) of the character. You can do everything with it that you can do with anint
.Character.getNumericValue()
tries to interpret the character as a digit.My method to do it is something like this:
You can create a simple loop to list all the UTF-8 characters available like this: