I need to parse UTF-8 input (from a text file) character by character (and by character I mean full UTF-8 character (UTF-8 code point), not Java's char).
What approach should I use?
I need to parse UTF-8 input (from a text file) character by character (and by character I mean full UTF-8 character (UTF-8 code point), not Java's char).
What approach should I use?
Since Java 8 there's CharSequence.codePoints()
For example:
You can do this easily with an InputStreamReader by using the read() method. The read method will return an int which is a code point. Check out more here: http://docs.oracle.com/javase/tutorial/i18n/text/stream.html