How do I determine a word boundary in Unicode stre

2019-08-31 01:28发布

问题:

I'm reading a Unicode stream and would rather not have to pass the entire string through a regex. Is there a simple (reliable) character I can use to break words across languages?

My byte array is likely going to be based in UTF-16 or UTF-8

回答1:

If you are using Java then you can use the BreakIterator.