Trying to split this string "主楼怎么走" into separate characters (I need an array) using mb_split with no luck... Any suggestions?
Thank you!
Trying to split this string "主楼怎么走" into separate characters (I need an array) using mb_split with no luck... Any suggestions?
Thank you!
try a regular expression with 'u' option, for example
An ugly way to do it is:
You should also try your way with mb_split with setting the internal_encoding before it.
You can use grapheme functions (PHP 5.3 or intl 1.0) and IntlBreakIterator (PHP 5.5 or intl 3.0). The following code shows the diffrence among intl and mbstring and PCRE functions.
When working on production environment, you need to replace invalid byte sequence with the substitute character since almost all grapheme and mbstring functions can't handle invalid byte sequence. If you have an interest, see my past answer: https://stackoverflow.com/a/13695364/531320
If you don't take of perfomance, htmlspecialchars and htmlspecialchars_decode can be used. The merit of this way is supporting various encoding other than UTF-8.
If you want to learn the specification of UTF-8, the byte manipulation is the good way to practice.
The result of benchmark between these functions is here.
The benchmark code is here.
Assuming you have set the desired encoding and regular expression encoding for the MB functions (such as to UTF-8), you could use my method from my String class library.
By wrapping the
mb_split()
function in a method, I make it much easier to use. Simply invoke it with the desired value for the variable$pattern
.Remember, set the character encoding appropriately for your task.
In the case of my wrapper method, supply the empty string to the method like so.
In the direct PHP case ...