To be more precise, I need to know whether (and if possible, how) I can find whether a given string has double byte characters or not. Basically, I need to open a pop-up to display a given text which can contain double byte characters, like Chinese or Japanese. In this case, we need to adjust the window size than it would be for English or ASCII. Anyone has a clue?
相关问题
- Is there a limit to how many levels you can nest i
- How to toggle on Order in ReactJS
- void before promise syntax
- Keeping track of variable instances
- Can php detect if javascript is on or not?
Actually, all of the characters are Unicode, at least from the Javascript engine's perspective.
Unfortunately, the mere presence of characters in a particular Unicode range won't be enough to determine you need more space. There are a number of characters which take up roughly the same amount of space as other characters which have Unicode codepoints well above the ASCII range. Typographic quotes, characters with diacritics, certain punctuation symbols, and various currency symbols are outside of the low ASCII range and are allocated in quite disparate places on the Unicode basic multilingual plane.
Generally, projects that I've worked on elect to provide extra space for all languages, or sometimes use javascript to determine whether a window with auto-scrollbar css attributes actually has content with a height which would trigger a scrollbar or not.
If detecting the presence of, or count of, CJK characters will be adequate to determine you need a bit of extra space, you could construct a regex using the following ranges: [\u3300-\u9fff\uf900-\ufaff], and use that to extract a count of the number of characters that match. (This is a little excessively coarse, and misses all the non-BMP cases, probably excludes some other relevant ranges, and most likely includes some irrelevant characters, but it's a starting point).
Again, you're only going to be able to manage a rough heuristic without something along the lines of a full text rendering engine, because what you really want is something like GDI's MeasureString (or any other text rendering engine's equivalent). It's been a while since I've done so, but I think the closest HTML/DOM equivalent is setting a width on a div and requesting the height (cut and paste reuse, so apologies if this contains errors):
I used mikesamuel answer on this one. However I noticed perhaps because of this form that there should only be one escape slash before the
u
, e.g.\u
and not\\u
to make this work correctly.Works for me :)
Here is benchmark test: http://jsben.ch/NKjKd
This is much faster:
than this:
JavaScript holds text internally as UCS-2, which can encode a fairly extensive subset of Unicode.
But that's not really germane to your question. One solution might be to loop through the string and examine the character codes at each position:
This might not be as fast as you would like.
Why not let the window resize itself based on the runtime height/width?
Run something like this in your pop-up:
I have benchmarked the two functions in the top answers and thought I would share the results. Here is the test code I used:
When running this I got:
So for this particular string the regex solution is about 3 times faster.
However note that for a string where the first character is unicode,
isDoubleByte()
returns right away and so is much faster than the regex (which still has the overhead of the regular expression).For instance for the string
中国
, I got these results:To get the best of both world, it's probably better to combine both:
In that case, if the first character is Chinese (which is likely if the whole text is Chinese), the function will be fast and return right away. If not, it will run the regex, which is still faster than checking each character individually.