getting a string length that contains unicode char

2020-03-26 06:29发布

I’m using this character, double sharp '

3条回答
Bombasti
2楼-- · 2020-03-26 06:47
String.prototype.codes = function() { return [...this].length };
String.prototype.chars = function() {
    let GraphemeSplitter = require('grapheme-splitter');
    return (new GraphemeSplitter()).countGraphemes(this);
}

console.log("F                                                                    
查看更多
我想做一个坏孩纸
3楼-- · 2020-03-26 06:49

To sumarize my comments:

That's just the lenght of that string.

Some chars involve other chars as well, even if it looks like a single character. "̉mủt̉ả̉̉̉t̉ẻd̉W̉ỏ̉r̉̉d̉̉".length == 24

From this (great) blog post, they have a function that will return correct length:

function fancyCount(str){
  const joiner = "\u{200D}";
  const split = str.split(joiner);
  let count = 0;
    
  for(const s of split){
    //removing the variation selectors
    const num = Array.from(s.split(/[\ufe00-\ufe0f]/).join("")).length;
    count += num;
  }
    
  //assuming the joiners are used appropriately
  return count / split.length;
}

console.log(fancyCount("F                                                                    
查看更多
欢心
4楼-- · 2020-03-26 07:07

Javascript (and Java) strings use UTF-16 encoding.

Unicode codepoint U+0046 (F) is encoded in UTF-16 using 1 codeunit: 0x0046

Unicode codepoint U+1D12A (

查看更多
登录 后发表回答