How many bytes in a JavaScript string?-第2页回答

I have a javascript string which is about 500K when being sent from the server in UTF-8. How can I tell its size in JavaScript?

I know that JavaScript uses UCS-2, so does that mean 2 bytes per character. However, does it depend on the JavaScript implementation? Or on the page encoding or maybe content-type?

标签： javascript string size byte

12条回答

女痞

2楼-- · 2019-01-04 09:20

If you're using node.js, there is a simpler solution using buffers :

function getBinarySize(string) {
    return Buffer.byteLength(string, 'utf8');
}

There is a npm lib for that : https://www.npmjs.org/package/utf8-binary-cutter (from yours faithfully)

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-01-04 09:24

You can use the Blob to get the string size in bytes.

Examples:

console.info(
  new Blob(['


             
            
                                  
            
            
            
            
            
            叛逆                          
            
             
             4楼-- · 2019-01-04 09:26
             
             
             
                          
             
                                                                          
The size of a JavaScript string is


Pre-ES6:  2 bytes per character 
ES6 and later: 2 bytes per character,
or 5 or more bytes per character





Pre-ES6

Always 2 bytes per character. UTF-16 is not allowed because the spec says "values must be 16-bit unsigned integers". Since UTF-16 strings can use 3 or 4 byte characters, it would violate 2 byte requirement. Crucially, while UTF-16 cannot be fully supported, the standard does require that the two byte characters used are valid UTF-16 characters. In other words, Pre-ES6 JavaScript strings support a subset of UTF-16 characters.



ES6 and later

2 bytes per character, or 5 or more bytes per character.  The additional sizes come into play because ES6 (ECMAScript 6) adds support for Unicode code point escapes. Using a unicode escape looks like this: \u{1D306}

Practical notes  


This doesn't relate to the internal implemention of a particular engine. For
example, some engines use data structures and libraries with full
UTF-16 support, but what they provide externally doesn't have to be
full UTF-16 support. Also an engine may provide external UTF-16
support as well but is not mandated to do so.
For ES6, practically speaking characters will never be more than 5
bytes long (2 bytes for the escape point + 3 bytes for the Unicode
code point) because the latest version of Unicode only has 136,755
possible characters, which fits easily into 3 bytes.  However this is
technically not limited by the standard so in principal a single
character could use say, 4 bytes for the code point and 6 bytes
total.
Most of the code examples here for calculating byte size don't seem to take into account ES6 Unicode code point escapes, so the results could be incorrect in some cases.

    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
             
            
                                  
            
            
            
            
            
            兄弟一词,经得起流年.                          
            
             
             5楼-- · 2019-01-04 09:26
             
             
             
                          
             
                                                                          
You can try this:

  var b = str.match(/[^\x00-\xff]/g);
  return (str.length + (!b ? 0: b.length)); 


It worked for me.
    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
             
            
                                  
            
            
            
            
            
            趁早两清                          
            
             
             6楼-- · 2019-01-04 09:27
             
             
             
                          
             
                                                                          
Note that if you're targeting node.js you can use Buffer.from(string).length:



var str = "\u2620"; // => "☠"
str.length; // => 1 (character)
Buffer.from(str).length // => 3 (bytes)

    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
             
            
                                  
            
            
            
            
            
            手持菜刀，她持情操                          
            
             
             7楼-- · 2019-01-04 09:27
             
             
             
                          
             
                                                                          
The answer from Lauri Oherd works well for most strings seen in the wild, but will fail if the string contains lone characters in the surrogate pair range, 0xD800 to 0xDFFF. E.g.

byteCount(String.fromCharCode(55555))
// URIError: URI malformed


This longer function should handle all strings:

function bytes (str) {
  var bytes=0, len=str.length, codePoint, next, i;

  for (i=0; i < len; i++) {
    codePoint = str.charCodeAt(i);

    // Lone surrogates cannot be passed to encodeURI
    if (codePoint >= 0xD800 && codePoint < 0xE000) {
      if (codePoint < 0xDC00 && i + 1 < len) {
        next = str.charCodeAt(i + 1);

        if (next >= 0xDC00 && next < 0xE000) {
          bytes += 4;
          i++;
          continue;
        }
      }
    }

    bytes += (codePoint < 0x80 ? 1 : (codePoint < 0x800 ? 2 : 3));
  }

  return bytes;
}


E.g.

bytes(String.fromCharCode(55555))
// 3


It will correctly calculate the size for strings containing surrogate pairs:

bytes(String.fromCharCode(55555, 57000))
// 4 (not 6)


The results can be compared with Node's built-in function Buffer.byteLength:

Buffer.byteLength(String.fromCharCode(55555), 'utf8')
// 3

Buffer.byteLength(String.fromCharCode(55555, 57000), 'utf8')
// 4 (not 6)

    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
   上一页
1
2


     
                      登录 后发表回答



   
   
   
  
   相关问题
      
    
    
   
   

     


   
   Is there a limit to how many levels you can nest i   

   



     


   
   How to toggle on Order in ReactJS   

   



     


   
   void before promise syntax   

   



     


   
   Keeping track of variable instances   

   



     


   
   how to split a list into a given number of sub-lis   

   



        
      
    查看全部
   
   
  
   相关文章
 
   
   

     


   
   实时推送的大数据量（通过websocket)，造成页面数据加载比较慢，应该怎么改善？   

     


   
   反爬能检测到JS模拟的键盘输入吗   

     


   
   VUE的v-for中深入响应式原理的问题   

     


   
   做一个留言板，求动态修改数据文件的思路？   

     


   
   vue的data()中的值能否递归调用   

     


   
   浅拷贝的问题   

     


   
   javascript案例隐藏密码--有关元素选取的问题   

     


   
   JavaScript让一个变量影响另一个变量内容   

        
        
    查看全部
                 收藏的人(5)





  
    
      
      采纳回答
    
    

     
        
        
        
            
                向帮助了您的知道网友说句感谢的话吧!
            
            
                
                    
                        非常感谢!

How many bytes in a JavaScript string?

The size of a JavaScript string is

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间