UTF-8 to Unicode Code Points

Is there a function that will change UTF-8 to Unicode leaving non special characters as normal letters and numbers?

ie the German word "tchüß" would be rendered as something like "tch\20AC\21AC" (please note that I am making the Unicode codes up).

EDIT: I am experimenting with the following function, but although this one works well with ASCII 32-127, it seems to fail for double byte chars:

function strToHex ($string)
{
    $hex = '';
    for ($i = 0; $i < mb_strlen ($string, "utf-8"); $i++)
    {
        $id = ord (mb_substr ($string, $i, 1, "utf-8"));
        $hex .= ($id <= 128) ? mb_substr ($string, $i, 1, "utf-8") : "&#" . $id . ";";
}

    return ($hex);
}

Any ideas?

EDIT 2: Found solution: The PHP ord() function does not work for double byte chars. Use instead: http://nl.php.net/manual/en/function.ord.php#78032

标签： php unicode utf-8

8条回答

贼婆χ

2楼-- · 2019-01-06 22:15

I guess you're going to print out your strings on a website?

I'm storing all my databases in uft8, using html_entities($string) before output.

Maybe you have to try html_entities(utf8_encode($string));

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2019-01-06 22:16

With PHP 7, there is a new IntlChar::ord() to find the Unicode Code Point from a given UTF-8 character:

var_dump(sprintf('U+%04X', IntlChar::ord('ß')));

# Outputs: string(6) "U+00DF"

0人赞添加讨论(0) 举报

唯我独甜

4楼-- · 2019-01-06 22:16

I once created a function called _convert() which encodes safely everything to UTF-8.

0人赞添加讨论(0) 举报

来，给爷笑一个

5楼-- · 2019-01-06 22:16

Tested on php 5.6

/**
 * @param string $utf8char
 * @return string
 */
function toUnicodeCodePoint($utf8char)
{
    return 'U+' . dechex(mb_ord($utf8char));
}

/**
 * @see https://github.com/symfony/polyfill-mbstring
 * @param string $s
 * @return int
 */
function mb_ord($s)
{
    $code = ($s = unpack('C*', substr($s, 0, 4))) ? $s[1] : 0;
    if (0xF0 <= $code) {
        return (($code - 0xF0) << 18) + (($s[2] - 0x80) << 12) + (($s[3] - 0x80) << 6) + $s[4] - 0x80;
    }
    if (0xE0 <= $code) {
        return (($code - 0xE0) << 12) + (($s[2] - 0x80) << 6) + $s[3] - 0x80;
    }
    if (0xC0 <= $code) {
        return (($code - 0xC0) << 6) + $s[2] - 0x80;
    }

    return $code;
}

echo toUnicodeCodePoint('


             
            
                                  
            
            
            
            
            
            不美不萌又怎样                          
            
             
             6楼-- · 2019-01-06 22:23
             
             
             
                          
             
                                                                          
Converting one character set to another can be done with iconv:

http://php.net/manual/en/function.iconv.php

Note that UTF is already an Unicode encoding.

Another way is simply using htmlentities with the right character set:

http://php.net/manual/en/function.htmlentities.php
    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
             
            
                                  
            
            
            
            
            
            Viruses.                          
            
             
             7楼-- · 2019-01-06 22:25
             
             
             
                          
             
                                                                          
For people looking to find the Unicode Code Point for any character this might be useful. You can then encode the string in whatever you want, replacing certain characters with escape codes, and leaving others in their binary form (eg. ascii printable characters), depending on the context in which you want to use it.

From: Mapping codepoints to Unicode encoding forms


  The mapping for UTF-32 is, essentially, the identity mapping: the
  32-bit code unit used to encode a codepoint has the same integer value
  as the codepoint itself.


/**
 * Convert a string into an array of decimal Unicode code points.
 *
 * @param $string   [string] The string to convert to codepoints
 * @param $encoding [string] The encoding of $string
 * 
 * @return [array] Array of decimal codepoints for every character of $string
 */
function toCodePoint( $string, $encoding )
{
    $utf32  = mb_convert_encoding( $string, 'UTF-32', $encoding );
    $length = mb_strlen( $utf32, 'UTF-32' );
    $result = [];


    for( $i = 0; $i < $length; ++$i )

        $result[] = hexdec( bin2hex( mb_substr( $utf32, $i, 1, 'UTF-32' ) ) );


    return $result;
}

    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
   1
2
下一页


     
                      登录 后发表回答



   
   
   
  
   相关问题
      
    
    
   
   

     


   
   Views base64 encoded blob in HTML with PHP   

   



     


   
   Laravel Option Select - Default Issue   

   



     


   
   PHP Recursively File Folder Scan Sorted by Modific   

   



     


   
   Can php detect if javascript is on or not?   

   



     


   
   Using similar_text and strpos together   

   



        
      
    查看全部
   
   
  
   相关文章
 
   
   

     


   
   appnode 网站已经建立好了,admin.php等无法访问   

     


   
   如何安全的关闭程序   

     


   
   tp5.1.前后端分离.cros跨域问题.在线上找了各种方法.没辙了   

     


   
   这个php乱码能不能恢复   

     


   
   你们和公司如何签署不泄露公司项目的协议呢   

     


   
   如何将表单内容通过php展示出来 求解！   

     


   
   财务系统域名选择问题   

     


   
   ssl配置问题   

        
        
    查看全部
                 收藏的人(5)

UTF-8 to Unicode Code Points

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间