Foreign characters and LDAP. What encoding/charset

2020-07-11 05:28发布

问题:

I am parsing XML, with simplexml_load_string(), and using the data within it to update Active Directory (AD) objects, via LDAP.

Example XML (simplified):

<?xml version="1.0" encoding="UTF-8"?>
<users>
    <user>Bìlbö Bággįnš</user>
    <user>Gãńdåłf Thê Gręât</user>
    <user>Śām Wīšë</user>
</users>

I firstly run an ldap_search() to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing up.

For example: Bìlbö Bággįnš

I've tried the following functions, to no avail:

utf8_encode($str);
utf8_decode($str);
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
iconv("UTF-8", "ASCII//TRANSLIT", $str);
iconv("UTF-8", "T.61", $str);

Ideally, I don't want to do any of these string conversions. UTF-8 should be fine, right?!

I've also noticed the following: I have printed out the values to see how they come out. curl-ing the script in CLI will show the correct characters, but web browsers show the same as AD.

What's going on? Should I be looking at something else, eg. URL encoding? I'm hoping this is down to a simple mistake on my end.

EDIT: I entered in these characters using AD admin GUI to see how they would come out. I can read them via LDAP fine. Correct characters are displayed when in a browser. curl-ing via CLI will show question marks instead of foreign characters. Passing one of these returned values into mb_detect_encoding() will return UTF-8.

I decided to immediately modify the same object by not writing in a new string, but just reversing the existing value and saving the object. This works fine - I see the correct value (reversed) in AD.

  • Developing on Mac OS X 10.7 Lion - PHP 5.4.3
  • Running production on: Red Hat 6 - PHP 5.4.3
  • AD server: Windows 2003

UPDATE: After a few months, I was unable to find the answer/solution to this problem. In the end, I went with replacing characters to their non-accented equivalent (NOT ideal, I know).

回答1:

Are you using LDAP v3?

ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);

LDAPv3 supports UTF-8 by default, which it expects requests and responses to be in by default. See here: http://technet.microsoft.com/en-us/library/cc961766.aspx



回答2:

Here is solution that worked for me. Do following things:

1.) First make sure you are using LDAP protocol version 3 which is using "UTF-8" by default:

ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);

2.) If you want to change user's password, than make sure that "use TLS" option is set to true and use SSL to false.

ldap_start_tls($ldapConnection);

3.) I used port number 389.

4.) Use PHP function ldap_mod_replace to replace user's password.

5.) Use the following function to encode your $password:

public function encodePassword($password)
{
    $password="\"".$password."\"";
    $encoded="";
    for ($i=0; $i <strlen($password); $i++){ 
        $encoded.="{$password{$i}}\000"; 
    }
    return $encoded;
}

6.) Use the following logic to change user's password:

$password="test";
if(mb_detect_encoding($password) == 'UTF-8')
{
    $password = utf8_decode($password);
}

$add=array();
$add["unicodePwd"][0] = encodePassword($password);

$result = @ldap_mod_replace($ldapConnection, $userDn, $add);
if ($result === false){
    //your action
}
else{
    //Your action
}

7.) Please note that function encodePassword will encode your $password to UTF-8 encoding. If your password is UTF-8 encoded, then your have to decode it before sending it to the encodePassword function. That is why I wrote the line:

if(mb_detect_encoding($password) == 'UTF-8')
{
    $password = utf8_decode($password);
}

This code worked for me when I provide german Umlauts in password: äüößÄÜ etc...



回答3:

I've managed to add foreign characters in LDAP with two steps:

  • add the user only with ASCII characters (iconv "ASCII//TRANSLIT")

  • use ldapmodify to update the field(s) with UTF-8 characters

LDAPv3 is UTF-8, but the tool I used (from smbldap-tools) was not dealing with it properly.



回答4:

Another thing to mention for those stumbling across this:

If your text is already in UTF-8, then do NOT attempt to re-encode it. Note the following remarks on the doc page for utf8_encode. Re-encoding an already encoded string will result in garbled text. Additionally, the function only allows for one specific encoding to another.

You could easily test if you need to UTF-8 encode the string by doing something like:

if (!preg_match('//u', $value)) {
    // do your encoding process...
}

Regarding the characters not showing correctly on a web page either, but they are on the CLI, make sure you are setting the correct charset in your headers:

header('Content-type: text/html; charset=utf-8');