I am parsing XML, with simplexml_load_string()
, and using the data within it to update Active Directory (AD) objects, via LDAP.
Example XML (simplified):
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>Bìlbö Bággįnš</user>
<user>Gãńdåłf Thê Gręât</user>
<user>Śām Wīšë</user>
</users>
I firstly run an ldap_search()
to find a single user and then proceed to change their attributes. Pumping the above values straight into AD, using LDAP, will result in some pretty mangled characters showing up.
For example: Bìlbö Bággįnš
I've tried the following functions, to no avail:
utf8_encode($str);
utf8_decode($str);
iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str);
iconv("UTF-8", "ASCII//TRANSLIT", $str);
iconv("UTF-8", "T.61", $str);
Ideally, I don't want to do any of these string conversions. UTF-8 should be fine, right?!
I've also noticed the following:
I have printed out the values to see how they come out. curl-ing the script in CLI will show the correct characters, but web browsers show the same as AD.
What's going on? Should I be looking at something else, eg. URL encoding?
I'm hoping this is down to a simple mistake on my end.
EDIT:
I entered in these characters using AD admin GUI to see how they would come out. I can read them via LDAP fine. Correct characters are displayed when in a browser. curl-ing via CLI will show question marks instead of foreign characters. Passing one of these returned values into mb_detect_encoding()
will return UTF-8.
I decided to immediately modify the same object by not writing in a new string, but just reversing the existing value and saving the object. This works fine - I see the correct value (reversed) in AD.
- Developing on Mac OS X 10.7 Lion - PHP 5.4.3
- Running production on: Red Hat 6 - PHP 5.4.3
- AD server: Windows 2003
UPDATE:
After a few months, I was unable to find the answer/solution to this problem.
In the end, I went with replacing characters to their non-accented equivalent (NOT ideal, I know).
Are you using LDAP v3?
ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);
LDAPv3 supports UTF-8 by default, which it expects requests and responses to be in by default. See here: http://technet.microsoft.com/en-us/library/cc961766.aspx
Here is solution that worked for me. Do following things:
1.) First make sure you are using LDAP protocol version 3 which is using
"UTF-8" by default:
ldap_set_option($ldap, LDAP_OPT_PROTOCOL_VERSION, 3);
2.) If you want to change user's password, than make sure that "use TLS" option is set to true
and use SSL to false
.
ldap_start_tls($ldapConnection);
3.) I used port number 389
.
4.) Use PHP function ldap_mod_replace
to replace user's password.
5.) Use the following function to encode your $password
:
public function encodePassword($password)
{
$password="\"".$password."\"";
$encoded="";
for ($i=0; $i <strlen($password); $i++){
$encoded.="{$password{$i}}\000";
}
return $encoded;
}
6.) Use the following logic to change user's password:
$password="test";
if(mb_detect_encoding($password) == 'UTF-8')
{
$password = utf8_decode($password);
}
$add=array();
$add["unicodePwd"][0] = encodePassword($password);
$result = @ldap_mod_replace($ldapConnection, $userDn, $add);
if ($result === false){
//your action
}
else{
//Your action
}
7.) Please note that function encodePassword
will encode your
$password
to UTF-8 encoding. If your password is UTF-8 encoded,
then your have to decode it before sending it to the
encodePassword
function. That is why I wrote the line:
if(mb_detect_encoding($password) == 'UTF-8')
{
$password = utf8_decode($password);
}
This code worked for me when I provide german Umlauts in password: äüößÄÜ
etc...
I've managed to add foreign characters in LDAP with two steps:
LDAPv3 is UTF-8, but the tool I used (from smbldap-tools
) was not dealing with it properly.
Another thing to mention for those stumbling across this:
If your text is already in UTF-8, then do NOT attempt to re-encode it. Note the following remarks on the doc page for utf8_encode. Re-encoding an already encoded string will result in garbled text. Additionally, the function only allows for one specific encoding to another.
You could easily test if you need to UTF-8 encode the string by doing something like:
if (!preg_match('//u', $value)) {
// do your encoding process...
}
Regarding the characters not showing correctly on a web page either, but they are on the CLI, make sure you are setting the correct charset in your headers:
header('Content-type: text/html; charset=utf-8');