I am new here, so I apologize if I am doing anything wrong.
I have a form which submits user input onto another page. User is expected to type ä, ö, é, etc... I have placed all of the following in the document:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
header('Content-Type:text/html; charset=UTF-8');
<form action="whatever.php" accept-charset="UTF-8">
I even tried:
ini_set('default_charset', 'UTF-8');
When the other page loads, I need to check what the user input with something like:
if ( $_POST['field'] == $check ) {
...
}
But if he inputs something like 'München', PHP will compare 'München' with 'München' and will never trigger TRUE even though it should. Since it is specified UTF-8 everywhere, I am guessing that the server is converting to something else (Windows-1252 as I read on another thread) because it does not support or is not configured to UTF-8. I am using Apache on a local server before I load into production; I have not changed (and don't know how to) any of the default settings. I've been working on a Windows 7, editing with Notepad++ enconding my files in ANSI. If I bin2hex('München')
I get '4dc3bc6e6368656e'.
If I echo $_POST['field'];
it displays 'München' correctly.
I have researched everywhere for an explanation, all I find is that I should include those tags/headings I already have.
Any help is much appreciated.
This is due to the character encoding of the PHP file(s).
The hardcoded
München
is stored with the character encoding of the source file(s), in this caseANSI
and when that value is compared to the UTF-8 encoded value provided in the$_POST
variable, the two will, quite naturally, differ.The solution to your problem is one of:
windows-1252
.content="text/html; charset=UTF-8"
tocontent="text/html; charset=windows-1252"
whenever serving HTML data.UTF-8
andwindows-1252
, more or less only hardcode values that only includes English letters and numbers.UTF-8
values would have to be read from a source that ensures they areUTF-8
encoded (for instance a database set to useUTF-8
as storage encoding as well as connection encoding).utf8_encode()
, for instance$value = utf8_encode ('München');
UTF-8
.Either solution 1 or 4 would be my preferred solution, especially if multiple people are involved in the project.
As a side-note, some text editors (notably
Notepad++
) has the option of using eitherUTF-8
orUTF-8 without BOM
. TheBOM
(Byte Order Mark) is pointless inUTF-8
and will cause problems when writing headers in PHP (most often when doing a redirect). This is because theBOM
is right in front of the initial<?php
, causing the server to send theBOM
just as it would had there been any other character in front. The difference is you'd note a character in front, but theBOM
isn't displayed.Rule of thumb: Always use UTF-8 without BOM.
Another solution that may be helpful is in Apache, you can place a directive in your configuration file (httpd.conf) or .htacess called
AddDefaultCharset
. It looks like this:AddDefaultCharset utf-8
http://httpd.apache.org/docs/2.0/mod/core.html#affffdefaultcharset
That will override any other default charsets.
I've used Unicode characters in my forms and file many times. I had not any problem up to now. Try to do these steps and check the result:
header('Content-Type:text/html; charset=UTF-8');
from your HTML form codes.<form action="whatever.php">
withoutaccept-charset="UTF-8"
. (It's better to insert the method of sending data in your form tag).<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
in a<head>
tag.I always did my project like what I mentioned here and I did not have any problem with Unicode strings.
You are facing many different problems at the same, let's start with the simplest one.
Problem 1) You say that
echo $_POST['field'];
will display it correctly? What do you mean with "display"? It can be displayed correctly in two cases:So, the fact that
echo $_POST['field'];
is correct tells you nothing.Problem 2) You are using
Is this PHP code? If it is, it will be an error because the header must be set before sending out any byte. If you do this you will not set the
Content-Type
header and PHP should generate a warning.Problem 3) You are using
Some browsers (IE, mostly) ignore
accept-charset
if they can coerce the data to be sent in ASCII or ISO Latin-1. So the data will be in UTF-8 and declared as ISO Latin-1 or ISO Latin-1 and sent as ISO Latin-1 (but this second case is not your case).Have a look at https://stackoverflow.com/a/8547004/449288 to see how to solve this problem.
Problem 4) Which strings are you comparing? For example, if you have
The result of this code will depend on the encoding of the PHP file. If the file is encoded in ISO Latin-1 and the
$_POST
correctly contains UTF-8 data, the==
will compare different bytes and will return false.I changed "mbstring.detect_order = pass" in my php.ini file and i worked