Hi I am saving mostly english and german characters into a mysql database which currently is set to utf-8 charset.
I am assuming that I should use latin1 charset for this type of data, is that correct?
If so how can I change the charset to correct the german chars which are now saved in utf-8?
UPDATE
Maybe then it is a retrival problem ... When I export data from the db via php of course I get utf-8 back, can I do the retrival to give me latin1?
UPDATE 1
Ok I am building a website, the html encoding is uft-8 the db is uft-8, and now I want to run some exports and extract data, which should be returned in an excel sheet, and the data is utf-8, but here I need the chars to be latin1 ... or the encoding of the excel sheet extracted from the db need to be such that Töst will show Täst. Right now I get the data like this -> Töst
UPDATE 2
I am using following php script to do the dump:
http://www.fundisom.com/phparadise/php/databases/mySQL_to_excel
on line 48 I have changed the code to
header("Content-Type: application/$file_type; charset=utf-8");
no change in behaviour.
How would I solve the issue?
Almost Solution
<?php
$text = "ö is a valid UTF-8 character";
echo 'Original : ', $text, PHP_EOL;
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
echo 'Plain : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;
?>
this is what I need I think ... but I need to check it in context of the php script... tomorrow :-)
I agree with the previous answers that UTF-8 is a good choice for most applications.
Beware the traps that might be awaiting you, though! You'll want to be careful that you use a consistent character encoding throughout your system (input forms, output web pages, other front ends that might access or change the data).
I have spent some unpleasant hours trying to figure out why a simple β or é was mangled on my web page, only to find that something somewhere had goofed up an encoding. I've even seen cases of text that gets run through multiple encoders--once turning a single quotation mark into eight bytes.
Bottom line, don't assume the correct translation will be done; be explicit about character encoding throughout your project.
Edit: I see in your update you've already started to discover this particular joy. :)
Once you using double byte characters like UTF-8, there is no turning back...
The closer you can use is iconv
like this
<?php
$text = "ü is still a valid ISO-8859-1";
echo 'Original : ', $text, PHP_EOL;
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
echo 'Plain : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;
?>
details : http://php.net/manual/en/function.iconv.php
With UTF-8 you can store any character supported by Unicode, With UTF-8 you can store any character supported by Unicode, so you shouldn't have any problem using it to store just latin1 characters (which are just a really small subset of what Unicode supports).
So, for information storage you're ok; if you need to do any conversions when retrieving the data, that depends on the connector you use to get the data from the DB and on how your programming language handles string.
For the update: assuming that you're using PHP to produce web pages, can't you just send the right HTTP header specifying that your page is encoded in UTF8?
UTF-8 is the very best choice for all intents and purposes. Unless you have a really pressing reason to go for latin1 (e.g. compatibility with other applications), go for it.
There are several UTF-8 collations that handle umlauts and sort orders differently (see here for a list). You may need to pick one over the other depending on your requirements. They all can store umlauts, though.