I have a form with a textarea. Users enter a block of text which is stored in a database.
Occasionally a user will paste text from Word containing smart quotes or emdashes. Those characters appear in the database as: –, ’, “ ,â€
What function should I call on the input string to convert smart quotes to regular quotes and emdashes to regular dashes?
I am working in PHP.
Update: Thanks for all of the great responses so far. The page on Joel's site about encodings is very informative: http://www.joelonsoftware.com/articles/Unicode.html
Some notes on my environment:
The MySQL database is using UTF-8 encoding. Likewise, the HTML pages that display the content are using UTF-8 (Update:) by explicitly setting the meta content-type.
On those pages the smart quotes and emdashes appear as a diamond with question mark.
Solution:
Thanks again for the responses. The solution was twofold:
- Make sure the database and HTML files were explicitly set to use UTF-8 encoding.
- Use
htmlspecialchars()
instead ofhtmlentities()
.
In my experience, it's easier to just accept the smart quotes and make sure you're using the same encoding everywhere. To start, add this to your form tag:
accept-charset="utf-8"