Convert HTML entities and special characters to UT

2019-06-03 14:49发布

There are a lot of questions and documentation about converting HTML entities and special characters to UTF8 text in PHP. And also there is the PHP documentation itself, such as this htmlspecialchars_decode() and this html_entity_decode(). However, I could not find any function/solution that clearly describes how to convert any HTML characters and special entities to UTF-8 text. All of them state something like "if you want to do this, then do that", etc. But no solution ever states "to have pure UTF-8 text that could be read by humans, then do this".

The reason for me asking, is I really don't have a test case. I am reading off a database, and it is multilingual. However the only guarantee is that the characters are in HTML, and I need to convert those to UTF-8, in a way that can be read by humans who understand those languages. Now, how can I do that? What is the proper way to sanitize/decode the input so it is pure text?

Thanks.


Update

Here is an update, as it is clear from the comments I was not asking the question properly. My DB contains text. I would like to convert that text (which contains HTML entities and special characters), to UTF-8 text that I can display to the end user on the webpage. This text in the databae is written in multiple languages (such as French, Arabic, English ...etc.). All those can contains HTML entities for special characters. So how can I convert all that to UTF-8 text that can be read by humans who understand those languages? I like to remove those special characters and convert them to something that can be read by humans.

1条回答
姐就是有狂的资本
2楼-- · 2019-06-03 15:26

This works for me for decoding entities to utf8:

html_entity_decode($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');

Edit:-- The "trick" to it is the combination in the second parameter, and including the encoding in the third parameter. That is, if you just did html_entity_decode($str); the result would not be utf8.

查看更多
登录 后发表回答