php to rtf, é becomes é

2019-03-30 13:47发布

Using this rtf class, I see my special characters getting converted, like é becomes \'C3\'A9 (that part is probably not the problem)

Once I get it in rtf using php header, the resulting character (é) is seen as é.

header("Content-type: application/rtf; charset=utf-8");
header("Content-Disposition: attachment; filename=$file_rtf");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false); 

Strange! My file is saved in utf-8, for information.

I had a similar problem while getting excel, but that is solved using

$text = mb_convert_encoding($text,'utf-16','utf-8');

This is not working for rtf. Thanks for any help.

PS. My file is saved with utf-8 encoding thro' DW, and my mysql default charset is also utf-8. I don't have a problem when I display a character directly from the database, but this problem is seen only when I type a special character directly into the page before to use the header.

Cheers.

4条回答
Anthone
2楼-- · 2019-03-30 14:00

You should ask author of the script, but judging from documentation and provided encodings, it is not UTF8 friendly. So you may try converting text to your code page (ex. cp1251) and using one of available encodings in this class to find best results.

查看更多
放我归山
3楼-- · 2019-03-30 14:03

Well, finally I solved it using:

mb_convert_encoding($text,'ISO-8859-15','utf-8');
查看更多
贼婆χ
4楼-- · 2019-03-30 14:11

hmm, having the same problem pulling from mysql. My page is encoded in UTF-8, as is my database. I'm even forcing mysqli into utf-8 mode by putting

if (!$mysqli->set_charset("utf8")) {echo "utf8 on";} else {echo "utf8 already on";}; 

nothing helps, i keep getting é

this was my solution, not eloquent, but it works.

echo str_replace('é', 'é', $mySQLResult);
查看更多
我欲成王,谁敢阻挡
5楼-- · 2019-03-30 14:16

You’re getting double-encoded. A \N{LATIN SMALL LETTER E WITH ACUTE} character is code point U+00E9. In UTF-8, that is \xC3\xA9.

But if you turn around and treat those two bytes as distinct code points U+00C3 and U+00A9, those are \N{LATIN CAPITAL LETTER A WITH TILDE} and \N{COPYRIGHT SIGN}, respectively.

Now once those now in turn get re-encoded, you get the byte sequence \xC3\x83\xC2\xA9, which is what you are seeing.

Are you on Windows system? They often seem to double-re-encode things.

查看更多
登录 后发表回答