UTF-8 incorrectly displayed in Lua/ Corona

In Lua, for an iPad Corona project, I'm requesting a UTF-8 server text file (containing Chinese characters) using network.request, but the result when displayed in the console or in the app shows as "garbage". Google Chrome, for instance, displays the same UTF-8 page fine, as I'm setting the http header when the server sends this (using PHP) to 'Content-Type: text/plain; charset=utf-8' (and there's no BOM, byte order mark either). The "garbage" I'm seeing in Lua looks similar to when I "force" Chrome to render the page as ISO-8859-1 using the options menu.

Does anyone have any help or pointers? If all else fails, how would I convert the "garbage" string back to its UTF-8 origins within Lua?

Thanks for any help!

标签： unicode utf-8 lua corona

3条回答

走好不送

2楼-- · 2019-04-16 08:01

Lua doesn't know anything about UTF-8; Lua strings are just sequences of bytes. It sounds like Corona itself is parsing the strings as ISO8859-1. The most likely cause for this is them doing something really stupid and naive like treating each byte of the string as a Unicode code point.

I'm afraid I don't know Corona, so can't provide any specific solutions, but I'd suggest looking to see what functions it's got that involve encodings --- there may be a specific function to render a string with a particular encoding, for example.

0人赞添加讨论(0) 举报

Juvenile、少年°

3楼-- · 2019-04-16 08:04

Can you show the code for your network.request() call?

If you're downloading a html page, you should use network.download().

0人赞添加讨论(0) 举报

Juvenile、少年°

4楼-- · 2019-04-16 08:14

I had this exact same problem, except with Japanese characters. Although Lua doesn't support UTF-8, Corona acts like it does. What that means is that... if you pass a UTF-8 String to display.newText(...), it should display properly. Now, if you output to the console, it will actually print out the raw bytes of the String. And, if you try to print the length of the string, it will actually print out the number of bytes.

So, in summary, Lua treats all strings as an array of bytes. It knows nothing about UTF-8. Some Corona API methods, when passed UTF-8 strings, will display the strings correctly.

I had issues when I mixed UTF-8 with plain ASCII characters, which I believe confused Corona (what I mean is that I mixed English characters with Japanese characters... still all UTF-8, though). I have a hunch that each character in the string must be of the same length in bytes for Corona to display it properly. Try printing out one character at a time to see if that helps. Please feel free to post comments here if you run into trouble. I'd like to figure this issue out myself, too.

0人赞添加讨论(0) 举报

UTF-8 incorrectly displayed in Lua/ Corona

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间