Check if UTF-8 string is valid in Qt

2019-04-18 22:30发布

In Qt, is there a way to check if a byte array is a valid UTF-8 sequence?

It seems that QString::fromUtf8() silently suppresses or replaces invalid sequences, without notifying the caller that there were any. This is from its documentation:

However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed.

标签： c++ qt utf-8

2条回答

在下西门庆

2楼-- · 2019-04-18 23:13

Try with QTextCodec::toUnicode and passing a ConverterState instance. ConverterState has members like invalidChars. They are not documented via doxygen though, but I assume them to be public API, as they are mentioned in the QTextCodec documentation.

Sample code:

QTextCodec::ConverterState state;
QTextCodec *codec = QTextCodec::codecForName("UTF-8");
const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
if (state.invalidChars > 0) {
    qDebug() << "Not a valid UTF-8 sequence.";
}

0人赞添加讨论(0) 举报

趁早两清

3楼-- · 2019-04-18 23:16

The ConverterState way, which has already been reported here by Frank Osterfeld, works even if the text hasn't got a "BOM (Byte Order Mark)" (*).

(*) Unlike QTextCodec::codecForUtfText(), which needs a BOM in the text in order to know that it's in Utf-8.

0人赞添加讨论(0) 举报

Check if UTF-8 string is valid in Qt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间