In Qt, is there a way to check if a byte array is a valid UTF-8 sequence?
It seems that QString::fromUtf8() silently suppresses or replaces invalid sequences, without notifying the caller that there were any. This is from its documentation:
However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed.
Try with QTextCodec::toUnicode and passing a ConverterState instance. ConverterState has members like
invalidChars
. They are not documented via doxygen though, but I assume them to be public API, as they are mentioned in the QTextCodec documentation.Sample code:
The
ConverterState
way, which has already been reported here by Frank Osterfeld, works even if the text hasn't got a "BOM (Byte Order Mark)" (*).(*) Unlike
QTextCodec::codecForUtfText()
, which needs a BOM in the text in order to know that it's in Utf-8.