Unknown characters

2020-04-08 01:40发布

站内文章 / Java

35 0

该账号已被封号

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I read the string from file with encoding "UTF-8". And I need to match it to a expression. The first character of the file is #, but in the string the first is ''(empty symbol). I have translated it into bytes with charset "UTF-8", here it is [-17, -69, -65]. Does anyone know what is it and how to solve it with regexprs?

回答1:

Some editors (like notepad) adds BOM (byte order mask) signature when saved UTF-8 text. You should check 0xEF, 0xBB, 0xBF bytes before read string from such file and skip them if they exists.

Another way is do not use notepad for editing UTF-8 texts, get other program like Notepad++, Kate or whatever with witch you can control adding BOM.