Elegant way to search for UTF-8 files with BOM?

For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:

find -type f |
while read file
do
    if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
    then
        echo "found BOM in: $file"
    fi
done

Or, if you prefer short, unreadable one-liners:

find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done

It doesn't work with filenames that contain a line break, but such files are not to be expected anyway.

Is there any shorter or more elegant solution?

Are there any interesting text editors or macros for text editors?

标签： php utf-8 shell text-editor

11条回答

可以哭但决不认输i

2楼-- · 2020-01-23 05:55

The best and easiest way to do this on Windows:

Total Commander → go to project's root dir → find files (Alt + F7) → file types *.* → Find text "EF BB BF" → check 'Hex' checkbox → search

And you get the list :)

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2020-01-23 05:56

If you are looking for UTF files, the file command works. It will tell you what the encoding of the file is. If there are any non ASCII characters in there it will come up with UTF.

file *.php | grep UTF

That won't work recursively though. You can probably rig up some fancy command to make it recursive, but I just searched each level individually like the following, until I ran out of levels.

file */*.php | grep UTF

0人赞添加讨论(0) 举报

时光不老，我们不散

4楼-- · 2020-01-23 06:01

I used this to correct only JavaScript files:

find . -iname *.js -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;

0人赞添加讨论(0) 举报

淡お忘

5楼-- · 2020-01-23 06:03

If you accept some false positives (in case there are non-text files, or in the unlikely case there is a ZWNBSP in the middle of a file), you can use grep:

fgrep -rl `echo -ne '\xef\xbb\xbf'` .

0人赞添加讨论(0) 举报

\"骚年 ilove

6楼-- · 2020-01-23 06:07

find . -type f -print0 | xargs -0r awk '
    /^\xEF\xBB\xBF/ {print FILENAME}
    {nextfile}'

Most of the solutions given above test more than the first line of the file, even if some (such as Marcus's solution) then filter the results. This solution only tests the first line of each file so it should be a bit quicker.

0人赞添加讨论(0) 举报

Juvenile、少年°

7楼-- · 2020-01-23 06:07

You can use grep to find them and Perl to strip them out like so:

grep -rl $'\xEF\xBB\xBF' . | xargs perl -i -pe 's{\xEF\xBB\xBF}{}'

0人赞添加讨论(0) 举报

1 2 下一页

Elegant way to search for UTF-8 files with BOM?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间