可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read file
do
if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
then
echo "found BOM in: $file"
fi
done
Or, if you prefer short, unreadable one-liners:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
It doesn't work with filenames that contain a line break,
but such files are not to be expected anyway.
Is there any shorter or more elegant solution?
Are there any interesting text editors or macros for text editors?
回答1:
What about this one simple command which not just finds but clears the nasty BOM? :)
find . -type f -exec sed '1s/^\xEF\xBB\xBF//' -i {} \;
I love "find" :)
Warning The above will modify binary files which contain those three characters.
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
回答2:
The best and easiest way to do this on Windows:
Total Commander → go to project's root dir → find files (Alt + F7) → file types *.* → Find text "EF BB BF" → check 'Hex' checkbox → search
And you get the list :)
回答3:
find . -type f -print0 | xargs -0r awk '
/^\xEF\xBB\xBF/ {print FILENAME}
{nextfile}'
Most of the solutions given above test more than the first line of the file, even if some (such as Marcus's solution) then filter the results. This solution only tests the first line of each file so it should be a bit quicker.
回答4:
If you accept some false positives (in case there are non-text files, or in the unlikely case there is a ZWNBSP in the middle of a file), you can use grep:
fgrep -rl `echo -ne '\xef\xbb\xbf'` .
回答5:
I would use something like:
grep -orHbm1 "^`echo -ne '\xef\xbb\xbf'`" . | sed '/:0:/!d;s/:0:.*//'
Which will ensure that the BOM occurs starting at the first byte of the file.
回答6:
You can use grep
to find them and Perl to strip them out like so:
grep -rl $'\xEF\xBB\xBF' . | xargs perl -i -pe 's{\xEF\xBB\xBF}{}'
回答7:
For a Windows user, see this (good PHP script for finding the BOM
in your project).
回答8:
An overkill solution to this is phptags
(not the vi
tool with the same name), which specifically looks for PHP scripts:
phptags --warn ./
Will output something like:
./invalid.php: TRAILING whitespace ("?>\n")
./invalid.php: UTF-8 BOM alone ("\xEF\xBB\xBF")
And the --whitespace
mode will automatically fix such issues (recursively, but asserts that it only rewrites .php scripts.)
回答9:
find -type f -print0 | xargs -0 grep -l `printf '^\xef\xbb\xbf'` | sed 's/^/found BOM in: /'
find -print0
puts a null \0 between each file name instead of using new lines
xargs -0
expects null separated arguments instead of line separated
grep -l
lists the files which match the regex
- The regex
^\xeff\xbb\xbf
isn't entirely correct, as it will match non-BOMed UTF-8 files if they have zero width spaces at the start of a line
回答10:
I used this to correct only JavaScript files:
find . -iname *.js -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;
回答11:
If you are looking for UTF files, the file command works. It will tell you what the encoding of the file is. If there are any non ASCII characters in there it will come up with UTF.
file *.php | grep UTF
That won't work recursively though. You can probably rig up some fancy command to make it recursive, but I just searched each level individually like the following, until I ran out of levels.
file */*.php | grep UTF