I want to do this:
findstr /s /c:some-symbol *
or the grep equivalent
grep -R some-symbol *
but I need the utility to autodetect files encoded in UTF-16 (and friends) and search them appropriately. My files even have the byte-ordering mark FFEE in them so I'm not even looking for heroic autodetection.
Any suggestions?
I'm referring to Windows Vista and XP.
can be replaced with the following character encoding aware command:
Thanks for the suggestions. I was referring to Windows Vista and XP.
I also discovered this workaround, using free Sysinternals
strings.exe
:Strings.exe
extracts all of the strings it finds (from binaries, but works fine with text files too) and prepends each result with a filename and colon, so take that into account in the regexp (or use cut or another step in the pipeline). The-s
makes it do a recursive extraction and-b
just suppresses the banner message.Ultimately I'm still kind of surprised that the flagship searching utilities Gnu
grep
andfindstr
don't handle Unicode character encodings natively.In higher versions of Windows, UTF-16 is supported out-of-box. If not, try changing active code page by
chcp
command.In my case when using
findstr
alone was failing for UTF-16 files, however it worked withtype
:According to this blog article by Damon Cortesi grep doesn't work with UTF-16 files, as you found out. However, it presents this work-around:
This is obviously for Unix, not sure what the equivalent on Windows would be. The author of that article also provides a shell-script to do the above that you can find on github here.
This only greps files that are UTF-16. You'd also grep your ASCII files the normal way.
A workaround is to convert your UTF-16 to ASCII or ANSI
Then you can use FINDSTR.
On Windows, you can also use find.exe.
The only problem is this prints file names followed by matches. You may filter them by piping to findstr