This question is in continuation to another question about selectively appending lines from one file to another.
The regex that I'm using works just fine at matching the lines to keep/to discard. The problem is that the file was composed from a bunch of other files, and sometimes the line I want to keep started out as the first line of a UTF-8 encoded file. This means that the findstr
command returns something like:
LineToKeep that started out as the first line in its file
LineToKeep another
LineToKeep more lines
LineToKeep that started out as the first line in its file
LineToKeep more
It's guaranteed that excepting the BOM bytes, the line will always begin with "LineToKeep". How can I get rid of those three UTF-8 BOM bytes, since these windows shell commands can't properly handle them?
I'm hoping for a way to remove them in place, or perhaps a modification to the findstr
command from that previous question.
Since I know each line must begin with "LineToKeep" or "LineToKeep", I figure there's a way to compute something like if (Line[3:10] == "LineToKeep") { Line = Line[3:]; }
for every line.
Another alternative from unix world that removes the BOM in file in-place:
This requires to download sed 4.4 for windows from https://github.com/mbuilov/sed-windows which offers working
-z
and-b
options which prevent corruption of line endings.I ended up calling PowerShell in windows cmd: