I have binary and want to extract part of it, starting from know byte string (i.e. FF D8 FF D0) and ending with known byte string (AF FF D9)
In the past I've used dd
to cut part of binary file from beginning/ending but this command doesn't seem to support what I ask.
What tool on terminal can do this?
This should work with standard tools (xxd, tr, grep, awk, dd). This correctly handles the "pattern split across line" issue, also look for the pattern only aligned at byte offset (not nibble).
Note: The script above use a temporary file to prevent having the binary>hex conversion twice. A space/time trade-off is to pipe the result of
xxd
directly into the twogrep
. A one-liner is also possible, at the expense of clarity.One could also use
tee
and named pipe to prevent having to store a temporary file and converting output twice, but I'm not sure it would be faster (xxd is fast) and is certainly more complex to write.See this link for a way to do binary grep. Once you have the start and end offset, you should be able with
dd
to get what you need.In a single pipe:
The idea is to use
awk
between twoxxd
to select the part of the file that is needed. Once the 1st pattern is found,awk
prints the bytes until the 2nd pattern is found and exit.The case where the 1st pattern is found but the 2nd is not must be taken into account. It is done in the
END
part of theawk
script, which return a non-zero exit status. This is catch bybash
's${PIPESTATUS[1]}
where I decided to delete the new file.Note that en empty file also mean that nothing has been found.
Locate the start/end position, then extract the range.
Another solution in
sed
, but using less memory:The 1st
sed
prints fromff d8 ff d0
till the end of file. Note that you need as muchN
in-e '1{N;N;N}'
as there is bytes in your 1st pattern less one.The 2nd
sed
prints from the beginning of the file toaa ff d9
. Note again that you need as muchN
in-e '1{N;N}'
as there is bytes in your 2nd pattern less one.Again, a test is needed to check if the 2nd pattern is found, and delete the file if it is not.
Note that the
Q
command is a GNU extension tosed
. If you do not have it, you need to trash the rest of the file once the pattern is found (in a loop like the 1stsed
, but not printing the file), and check after hex to binary conversion that the new_file end with the wright pattern.A variation on the
awk
solution that assumes that your binary file, once converted in hex with spaces, fits in memory: