可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.

Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135

In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15

I have tried with

xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt

and

xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt

But I can't get it working.

Can anybody give me a hint?

回答1:

If your grep supports -P parameter then you could simply use the below command.

$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A

For 56 chars,

$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135

回答2:

As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:

#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
    i += len(needle)
    print (haystack[i : i + 28])
    i = haystack.find(needle, i)

回答3:

Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.

Adapted from this post: Grep characters before and after match?

I added some blank lines for readability.

VirtualBox:~$ cat data.dat

Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata

VirtualBox:~$ cat test.sh

awk '/5Z\<\/time\>/ {
  match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat

VirtualBox:~$ ./test.sh

2011-04-26T14:47:02-14:47:15

VirtualBox:~$

EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.