可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This question already has answers here:
Closed 2 years ago.
I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.
I have tried using the dd
utility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset. This is the command I tried:
dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile
I guess I could write a small perl/python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.
Is there a utility that supports something like this?
回答1:
Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:
#!/bin/sh
bs=100000
infile=$1
skip=$2
length=$3
(
dd bs=1 skip=$skip count=0
dd bs=$bs count=$(($length / $bs))
dd bs=$(($length % $bs)) count=1
) < "$infile"
回答2:
You can use tail -c+N
to trim the leading N bytes from input, then you can use head -cM
to output only the first M bytes from its input.
$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12
So using your variables, it would probably be:
tail -c+$offset inputfile | head -c$datalength > outputfile
Ah, didn't see it had to seek. Leaving this as CW.
回答3:
Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.
I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.
#!/usr/local/bin/python
import sys
BUFFER_SIZE = 100000
# Read args
if len(sys.argv) < 4:
print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])
# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)
# Read and write data in chunks
while length > 0:
# Read data
buffer = input.read(min(BUFFER_SIZE, length))
amount_read = len(buffer)
# Check for EOF
if not amount_read:
print >> sys.stderr, "Reached EOF, exiting..."
sys.exit(1)
# Write data
sys.stdout.write(buffer)
length -= amount_read
回答4:
According to man
dd
on FreeBSD:
skip=
n
Skip n blocks from the beginning of the input before copying.
On input which supports seeks, an lseek(2) operation is used.
Otherwise, input data is read and discarded. For pipes, the
correct number of bytes is read. For all other devices, the
correct number of blocks is read without distinguishing between
a partial or complete block being read.
Using dtruss
I verified that it does use lseek()
on an input file on Mac OS X.
If you just think that it is slow then I agree with the comment that this would be due to the 1-byte block size.
回答5:
You can use the
--input-position=POS
option of ddrescue.
回答6:
You can try hexdump command :
hexdump -v <File Path> -c -n <No of bytes to read> -s <Start Offset> | awk '{$1=""; print $0}' | sed 's/ //g'
Ex.) Read 100 bytes from 'mycorefile' starting from offset 100.
# hexdump -v -c mycorefile -n 100 -s 100 | awk '{$1=""; print $0}' | sed 's/ //g'
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0001\0\0\0005\0\0\0\0020003\0
\0\0\0\0\0\0@\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0 003\0
\0\0\0\0\0020\0\0\0\0\0\0001\0\0\0
006\0\0\0\0020003\0\0\0\0\0\0220c\0
\0\0\0\0
Then, using another script join all the lines of the output into single line if you want.
If you simply want to see the contents :
# /usr/bin/hexdump -v -C mycorefile -n 100 -s 100
00000064 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000074 00 00 00 00 01 00 00 00 05 00 00 00 00 10 03 00 |................|
00000084 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 |......@.........|
00000094 00 00 00 00 00 00 00 00 00 00 00 00 00 a0 03 00 |................|
000000a4 00 00 00 00 00 10 00 00 00 00 00 00 01 00 00 00 |................|
000000b4 06 00 00 00 00 10 03 00 00 00 00 00 00 90 63 00 |..............c.|
000000c4 00 00 00 00 |....|
000000c8
#