How to grab an arbitrary chunk from a file on unix

You can use tail -c+N to trim the leading N bytes from input, then you can use head -cM to output only the first M bytes from its input.

$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12

So using your variables, it would probably be:

tail -c+$offset inputfile | head -c$datalength > outputfile

Ah, didn't see it had to seek. Leaving this as CW.

回答3:

Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.

I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.

#!/usr/local/bin/python

import sys

BUFFER_SIZE = 100000

# Read args
if len(sys.argv) < 4:
    print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
    sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])

# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)

# Read and write data in chunks
while length > 0:
    # Read data
    buffer = input.read(min(BUFFER_SIZE, length))
    amount_read = len(buffer)

    # Check for EOF
    if not amount_read:
        print >> sys.stderr, "Reached EOF, exiting..."
        sys.exit(1)

    # Write data
    sys.stdout.write(buffer)
    length -= amount_read

回答4:

According to man dd on FreeBSD:

skip=n

Skip n blocks from the beginning of the input before copying. On input which supports seeks, an lseek(2) operation is used. Otherwise, input data is read and discarded. For pipes, the correct number of bytes is read. For all other devices, the correct number of blocks is read without distinguishing between a partial or complete block being read.

Using dtruss I verified that it does use lseek() on an input file on Mac OS X. If you just think that it is slow then I agree with the comment that this would be due to the 1-byte block size.

回答5:

You can use the

--input-position=POS

option of ddrescue.

回答6:

You can try hexdump command :

 hexdump  -v <File Path> -c -n <No of bytes to read> -s <Start Offset> | awk '{$1=""; print $0}' | sed 's/ //g'

Ex.) Read 100 bytes from 'mycorefile' starting from offset 100.

# hexdump  -v -c  mycorefile -n 100 -s 100 | awk '{$1=""; print $0}' | sed 's/ //g'
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0001\0\0\0005\0\0\0\0020003\0
\0\0\0\0\0\0@\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0 003\0
\0\0\0\0\0020\0\0\0\0\0\0001\0\0\0
006\0\0\0\0020003\0\0\0\0\0\0220c\0
\0\0\0\0

Then, using another script join all the lines of the output into single line if you want.

If you simply want to see the contents :

# /usr/bin/hexdump  -v -C  mycorefile -n 100 -s 100
00000064  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000074  00 00 00 00 01 00 00 00  05 00 00 00 00 10 03 00  |................|
00000084  00 00 00 00 00 00 40 00  00 00 00 00 00 00 00 00  |......@.........|
00000094  00 00 00 00 00 00 00 00  00 00 00 00 00 a0 03 00  |................|
000000a4  00 00 00 00 00 10 00 00  00 00 00 00 01 00 00 00  |................|
000000b4  06 00 00 00 00 10 03 00  00 00 00 00 00 90 63 00  |..............c.|
000000c4  00 00 00 00                                       |....|
000000c8
#