How to grab an arbitrary chunk from a file on unix

2020-02-23 08:55发布

I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.

I have tried using the dd utility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset. This is the command I tried:

dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile

I guess I could write a small perl/python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.

Is there a utility that supports something like this?

标签: bash shell unix
6条回答
地球回转人心会变
2楼-- · 2020-02-23 08:59

You can use tail -c+N to trim the leading N bytes from input, then you can use head -cM to output only the first M bytes from its input.

$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12

So using your variables, it would probably be:

tail -c+$offset inputfile | head -c$datalength > outputfile


Ah, didn't see it had to seek. Leaving this as CW.

查看更多
该账号已被封号
3楼-- · 2020-02-23 09:07

Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:

#!/bin/sh

bs=100000
infile=$1
skip=$2
length=$3

(
  dd bs=1 skip=$skip count=0
  dd bs=$bs count=$(($length / $bs))
  dd bs=$(($length % $bs)) count=1
) < "$infile"
查看更多
乱世女痞
4楼-- · 2020-02-23 09:08

According to man dd on FreeBSD:

skip=n

Skip n blocks from the beginning of the input before copying. On input which supports seeks, an lseek(2) operation is used. Otherwise, input data is read and discarded. For pipes, the correct number of bytes is read. For all other devices, the correct number of blocks is read without distinguishing between a partial or complete block being read.

Using dtruss I verified that it does use lseek() on an input file on Mac OS X. If you just think that it is slow then I agree with the comment that this would be due to the 1-byte block size.

查看更多
Rolldiameter
5楼-- · 2020-02-23 09:11

You can try hexdump command :

 hexdump  -v <File Path> -c -n <No of bytes to read> -s <Start Offset> | awk '{$1=""; print $0}' | sed 's/ //g'

Ex.) Read 100 bytes from 'mycorefile' starting from offset 100.

# hexdump  -v -c  mycorefile -n 100 -s 100 | awk '{$1=""; print $0}' | sed 's/ //g'
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0001\0\0\0005\0\0\0\0020003\0
\0\0\0\0\0\0@\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0 003\0
\0\0\0\0\0020\0\0\0\0\0\0001\0\0\0
006\0\0\0\0020003\0\0\0\0\0\0220c\0
\0\0\0\0

Then, using another script join all the lines of the output into single line if you want.

If you simply want to see the contents :

# /usr/bin/hexdump  -v -C  mycorefile -n 100 -s 100
00000064  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000074  00 00 00 00 01 00 00 00  05 00 00 00 00 10 03 00  |................|
00000084  00 00 00 00 00 00 40 00  00 00 00 00 00 00 00 00  |......@.........|
00000094  00 00 00 00 00 00 00 00  00 00 00 00 00 a0 03 00  |................|
000000a4  00 00 00 00 00 10 00 00  00 00 00 00 01 00 00 00  |................|
000000b4  06 00 00 00 00 10 03 00  00 00 00 00 00 90 63 00  |..............c.|
000000c4  00 00 00 00                                       |....|
000000c8
#
查看更多
Juvenile、少年°
6楼-- · 2020-02-23 09:21

Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.

I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.

#!/usr/local/bin/python

import sys

BUFFER_SIZE = 100000

# Read args
if len(sys.argv) < 4:
    print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
    sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])

# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)

# Read and write data in chunks
while length > 0:
    # Read data
    buffer = input.read(min(BUFFER_SIZE, length))
    amount_read = len(buffer)

    # Check for EOF
    if not amount_read:
        print >> sys.stderr, "Reached EOF, exiting..."
        sys.exit(1)

    # Write data
    sys.stdout.write(buffer)
    length -= amount_read
查看更多
在下西门庆
7楼-- · 2020-02-23 09:25

You can use the

--input-position=POS

option of ddrescue.

查看更多
登录 后发表回答