I have such a example input.txt
file:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.
Now I can easly grep for a word and get it's byte offset:
$ grep -ob incididunt /dev/null input.txt
input.txt:80:incididunt
Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80
byte offset. I want to print the whole line that contains that byte offset inside the file.
So ideally that would be to get a script.sh
that with two parameters, a file name and a byte offset, outputs the searched line:
$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
Another examples:
For the file=input.txt and the byte offset=130 the output should be:
enim ad minim veniam, quis nostrud exercitation ullamco laboris
For the file=input.txt and any byte offset between 195 up until 253 the output should be:
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
For the file=input.txt and the byte offset=400 the output should be:
sunt in culpa qui officia deserunt mollit anim id est laborum.
I have tried:
I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor
part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.
$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt
incididunt ut labore et dolore magna aliqua. Ut
I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read
, as it omits newlines. I think I can get it to work with using dd
, but there's surely must be a simpler solution.
set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
IFS= read -r -u 10 -N 1 c
pos=$((pos+1))
# this will not work..., read omits newlines
if [ "$c" = $'\n' ]; then
lastnewlinepost="$pos"
fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"
How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?