I'd like to know if there is any way to check if there is a string inside a pdf
file using a shell script? I was looking for something like:
if [search(string,pdf_file)] > 0 then
echo "exist"
fi
I'd like to know if there is any way to check if there is a string inside a pdf
file using a shell script? I was looking for something like:
if [search(string,pdf_file)] > 0 then
echo "exist"
fi
Each letter within a PDF doc is typically set individually. Therefore, you have to convert the .pdf to text, which will reduce the text to a simple stream.
I would try this:
The
tr
joins line breaks. The\+
allows for 1 or more space chars between words. Finally,grep -q
only returns exit status0/1
based on a match. It does not print matching lines.As nicely pointed by Simon, you can simply convert the
pdf
to plain text usingpdftotext
, and then, just search for what you're looking for.After conversion, you may use
grep
, bash regex, or any variation you want:This approach converts the .pdf files page-wise, so the occurences of the search string
$query
can be located more specifically.pdftotext -f $p -l $p
limits the range to be converted to only one page identified by the number$p
.grep --color=always
allows for protecting match highlights in the subsequentecho
.fileid=""
just makes sure the file name of the .pdf document is only printed once for multiple matches.