I need to crop a pdf document using the linux shell and then extract the text just in that cropped pdf.
My idea was to crop a pdf using pdfcrop linux tool and then use a txt2pdf text extractor tool to extract the text just in the cropped area, but i've realized that i'm thinking on images, and when i try to do this the result is the same than doing it over the original, not cropped, pdf.
I guess it's a layer problem. As the pdf format works with layers, if i don't "crop" all the layers, the result is gonna include all the information from all the layers, which i don't want.
I would appreciate so much if someone has any idea of how i could do a real "all layers cropping" in a pdf. If its possible or if i should start thinking on another solution.
TY