Searched quite a bit but as I couldn't find a solution for this kind of problem, hence posting a clear question on the same. Most answers cover image/text extraction which are comparatively easier.
I've a requirement of extracting tables and graphs as text (csv) and images respectively from PDFs.
Can anyone help me with an efficient python 3.6 code to solve the same?
Till now I could achieve extracting jpgs using startmark = b"\xff\xd8" and endmark = b"\xff\xd9", but not all tables and graphs in a PDF are plain jpgs, hence my code fails badly in achieving that.
Example, I want to extract table from page 11 and graphs from page 12 as image or something which is feasible from the below given link. How to go about it?
https://hartmannazurecdn.azureedge.net/media/2369/annual-report-2017.pdf