I have used this code to convert pdf to text.
input1 = '//Home//Sai Krishna Dubagunta.pdf'
output = '//Home//Me.txt'
os.system(("pdftotext %s %s") %( input1, output))
I have created the Home directory and pasted the source file in it.
The output I get is
1
And no file with .txt was created. Where is the Problem?
Your expression
will translate to
which means that the first parameter passed to
pdftotext
is//Home//Sai
, and the second parameter isKrishna
. That obviously won't work.Enclose the parameters in quotes:
There are various Python packages to extract the text from a PDF with Python.
pdftotext
pdftotext
package: Seems to work pretty well, but it has no options e.g. to extract bounding boxesInstallation
For Ubuntu:
Minimal Working Example
PDF miner
Install it with
pip install pdfminer.six
. A minimal working example is here.I think pdftotext command takes only one argument. Try using:
and see what happens. Hope this helps.