I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
with open("document-page%s.pdf" % i, "wb") as outputStream:
output.write(outputStream)
return pages
This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream)
to:
pages.append(outputStream)
When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file
.
Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!
It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open()
with StringIO
and call getvalue()
to obtain the contents:
import cStringIO
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
io = cStringIO.StringIO()
output.write(io)
pages.append(io.getvalue())
return pages
You can use the in-memory binary streams in the io
module. This will store the pdf files in your memory.
import io
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
outputStream = io.BytesIO()
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
output.write(outputStream)
# Move the stream position to the beginning,
# making it easier for other code to read
outputStream.seek(0)
pages.append(outputStream)
return pages
To later write the objects to a file, use shutil.copyfileobj
:
import shutil
with open('page0.pdf', 'wb') as out:
shutil.copyfileobj(pages[0], out)
Haven't used PdfFileWriter, but think that this should work.
def splitPdf(file_):
pdf = PdfFileReader(file_)
pages = []
for i in range(pdf.getNumPages()):
output = PdfFileWriter()
output.addPage(pdf.getPage(i))
pages.append(output)
return pages
def writePdf(pages):
i = 1
for p in pages:
with open("document-page%s.pdf" % i, "wb") as outputStream:
p.write(outputStream)
i += 1