How to “write to variable” instead of “to file” in

2019-07-20 19:01发布

问题:

I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            output.write(outputStream)
    return pages

This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream) to:

pages.append(outputStream)

When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file.

Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!

回答1:

It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open() with StringIO and call getvalue() to obtain the contents:

import cStringIO

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        io = cStringIO.StringIO()
        output.write(io)
        pages.append(io.getvalue())
    return pages


回答2:

You can use the in-memory binary streams in the io module. This will store the pdf files in your memory.

import io

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        outputStream = io.BytesIO()

        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        output.write(outputStream)

        # Move the stream position to the beginning,
        # making it easier for other code to read
        outputStream.seek(0)

        pages.append(outputStream)
    return pages

To later write the objects to a file, use shutil.copyfileobj:

import shutil

with open('page0.pdf', 'wb') as out:
    shutil.copyfileobj(pages[0], out)


回答3:

Haven't used PdfFileWriter, but think that this should work.

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        pages.append(output)
    return pages

def writePdf(pages):
    i = 1
    for p in pages:
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            p.write(outputStream)
        i += 1