Importing pdf in R through package “tm”

2020-08-04 10:05发布

问题:

I know the practical example to get pdf in "R" workspace through package "tm" but not able to understand how the code is working and thus not able to import the desired pdf. The pdf imported in the following code is "tm" vignette.

The code is

if(file.exists(Sys.which("pdftotext"))) {
    pdf <- readPDF(PdftotextOptions = "-layout")(elem = list(uri = vignette("tm")$pdf),
                                                 language = "en",
                                                 id = "id1")
    pdf[1:13]
}

The "tm" is vignette. While the pdf which I am trying to bring is "different". So how to change the above code to bring my pdf in the workspace. minn is the pdf document which I am trying to import.

like

if(file.exists(Sys.which("pdftotext"))) {
        pdf <- readPDF(PdftotextOptions = "-layout")(elem = list(uri = vignette("minn")$pdf),
                                                     language = "en",
                                                     id = "id1")
        pdf[1:13]
    }

回答1:

So it seems that problem is with the PDF which I was trying to read. However the code goes like the below. Thanks Thomas for the lead. The link for pdf is "http://www.wine-economics.org/workingpapers/AAWE_WP16.pdf"

tt <- readPDF(PdftotextOptions="-layout")
rr <- tt(elem=list(uri="AAWE_WP16.pdf"),language="en",id="id1")
rr[1:15]


标签: r pdf tm