可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I’ve recently gotten into scraping (and programming in general) for my internship, and I came across PDF scraping. Every time I try to read a scanned pdf with R, I can never get it to work. I’ve tried using the file.choose() function to no avail. Do I need to change my directory, or how can I get the pdf from my files into R? The code looks something like this:

    > library(pdftools)
    > text=pdf_text("C:/Users/myname/Documents/renewalscan.pdf")
    > text
    [1] ""

Also, using pdftables leads me here:

    > library(pdftables)
    > convert_pdf("C:/Users/myname/Documents/renewalscan.pdf","my.csv")
    Error in get_content(input_file, format, api_key) : 
    Bad Request (HTTP 400).

回答1:

You should use the packages pdftools and pdftables.

If you are trying to read text inside the pdf, then use pdf_text() function. What goes inside is the path (in your computer or web) to the pdf. For example

tt = pdf_text("C:/Users/Smith/Documents/my_file.pdf")

It would be nice if you were more specif and also give us reproducible example.

回答2:

To use the PDFTables R package, you need to the run the following command:

convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")

How to scrape a downloaded PDF file with R

问题:

回答1:

回答2:

收藏的人(0)

How to scrape a downloaded PDF file with R

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮