What is the best perl module to extract text from

2020-07-11 05:32发布

What is the best way to extract text from a pdf?

标签： perl pdf text extraction

1条回答

2楼-- · 2020-07-11 06:21

The CAM::PDF module is pretty useful for extracting text and maintaining some information about where it came from in the document. It installs /usr/local/bin/getpdftext.pl which demonstrates simple extraction. However, CAM::PDF can only read PDFs that are completely valid.

If you are dealing with ill-formed PDFs, you may need a more lenient parser, such as pdftotext. It dumps foo.pdf to foo.txt, which you could then read into Perl.

0人赞添加讨论(0) 举报

What is the best perl module to extract text from

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间