Read pdf files with php

I have a large PDF file that is a floor map for a building. It has layers for all the office furniture including text boxes of seat location.

My goal is to read this file with PHP, search the document for text layers, get their contents and coordinates in the file. This way I can map out seat locations -> x/y coordinates.

Is there any way to do this via PHP? (Or even Ruby or Python if that's what's necessary)

标签： php pdf

5条回答

爱死公子算了

2楼-- · 2019-01-01 02:39

There is a php library (pdfparser) that does exactly what you want.

project website

http://www.pdfparser.org/

github

https://github.com/smalot/pdfparser

Demo page/api

http://www.pdfparser.org/demo

After including pdfparser in your project you can get all text from mypdf.pdf like so:

<?php
$parser = new \installpath\PdfParser\Parser();
$pdf    = $parser->parseFile('mypdf.pdf');  
$text = $pdf->getText();
echo $text;//all text from mypdf.pdf

?>

Simular you can get the metadata from the pdf as wel as getting the pdf objects (for example images).

0人赞添加讨论(0) 举报

何处买醉

3楼-- · 2019-01-01 02:42

You might want to also try this application http://pdfbox.apache.org/. A working example can be found at https://www.jinises.com

0人赞添加讨论(0) 举报

高级女魔头

4楼-- · 2019-01-01 02:50

Hmm ... not exactly php, but you could call a program from php to convert the pdf to a temporary html file and then parse the resulting file with php. I've done something similar for a project of mine and this is the program I used:

PdfToHtml

What's cool about the program is that it will spit out the text elements in < div > tags with absolute position coordinates. It seems like this is exactly what you are trying to do.

0人赞添加讨论(0) 举报

春风洒进眼中

5楼-- · 2019-01-01 02:52

Check out FPDF (with FPDI):

http://www.fpdf.org/

http://www.setasign.de/products/pdf-php-solutions/fpdi/

These will let you open an pdf and add content to it in PHP. I'm guessing you can also use their functionality to search through the existing content for the values you need.

Another possible library is TCPDF: http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=tcpdf

Update to add a more modern library: PDF Parser

0人赞添加讨论(0) 举报

梦该遗忘

6楼-- · 2019-01-01 02:54

your initial request is "I have a large PDF file that is a floor map for a building. "

I am afraid to tell you this might be harder than you guess.

Cause the last known lib everyones use to parse pdf is smalot, and this one is known to encounter issue regarding large file.

Here too, Lookig for a real php lib to parse pdf, without any memory peak that need a php configuration to disable memory limit as lot of "developers" does (which I guess is really not advisable).

see this post for more details about smalot performance : https://github.com/smalot/pdfparser/issues/163

0人赞添加讨论(0) 举报

Read pdf files with php

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间