iText API for PDF comparison

2019-01-27 11:07发布

问题:

Can I use iText API for comparing two PDF files? I have gone through various approaches on stackoverflow for comparing PDF files such as tools, some utilities such as imagemagick etc. The PDFs which I wish to compare are fiancial reports with graphs, tables and text etc. We have to compare a large number of files and would like to do it through command line utility. There is a ComparePDF command line tool but its just outputs whether two files are conatining differences. We will like to print a log of file differences. Can we accomplish this through iText?

回答1:

What do you want to compare? iText could be used to compare structure and syntax, but... two different PDFs that look identical to the human eye, may have a completely different structure and syntax internally.

At iText, we have written JUnit tests that use GhostScript to create images of each page. These images are compared to each other on a pixel per pixel basis.

We also use iText in JUnit tests, but these tests look at the structure and the syntax more than at the content.



回答2:

You need to use the Myers O(ND) diff algorithm for PDF comparison , itext or pdfbox api dont provide the method for pdf comparison , you can extract the text of these files and coordinates using itext , later use the Myers O(ND) diff algorithm to find the difference and highlight the changes.



标签: java pdf itext