Can I use iText API for comparing two PDF files? I have gone through various approaches on stackoverflow for comparing PDF files such as tools, some utilities such as imagemagick etc. The PDFs which I wish to compare are fiancial reports with graphs, tables and text etc. We have to compare a large number of files and would like to do it through command line utility. There is a ComparePDF command line tool but its just outputs whether two files are conatining differences. We will like to print a log of file differences. Can we accomplish this through iText?
问题:
回答1:
What do you want to compare? iText could be used to compare structure and syntax, but... two different PDFs that look identical to the human eye, may have a completely different structure and syntax internally.
At iText, we have written JUnit tests that use GhostScript to create images of each page. These images are compared to each other on a pixel per pixel basis.
We also use iText in JUnit tests, but these tests look at the structure and the syntax more than at the content.
回答2:
You need to use the Myers O(ND) diff algorithm for PDF comparison , itext or pdfbox api dont provide the method for pdf comparison , you can extract the text of these files and coordinates using itext , later use the Myers O(ND) diff algorithm to find the difference and highlight the changes.