Is there a .dll I can use which uses .pdf file as an input and .html file as an output? I want to convert .pdf to .html. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from pdf and then creating new html file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?
问题:
回答1:
Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.
If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).
UPDATE
As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.
回答2:
You can download this free tool: PDFToHTML
Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.
回答3:
If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.
http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/key-features.aspx