c# converting pdf to html [closed]

2019-03-10 22:54发布

问题:

Is there a .dll I can use which uses .pdf file as an input and .html file as an output? I want to convert .pdf to .html. My colleague says that it's very difficult going step by step, getting text/font/image/margins/links etc. from pdf and then creating new html file with the same content. He says it's nearly impossible. So I was thinking - if there's some dll which I can use as a reference to do that?

回答1:

Writing a program to do it is definitely not trivial. If you don't find any .NET Library to do this (I couldn't, at least not free), I would just download this and invoke it programmatically to get my html.

If you have the time to spare and/or PDFToHtml does not produce acceptable output for you, you could use iText to write the program yourself. It's a very mature free pdf library. I've used it in the past to manipulate PDFs (merge, create, etc).

UPDATE

As noted in the comment by Quandary, the PDFSharp library offers a more relaxed license (MIT) compared to the Commercial or AGPL license offered by iText. Keep this is mind when choosing your library. I have not used the PDFSharp library myself and I don't know how they compare in terms of functionality.



回答2:

You can download this free tool: PDFToHTML

Then in your program just fork a new process and run the executable passing the PDF file. I just tested it now and it seems to work ok.



回答3:

If you don't mind paying, Aspose offers a very good solution, this is what we use at my company.

http://www.aspose.com/categories/.net-components/aspose.pdf-for-.net/key-features.aspx



标签: c# html pdf dll