MS Word documents to RTF documents

2019-09-11 12:22发布

问题:

I've a problem: my application must convert ms word documents (imported from another system) into rtf documents, in order to be manipulated with OOo APIs and to be immune from mistakes (for coding incompatibility reasons).

I ask you: how can I manipulate ms word documents directly from my Java application? There are APIs (like POI or OOo) that allow me to do my work without any coding incompatibility?

My system runs on Linux server machines (such as all production systems for public) and I've installed only OOo.

Using the OOo java APIs I can open, manipulate and save the documents, but, in this last period I'm viewing a lot of problems concerning the incompatibility for coding between the Ms Word closed coding and the OOo opend document format coding (I refer to swriter). In many cases, list with particular bullets (e.g., '-' or also nested list), page numbering (e.g., 1 of x format), and many others formatting options, the output document (from manipulation) shows many errors due to, I think, incompatibility between the two coding formats.

Now, I'm studying the Apache POI capabilities in order to understand if I can open Ms Word with it, and save the document in RTF format that is and interchange format able to reduce the incompatibility to minimal level.

Do you have a same problem? Can you indicate me a Java open source library more powefull of POI? Or, can you suggest me a combined approach such as POI+iText to do the conversion step ms word to rtf?

回答1:

When I was asked to provide a way to reliably convert a doc to a tiff I did some research. There is a number of libraries out there - both free and commercial which claim to be able to render ms.docs. None of them provide 100% accurate rendering.

The way I had to do it is to run MS Word in a wrapper and manipulate it to do what I need through the OLE Automation. This (running Word in background) in itself has quiet a few gotchas but with thoughtful design you can make it work.

Your case is even easier than mine because all you need is to open the doc and then save it as.

Edit

@Paolo - There you go. I've been through the same - evaluating various packages, OO included and finding that they are mmmm... less than precise. Of course it all depends on how strict you customers are about document formatting. Mine were extremely picky - up to the margin sizes and picture positioning.

Another option would be to give (and get approval of) a list of imprecisions. Unfortunately with every new doc you will run a chance to hit a new one



回答2:

Docvert lets you set up a web service to convert Word documents to Open Office format. It craps out on the OLE objects though.