I am able to generate pdf from docx file using docx4j.But i need to convert doc file to pdf including images and tables. Is there any way to convert doc to docx in java. or (doc to pdf)?
问题:
回答1:
docx4j contains org.docx4j.convert.in.Doc, which uses POI to read the .doc, but it is a proof of concept, not production ready code. Last I checked, there were limits to POI's HWPF parsing of a binary .doc.
Further to mqchen's comment, you can use LibreOffice or OpenOffice to convert doc to docx. But if you are going to use LibreOffice or OpenOffice, you may as well use it to convert both .doc and .docx directly to PDF. Google 'jodconverter'.
回答2:
Cribbing off the POI unit tests, I came up with this to extract the text from a word document:
public String getText(String document) {
try
{
ZipInputStream is = new ZipInputStream( new FileInputStream(document));
try
{
is.getNextEntry();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try
{
IOUtils.copy( is, baos );
}
finally
{
baos.close();
}
byte[] byteArray = baos.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream( byteArray );
HWPFDocument doc = new HWPFDocument( bais );
extractor = new WordExtractor(doc);
extractor.getText();
}
finally
{
is.close();
}
}
catch ( IOException e )
{
throw new RuntimeException( e );
}
}
And then, cribbing off the PDFBox user's guide for creation:
PDDocument pdDoc = new PDDocument();
PDPage page = new PDPage();
pdDoc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount( 100, 700 );
contentStream.drawText(getText(documentPath));
contentStream.endText();
contentStream.close();
pdDoc.save("foo.pdf");
pdDoc.close();
I do hope that points you in the right direction, if not sorts you entirely.
回答3:
You can use jWordConvert for this.
jWordConvert is a Java library that can read and render Word documents natively to convert to PDF, to convert to images, or to print the documents automatically.
Details can be found at following link http://www.qoppa.com/wordconvert/