I have HTML5 source code (with full doctype/head/body) that needs to be converted to a Word DOCX file. The HTML file is generated page with minimal formatting (H1/H2/P
) and images (img
).
There is a FIGURE
that contains the image source (SRC
) parameter, and then there is a FIGCAPTION
tag that contains the caption for the image, similar to this (from https://www.w3schools.com/tags/tag_figcaption.asp ):
<figure>
<img src="img_pulpit.jpg" alt="The Pulpit Rock" width="304" height="228">
<figcaption>Fig1. - A view of the pulpit rock in Norway.</figcaption>
</figure>
The image and caption shows properly when the HTML5 page is viewed in a browser.
The issue is importing that HTML5 document into Word 2010 DOCX document (via File, Open, then File, Save As a DOCX). The caption (figcaption
) is not converted into a DOCX image caption, but is displayed separately (outside) of the image. If you look at the image's attributes (in Word), the caption is not there; the caption is just text that is not 'part of' the image.
How do I get the figcaption
text to be a caption in an image in the DOCX file?
(I don't have HTML-to-DOCX converters availabe - like Pandoc; I have tried several HTML-to-DOCX JS converters, and they don't solve the problem. Note that this issue is not with displaying the HTML in a browser, but in the conversion of HTML into DOCX when there are figure/figcaption tags.)
Added: the intent is to get pictures with their captions into the DOCX with additional text content. Pictures need to be side-by-side, not in separate 'rows'.