We are currently working on an pdf version of a newspaper at work, we have a .net website which captures the articles to publish, storing the content entered as html, so we can maintain styles like bold, underline, strike out.
Once this is stored in the database we are planning to use Indesign to create the pdf. We currently we have a template built, but when we generate an xml document and import into Indesign the html tags are just written out. Is there a way around this, to get Indesign to maintain the tags as they would be in html? We just need some simple ones, like bold, strikeout, underling, center align.
Thanks.
We have had some bad experiences importing xml into InDesign directly.
If you are still having trouble with this issue, check out the open source Ickmull code library. It converts an xhtml file to an idml file, that can then be opened in InDesign. This might be a better web to print workflow for you.
http://code.google.com/p/ickmull/
Maybe you can use a Markdown to InDesign translater as a starting point: http://www.jongware.com/markdownid.html
Pandoc now support export to ICML (Adobe InCopy's XML format that can be "placed" in InDesign documents). To convert HTML to ICML:
See Importing Markdown in InDesign in the pandoc wiki for details around the workflow.
This is an old question, but the problem is probably perennial.
Here is an easy real-world technique. It may not be perfectly suited to an automatic workflow, but is perfect for occasional use.
Copy the html code, for example from the source view of the browser. Omit the head part, css, menus, etc., and copy only the relevant content which may be enclosed in a series of div, section or other container tags.
Paste in a plain text document (Notepad on Windows, TextEdit on Mac) and save as a plain text file with a
.html
extension.Open the html file with LibreOffice. I tried with versions 4 and 6, and they both parse html just fine. You get a document with paragraph styles (like headings) and character styles (like bold and italic). Optionally select all and change the font to Times New Roman. Save as a
.docx
file, or some other file type.Import this to InDesign with options for preserving styles and formatting and importing styles automatically. You get a document with paragraph styles and character styles which you may edit as you wish.
This tool is a decent HTML to InDesign importer: https://www.id-extras.com/html-import-script
It may take some rework, but it brings in styles that you can edit and has saved me a bunch of time.
You'll need to translate the HTML tags into CharacterStyles, and apply those to the XML on import.
The tricky thing is that CharacterStyles can't be applied nested like HTML can, so you need to make a CharacterStyle for each combination that might be present. Or you can apply styles to the specific run of text, with a script.