I am using itext7 version 7.1.2 and itext7.pdfhtml version 2.0.2 to produce a PDF from some HTML containing elements which must not break across pages (e.g. graphs and their accompanying text).
I have tried using explicit page breaks (as was used successfully in our legacy iTextSharp solution (using page-break-before: always
on any elements containing elements which should not be separated)) but these don't work at all so tried using the more preferable page-break-inside: avoid
as a style on the element containing the elements which I did not want to break across multiple pages. Here is a simplified version of the code which outputs the inline HTML as a PDF in your "My Documents" path...
using iText.Html2pdf;
using iText.Kernel.Geom;
using iText.Kernel.Pdf;
using iText.Layout;
using iText.Layout.Element;
using System;
using System.Linq;
namespace IText7Html2PdfPageBreakTester
{
internal class Program
{
private static void Main(string[] args)
{
var html = @"<html>
<head>
</head>
<body>
<div style=""font-size: 60pt"">
Some Initial Text.
</div>
<div style=""page-break-inside: avoid; font-size: 120pt"">
This text should all be on the same page.
</div>
</body>
</html>";
var pdfFilePath = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), "Example PDF.pdf");
Console.WriteLine($"Converting example HTML to PDF and writing the PDF to: \"{pdfFilePath}\".");
using (var pdfWriter = new PdfWriter(pdfFilePath))
{
using (var pdfDocument = new PdfDocument(pdfWriter))
{
var converterProperties = new ConverterProperties();
pdfDocument.SetDefaultPageSize(PageSize.A4);
using (var document = new Document(pdfDocument))
{
//NOTE: If this line is commented then the "page-break-inside: avoid" style behaves as expected.
document.SetMargins(40, 40, 40, 40);
foreach (var element in HtmlConverter.ConvertToElements(html, converterProperties).OfType<IBlockElement>())
document.Add(element);
}
}
}
Console.WriteLine($"PDF written to: \"{pdfFilePath}\".");
}
}
}
Note that I was able to achieve the desired behaviour if no margins were set on the document; however, it is a business requirement that margins are set on the document so how can I both have these margins set and keep the page-break-inside: avoid
behaviour?
I have also tried creating a custom ITagWorker
to interpret a custom <pageBreak/>
element I tried using instead as a workaround but was having no luck there getting the ProcessorContext.GetPdfDocument().AddNewPage()
method to actually add a page.
Supplement: if you substitute the html
variable with the following you can see that neither page-break-before: always
nor page-break-after: always
work as expected regardless of whether margins have been set on the document.
var html = @"<html>
<head>
</head>
<body>
<div style=""page-break-after: always"">
Some Initial Text.
</div>
<div>
This text should be on a new page.
</div>
<div style=""page-break-before: always; font-size: 60pt"">
This text should be on a further new page.
</div>
<div style=""page-break-inside: avoid; font-size: 120pt"">
This text should all be on the same page.
</div>
</body>
</html>";