html tag not accepted in itextsharp and text out o

2019-08-19 05:36发布

问题:

I created a table with itextsharp and filled it with data from my database. Everything is ok but some data contains html tags, so in my table I get the tags instead of the text formatted, also some of the text gets outside of the table border.

Here is some code:

PdfPTable table4 = new PdfPTable(3);
                PdfPCell cell8 = new PdfPCell(new Phrase("Protocol", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell8.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell8);
                PdfPCell cell9 = new PdfPCell(new Phrase("Port", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell9.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell9);
                PdfPCell cell10 = new PdfPCell(new Phrase("Service", new Font(FontFactory.GetFont("Helvetica", 12f, Font.BOLD, new BaseColor(0, 0, 0)))));
                cell10.BackgroundColor = new BaseColor(242, 242, 242);
                table4.AddCell(cell10);

                foreach (int t in myprotocol)
                {
                    table4.AddCell(t.Protocol);
                    table4.AddCell(t.Port.ToString());
                    table4.AddCell(t.Service);
                }
                document.Add(table4);

回答1:

When you manually add content, whether its a Table, a Paragraph, a Chunk or something else, iTextSharp will always insert the content exactly as it. This means that it does not parse HTML.

If all you want to do is strip out the HTML tags then see this post and either use a RegEx (no dependencies but a few edge cases could break) or the HtmlAgilityPack (in my opinion a lot of unnecessary overhead) to remove the tags.

If you want to interpret the tags (for instance bolding when <strong> is encountered) then you're going to have to look at the HTMLWorker object. Here's a post that goes into a little detail on it.

EDIT

Below is sample code that tries to overflow the table's boundaries but doesn't on my test machine. It creates 4 table rows, the 3rd and 4th of which have some convoluted attempts at breaking the table's boundaries but don't. (You'll see the convoluted part where I inject some returns, tabs and special Unicode spaces.)

(This code must be run completely and not cherry picked for it to work correctly and it targets iTextSharp 5.1.1.0.)

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        //Sample object that mimic's the OPs structure
        public class SampleObject
        {
            public string Protocol { get; set; }
            public int Port { get; set; }
            public string Service { get; set; }
        }
        private void Form1_Load(object sender, EventArgs e)
        {
            //Create some sample data the mimics the OP's data structure and include some edge cases that could (but don't) cause things to blow up
            List<SampleObject> myprotocol = new List<SampleObject>();
            //General text
            myprotocol.Add(new SampleObject { Protocol = "Short text", Port = 80, Service = "This is a test" });
            //Long text w/ HTML
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text", Port = 81, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t") });
            //Long text w/ spaces replaced by Unicode FEFF which is a zero-width non-breaking space
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text with zero width no-break space", Port = 82, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t").Replace(" ", "\uFEFF") });
            //Long text w/ sapces reaplces by Unicode 0020 which is a regular non-breaking space
            myprotocol.Add(new SampleObject { Protocol = "Long HTML text with non-breaking space", Port = 83, Service = string.Format("<p>{0}{0}<p>{1}Configure the database server to only allow acces to trusted systems.{0}{1}For Example, the PCI DSS standard requires you the place the database in an{0}{1}internal network zone, segregated from the DMZ.{0}</p>", "\r\n", "\t").Replace(" ", "\u0020") });

            using (iTextSharp.text.Document Doc = new iTextSharp.text.Document(PageSize.LETTER))
            {
                using (FileStream FS = new FileStream(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "TableTest.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read))
                {
                    using (PdfWriter writer = PdfWriter.GetInstance(Doc, FS))
                    {
                        Doc.Open();

                        Doc.NewPage();

                        PdfPTable table4 = new PdfPTable(3);
                        table4.SetWidths(new float[] { 0.9f, 1f, 1.2f });

                        PdfPCell cell8 = new PdfPCell(new Phrase("Protocol", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12.0f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell8.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell8);

                        PdfPCell cell9 = new PdfPCell(new Phrase("Port", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell9.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell9);

                        PdfPCell cell10 = new PdfPCell(new Phrase("Service", new iTextSharp.text.Font(FontFactory.GetFont("Helvetica", 12f, iTextSharp.text.Font.BOLD, new BaseColor(0, 0, 0)))));
                        cell10.BackgroundColor = new BaseColor(242, 242, 242);
                        table4.AddCell(cell10);

                        foreach (SampleObject t in myprotocol)
                        {
                            table4.AddCell(t.Protocol);
                            table4.AddCell(t.Port.ToString());
                            table4.AddCell(t.Service);
                        }

                        Doc.Add(table4);

                        Doc.Close();
                    }
                }
            }

            this.Close();
        }
    }
}


标签: html itext