如何利用iText Java中的PDF阅读表?(How to read a Table in a P

2019-06-25 03:49发布

我没有使用java.I希望在使用iText的Java库PDF文件阅读PDF表格处理太多的想法。 如何进行?

Answer 1:

您可以提取从内容流中的文本,但对于普通的PDF文件,其结果将是纯文本(没有任何结构)。 如果有页面上的表,该表将不会被承认。 你会得到的内容和一些白色的空间,但是这不是一个表结构! 只有当你有一个标签PDF,您可以获得一个XML文件。 如果PDF包含被认定为表标签的标签,这将反映在PDF中。

这就是我发现在这里



Answer 2:

对于从PDF文件读取表的内容,您只需要通过使用任何API将PDF转换成文本文件(我用PdfTextExtracter.getTextFromPage() iText的),然后读取您的Java程序,txt文件。 看完后的主要任务已经完成。 你必须筛选所需的数据,您可以通过持续使用的划分方法做String类,直到你找到你想要的记录。

下面是我的代码中,我已经提取从PDF文件中的记录的一部分,并将其写入到.csv文件。 你可以在这里查看PDF文件: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
        try {
            String line = null;
            // Appending Header in CSV file...
            BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
                    true));
            writer1.close();
            // Checking whether file is empty or not..
            BufferedReader br = new BufferedReader(new FileReader(csvpath));
                         if ((line = br.readLine()) == null) {
                BufferedWriter writer = new BufferedWriter(new FileWriter(
                        csvpath, true));
                writer.append("REGION,");
                writer.append("YEAR,");
                writer.append("MONTH,");
                writer.append("THERMAL,");
                writer.append("NUCLEAR,");
                writer.append("HYDRO,");
                writer.append("TOTAL\n");
                writer.close();
            }
            // Reading the pdf file..
            PdfReader reader = new PdfReader(pdfpath);
            BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
                    true));

            // Extracting records from page into String..
            String page = PdfTextExtractor.getTextFromPage(reader, 1);
            // Extracting month and Year from String..
            String period1[] = page.split("PEROID");
            String period2[] = period1[0].split(":");
            String month[] = period2[1].split("-");
            String period3[] = month[1].split("ENERGY");
            String year[] = period3[0].split("VIS");

            // Extracting Northen region
            String northen[] = page.split("NORTHEN REGION");
            String nthermal1[] = northen[0].split("THERMAL");
            String nthermal2[] = nthermal1[1].split(" ");

            String nnuclear1[] = northen[0].split("NUCLEAR");
            String nnuclear2[] = nnuclear1[1].split(" ");

            String nhydro1[] = northen[0].split("HYDRO");
            String nhydro2[] = nhydro1[1].split(" ");

            String ntotal1[] = northen[0].split("TOTAL");
            String ntotal2[] = ntotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("NORTHEN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nthermal2[4] + ",");
            writer.append(nnuclear2[4] + ",");
            writer.append(nhydro2[4] + ",");
            writer.append(ntotal2[4] + "\n");

            // Extracting Western region
            String western[] = page.split("WESTERN");

            String wthermal1[] = western[1].split("THERMAL");
            String wthermal2[] = wthermal1[1].split(" ");

            String wnuclear1[] = western[1].split("NUCLEAR");
            String wnuclear2[] = wnuclear1[1].split(" ");

            String whydro1[] = western[1].split("HYDRO");
            String whydro2[] = whydro1[1].split(" ");

            String wtotal1[] = western[1].split("TOTAL");
            String wtotal2[] = wtotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("WESTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(wthermal2[4] + ",");
            writer.append(wnuclear2[4] + ",");
            writer.append(whydro2[4] + ",");
            writer.append(wtotal2[4] + "\n");

            // Extracting Southern Region
            String southern[] = page.split("SOUTHERN");

            String sthermal1[] = southern[1].split("THERMAL");
            String sthermal2[] = sthermal1[1].split(" ");

            String snuclear1[] = southern[1].split("NUCLEAR");
            String snuclear2[] = snuclear1[1].split(" ");

            String shydro1[] = southern[1].split("HYDRO");
            String shydro2[] = shydro1[1].split(" ");

            String stotal1[] = southern[1].split("TOTAL");
            String stotal2[] = stotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("SOUTHERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(sthermal2[4] + ",");
            writer.append(snuclear2[4] + ",");
            writer.append(shydro2[4] + ",");
            writer.append(stotal2[4] + "\n");

            // Extracting eastern region
            String eastern[] = page.split("EASTERN");

            String ethermal1[] = eastern[1].split("THERMAL");
            String ethermal2[] = ethermal1[1].split(" ");

            String ehydro1[] = eastern[1].split("HYDRO");
            String ehydro2[] = ehydro1[1].split(" ");

            String etotal1[] = eastern[1].split("TOTAL");
            String etotal2[] = etotal1[1].split(" ");
            // Appending filtered data into CSV file..
            writer.append("EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(ethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(ehydro2[4] + ",");
            writer.append(etotal2[4] + "\n");

            // Extracting northernEastern region
            String neestern[] = page.split("NORTH");

            String nethermal1[] = neestern[2].split("THERMAL");
            String nethermal2[] = nethermal1[1].split(" ");

            String nehydro1[] = neestern[2].split("HYDRO");
            String nehydro2[] = nehydro1[1].split(" ");

            String netotal1[] = neestern[2].split("TOTAL");
            String netotal2[] = netotal1[1].split(" ");

            writer.append("NORTH EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(nehydro2[4] + ",");
            writer.append(netotal2[4] + "\n");
            writer.close();

        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

    }


Answer 3:

我的解决办法

package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;

public class SimplePDFTable
{
    public SimplePDFTable() throws Exception
    {
        Document document = new Document();
        PdfWriter.getInstance(document, 
            new FileOutputStream("SimplePDFTable.pdf"));
        document.open();
        PdfPTable table = new PdfPTable(2); // Code 1
        // Code 2
        table.addCell("1");
        table.addCell("2");
        // Code 3
        table.addCell("3");
        table.addCell("4");
        // Code 4
        table.addCell("5");
        table.addCell("6");
        // Code 5
        document.add(table);        
        document.close();
    }

    public static void main(String[] args)
    {    
        try
        {
            SimplePDFTable pdfTable = new SimplePDFTable();
        }
        catch(Exception e)
        {
            System.out.println(e);
        }
    }
}


文章来源: How to read a Table in a PDF using iText java?
标签: java pdf itext