我没有使用java.I希望在使用iText的Java库PDF文件阅读PDF表格处理太多的想法。 如何进行?
Answer 1:
您可以提取从内容流中的文本,但对于普通的PDF文件,其结果将是纯文本(没有任何结构)。 如果有页面上的表,该表将不会被承认。 你会得到的内容和一些白色的空间,但是这不是一个表结构! 只有当你有一个标签PDF,您可以获得一个XML文件。 如果PDF包含被认定为表标签的标签,这将反映在PDF中。
这就是我发现在这里
Answer 2:
对于从PDF文件读取表的内容,您只需要通过使用任何API将PDF转换成文本文件(我用PdfTextExtracter.getTextFromPage()
iText的),然后读取您的Java程序,txt文件。 看完后的主要任务已经完成。 你必须筛选所需的数据,您可以通过持续使用的划分方法做String
类,直到你找到你想要的记录。
下面是我的代码中,我已经提取从PDF文件中的记录的一部分,并将其写入到.csv文件。 你可以在这里查看PDF文件: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf
public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
try {
String line = null;
// Appending Header in CSV file...
BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
true));
writer1.close();
// Checking whether file is empty or not..
BufferedReader br = new BufferedReader(new FileReader(csvpath));
if ((line = br.readLine()) == null) {
BufferedWriter writer = new BufferedWriter(new FileWriter(
csvpath, true));
writer.append("REGION,");
writer.append("YEAR,");
writer.append("MONTH,");
writer.append("THERMAL,");
writer.append("NUCLEAR,");
writer.append("HYDRO,");
writer.append("TOTAL\n");
writer.close();
}
// Reading the pdf file..
PdfReader reader = new PdfReader(pdfpath);
BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
true));
// Extracting records from page into String..
String page = PdfTextExtractor.getTextFromPage(reader, 1);
// Extracting month and Year from String..
String period1[] = page.split("PEROID");
String period2[] = period1[0].split(":");
String month[] = period2[1].split("-");
String period3[] = month[1].split("ENERGY");
String year[] = period3[0].split("VIS");
// Extracting Northen region
String northen[] = page.split("NORTHEN REGION");
String nthermal1[] = northen[0].split("THERMAL");
String nthermal2[] = nthermal1[1].split(" ");
String nnuclear1[] = northen[0].split("NUCLEAR");
String nnuclear2[] = nnuclear1[1].split(" ");
String nhydro1[] = northen[0].split("HYDRO");
String nhydro2[] = nhydro1[1].split(" ");
String ntotal1[] = northen[0].split("TOTAL");
String ntotal2[] = ntotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("NORTHEN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(nthermal2[4] + ",");
writer.append(nnuclear2[4] + ",");
writer.append(nhydro2[4] + ",");
writer.append(ntotal2[4] + "\n");
// Extracting Western region
String western[] = page.split("WESTERN");
String wthermal1[] = western[1].split("THERMAL");
String wthermal2[] = wthermal1[1].split(" ");
String wnuclear1[] = western[1].split("NUCLEAR");
String wnuclear2[] = wnuclear1[1].split(" ");
String whydro1[] = western[1].split("HYDRO");
String whydro2[] = whydro1[1].split(" ");
String wtotal1[] = western[1].split("TOTAL");
String wtotal2[] = wtotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("WESTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(wthermal2[4] + ",");
writer.append(wnuclear2[4] + ",");
writer.append(whydro2[4] + ",");
writer.append(wtotal2[4] + "\n");
// Extracting Southern Region
String southern[] = page.split("SOUTHERN");
String sthermal1[] = southern[1].split("THERMAL");
String sthermal2[] = sthermal1[1].split(" ");
String snuclear1[] = southern[1].split("NUCLEAR");
String snuclear2[] = snuclear1[1].split(" ");
String shydro1[] = southern[1].split("HYDRO");
String shydro2[] = shydro1[1].split(" ");
String stotal1[] = southern[1].split("TOTAL");
String stotal2[] = stotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("SOUTHERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(sthermal2[4] + ",");
writer.append(snuclear2[4] + ",");
writer.append(shydro2[4] + ",");
writer.append(stotal2[4] + "\n");
// Extracting eastern region
String eastern[] = page.split("EASTERN");
String ethermal1[] = eastern[1].split("THERMAL");
String ethermal2[] = ethermal1[1].split(" ");
String ehydro1[] = eastern[1].split("HYDRO");
String ehydro2[] = ehydro1[1].split(" ");
String etotal1[] = eastern[1].split("TOTAL");
String etotal2[] = etotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("EASTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(ethermal2[4] + ",");
writer.append(" " + ",");
writer.append(ehydro2[4] + ",");
writer.append(etotal2[4] + "\n");
// Extracting northernEastern region
String neestern[] = page.split("NORTH");
String nethermal1[] = neestern[2].split("THERMAL");
String nethermal2[] = nethermal1[1].split(" ");
String nehydro1[] = neestern[2].split("HYDRO");
String nehydro2[] = nehydro1[1].split(" ");
String netotal1[] = neestern[2].split("TOTAL");
String netotal2[] = netotal1[1].split(" ");
writer.append("NORTH EASTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(nethermal2[4] + ",");
writer.append(" " + ",");
writer.append(nehydro2[4] + ",");
writer.append(netotal2[4] + "\n");
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
Answer 3:
我的解决办法
package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
public class SimplePDFTable
{
public SimplePDFTable() throws Exception
{
Document document = new Document();
PdfWriter.getInstance(document,
new FileOutputStream("SimplePDFTable.pdf"));
document.open();
PdfPTable table = new PdfPTable(2); // Code 1
// Code 2
table.addCell("1");
table.addCell("2");
// Code 3
table.addCell("3");
table.addCell("4");
// Code 4
table.addCell("5");
table.addCell("6");
// Code 5
document.add(table);
document.close();
}
public static void main(String[] args)
{
try
{
SimplePDFTable pdfTable = new SimplePDFTable();
}
catch(Exception e)
{
System.out.println(e);
}
}
}
文章来源: How to read a Table in a PDF using iText java?