Error when trying to Parse an excel xml file with

2019-09-08 10:38发布

问题:

I'm trying to parse the xml data from this tutorial. but I keep getting an error.

Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 40; Premature end of file.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at convert.ExcelXmlReader.getAndParseFile(ExcelXmlReader.java:60)
at convert.ExcelXmlReader.main(ExcelXmlReader.java:32)

I can download the file, and I did edit his code so that I could format my xml. My endgame is to import this into Access but I'm having trouble just parsing it.

Also in their code, they used something for the exml version and encoding, but my xml file already has that () so I took it out. I'm not sure what else I might need to do.

 private static void getAndParseFile() throws Exception {
        System.out.println("getAndParseFile");
        String fileName="C:\\Users\\windowsUserName\\Downloads\\F7BAH1P2_List.xml";

        File file = new File(fileName);
        removeLineFromFile(file.getAbsolutePath());

        System.out.println("Finished Removing Lines");


        String fileContent = IOUtils.toString(new FileInputStream(file));
        SAXParserFactory parserFactor = SAXParserFactory.newInstance();
        SAXParser parser = parserFactor.newSAXParser();
        SAXHandler handler = new SAXHandler();

        ByteArrayInputStream bis = new ByteArrayInputStream(fileContent.getBytes());

        parser.parse(bis, handler); \\Apparently error happens here**

        Workbook workbook = new HSSFWorkbook();
        Sheet sheet = workbook.createSheet();

        //Converts all rows to POI rows 
        int rowCount = 0;
        for (XmlRow subsRow : handler.xmlRowList) {
            Row row = sheet.createRow(rowCount);
            int cellCount = 0;
            for (String cellValue : subsRow.cellList) {
                Cell cell = row.createCell(cellCount);
                cell.setCellValue(cellValue);
                cellCount++;
            }
            rowCount++;
        }

        String fileOutPath = "C:\\Users\\windowsUserName\\Downloads\\fileOut.xls";
        FileOutputStream fout = new FileOutputStream(fileOutPath);
        workbook.write(fout);
        workbook.close();
        fout.close();

        if (file.exists()) {
            System.out.println("delete file-> " + file.getAbsolutePath());
            if (!file.delete()) {
                System.out.println("file '" + file.getAbsolutePath() + "' was not deleted!");
            }
        }
        System.out.println("getAndParseFile finished, processed " + " substances!");
    }

Their SaxHandler.java file that I don't know how to edit but I think it's right? I do see both "Row" and "Data" in my xml file as well.

package convert;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import java.util.ArrayList;
import java.util.List;

class SAXHandler extends DefaultHandler {

    List<XmlRow> xmlRowList = new ArrayList<>();
    XmlRow xmlRow = null;
    String content = null;

    @Override
    //Finds start of Row
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if (qName.equalsIgnoreCase("row"))
                xmlRow = new XmlRow();
    }

    @Override
    //Finds end of Row tag
    public void endElement(String uri, String localName, String qName) throws SAXException {
        switch (qName) {
            case "Row": //if it's the </row>,
                xmlRowList.add(xmlRow);  //add this row in the rowlist?
                break;
            case "Data": //if it is </data>
                xmlRow.cellList.add(content); //
                break;
        }
    }

    @Override
    //Gets data between the tags.
    public void characters(char[] ch, int start, int length) throws SAXException {
        content = String.copyValueOf(ch, start, length).trim();
    }
}

The Excel/Xml File:

<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
<Cell><Data ss:Type="String">WR Status</Data></Cell>
<Cell><Data ss:Type="String">Request Plant</Data></Cell>
<Cell><Data ss:Type="String">Request #</Data></Cell>    
<Cell><Data ss:Type="String">Item#</Data></Cell>
<Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
<Cell><Data ss:Type="String">WR Description</Data></Cell>
<Cell><Data ss:Type="String">W/O No</Data></Cell>
<Cell><Data ss:Type="String">Charge Plant</Data></Cell>
<Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
<Cell><Data ss:Type="String">Equip NO</Data></Cell>
<Cell><Data ss:Type="String">Equipment Name</Data></Cell>
<Cell><Data ss:Type="String">Required Date</Data></Cell>
<Cell><Data ss:Type="String">WO Type</Data></Cell>
<Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
<Cell><Data ss:Type="String">Exec. Plant</Data></Cell>  
<Cell><Data ss:Type="String">Plant1</Data></Cell>
<Cell><Data ss:Type="String">Area</Data></Cell>
<Cell><Data ss:Type="String">Confirmed</Data></Cell>
<Cell><Data ss:Type="String">WO Status</Data></Cell>
<Cell><Data ss:Type="String">W/R Requester</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

I was looking at other answers but they all say that this error happens only when the xml file has something in front of the part. But there's nothing there, I checked. Other than that, I removed the spaces (tab entries) and the error still happens.

RemoveLineFromFile is modified from the tutorial. But basically it removes the original Empty rows that contained no data at the beginning and end (2 in the beginning, 2 in the end). It does a check to see if they have been removed.

private static void removeLineFromFile(String file) {

        BufferedReader br = null;
        PrintWriter pw = null;
        try {
            File inFile = new File(file);
            if (!inFile.isFile()) {
                return;
            }

            br = new BufferedReader(new FileReader(file));

            String line = null;
            int totalRows=0;
            boolean continueMethod = false;
            //Count total number of rows in file
            while ((line = br.readLine()) != null) {
                //check if file is already formatted
                if (line.contains("List for Work")){
                    continueMethod = true;
                }

                if (line.toLowerCase().contains("</row>")){
                        ++totalRows;
                    }
                }

            if (continueMethod)
            {
                //Create a temporary file to hold the file with deleted lines.
                File tempFile = new File(inFile.getAbsolutePath() + ".tmp");
                pw = new PrintWriter(new FileWriter(tempFile));

                line = null;
                br.close();
                br = null;
                br = new BufferedReader(new FileReader(file));
                boolean ignoreMe = false;
                int rowCounter = 0;
                int rowCloser = 0;
                //begin cycling through file and writing to new one.
                while((line = br.readLine()) != null)
                {
                    //if runs into a row, count it.
                    if (line.toLowerCase().contains("<row>")){
                        rowCounter++;
                    }
                    if (line.toLowerCase().contains("</row>")){
                        rowCloser++;
                    }
                    //Delete the first two, and last two lines
                    if ((rowCounter == 1 ) || (rowCounter == 2) || (rowCounter == (totalRows-1)) || (rowCounter == totalRows))
                    {
                        ignoreMe = true;
                        //If it reached the last closing tag, exit out of this to allow it to write the rest of the file.
                        if (rowCloser==totalRows)
                            rowCounter++;                   
                    }
                    else
                    {
                        ignoreMe = false;
                    }
                    //copy over other lines
                    if (!ignoreMe)
                    {
                        pw.println(line);
                        pw.flush();
                    }
                }   
                br.close();
                pw.close();
                //Delete the original file
                if (!inFile.delete()) {
                    System.out.println("Could not delete original file");
                    return;
                }

                //Rename the new file to the filename the original file had.
                if (!tempFile.renameTo(inFile))
                    System.out.println("Could not rename temp file");
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

xmlfile before using the RemoveLineFromFile:

<?xml version="1.0" encoding="utf-16"?>
<?mso-application progid="Excel.Sheet"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>marc</Author>
<LastAuthor>ESDI</LastAuthor>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>7560</WindowHeight>
<WindowWidth>12300</WindowWidth>
<WindowTopX>360</WindowTopX>
<WindowTopY>135</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s21">
<NumberFormat ss:Format="Short Date"/>
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table x:FullColumns="1" x:FullRows="1">
<Row>
<Cell><Data ss:Type="String">List for Work Request(F7BAH1P)</Data></Cell>
</Row>
<Row>
</Row>
            <Row>
    <Cell><Data ss:Type="String">Crt. Dte</Data></Cell>
    <Cell><Data ss:Type="String">WR Status</Data></Cell>
    <Cell><Data ss:Type="String">Request Plant</Data></Cell>
    <Cell><Data ss:Type="String">Request #</Data></Cell>    
    <Cell><Data ss:Type="String">Item#</Data></Cell>
    <Cell><Data ss:Type="String">Request Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">WR Description</Data></Cell>
    <Cell><Data ss:Type="String">W/O No</Data></Cell>
    <Cell><Data ss:Type="String">Charge Plant</Data></Cell>
    <Cell><Data ss:Type="String">Charge Cost Center</Data></Cell>
    <Cell><Data ss:Type="String">Equip NO</Data></Cell>
    <Cell><Data ss:Type="String">Equipment Name</Data></Cell>
    <Cell><Data ss:Type="String">Required Date</Data></Cell>
    <Cell><Data ss:Type="String">WO Type</Data></Cell>
    <Cell><Data ss:Type="String">Exec. C/C</Data></Cell>
    <Cell><Data ss:Type="String">Exec. Plant</Data></Cell>  
    <Cell><Data ss:Type="String">Plant1</Data></Cell>
    <Cell><Data ss:Type="String">Area</Data></Cell>
    <Cell><Data ss:Type="String">Confirmed</Data></Cell>
    <Cell><Data ss:Type="String">WO Status</Data></Cell>
    <Cell><Data ss:Type="String">W/R Requester</Data></Cell>

            </Row>






 <Row>
</Row>
<Row>
<Cell><Data ss:Type="String">Count: 244</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Selected/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

回答1:

It looks like you are having character set conversion issues.

The code you have for reading a file in is as follows:

    String fileContent = IOUtils.toString(new FileInputStream(file));
    // SAX parser creation omitted.
    ByteArrayInputStream bis = new ByteArrayInputStream(fileContent.getBytes());

    parser.parse(bis, handler); //Apparently error happens here**

You read a file in as a string using the default character set, and then convert it back to bytes again using the default character set, before passing the resulting byte-array input stream to the SAX parser. The XML file specifies a character set of UTF-16, and I'm guessing that your default character set is not UTF-16, so it would be wrong to read a UTF-16 file in as if it used some other character set.

You could try specifying a character set of UTF-16 in the calls to IOUtils.toString() and in fileContent.getBytes(), but to be honest it's much simpler to avoid any character-set issues altogether by passing a FileInputStream directly to the parser:

    parser.parse(new FileInputStream(file), handler); 

I'll leave it up to you to modify the code to ensure that the FileInputStream gets closed once it has been finished with.



标签: java xml sax