How to read password protected excel files using P

2019-08-03 14:36发布

问题:

I want to read password protected excel files (.xls and .xlsx) with Apache POI. I am not using the usermodel (org.apache.poi.ss.usermodel) but an Event API to process xls and xlsx files (to address the memory footprint issue).

I am implementing HSSFListener and overriding its processRecord(Record record) method for xls files. For xlsx files I am using javax.xml.parsers.SAXParser and org.xml.sax.XMLReader.

If I use below code to read .xls file:

Biff8EncryptionKey.setCurrentUserPassword("password");
        POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(this.getFileName()));
        MissingRecordAwareHSSFListener listener = new MissingRecordAwareHSSFListener(this);
        formatListener = new FormatTrackingHSSFListener(listener);

        HSSFEventFactory factory = new HSSFEventFactory();
        HSSFRequest request = new HSSFRequest();

        request.addListenerForAllRecords(formatListener);
        rowsReadSet.clear();
        factory.processWorkbookEvents(request, fs);

I get this exception:

Exception in thread "Thread-6" org.apache.poi.EncryptedDocumentException: HSSF does not currently support CryptoAPI encryption
    at org.apache.poi.hssf.record.FilePassRecord$Rc4KeyData.read(FilePassRecord.java:65)
    at org.apache.poi.hssf.record.FilePassRecord.<init>(FilePassRecord.java:193)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:87)
    at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:338)
    at org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.<init>(RecordFactoryInputStream.java:74)
    at org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:207)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:136)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:103)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:62)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:50)
    at com.mycompany.component.reader.MSExcelReader.readxls(MSExcelReader.java:300)
    at com.mycompany.component.reader.MSExcelReader.run(MSExcelReader.java:274)
    at java.lang.Thread.run(Unknown Source)

I will post the code for .xlsx later once I get reading .xls in place.

I am using JDK7 with Apache POI 3.11. Can someone please help?

[EDITED]

Another question is, I read here that

POI will not be able to read encrypted workbooks - that means that if you have protected the entire workbook (and not just a sheet), then it won't be able to read it. Otherwise, it should work.

Is this true? So with POI 3.11, cannot I read password protected file where password is set for entire workbook (generally this is done by Save As -> Tools -> General options)?

[EDITED]:

If I set the sheet password (using Review -> Protect Sheet ribbon option) and read the file using event model, it works fine. However it fails if I set the password for entire workbook (using Review -> Protect workbook ribbon option) or set the password for file (using Save As -> Tools -> General options). Below are the exceptions I get for both the approaches:

1. Using Review -> Protect Workbook ribbon option

Exception in thread "Thread-6" org.apache.poi.EncryptedDocumentException: Supplied password is invalid for salt/verifier/verifierHash
    at org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.createDecryptingStream(RecordFactoryInputStream.java:127)
    at org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:209)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:136)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:103)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:62)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:50)

I can read this file by supplying Decryptor.DEFAULT_PASSWORD or "VelvetSweatshop" string as password (which is a default one). So why it cannot read with the manually set password string?

2. Using Save As -> Tools -> General Options

Exception in thread "Thread-6" org.apache.poi.EncryptedDocumentException: HSSF does not currently support CryptoAPI encryption
    at org.apache.poi.hssf.record.FilePassRecord$Rc4KeyData.read(FilePassRecord.java:65)
    at org.apache.poi.hssf.record.FilePassRecord.<init>(FilePassRecord.java:193)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
    at java.lang.reflect.Constructor.newInstance(Unknown Source)
    at org.apache.poi.hssf.record.RecordFactory$ReflectionConstructorRecordCreator.create(RecordFactory.java:87)
    at org.apache.poi.hssf.record.RecordFactory.createSingleRecord(RecordFactory.java:338)
    at org.apache.poi.hssf.record.RecordFactoryInputStream$StreamEncryptionInfo.<init>(RecordFactoryInputStream.java:74)
    at org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:207)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:136)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:103)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:62)
    at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:50)

I am using the same password for all the three approaches mentioned above and using MS Office Professional Plus 2013. Why doesn't it work using the first approach above (using Review -> Protect Workbook). I am getting password is incorrect in the exception. The exception using second approach clearly specifies that HSSF doesn't support the encryption, so that is ok. But I expect it should work if Workbook is also protected (using Review -> Protect Workbook) if it can read the Sheet with the password (which is set using Review -> Protect Sheet). Can experts clarify please?

[EDITED]

Ok, I was wrong when I said that it works with usermodel. With above two approaches (listed in edited section before), it doesn't work with usermodel either. I get same exception with Approach 1. Using Review -> Protect Workbook ribbon option and Approach 2. Using Save As -> Tools -> General Options. Whereas I can read the file for which Sheet password is set (similar observation is seen with even model too). Please see below the sample testcase. I didn't find an option to attach excel file, so can't do that, but any simple excel file can be tested with below testcase (although test conditions will change according to the input).

Simple Testcase with user model and event model:

import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;
import junit.framework.TestCase;
import org.apache.poi.hssf.eventusermodel.HSSFEventFactory;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
import org.apache.poi.hssf.eventusermodel.HSSFRequest;
import org.apache.poi.hssf.record.BoundSheetRecord;
import org.apache.poi.hssf.record.NumberRecord;
import org.apache.poi.hssf.record.Record;
import org.apache.poi.hssf.record.crypto.Biff8EncryptionKey;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.poifs.filesystem.NPOIFSFileSystem;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.ss.usermodel.Cell;

/**
 * Testing for {@link HSSFEventFactory}
 */
public final class TestHSSFEventFactory extends TestCase {

    private String[] fileNames = {"C:\\XLS\\General_Password.xls",
            "C:\\XLS\\Sheet_Password.xls",
            "C:\\XLS\\Workbook_Password.xls"};

    private static class MockHSSFListener implements HSSFListener {
        private final List<Record> records = new ArrayList<Record>();

        public MockHSSFListener() {}
        public Record[] getRecords() {
            Record[] result = new Record[records.size()];
            records.toArray(result);
            return result;
        }

        public void processRecord(Record record) {
            records.add(record);
        }
    }

    public void testWithPasswordProtectedWorkbooksUserModel() throws Exception {
        // XOR/RC4 decryption for xls
        Biff8EncryptionKey.setCurrentUserPassword("4Sys-Tem");
        NPOIFSFileSystem nfs = new NPOIFSFileSystem(new File(fileNames[2]), true);
        HSSFWorkbook hwb = new HSSFWorkbook(nfs.getRoot(), true);
        HSSFSheet sheet = hwb.getSheetAt(0);
        HSSFRow row = sheet.getRow(2);
        Cell cell1 = row.getCell(3);
        row = sheet.getRow(3);
        Cell cell2 = row.getCell(3);
        row = sheet.getRow(4);
        Cell cell3 = row.getCell(3);

        assertEquals("17000.0", cell1.toString());
        assertEquals("7500.0", cell2.toString());
        assertEquals("5000.0", cell3.toString());

        Biff8EncryptionKey.setCurrentUserPassword(null);
    }

    public void testWithPasswordProtectedWorkbooksEvenModel() throws Exception {
        // With the password, is properly processed
        Biff8EncryptionKey.setCurrentUserPassword("4Sys-Tem");
        HSSFRequest req = new HSSFRequest();
        MockHSSFListener mockListen = new MockHSSFListener();
        POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(fileNames[2]));
        HSSFEventFactory factory = new HSSFEventFactory();
        req.addListenerForAllRecords(mockListen);
        factory.processWorkbookEvents(req, fs);

        // Check we got the sheet and the contents
        Record[] recs = mockListen.getRecords();
        assertTrue( recs.length > 50 );

        // Has one sheet, with values 1,2,3 in column A rows 1-3
        boolean hasSheet=false, hasA1=false, hasA2=false, hasA3=false;
        for (Record r : recs) {
            if (r instanceof BoundSheetRecord) {
                BoundSheetRecord bsr = (BoundSheetRecord)r;
                assertEquals("Trade Data", bsr.getSheetname());
                hasSheet = true;
            }
            if (r instanceof NumberRecord) {
                NumberRecord nr = (NumberRecord)r;
                if (nr.getColumn() == 3 && nr.getRow() == 2) {
                    assertEquals(17000, (int)nr.getValue());
                    hasA1 = true;
                }
                if (nr.getColumn() == 3 && nr.getRow() == 3) {
                    assertEquals(7500, (int)nr.getValue());
                    hasA2 = true;
                }
                if (nr.getColumn() == 3 && nr.getRow() == 4) {
                    assertEquals(5000, (int)nr.getValue());
                    hasA3 = true;
                }
            }
        }

        assertTrue("Sheet record not found", hasSheet);
        assertTrue("Numeric record for A1 not found", hasA1);
        assertTrue("Numeric record for A2 not found", hasA2);
        assertTrue("Numeric record for A3 not found", hasA3);
    }
}