Efficient way to search records from an excel file

2019-07-23 13:58发布

问题:

I am creating a web application which allows you to search some records from excel sheets.Excel sheets which need to be searched is a large one , it has 100k rows and 500 columns.

What approach should be use for this? I just want to allow users to perform search on this excel sheet,SO should I use database for it(for it first I need to set excel sheet values in database)? Or Is there any other better approach than this?

Does Apache POI provides any such direct method for searching excel files?

What will be most efficient approach for this scenario.Help with respective links will be appreciated

回答1:

Since the excel file you are dealing with is big, I would suggest you to use XSSF + SAX approach. With this approach you can stream XML files that store the data inside excel files. You can capture the data from the events generated by SAX parser. Compare the search data against each row. For each search, you will have to stream through the entire XML, but it is really fast.

------------EDITED----------------------------------

In detail,

Rename the .xlsx file as .zip. It will give you a zip file. Open it and you will see a folder called "xl". Inside "xl" folder concentrate on the following things.

worksheets folder : contains the xml representation of each sheet. sharedStrings.xml : Common place for storing all the string values across sheets. styles.xml : common place for holding the styles(color, formula...) applied to cells. workbook.xml : Contains the information about all sheets in the .xlsx file

Now start the journey from workbook.xm. Inside this file all sheets are covered in tag. sheets are represented as <sheet> Eg: <sheet name="General" sheetId="15" r:id="rId1"/> Take r:id, that will help you to access the General.xml which represents the General sheet of .xlsx file.

Once you have r:id, http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api will help you to read the XML file and get its contents.

OPCPackage pkg = OPCPackage.open(filename); //opens the .xlsx file
XSSFReader r = new XSSFReader( pkg ); //reads the xml fiels indide the .xlsx file
XMLReader parser = fetchSheetParser(sst); //SAX parser for parsing the XML files.

Feed the parser with XMl read using XSSFReader, write an event handler that implements DefaultHandler interface to capture the events generated by XMLReader.

SharedStringsTable sharedStringsTable = reader.getSharedStringsTable(); // using this we can access sharedStrings.xml 
StylesTable stylesTable = reader.getStylesTable(); // using this we can access styles.xml 

Since we are streaming the XML, we dont have to worry about memory consumption.


回答2:

maybe you can try to import your excel spreadsheet content to MySQL or NoSQL (e.g. MongoDB). First you want to parse your Excel into database, then create a query interface. I canot thinking of an effective way for querying spreadsheets