How to check a number in a string contains a date

2019-03-04 13:33发布

问题:

I am parsing an excel file which contains many dates like 13-4-2021 and some numbers in this 3,7%,2,65% format.So i am parsing that excel file and i am getting the data in a string to write them in a text file.So my problem is that i am getting the date in a whole number like 44299, while it is actually in 04/13/2021 format in the excel sheet.And another case is i have some numbers with percentage like 3,7%,2,65% which are coming like 3.6999999999999998E-2.So i can convert the number to a date using

SimpleDateFormat("MM/dd/yyyy").format(javaDate)

Here is the code i am using

private static class SheetHandler extends DefaultHandler {
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;
    private int rowNumber;

    private SheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        try {

            // row => row
            if(name.equals("row")) {
                   if (attributes.getValue("r") != null) {
                    rowNumber = Integer.valueOf(attributes.getValue("r"));
                   } else {
                    rowNumber++;
                   }
                   //System.out.println("row: " + rowNumber);
                  }

        if (rowNumber > 6) {

        // c => cell
        if(name.equals("c")) {
            // Print the cell reference 

            //System.out.print(attributes.getValue("r") + " - ");
            // Figure out if the value is an index in the SST
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true; 
            } else {
                nextIsString = false;
              }

        }
        // Clear contents cache
        lastContents = "";
        }
        }catch(Exception e) {
            e.printStackTrace();
        }
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once

        if (rowNumber > 6) {


        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);

            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }
        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
           // System.out.println(lastContents);

            if(!lastContents.isEmpty() ) // Here i am putting the values to a list to process 

                pickUpExcelValues.add(lastContents);
            }
        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        lastContents += new String(ch, start, length);
    }
}

But how i will check the string contains lets say 44299 is a date? And also i have no idea how to convert this 3.6999999999999998E-2 to 3,7% while writing to a text file.If anybody have any idea please help.

回答1:

This question needs further explanations.

At first, it is related to How to skip the rows in a xlsm file using apache event user model which was answered.

But if one wants using the examples from XSSF and SAX (Event API) then one needs basic knowledge about the XML used in the Office Open XML.

And the ExampleEventUserModel is a very low level example showing the streaming principle. For extending this to taking formats into account needs parsing the styles table too and then using DataFormatter.

The following is a complete example which is doing exactly this. But there is a fuller example, including support for fetching number formatting information and applying it to numeric cells (eg to format dates or percentages). Please see the XLSX2CSV example in svn.

import java.io.InputStream;
import java.util.Iterator;

import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.model.StylesTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.apache.poi.xssf.usermodel.XSSFCellStyle;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.BuiltinFormats;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.util.SAXHelper;
import javax.xml.parsers.ParserConfigurationException;

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;


public class ExampleEventUserModel {
 public void processOneSheet(String filename) throws Exception {
  OPCPackage pkg = OPCPackage.open(filename);
  XSSFReader r = new XSSFReader( pkg );
  SharedStringsTable sst = r.getSharedStringsTable();

  StylesTable st = r.getStylesTable();
  XMLReader parser = fetchSheetParser(sst, st);

  // To look up the Sheet Name / Sheet Order / rID,
  //  you need to process the core Workbook stream.
  // Normally it's of the form rId# or rSheet#
  InputStream sheet2 = r.getSheet("rId2");
  InputSource sheetSource = new InputSource(sheet2);
  parser.parse(sheetSource);
  sheet2.close();
 }

 public void processAllSheets(String filename) throws Exception {
  OPCPackage pkg = OPCPackage.open(filename);
  XSSFReader r = new XSSFReader( pkg );
  SharedStringsTable sst = r.getSharedStringsTable();

  StylesTable st = r.getStylesTable();
  XMLReader parser = fetchSheetParser(sst, st);

  Iterator<InputStream> sheets = r.getSheetsData();
  while(sheets.hasNext()) {
   System.out.println("Processing new sheet:\n");
   InputStream sheet = sheets.next();
   InputSource sheetSource = new InputSource(sheet);
   parser.parse(sheetSource);
   sheet.close();
   System.out.println("");
  }
 }

 public XMLReader fetchSheetParser(SharedStringsTable sst, StylesTable st) throws SAXException, ParserConfigurationException {
/*
  XMLReader parser =
  XMLReaderFactory.createXMLReader(
       "org.apache.xerces.parsers.SAXParser"
  );
*/
  XMLReader parser = SAXHelper.newXMLReader();
  ContentHandler handler = new SheetHandler(sst, st);
  parser.setContentHandler(handler);
  return parser;
 }


 /** 
  * See org.xml.sax.helpers.DefaultHandler javadocs 
  */
 private static class SheetHandler extends DefaultHandler {
  private SharedStringsTable sst;
  private StylesTable st;
  private String lastContents;
  private boolean nextIsString;
  private boolean nextIsStyledNumeric;
  private boolean inlineStr;
  private int styleIndex;
  private DataFormatter formatter;

  private int rowNumber;

  private SheetHandler(SharedStringsTable sst, StylesTable st) {
   this.sst = sst;
   this.st = st;
   this.rowNumber = 0;
   this.formatter = new DataFormatter(java.util.Locale.US, true);
   this.styleIndex = 0;
  }

  public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {

   // row => row
   if(name.equals("row")) {
    if (attributes.getValue("r") != null) {
     rowNumber = Integer.valueOf(attributes.getValue("r"));
    } else {
     rowNumber++;
    }
    System.out.println("row: " + rowNumber);
   }

   if (rowNumber > 6) {

    // c => cell
    if(name.equals("c")) {
     // Print the cell reference
     System.out.print(attributes.getValue("r") + " - ");

     String cellType = attributes.getValue("t");

     // Figure out if the value is an index in the SST
     nextIsString = false;
     if(cellType != null && cellType.equals("s")) {
      nextIsString = true;
     } 

     // Figure out if the value is an inline string     
     inlineStr = false;
     if(cellType != null && cellType.equals("inlineStr")) {
      inlineStr = true;
     } 

     // Figure out if the value is an styled numeric value or date
     nextIsStyledNumeric = false;
     if(cellType != null && cellType.equals("n") || cellType == null) {
      String cellStyle = attributes.getValue("s");
      if (cellStyle != null) {
       styleIndex = Integer.parseInt(cellStyle);
       nextIsStyledNumeric = true;
      }
     } 
    }
   }

   // Clear contents cache
   lastContents = "";
  }

  public void endElement(String uri, String localName, String name)
            throws SAXException {
   if (rowNumber > 6) {

    // Process the last contents as required.
    // Do now, as characters() may be called more than once

    // If the value is in the shared string table, get it
    if(nextIsString) {
     int idx = Integer.parseInt(lastContents);
     lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
     nextIsString = false;
    }

    // v => contents of a cell
    // Output after we've seen the string contents
    if(name.equals("v") || (inlineStr && name.equals("c"))) {
     // If the value is styled numeric, use DataFormatter to formaat it
     if (nextIsStyledNumeric) {
      XSSFCellStyle style = st.getStyleAt(styleIndex);
      int formatIndex = style.getDataFormat();
      String formatString = style.getDataFormatString();
      if (formatString == null) {
       // formatString could not be found, so it must be a builtin format.
       formatString = BuiltinFormats.getBuiltinFormat(formatIndex);
      }
      double value = Double.valueOf(lastContents);
      lastContents = formatter.formatRawCellContents(value, formatIndex, formatString);
      nextIsStyledNumeric = false;
     } 
     // Print out the contents
     System.out.println(lastContents);
    }
   }
  }

  public void characters(char[] ch, int start, int length)
            throws SAXException {
   //collect each character part to the content
   lastContents += new String(ch, start, length);
  }
 }

 public static void main(String[] args) throws Exception {
  ExampleEventUserModel example = new ExampleEventUserModel();
  //example.processOneSheet(args[0]);
  example.processAllSheets(args[0]);
 }
}


回答2:

This function wraps the two cases (a percentage or a date):

private static String convert(String s) {
    if(s.indexOf('E') > -1) {   // It's a percentage
        String[] components = s.split("E");
        double num = Double.parseDouble(components[0]) * Math.pow(10, Integer.parseInt(components[1]));
        //return String.valueOf(num);                   // will return i.e. "0.037"
        return Math.round(num * 10000.0) / 100.0 + "%"; // will return i.e. "3.7%"
    }
    else {  // It's a date
        SimpleDateFormat sdf = new SimpleDateFormat("MM/dd/yyyy");
        GregorianCalendar gc = new GregorianCalendar(1900, 0, 0);
        gc.add(Calendar.DATE, Integer.parseInt(s) - 1);
        Date date = gc.getTime();
        return sdf.format(date);
    }
}

Note that the serial number dates in Excel represent the days that have elapsed since January 1st 1900, hence the conversion I've used.

Let me know how it works for you



回答3:

It seems to me that you can distinguish your strings from Excel in a way similar to this:

private static void checkNumber(String fromExcel) {
    try {
        double asNumber = Double.parseDouble(fromExcel);
        if (asNumber >= 0 && asNumber <= 1) {
            System.out.println("Percentage: " + asNumber * 100 + " %");
        }
        long asWholeNumber = Math.round(asNumber);
        try {
            LocalDate asDate = LocalDate.of(1899, Month.DECEMBER, 30)
                    .plusDays(asWholeNumber);
            if (asDate.isAfter(LocalDate.of(2000, Month.DECEMBER, 31)) 
                    && asDate.isBefore(LocalDate.of(2035, Month.JANUARY, 1))) {
                System.out.println("Date: " + asDate);
            }
        } catch (DateTimeException dte) {
            System.out.println("Unidentified: " + fromExcel);
        }
    } catch (NumberFormatException nfe) {
        System.out.println("Unidentified: " + fromExcel);
    }
}

Try this method out:

    checkNumber("44299");
    checkNumber("3.6999999999999998E-2");

This prints:

Date: 2021-04-13
Percentage: 3.6999999999999997 %

That there are two possible interpretations for the strings should not keep you from doing a validation and catching strings that do not conform with either interpretation, so I am trying some filtering of each case. Please note that if you include 1899 in the accepted dates, "0" and "1" will be accepted both as dates and as percentages.

I am using and recommending LocalDate from java.time, the modern Java date and time API, for handling dates. The modern API is so much nicer to work with then the outdated Date and GregorianCalendar.

Gary’ Student’s idea of getting a formatted cell might be a more correct way to go.

Question: My code needs to be compatible with Java 6; can I use java.time?

EDIT: Yes, java.time can work nicely in Java 6.

  • In Java 8 and later and on new Android devices (from API level 26, I’m told) the new API comes built-in.
  • In Java 6 and 7 get the ThreeTen Backport, the backport of the new classes (ThreeTen for JSR 310, where the modern API was first described). EDIT 2: Make sure you import the date and time classes and exceptions from package org.threeten.bp and subpackages.
  • On (older) Android, use the Android edition of ThreeTen Backport. It’s called ThreeTenABP. Also here make sure you import the date and time classes from package org.threeten.bp and subpackages.

Links

  • Oracle tutorial: Date Time, explaining how to use java.time.
  • ThreeTen Backport project
  • ThreeTenABP, Android edition of ThreeTen Backport
  • Question: How to use ThreeTenABP in Android Project, with a very thorough explanation.
  • Java Specification Request (JSR) 310.