Keep the nature of array formulas when using openp

2020-01-20 05:41发布

问题:

Iam working on a excel workbook that needs to find median of all column values where another column value is a specific string using openpyxl. For this purpose, I have used MEDIAN and IF combined, where Ctrl+Shift+Enter needs to be pressed for it to be considered as an Array formula.

But openpyxl doesn't keep this array formula's nature while saving the file.

Example Data:

Values | IDS
3.5    |  1234    
2.5    |  1234    
6.5    |  5687    
7.5    |  1234    
9.5    |  1234    
1.0    |  7894

For the above data, I should be able to only find median of values whose adjacent id=1234.

I have to use openpyxl and so need a non-array formula to loop through list of arrays with ids and find the medians of each matching id.

回答1:

openpyxl supports limited parsing of formulas embedded in cells. The openpyxl.formula package contains a Tokenizer class to break formulas into their constituent tokens. A token in an Excel formula.

Tokens have three attributes:

  • value: The string value parsed that led to this token (The actual formula, in our Median-IF formula as you would write in excel.)
  • type: A string identifying the type of token
  • subtype: A string identifying subtype of the token (optional, and defaults to “”)

Your Median-If formula could be different, the code below is for reference only. It suggest how to parse (or say, validate) the array formulas using openpyxl. Usage is as follows:

>>> from openpyxl.formula import Tokenizer
>>> tok = Tokenizer("""{=MEDIAN(IF($B$1:$B$6="1234",$A$1:$A$6,""))}""")
>>> tok.type = Token.ARRAY
>>> tok.parse()

Kindly note that openpyxl support either the formula or the value of the formula which one can select using the data_only flag when opening a workbook. However, openpyxl does not and will not calculate the result of a formula. Use either of the following for a greater control to Excel in Python:

  1. pycel
  2. xlwings

Attribution: openpyxl documentation (Tokenizer)