Iam working on a excel workbook that needs to find median of all column values where another column value is a specific string using openpyxl. For this purpose, I have used MEDIAN
and IF
combined, where Ctrl+Shift+Enter needs to be pressed for it to be considered as an Array formula.
But openpyxl doesn't keep this array formula's nature while saving the file.
Example Data:
Values | IDS
3.5 | 1234
2.5 | 1234
6.5 | 5687
7.5 | 1234
9.5 | 1234
1.0 | 7894
For the above data, I should be able to only find median of values whose adjacent id=1234.
I have to use openpyxl and so need a non-array formula to loop through list of arrays with ids and find the medians of each matching id.
openpyxl supports limited parsing of formulas embedded in cells. The openpyxl.formula package contains a Tokenizer class to break formulas into their constituent tokens. A token in an Excel formula.
Tokens have three attributes:
- value: The string value parsed that led to this token (The actual formula, in our Median-IF formula as you would write in excel.)
- type: A string identifying the type of token
- subtype: A string identifying subtype of the token (optional, and defaults to “”)
Your Median-If formula could be different, the code below is for reference only. It suggest how to parse (or say, validate) the array formulas using openpyxl. Usage is as follows:
>>> from openpyxl.formula import Tokenizer
>>> tok = Tokenizer("""{=MEDIAN(IF($B$1:$B$6="1234",$A$1:$A$6,""))}""")
>>> tok.type = Token.ARRAY
>>> tok.parse()
Kindly note that openpyxl support either the formula or the value of the formula which one can select using the data_only flag when opening a workbook. However, openpyxl does not and will not calculate the result of a formula. Use either of the following for a greater control to Excel in Python:
- pycel
- xlwings
Attribution: openpyxl documentation (Tokenizer)