Excel formula evaluation in Pandas

2019-08-11 22:54发布

Say i have a dict like this whose key's are cell references in excel and value's are either formula or integer.

input_dict = {
    "B25": "=B2*B4",
    "C25": "=C2*C4",
    "D25": "=D2*D4",
    "E25": "=E2*E4",
    "F25": "=F2*F4",
    "G25": "=G2*G4",
    "B22": 0,
    "C22": "=SUM(B22:B28)",
    "D22": "=SUM(C22:C28)",
    "E22": "=SUM(D22:D28)",
    "F22": "=SUM(E22:E28)",
    "G22": "=SUM(F22:F28)",
    "B28": "=B19*B20*B22",
    "C28": "=C19*C20*C22",
    "D28": "=D19*D20*D22",
    "E28": "=E19*E20*E22",
    "F28": "=F19*F20*F22",
    "G28": "=G19*G20*G22",
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

How to perform the calculation on this type of data.

My approach was to convert the dict to DataFrame and perform the calculations on it but i am stuck.

df = pd.DataFrame(list(input_dict.items()))

The output of df is shown below.

    0   1
0   G22 =SUM(F22:F28)
1   G4  0.6
2   F2  840
3   D2  880
4   C20 3
5   C4  0.9
6   B28 =B19*B20*B22
7   F25 =F2*F4
8   B25 =B2*B4
9   G25 =G2*G4
10  C28 =C19*C20*C22
11  G28 =G19*G20*G22
12  F22 =SUM(E22:E28)
13  C25 =C2*C4
14  B19 0.001
15  E4  0.8
16  D22 =SUM(C22:C28)
17  D4  0.8
18  G2  800
19  E28 =E19*E20*E22
20  D20 4
21  G20 4
22  E25 =E2*E4
23  F20 3
24  G19 0.002
25  E22 =SUM(D22:D28)
26  C2  900
27  D25 =D2*D4
28  E2  860
29  D28 =D19*D20*D22
30  C19 0.001
31  F28 =F19*F20*F22
32  B20 4
33  B2  1000
34  F4  0.7
35  E19 0.001
36  D19 0.001
37  B4  0.95
38  B22 0
39  F19 0.001
40  C22 =SUM(B22:B28)
41  E20 4

How to perform excel like calculations in Python ?

The expected output is as below

{
    "B25": "950",
    "C25": "810",
    "D25": "704",
    "E25": "688",
    "F25": "588",
    "G25": "480",
    "B22": 0,
    "C22": 950,
    "D22": 1757.15,
    "E22": 2454.1214,
    "F22": 3710.908,
    "G22": 4161.220736,
    "B28": 0,
    "C28": -2.85,
    "D28": -7.0286,
    "E28": -9.8164856,
    "F28": -9.396914743,
    "G28": -29.687264,
    "B2": 1000,
    "C2": 900,
    "D2": 880,
    "E2": 860,
    "F2": 840,
    "G2": 800,
    "B4": 0.95,
    "C4": 0.90,
    "D4": 0.80,
    "E4": 0.80,
    "F4": 0.70,
    "G4": 0.60,
    "B19": 0.001,
    "C19": 0.001,
    "D19": 0.001,
    "E19": 0.001,
    "F19": 0.001,
    "G19": 0.002,
    "B20": 4,
    "C20": 3,
    "D20": 4,
    "E20": 4,
    "F20": 3,
    "G20": 4
  }

2条回答
Viruses.
2楼-- · 2019-08-11 23:21

You could use regex (regular expressions) and Python's eval function.

Let's assume we have

d = {'A1': '=A2+A3',
     'A2': '=SUM(A3:A5)',
     'A3': 3,
     'A4': 6,
     'A5': -1,
     ...}

the entire function would look like

import re
def g(s):
    """Excel-like evaluation with recurrence"""
    if isinstance(s,(int, float)):
        return s
    s=re.sub(r'=', '', s)
    s=re.sub(r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)','sum([g(d[chr(i)+str(j)]) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)])',s)
    s=re.sub(r'([A-Z][0-9])',r'g(d["\1"])',s)
    return eval(s)

For example

>>> print(g(d['A1']))
11

Let's have a look at the single steps:

  • First of all we get rid of the =. One could alternatively write a test and only evaluate the formula if it starts with a =, up to the reader.
  • With re.sub(r'([A-Z][0-9])', r'g(d["\g<1>"])', any_string) one replaces a group with one capital letter and a number (e.x. 'A3') with the dictionary look-up of it (e.x. 'g(d["A3"])')

    • we need to apply g() again if the new cell value is still a formula (recurrence)
    • Note: If there are entries like 'A$3' or '$AB$4' as they occur in Excel, one could use r'$?([A-Z]+)$?([0-9])' as search pattern and r'd["\g<1>\g<2>"]' as substitution.
  • Then we can evaluate this string with eval(). Up to now one can use all implemented Python operations such as +, -, *, /, //, %, etc.

  • All other functions need to be implemented manually by substituting them with Python expressions. Her is an example for SUM(A3:B10):

    With r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)' we search for the formula.

    [chr(i)+str(j) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)]] gives us all table indices that are in the sum. Then we apply g(d[...]) on each of them (recurrence) and take the sum.

This can be extended to any Excel formula of course.

查看更多
欢心
3楼-- · 2019-08-11 23:30

You'll need something to parse Excel formulas and convert them into a form which allows executing the calculations.

A quick search brings up pycel as the most promising Python library for this. It does not support all of Excel's functions and syntax, but it should probably support what you need, and it definitely supports the formulas in the example you posted.

Also see this answer to a similar SO question. As it mentions, you can also actually connect to Excel, have it execute all of the calculations, and then just read the results. One way to do this is using the win32com library, as detailed in the previously mentioned answer.

查看更多
登录 后发表回答