Say i have a dict like this whose key's are cell references in excel and value's are either formula or integer.
input_dict = {
"B25": "=B2*B4",
"C25": "=C2*C4",
"D25": "=D2*D4",
"E25": "=E2*E4",
"F25": "=F2*F4",
"G25": "=G2*G4",
"B22": 0,
"C22": "=SUM(B22:B28)",
"D22": "=SUM(C22:C28)",
"E22": "=SUM(D22:D28)",
"F22": "=SUM(E22:E28)",
"G22": "=SUM(F22:F28)",
"B28": "=B19*B20*B22",
"C28": "=C19*C20*C22",
"D28": "=D19*D20*D22",
"E28": "=E19*E20*E22",
"F28": "=F19*F20*F22",
"G28": "=G19*G20*G22",
"B2": 1000,
"C2": 900,
"D2": 880,
"E2": 860,
"F2": 840,
"G2": 800,
"B4": 0.95,
"C4": 0.90,
"D4": 0.80,
"E4": 0.80,
"F4": 0.70,
"G4": 0.60,
"B19": 0.001,
"C19": 0.001,
"D19": 0.001,
"E19": 0.001,
"F19": 0.001,
"G19": 0.002,
"B20": 4,
"C20": 3,
"D20": 4,
"E20": 4,
"F20": 3,
"G20": 4
}
How to perform the calculation on this type of data.
My approach was to convert the dict to DataFrame and perform the calculations on it but i am stuck.
df = pd.DataFrame(list(input_dict.items()))
The output of df is shown below.
0 1
0 G22 =SUM(F22:F28)
1 G4 0.6
2 F2 840
3 D2 880
4 C20 3
5 C4 0.9
6 B28 =B19*B20*B22
7 F25 =F2*F4
8 B25 =B2*B4
9 G25 =G2*G4
10 C28 =C19*C20*C22
11 G28 =G19*G20*G22
12 F22 =SUM(E22:E28)
13 C25 =C2*C4
14 B19 0.001
15 E4 0.8
16 D22 =SUM(C22:C28)
17 D4 0.8
18 G2 800
19 E28 =E19*E20*E22
20 D20 4
21 G20 4
22 E25 =E2*E4
23 F20 3
24 G19 0.002
25 E22 =SUM(D22:D28)
26 C2 900
27 D25 =D2*D4
28 E2 860
29 D28 =D19*D20*D22
30 C19 0.001
31 F28 =F19*F20*F22
32 B20 4
33 B2 1000
34 F4 0.7
35 E19 0.001
36 D19 0.001
37 B4 0.95
38 B22 0
39 F19 0.001
40 C22 =SUM(B22:B28)
41 E20 4
How to perform excel like calculations in Python ?
The expected output is as below
{
"B25": "950",
"C25": "810",
"D25": "704",
"E25": "688",
"F25": "588",
"G25": "480",
"B22": 0,
"C22": 950,
"D22": 1757.15,
"E22": 2454.1214,
"F22": 3710.908,
"G22": 4161.220736,
"B28": 0,
"C28": -2.85,
"D28": -7.0286,
"E28": -9.8164856,
"F28": -9.396914743,
"G28": -29.687264,
"B2": 1000,
"C2": 900,
"D2": 880,
"E2": 860,
"F2": 840,
"G2": 800,
"B4": 0.95,
"C4": 0.90,
"D4": 0.80,
"E4": 0.80,
"F4": 0.70,
"G4": 0.60,
"B19": 0.001,
"C19": 0.001,
"D19": 0.001,
"E19": 0.001,
"F19": 0.001,
"G19": 0.002,
"B20": 4,
"C20": 3,
"D20": 4,
"E20": 4,
"F20": 3,
"G20": 4
}
You could use regex (regular expressions) and Python's
eval
function.Let's assume we have
the entire function would look like
For example
Let's have a look at the single steps:
=
. One could alternatively write a test and only evaluate the formula if it starts with a=
, up to the reader.With
re.sub(r'([A-Z][0-9])', r'g(d["\g<1>"])', any_string)
one replaces a group with one capital letter and a number (e.x. 'A3') with the dictionary look-up of it (e.x. 'g(d["A3"])')g()
again if the new cell value is still a formula (recurrence)r'$?([A-Z]+)$?([0-9])'
as search pattern andr'd["\g<1>\g<2>"]'
as substitution.Then we can evaluate this string with
eval()
. Up to now one can use all implemented Python operations such as+, -, *, /, //, %, etc.
All other functions need to be implemented manually by substituting them with Python expressions. Her is an example for
SUM(A3:B10)
:With
r'SUM\(([A-Z])([0-9]):([A-Z])([0-9])\)'
we search for the formula.[chr(i)+str(j) for j in range(\g<2>,\g<4>+1) for i in range(ord("\g<1>"), ord("\g<3>")+1)]]
gives us all table indices that are in the sum. Then we applyg(d[...])
on each of them (recurrence) and take the sum.This can be extended to any Excel formula of course.
You'll need something to parse Excel formulas and convert them into a form which allows executing the calculations.
A quick search brings up pycel as the most promising Python library for this. It does not support all of Excel's functions and syntax, but it should probably support what you need, and it definitely supports the formulas in the example you posted.
Also see this answer to a similar SO question. As it mentions, you can also actually connect to Excel, have it execute all of the calculations, and then just read the results. One way to do this is using the win32com library, as detailed in the previously mentioned answer.