I am new to Python and working on a project that I could use some help on. So I am trying to modify an existing excel workbook in order to compare stock data. Luckily, there was a program online that retrieved all the data I need and I have successful been able to pull the data and write the data into a new excel file. However, the goal is to pull the data and put it into an existing excel file. Furthermore, I need to overwrite the cell values in the existing file. I believe xlwings is able to do this and I think my code is on the right track, but I ran into an unexpected error. The error I get is:
TypeError: Objects of type 'Period' can not be converted to a COM VARIANT (but obtaining the buffer() of this object could)
I was wondering if anyone knew why this error came up? Also, does anyone know how to fix it? Is it fixable? Is my code wrong? Any help or guidance is appreciated. Thank you.
import good_morning as gm
import pandas as pd
import xlwings as xw
#import income statement, balance sheet, and cash flow of AAPL
fd = gm.FinancialsDownloader()
fd_frames = fd.download('AAPL')
#Creates a DataFrame for only the balance sheet
df1 = pd.DataFrame(list(fd_frames.values())[0])
#connects to workbook I want to modify
wb = xw.Book(r'C:\Users\vince\Project\Spreadsheet.xlsm')
#sheet I would like to modify
sht = wb.sheets[1]
#modifies & overwrites values in my spreadsheet(this is where I get the type_error)
sht.range('M6').value = df1
Data Types:
type(fd_frames)
>>> <class 'dict'>
fd_frames.values())[0].info()
>>> <class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 8 columns):
parent_index 22 non-null int64
title 22 non-null object
2012 19 non-null float64
2013 20 non-null float64
2014 20 non-null float64
2015 20 non-null float64
2016 20 non-null float64
2017 20 non-null float64
dtypes: float64(6), int64(1), object(1)
memory usage: 1.5+ KB
Comments: You have a Dict of pandas.DataFrame
.
Selecting from a Dict using list(fd_frames.values())[0]
does lead to unpredictable Results. Show the Keys of the Dict and choose the one you interested off using these Key, e.g.:
print(fd_frames.keys())
>>> dict_keys(['key_1', 'key_2', 'key_n']
df_2 = fd_frames['key_2']
Beside this, neither of the Dimension in your pandas.DataFrame
does match M6:M30
= 25. There are only 8 columns with 20 Values. Therfore you have to align your Worksheet Range to 20 Rows. To write Column 2017 to the Worksheet, e.g.:
wb['M6:M25'] = df_2['2017'].values
Note: I have updated the code below to accept numpy.ndarray
also.
Question: ... the goal is to pull the data and put it into an existing excel file
Update a Workbooks Worksheet Range with List Values.
Using: OpenPyXL: A Python library to read/write Excel 2010 xlsx/xlsm files
Note: Observe how the List Values have to be arranged!
param values: List: *[row 1(col1, ... ,coln), ..., row n(col1, ... ,coln)]`
from openpyxl import Workbook, load_workbook
class UpdateWorkbook(object):
def __init__(self, fname, worksheet=0):
self.fname = fname
self.wb = load_workbook(fname)
self.ws = self.wb.worksheets[worksheet]
def save(self):
self.wb.save(self.fname)
def __setitem__(self, _range, values):
"""
Assign Values to a Worksheet Range
:param _range: String e.g ['M6:M30']
:param values: List: [row 1(col1, ... ,coln), ..., row n(col1, ... ,coln)]
:return: None
"""
def _gen_value():
for value in values:
yield value
if not isinstance(values, (list, numpy.ndarray)):
raise ValueError('Values Type Error: Values have to be "list": values={}'.
format(type(values)))
if isinstance(values, numpy.ndarray) and values.ndim > 1:
raise ValueError('Values Type Error: Values of Type numpy.ndarray must have ndim=1; values.ndim={}'.
format(values.ndim))
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries(_range)
cols = ((max_col - min_col)+1)
rows = ((max_row - min_row)+1)
if cols * rows != len(values):
raise ValueError('Number of List Values:{} does not match Range({}):{}'.
format(len(values), _range, cols * rows))
value = _gen_value()
for row_cells in self.ws.iter_rows(min_col=min_col, min_row=min_row,
max_col=max_col, max_row=max_row):
for cell in row_cells:
cell.value = value.__next__()
Usage
wb = UpdateWorkbook(r'C:\Users\vince\Project\Spreadsheet.xlsx', worksheet=1)
df_2 = fd_frames['key_2']
wb['M6:M25'] = df_2['2017'].values
wb.save()
Tested with Python:3.4.2 - openpyxl:2.4.1 - LibreOffice:4.3.3.2
Here's how I do a similar procedure for other Stack explorers:
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
... create your pandas dataframe df...
# Writing from pandas back to an existing EXCEL workbook
# Load workbook
wb = load_workbook(filename=target, read_only=False, keep_vba=True)
ws = wb['Sheet1']
# Overwrite Existing data in sheet with a dataframe.
rows = dataframe_to_rows(df, index=False, header=True)
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
# Save file
wb.save('outfile.xlsm')