Overwriting data to an existing workbook using Pyt

2019-08-05 02:11发布

问题:

I am new to Python and working on a project that I could use some help on. So I am trying to modify an existing excel workbook in order to compare stock data. Luckily, there was a program online that retrieved all the data I need and I have successful been able to pull the data and write the data into a new excel file. However, the goal is to pull the data and put it into an existing excel file. Furthermore, I need to overwrite the cell values in the existing file. I believe xlwings is able to do this and I think my code is on the right track, but I ran into an unexpected error. The error I get is:

TypeError: Objects of type 'Period' can not be converted to a COM VARIANT (but obtaining the buffer() of this object could)

I was wondering if anyone knew why this error came up? Also, does anyone know how to fix it? Is it fixable? Is my code wrong? Any help or guidance is appreciated. Thank you.

import good_morning as gm
import pandas as pd
import xlwings as xw

#import income statement, balance sheet, and cash flow of AAPL
fd = gm.FinancialsDownloader()
fd_frames = fd.download('AAPL')

#Creates a DataFrame for only the balance sheet
df1 = pd.DataFrame(list(fd_frames.values())[0])

#connects to workbook I want to modify 
wb = xw.Book(r'C:\Users\vince\Project\Spreadsheet.xlsm')

#sheet I would like to modify
sht = wb.sheets[1]

#modifies & overwrites values in my spreadsheet(this is where I get the type_error)
sht.range('M6').value = df1

Data Types:

type(fd_frames)
>>> <class 'dict'>
fd_frames.values())[0].info()
>>> <class 'pandas.core.frame.DataFrame'> 
RangeIndex: 22 entries, 0 to 21 
Data columns (total 8 columns): 
parent_index 22 non-null int64 
title 22 non-null object 
2012 19 non-null float64 
2013 20 non-null float64 
2014 20 non-null float64 
2015 20 non-null float64 
2016 20 non-null float64 
2017 20 non-null float64 
dtypes: float64(6), int64(1), object(1) 
memory usage: 1.5+ KB

回答1:

Comments: You have a Dict of pandas.DataFrame.

Selecting from a Dict using list(fd_frames.values())[0] does lead to unpredictable Results. Show the Keys of the Dict and choose the one you interested off using these Key, e.g.:

 print(fd_frames.keys())
 >>> dict_keys(['key_1', 'key_2', 'key_n']
 df_2 = fd_frames['key_2']

Beside this, neither of the Dimension in your pandas.DataFrame does match M6:M30 = 25. There are only 8 columns with 20 Values. Therfore you have to align your Worksheet Range to 20 Rows. To write Column 2017 to the Worksheet, e.g.:

wb['M6:M25'] = df_2['2017'].values

Note: I have updated the code below to accept numpy.ndarray also.


Question: ... the goal is to pull the data and put it into an existing excel file

Update a Workbooks Worksheet Range with List Values.
Using: OpenPyXL: A Python library to read/write Excel 2010 xlsx/xlsm files

Note: Observe how the List Values have to be arranged!
param values: List: *[row 1(col1, ... ,coln), ..., row n(col1, ... ,coln)]`

from openpyxl import Workbook, load_workbook

class UpdateWorkbook(object):
    def __init__(self, fname, worksheet=0):
        self.fname = fname
        self.wb = load_workbook(fname)
        self.ws = self.wb.worksheets[worksheet]

    def save(self):
        self.wb.save(self.fname)

    def __setitem__(self, _range, values):
        """
         Assign Values to a Worksheet Range
        :param _range:  String e.g ['M6:M30']
        :param values: List: [row 1(col1, ... ,coln), ..., row n(col1, ... ,coln)]
        :return: None
        """

        def _gen_value():
            for value in values:
                yield value

            if not isinstance(values, (list, numpy.ndarray)):
                raise ValueError('Values Type Error: Values have to be "list": values={}'.
                                  format(type(values)))
            if isinstance(values, numpy.ndarray) and values.ndim > 1:
                raise ValueError('Values Type Error: Values of Type numpy.ndarray must have ndim=1; values.ndim={}'.
                                  format(values.ndim))

        from openpyxl.utils import range_boundaries
        min_col, min_row, max_col, max_row = range_boundaries(_range)
        cols = ((max_col - min_col)+1)
        rows = ((max_row - min_row)+1)
        if cols * rows != len(values):
            raise ValueError('Number of List Values:{} does not match Range({}):{}'.
                             format(len(values), _range, cols * rows))

        value = _gen_value()
        for row_cells in self.ws.iter_rows(min_col=min_col, min_row=min_row,
                                           max_col=max_col, max_row=max_row):
            for cell in row_cells:
                cell.value = value.__next__()

Usage

wb = UpdateWorkbook(r'C:\Users\vince\Project\Spreadsheet.xlsx', worksheet=1)
df_2 = fd_frames['key_2']
wb['M6:M25'] = df_2['2017'].values
wb.save()

Tested with Python:3.4.2 - openpyxl:2.4.1 - LibreOffice:4.3.3.2



回答2:

Here's how I do a similar procedure for other Stack explorers:

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

... create your pandas dataframe df...

# Writing from pandas back to an existing EXCEL workbook
# Load workbook
wb = load_workbook(filename=target, read_only=False, keep_vba=True)
ws = wb['Sheet1']

# Overwrite Existing data in sheet with a dataframe.
rows = dataframe_to_rows(df, index=False, header=True)

for r_idx, row in enumerate(rows, 1):
    for c_idx, value in enumerate(row, 1):
         ws.cell(row=r_idx, column=c_idx, value=value)

# Save file
wb.save('outfile.xlsm')