Extracting Hyperlinks From Excel (.xlsx) with Pyth

2019-01-26 11:42发布

问题:

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xlsx files, so I can not use the xlrd hyperlink_map function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):

import openpyxl

wb = openpyxl.load_workbook('testFile.xlsx')

ws = wb.get_sheet_by_name('Sheet1')

r = 0
c = 0

print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id

Output:

test

None

I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?

回答1:

In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'

http://ironpython.net/

import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()

wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']

address = ws.Cells(row, col).Hyperlinks.Item(1).Address


回答2:

FYI, the problem with openpyxl is an actual bug.

And, yes, xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx.



回答3:

This should be possible with openpyxl now:

import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb.get_sheet_by_name('Sheet1')
print(ws.cell(row=2, column=1).hyperlink.target)


回答4:

Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:

hl_obj = ws.row(col).hyperlink  # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
    print(hl_obj.display)
    print(hl_obj.target)
    print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
    print(hl_obj) # to see other stuff if you need


回答5:

A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.



回答6:

For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.



回答7:

If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.