How to detect “strikethrough” style from xlsx file

2019-07-14 06:12发布

I have to check the data which contain "strikethrough" format when importing excel file in R

Do we have any method to detect them ? Welcome for both R and Python approach

3条回答
爷、活的狠高调
2楼-- · 2019-07-14 06:12

R-solution

the tidyxl-package can help you...

example test.xlsx, with data on A1:A4 of the first sheet. Below is an excel-screenshot:

enter image description here

library(tidyxl)

formats <- xlsx_formats( "temp.xlsx" )
cells <- xlsx_cells( "temp.xlsx" )

strike <- which( formats$local$font$strike )
cells[ cells$local_format_id %in% strike, 2 ]

# A tibble: 2 x 1
#   address
#   <chr>  
# 1 A2     
# 2 A4   
查看更多
兄弟一词,经得起流年.
3楼-- · 2019-07-14 06:22

I found a method below:

'# Assuming the column from 1 - 10 has value : A , the 5th A contains "strikethrough"

TEST_wb = load_workbook(filename = 'TEST.xlsx')
TEST_wb_s =  TEST_wb.active

for i in range(1, TEST_wb_s.max_row+1):
    ck_range_A = TEST_wb_s['A'+str(i)] 
    if ck_range_A.font.strikethrough == True:
        print('YES')
    else:
        print('NO') 

But it doesn't tell the location (this case is the row numbers),which is hard for knowing where contains "strikethrough" when there is a lot of result , how can i vectorize the result of statement ?

查看更多
劫难
4楼-- · 2019-07-14 06:24

I present below a small sample program that filters out text with strikethrough applied, using the openpyxl package (I tested it on version 2.5.6 with Python 3.7.0). Sorry it took so long to get back to you.

import openpyxl as opx
from openpyxl.styles import Font


def ignore_strikethrough(cell):
    if cell.font.strike:
        return False
    else:
        return True


wb = opx.load_workbook('test.xlsx')
ws = wb.active
colA = ws['A']
fColA = filter(ignore_strikethrough, colA)
for i in fColA:
    print("Cell {0}{1} has value {2}".format(i.column, i.row, i.value))
    print(i.col_idx)

I tested it on a new workbook with the default worksheets, with the letters a,b,c,d,e in the first five rows of column A, where I had applied strikethrough formatting to b and d. This program filters out the cells in columnA which have had strikethrough applied to the font, and then prints the cell, row and values of the remaining ones. The col_idx property returns the 1-based numeric column value.

查看更多
登录 后发表回答