I have a word file (.docx) with table of data, I am trying to create a pandas data frame using that table, I have used docx and pandas module. But I could not create a data frame.
from docx import Document
document = Document('req.docx')
for table in document.tables:
for row in table.rows:
for cell in row.cells:
print (cell.text)
and also tried to read table as df pd.read_table("path of the file")
I can read the data cell by cell but I want to read the entire table or any particular column. Thanks in advance
docx
always reads data from Word tables as text (strings).If we want to parse data with correct dtypes we can do one of the following:
dtype
for all columns (not flexible)pd.read_csv()
guess/infer correct dtypes (I've chosen this way)Many thanks to @Anton vBR for improving the function!
NOTE: you may want to add more checks and exception catching...
Examples:
parsing dates: