csv & xlsx files import to pandas data frame: spee

2019-04-10 16:43发布

站内文章 / Python

42 0

神经病院院长

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Reading data (just 20000 numbers) from a xlsx file takes forever:

import pandas as pd
xlsxfile = pd.ExcelFile("myfile.xlsx")
data = xlsxfile.parse('Sheet1', index_col = None, header = None)

takes about 9 seconds.

If I save the same file in csv format it takes ~25ms:

import pandas as pd
csvfile = "myfile.csv"
data = pd.read_csv(csvfile, index_col = None, header = None)

Is this an issue of openpyxl or am I missing something? Are there any alternatives?

回答1:

xlrd has support for .xlsx files, and this answer suggests that at least the beta version of xlrd with .xlsx support was quicker than openpyxl.

The current stable version of Pandas (11.0) uses openpyxl for .xlsx files, but this has been changed for the next release. If you want to give it a go, you can download the dev version from GitHub

标签： python csv pandas xlsx openpyxl

神经病院院长

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

csv & xlsx files import to pandas data frame: spee

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮