Performance issue with CSV typeprovider from FShar

I am trying to learn more about the FSharp.Data project by using it for reading a CSV file. The CSV file is a simplified version of the data from the digit recognizer competition on Kaggle.

When I read the CSV file which contains 785 columns and 113 rows (including header row) then the following two lines of code executes really slow:

type trainingSet = CsvProvider<"Data/trainSmall.csv", ",", CacheRows=false>
let data = trainingSet.Load("Data/trainSmall.csv")

When I sent the first line to the F# interactive it returns in about 10 seconds whereas when I sent the second line of code to the F# interactive it takes more than 5 minutes before the interactive prompt replies.

I am running the code on my MacBook Pro from 2013 with a 2.6 GHz I5 processor and 16GB ram using F# 3.0 and Xamarin Studio. I have tried the same experiment with Windows7 / VS2013 running under a VM on the same hardware. The results are comparable. When I use the same machine and try to do the exact same thing with R it is so fast that I cannot time it with an ordinary watch.

Please advice me on the proper usage of the CSV typeprovider from Fsharp.Data!

标签： csv f# type-providers f#-3.0 f#-data

2条回答

戒情不戒烟

2楼-- · 2019-07-07 03:01

Humm, the second line is supposed to be doing mostly nothing, as the rows are read by demand. Something is wrong there, can you please submit an issue on github with a repro file?

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-07-07 03:04

I recommend that you don't use CsvProvider for this. You're loading a matrix so you won't get any benefit of having the type of each column inferred, as they are all the same. You can still use the CSV parser of F# Data by using CsvFile. CsvProvider is optimized for files with not many columns but potentially many rows. The way the code is generated will try to generate a tuple with 785 elements on your example, which just won't work

0人赞添加讨论(0) 举报

Performance issue with CSV typeprovider from FShar

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间