I am learning F# and the FSharp.Data library. I have a task which I need to read 20 CSV files. Each file has different number of columns but the records share the same nature: keyed on a date string and all the rest of the columns are float numbers. I need to do some statistical calculation on the float format data columns before persist the results into the database. Although I got all the plumbing logic working:
- read in the CSV via FSharp.Data CSV type provider,
- use reflection to get the type of the each column fields together with the header names they are fed into a pattern match, which decides the relevant calculation logics
- sqlbulkcopy the result), I ended 20 functions (1 per CSV file).
The solution is far from acceptable. I thought I could create a generic top level function as the driver to loop through all the files. However after days of attempts I am getting nowhere.
The FSharp.Data CSV type provider has the following pattern:
type Stocks = CsvProvider<"../docs/MSFT.csv">
let msft = Stocks.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT")
msft.Data |> Seq.map(fun row -> do something with row)
...
I have tried:
let mainfunc (typefile:string) (datafile:string) =
let msft = CsvProvider<typefile>.Load(datafile)
....
This doesnt work as the CsvProvider complains the typefile is not a valid constant expression. I am guessing the type provider must need the file to deduce the type of the columns at the coding time, the type inference can not be deferred until the code where the mainfunc is called with the relevant information.
I then tried to pass the Type into the mainfunc as a parameter
neither
let mainfunc (typeProvider:CsvProvider<"../docs/MSFT.csv">) =
....
nor
let mainfunc<typeProvider:CsvProvider<"../docs/MSFT.csv">> =
....
worked.
I then tried to pass the MSFT from
type Stocks = CsvProvider<"../docs/MSFT.csv">
let msft = Stocks.Load("http://ichart.finance.yahoo.com/table.csv?s=MSFT")
Into a mainFunc. According to the intellisence, MSFT has a type of CsvProvider<...>
and MSFT.Data has a type of seq<CsvProvider<...>>
. I have tried to declare a input parameter with explicit type of these two but neither of them can pass compile.
Can anyone please help and point me to the right direction? Am I missing somthing fundamental here? Any .net type and class object can be used in a F# function to explicitly specify the parameter type, but can i do the same with the type from a type provider?
If the answer to above question is no, what are the alternative to make the logic generic to handle 20 files or even 200 different files?
This is related to Type annotation for using a F# TypeProvider type e.g. FSharp.Data.JsonProvider<...>.DomainTypes.Url
Even though intellisense shows you
CsvProvider<...>
, to reference themsft
type in a type annotation you have to useStocks
, and formsft.Data
, instead ofCsvProvider<...>.Row
, you have to useStocks.Row
.If you want to do something dynamic, you can get the columns names with
msft.Headers
and you can get the types of the columns usingMicrosoft.FSharp.Reflection.FSharpType.GetTupleElements(typeof<Stocks.Row>)
(this works because the row is erased to a tuple at runtime)EDIT:
If the formats are incompatible, and you're dealing with dynamic data that doesn't conform to a common format, you might want to use
CsvFile
instead (http://fsharp.github.io/FSharp.Data/library/CsvFile.html), but you'll lose all the type safety of the type provider. You might also consider using Deedle instead (http://bluemountaincapital.github.io/Deedle/)