Merge multiple lists of data together by common ID

2020-02-12 05:15发布

I have multiple lists of data from 4 different sources with a common set of IDs that I would like to merge together, based on ID, basically ending up with a new list, one for each ID and a single entry for each source.

The objects in the output list from each of the 4 sources look something like this:

type data = {ID : int; value : decimal;}

so, for example I would have:

let sourceA = [data1, data2, data3];
let sourceB = [data1, data2, data3];
let sourceC = [data1, data2, data3];
let sourceD = [data1, data2, data3];

(I realize this code is not valid, just trying to give a basic idea... the lists are actually pulled and generated from a database)

I would then like to take sourceA, sourceB, sourceC and sourceD and process them into a list containing objects something like this:

type dataByID = {ID : int; valueA : decimal; valueB : decimal; valueC : decimal; valueD : decimal; }

...so that I can then print them out in a CSV, with the first column being the ID and coulmns 2 - 5 being data from sources A - D corresponding to the ID in that row.

I'm totally new to F#, so what would be the best way to process this data so that I match up all the source data values by ID??

标签: f# f#-data
1条回答
倾城 Initia
2楼-- · 2020-02-12 06:01

It seems that you could simply concatenate all the lists and then use Seq.groupBy to get a list that contains unique IDs in the input lists and all values associated with the ID. This can be done using something like:

let data = 
  [ data1; data2; data3; data4 ]   // Create list of lists of items 
  |> Seq.concat                    // Concatenate to get a single list of items
  |> Seq.groupBy (fun d -> d.ID)   // Group elements by ID

seq { for id, values in data -> 
        // ID is the id and values is a sequence with all values 
        // (that come from any data source) }

If you want to associate the source (whether it was data1, data2, etc...) with the value then you can first usemap` operation to add an index of the data source:

let addIndex i data = 
  data |> Seq.map (fun v -> i, v)

let data = 
  [ List.map (addIndex 1) data1;
    List.map (addIndex 2) data2;
    List.map (addIndex 3) data3;
    List.map (addIndex 4) data4 ]
  |> Seq.concat
  |> Seq.groupBy (fun (index, d) -> d.ID)

Now, data also contains index of the data source (from 1 to 3), so when iterating over the values, you can use index to find out from which data source the item comes from. Even nicer version can be written using Seq.mapi to iterate over list of data sources and add index to all the values automatically:

let data = 
  [ data1; data2; data3; data4 ]
  |> Seq.mapi (fun index data -> Seq.map (addIndex index) data)
  |> Seq.concat
  |> Seq.groupBy (fun (index, d) -> d.ID)
查看更多
登录 后发表回答