I have a data structure that I have loaded in from json that resembles the below
json_in =
[ Dict("customer" => "cust1", "transactions" => 1:10^6)
, Dict("customer" => "cust2", "transactions" => 1:10^6)
, Dict("customer" => "cust3", "transactions" => 1:10^6)]
I know of two methods to collapse the transactions into one array
@time methodA = reduce(vcat,[cust["transactions"] for cust in json_in])
@time methodB = vcat(json_in[1]["transactions"],json_in[2]["transactions"],json_in[3]["transactions"])
However the timing of methodA is ~0.22s vs ~0.02s for methodB on my computer. I intend to perform this thousands of times so 10x quicker performance is a big deal.
I see methodB is not very robust as it can only deal with 3 Dicts (customers) so even though it's performant it doesn't generalise.
What would be the most efficient way to concatenate arrays that are elements in an array of Dict efficiently?
As @Gnimuc states in his comment, you should not benchmark in global scope, and benchmarks are best done using BenchmarkTools.jl - here are the timings done right:
Method B is still like twice as fast. That is exactly because it is more specialized, on an array with exactly three elements.
An alternative solution that might work well here is to use a MappedArray, which creates a lazy view into the original array:
Of course this doesn't concatenate the arrays, but you can concatenate views using the CatView package:
Because it doesn't allocate it is like 300x faster than method B (but it's possible it's slower to use the result because of nonlocality - worth benchmarking).
Thanks for the help, after some research I came up with this idea to inline expand the code using macros, see code below, and it performs pretty well on the benchmarks (on Juliabox.com 21Sep2017)
One shortcoming of this method is that if there are a large (~1million) customers in the JSON then the code generated will be long and parsing it would take a long time I assume well. Hence it's probably not a good idea for large datasets.