I have the following JSON:
{
"groups" : [
{
"values": "21",
"date": "2013-02-22"
},
{
"values": "25",
"date": "2013-02-22"
},
{
"values": "20",
"date": "2013-02-22"
},
{
"values": "19",
"date": "2013-02-22"
},
{
"values": "42",
"date": "2013-02-10"
},
{
"values": "30",
"date": "2013-02-10"
},
{
"values": "11",
"date": "2013-02-10"
}
]
}
I have the values and the date already extracted in a Ruby Class. I want to find the "highest" and "lowest" value for every date. How do I do that?
Also I want to create parallel arrays for the same. For instance:
low = [12, 22, 11, 45]
high = [34, 50, 15, 60]
dates = ["2013-02-22", "2013-02-10", "2013-02-06", "2013-02-01"]
I would also like to display all the values for every date.
Could someone please give me some direction for this?
You can group_by
:date
and iterate through the dates. Then create an array of :values
in the group.
Then use minmax
to get the proper values and transpose
the final array to get your arrays and assign to dates, low and high.
json = {
"groups": [
{ "values": "21", "date": "2013-02-22" },
{ "values": "25", "date": "2013-02-22" },
{ "values": "20", "date": "2013-02-22" },
{ "values": "19", "date": "2013-02-22" },
{ "values": "42", "date": "2013-02-10" },
{ "values": "30", "date": "2013-02-10" },
{ "values": "11", "date": "2013-02-10" }
]
}
dates, low, high = json[:groups].group_by { |g| g[:date] }.map do |date, grouped|
values = grouped.map { |group| group[:values] }
[date, *values.minmax]
end.transpose
# => => [["2013-02-22", "2013-02-10"], ["19", "11"], ["25", "42"]]
dates
# => ["2013-02-22", "2013-02-10"]
low
# => ["19", "11"]
high
# => ["25", "42"]
If str
is your JSON string:
require 'json'
arr = JSON.parse(str)["groups"]
#=> [{"values"=>"21", "date"=>"2013-02-22"},
# {"values"=>"25", "date"=>"2013-02-22"},
# {"values"=>"20", "date"=>"2013-02-22"},
# {"values"=>"19", "date"=>"2013-02-22"},
# {"values"=>"42", "date"=>"2013-02-10"},
# {"values"=>"30", "date"=>"2013-02-10"},
# {"values"=>"11", "date"=>"2013-02-10"}]
by_date = arr.each_with_object(Hash.new {|h,k| h[k] = []}) { |g,h|
h[g["date"]] << g["values"].to_i }
# => {"2013-02-22"=>[21, 25, 20, 19], "2013-02-10"=>[42, 30, 11]}
dates = by_date.keys
#=> ["2013-02-22", "2013-02-10"]
min_vals, max_vals = *by_date.map { |_,vals| vals.minmax }
#=> [[19, 25], [11, 42]]
min_vals
#=> [19, 25]
max_vals
#=> [11, 42]
The method Enumerable#each_with_object takes an argument that is the initial value of the object that will be constructed and returned by the method. It's value is given by the second block variable, h
. I made that argument an empty hash with a default value given by the block:
{|h,k| h[k] = []}
What is the "default value"? All it means is that if the hash h
does not have a key k
, h[k]
returns an empty array. Let's see how that works here.
Initially, h #=> {}
and each_with_object
sets the first block variable, g
equal to the first value of arr
:
g = {"values"=>"21", "date"=>"2013-02-22"}
and block calculation is performed:
h[g["date"]] << g["values"].to_i
#=> h["2013-02-22"] << 21
Since h
does not have a key "2013-02-22"
, h["2013-02-22"]
is first set equal to the default value, an empty array:
h["2013-02-22"] = []
then
h["2013-02-22"] << 21
#=> [21]
h #=> {"2013-02-22"=>[21]}
When the next value of arr
is passed to the block:
g = {"values"=>"25", "date"=>"2013-02-22"}
and h
is as above. So now the block calculation is:
h[g["date"]] << g["values"].to_i
#=> h["2013-02-22"] << 25
#=> [21, 25]
h #=> {"2013-02-22"=>[21, 25]}
The default value is not used this time, as h
has a key "2013-02-22"
.
One other thing may require explanation: the "splat" *
in:
min_vals, max_vals = *by_date.map { |_,vals| vals.minmax }
We see that:
by_date.map { |date, vals| vals.minmax }
#=> [[19, 25], [11, 42]]
If *by_date.map { |date, vals| vals.minmax }
is on the right side of an equality, the splat causes the two elements of [[19, 25], [11, 42]]
are assigned to variables on the left side of the equality using parallel assignment. The weird and wonderful splat operator needs to be in every Rubiest's bag of tricks.
Since I'm not using date
in the block calculation, I've drawn attention to that by replacing date
with the local variable _
.
Edit: To answer the question you posted in a comment, if:
id = [1,1,1,2,2,3,4]
high = [100,100,100,90,90,100,100]
low = [20,20,20,10,10,30,40]
and I understand your question correctly, you could first compute:
indices = id.each_with_index.to_a.uniq(&:first).map(&:last)
#=> [0, 3, 5, 6]
Then the three arrays you want are:
id.values_at(*indices)
#=> [1, 2, 3, 4]
high.values_at(*indices)
#=> [100, 90, 100, 100]
low.values_at(*indices)
#=> [20, 10, 30, 40]