I have a sorted array:
'FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">'
I would like to get something like this but it does not have to be a hash:
{:error => 'FATAL <error title="Request timed out.">', :count => 2},
{:error => 'FATAL <error title="There is insufficient system memory to run this query.">', :count => 1}
The following code prints what you asked for. I'll let you decide on how to actually use to generate the hash you are looking for:
# sample array
# make the hash default to 0 so that += will work correctly
b = Hash.new(0)
# iterate over the array, counting duplicate entries
a.each do |v|
b[v] += 1
b.each do |k, v|
puts "#{k} appears #{v} times"
Note: I just noticed you said the array is already sorted. The above code does not require sorting. Using that property may produce faster code.
You can do this very succinctly (one line) by using inject
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient ...">']
b = a.inject(Hash.new(0)) {|h,i| h[i] += 1; h }
b.to_a.each {|error,count| puts "#{count}: #{error}" }
Will produce:
1: FATAL <error title="There is insufficient ...">
2: FATAL <error title="Request timed out.">
If you have array like this:
words = ["aa","bb","cc","bb","bb","cc"]
where you need to count duplicate elements, a one line solution is:
result = words.each_with_object(Hash.new(0)) { |word,counts| counts[word] += 1 }
A different approach to the answers above, using Enumerable#group_by.
[1, 2, 2, 3, 3, 3, 4].group_by(&:itself).map { |k,v| [k, v.count] }.to_h
# {1=>1, 2=>2, 3=>3, 4=>1}
Breaking that into its different method calls:
a = [1, 2, 2, 3, 3, 3, 4]
a = a.group_by(&:itself) # {1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
a = a.map { |k,v| [k, v.count] } # [[1, 1], [2, 2], [3, 3], [4, 1]]
a = a.to_h # {1=>1, 2=>2, 3=>3, 4=>1}
was added in Ruby 1.8.7.
How about the following:
things = [1, 2, 2, 3, 3, 3, 4]
things.uniq.map{|t| [t,things.count(t)]}.to_h
It sort of feels cleaner and more descriptive of what we're actually trying to do.
I suspect it would also perform better with large collections than the ones that iterate over each value.
Benchmark Performance test:
a = (1...1000000).map { rand(100)}
user system total real
inject 7.670000 0.010000 7.680000 ( 7.985289)
array count 0.040000 0.000000 0.040000 ( 0.036650)
each_with_object 0.210000 0.000000 0.210000 ( 0.214731)
group_by 0.220000 0.000000 0.220000 ( 0.218581)
So it is quite a bit faster.
Personally I would do it this way:
# myprogram.rb
a = ['FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">']
puts a
Then run the program and pipe it to uniq -c:
ruby myprogram.rb | uniq -c
2 FATAL <error title="Request timed out.">
1 FATAL <error title="There is insufficient system memory to run this query.">
a = [1,1,1,2,2,3]
a.uniq.inject([]){|r, i| r << { :error => i, :count => a.select{ |b| b == i }.size } }
=> [{:count=>3, :error=>1}, {:count=>2, :error=>2}, {:count=>1, :error=>3}]
From Ruby >= 2.2 you can use itself
: array.group_by(&:itself).transform_values(&:count)
With some more detail:
array = [
'FATAL <error title="Request timed out.">',
'FATAL <error title="Request timed out.">',
'FATAL <error title="There is insufficient system memory to run this query.">'
=> { "FATAL <error title=\"Request timed out.\">"=>2,
"FATAL <error title=\"There is insufficient system memory to run this query.\">"=>1 }
If you want to use this often I suggest to do this:
# lib/core_extensions/array/duplicates_counter
module CoreExtensions
module Array
module DuplicatesCounter
def count_duplicates
self.each_with_object(Hash.new(0)) { |element, counter| counter[element] += 1 }.sort_by{|k,v| -v}.to_h
Load it with
Array.include CoreExtensions::Array::DuplicatesCounter
And then use from anywhere with just:
the_ar = %w(a a a a a a a chao chao chao hola hola mundo hola chao cachacho hola)
"a" => 7,
"chao" => 4,
"hola" => 4,
"mundo" => 1,
"cachacho" => 1
Simple implementation:
(errors_hash = {}).default = 0
array_of_errors.each { |error| errors_hash[error] += 1 }
Here is the sample array:
- Select all the unique keys.
- For each key, we'll accumulate them into a hash to get something like this:
{'bb' => ['bb', 'bb']}
res = a.uniq.inject({}) {|accu, uni| accu.merge({ uni => a.select{|i| i == uni } })}
{"aa"=>["aa"], "bb"=>["bb", "bb", "bb"], "cc"=>["cc", "cc"]}
Now you are able to do things like: