I have 2 array of hashes:
actual = [{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"},
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
{"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}]
expected = [{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"},
{"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}]
I need to compare these 2 hashes and find out the ones for which the column_data_type
differs.
to compare we can directly use:
diff = actual - expected
This will print the output as:
{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"}
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"}
My expected output is that in the result i want to print the actual and expected datatype, means the datatypes for the missing `column_name' from both the actual and expected array of hashes, something like:
{"column_name"=>"NONINTERESTEXPENSE", "expected_column_data_type"=>"NUMBER", "actual_column_data_type" => "VARCHAR"}
{"column_name"=>"TRANSACTIONDATE", "expected_column_data_type"=>"NUMBER","actual_column_data_type" => "TIMESTAMP" }
(expected - actual).
concat(actual - expected).
group_by { |column| column['column_name'] }.
map do |name, (expected, actual)|
{
'column_name' => name,
'expected_column_data_type' => expected['column_data_type'],
'actual_column_data_type' => actual['column_data_type'],
}
end
This will work irrespective of order of hashes in your array.
diff = []
expected.each do |elem|
column_name = elem['column_name']
column_type = elem['column_data_type']
match = actual.detect { |elem2| elem2['column_name'] == column_name }
if column_type != match['column_data_type']
diff << { 'column_name' => column_name,
'expected_column_data_type' => column_type,
'actual_column_data_type' => match['column_data_type'] }
end
end
p diff
[actual, expected].map { |a| a.map(&:dup).map(&:values) }
.map(&Hash.method(:[]))
.reduce do |actual, expected|
actual.merge(expected) do |k, o, n|
o == n ? nil : {name: k, actual: o, expected: n}
end
end.values.compact
#⇒ [
# [0] {
# :name => "NONINTERESTEXPENSE",
# :actual => "VARCHAR",
# :expected => "NUMBER"
# },
# [1] {
# :name => "TRANSACTIONDATE",
# :actual => "TIMESTAMP",
# :expected => "NUMBER"
# }
# ]
The method above easily expandable to merge N arrays (use reduce.with_index
and merge
with key "value_from_#{idx}"
.)
What about this?
def select(hashes_array, column_name)
hashes_array.select { |h| h["column_name"] == column_name }.first
end
diff = (expected - actual).map do |h|
{
"column_name" => h["column_name"],
"expected_column_data_type" => select(expected, h["column_name"])["column_data_type"],
"actual_column_data_type" => select(actual, h["column_name"])["column_data_type"],
}
end
PS: surely this code can be improved to look more elegant
Code
def convert(actual, expected)
hashify(actual-expected, "actual_data_type").
merge(hashify(expected-actual, "expected_data_type")) { |_,a,e| a.merge(e) }.values
end
def hashify(arr, key)
arr.each_with_object({}) { |g,h| h[g["column_name"]] =
{ "column_name"=>g["column_name"], key=>g["column_data_type"] } }
end
Example
actual = [
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"},
{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
{"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}
]
expected = [
{"column_name"=>"NONINTERESTINCOME", "column_data_type"=>"NUMBER"},
{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"},
{"column_name"=>"UPDATEDATE", "column_data_type"=>"TIMESTAMP"}
]
convert(actual, expected)
#=> [{"column_name"=>"TRANSACTIONDATE",
# "actual_data_type"=>"TIMESTAMP", "expected_data_type"=>"NUMBER"},
# {"column_name"=>"NONINTERESTEXPENSE",
# "actual_data_type"=>"VARCHAR", "expected_data_type"=>"NUMBER"}]
Explanation
For the example above the steps are as follows.
First hashify
actual
and expected
.
f = actual-expected
#=> [{"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"TIMESTAMP"},
# {"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"VARCHAR"}]
g = hashify(f, "actual_data_type")
#=> {"TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
# "actual_data_type"=>"TIMESTAMP"},
# "NONINTERESTEXPENSE"=>{ "column_name"=>"NONINTERESTEXPENSE",
# "actual_data_type"=>"VARCHAR"}}
h = expected-actual
#=> [{"column_name"=>"NONINTERESTEXPENSE", "column_data_type"=>"NUMBER"},
# {"column_name"=>"TRANSACTIONDATE", "column_data_type"=>"NUMBER"}]
i = hashify(h, "expected_data_type")
#=> {"NONINTERESTEXPENSE"=>{"column_name"=>"NONINTERESTEXPENSE",
# "expected_data_type"=>"NUMBER"},
# "TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
# "expected_data_type"=>"NUMBER"}}
Next merge g
and i
using the form of Hash#merge that employs a block to determine the values of keys that are present in both hashes being merged. See the doc for the definitions of the three block variables (the first of which, the common key, I've represented by an underscore to signify that it is not used in the block calculation).
j = g.merge(i) { |_,a,e| a.merge(e) }
#=> {"TRANSACTIONDATE"=>{"column_name"=>"TRANSACTIONDATE",
# "actual_data_type"=>"TIMESTAMP", "expected_data_type"=>"NUMBER"},
# "NONINTERESTEXPENSE"=>{"column_name"=>"NONINTERESTEXPENSE",
# "actual_data_type"=>"VARCHAR", "expected_data_type"=>"NUMBER"}}
Lastly, drop the keys.
k = j.values
#=> [{"column_name"=>"TRANSACTIONDATE", "actual_data_type"=>"TIMESTAMP",
# "expected_data_type"=>"NUMBER"},
# {"column_name"=>"NONINTERESTEXPENSE", "actual_data_type"=>"VARCHAR",
# "expected_data_type"=>"NUMBER"}]