I have a hash as follows:
my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);
(The structure is actually more complex than that; I'm simplifying it here.)
I wish to sort "globally" on timestamp and then the keys of the inner hashes (one, two, three adam). But the keys of the inner hashes are dynamic; I don't know what they are going to be until the data is read from files.
I want the sorted output of the above hash to be:
00:09:30,C3,adam
00:09:30,B2,two
00:12:30,B2,one
00:13:45,C3,three
I've looked at many questions/answers regarding sorting hashes by keys and/or values, but I haven't been able to figure it out when the key names are not known ahead of time. (Or maybe I'm just not understanding it.)
What I'm doing for now is two steps.
Flattening the hash into an array:
my @flattened;
for my $outer_key (keys %data) {
for my $inner_key (keys %{$data{$outer_key}}) {
push @flattened, [
$data{$outer_key}{$inner_key}{timestamp}
, $outer_key
, $inner_key
];
}
}
And then doing the sort:
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
print join ',' => @$ary;
print "\n";
}
I'm wondering if there is a more concise, elegant, efficient way of doing this?
This type question might be more suited to the Programmers Stack Exchange site or the Code Review one. Since it is asking about implementation, I think its fine to ask here. The sites tend to have some overlap.
As @DondiMichaelStroma pointed out, and as you already know, your code works great! However, there is more than one way to do it. For me, if this was in a small script, I would probably leave it as is and move on to the next part of the project. If this was in a more professional code base, I would make some changes.
For me, when writing for a professional code base, I try to keep a few things in mind.
- Readability
- Efficiency when it matters
- Not gold-plating it
- Unit Testing
So let's take a look at your code:
my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);
The way data is defined is excellent and nicely formatted. This may not be how %data
is built in your code, but maybe a unit test would have a hash like that.
my @flattened;
for my $outer_key (keys %data) {
for my $inner_key (keys %{$data{$outer_key}}) {
push @flattened, [
$data{$outer_key}{$inner_key}{timestamp}
, $outer_key
, $inner_key
];
}
}
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
print join ',' => @$ary;
print "\n";
}
The variable names could be more descriptive, and the @flattened
array has some redundant data in it. Printing it with Data::Dumper, you can see we have C3
and B2
in multiple places.
$VAR1 = [
'00:13:45',
'C3',
'three'
];
$VAR2 = [
'00:09:30',
'C3',
'adam'
];
$VAR3 = [
'00:12:30',
'B2',
'one'
];
$VAR4 = [
'00:09:30',
'B2',
'two'
];
Maybe this isn't a big deal, or maybe you want to keep the functionality of getting all the data under the key B2
.
Here's another way we could store that data:
my %flattened = (
'B2' => [['one', '00:12:30'],
['two', '00:09:30']],
'C3' => [['three','00:13:45'],
['adam', '00:09:30']]
);
It may make the sorting more complicated, but it makes the data structure simpler! Maybe this is getting closer to gold-plating, or maybe you'd benefit from this data structure in another part of the code. My preference is to keep data structures simple, and add extra code if needed when processing them. If you decide you need to dump %flattened
to a log file, you might appreciate not seeing duplicate data.
Implementation
Design: I think we want to keep this as two distinct operations. This will help code clarity and we could test each function individually. The first function would convert between the data formats we want to use, and the second function would sort the data. These functions should be in a Perl module, and we can use Test::More to do the unit testing. I don't know where we are calling these functions from, so let's pretend we are calling them from main.pl
, and we can put the functions in a module called Helper.pm
. These names should be more descriptive, but again I'm not sure what the application is here! Great names lead to readable code.
main.pl
This is what main.pl
could look like. Even though there are no comments, the descriptive names can make it self documenting. These names could be still be improved too!
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);
my %data = populate_data();
my @sorted_data = @{ sort_by_times_then_names( convert_to_simple_format( \%data ) ) };
print Dumper(@sorted_data);
Utilities/Helper.pm
Is this readable and elegant? I think it could use some improvements. More descriptive variable names would help in this module as well. However, it is easily testable, and keeps our main code clean and data structures simple.
package Utilities::Helper;
use strict;
use warnings;
use Exporter qw(import);
our @EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);
# We could put a comment here explaning the expected input and output formats.
sub sort_by_times_then_names {
my ( $data_ref ) = @_;
# Here we can use the Schwartzian Transform to sort it
# Normally, we would just be sorting an array. But here we
# are converting the hash into an array and then sorting it.
# Maybe that should be broken up into two steps to make to more clear!
#my @sorted = map { $_ } we don't actually need this map
my @sorted = sort {
$a->[2] cmp $b->[2] # sort by timestamp
||
$a->[1] cmp $b->[1] # then sort by name
}
map { my $outer_key=$_; # convert $data_ref to an array of arrays
map { # first element is the outer_key
[$outer_key, @{$_}] # second element is the name
} # third element is the timestamp
@{$data_ref->{$_}}
}
keys %{$data_ref};
# If you want the elements in a different order in the array,
# you could modify the above code or change it when you print it.
return \@sorted;
}
# We could put a comment here explaining the expected input and output formats.
sub convert_to_simple_format {
my ( $data_ref ) = @_;
my %reformatted_data;
# $outer_key and $inner_key could be renamed to more accurately describe what the data they are representing.
# Are they names? IDs? Places? License plate numbers?
# Maybe we want to keep it generic so this function can handle different kinds of data.
# I still like the idea of using nested for loops for this logic, because it is clear and intuitive.
for my $outer_key ( keys %{$data_ref} ) {
for my $inner_key ( keys %{$data_ref->{$outer_key}} ) {
push @{$reformatted_data{$outer_key}},
[$inner_key, $data_ref->{$outer_key}{$inner_key}{timestamp}];
}
}
return \%reformatted_data;
}
1;
run_unit_tests.pl
Finally, let's implement some unit testing. This is might be more than you were looking for with this question, but I think clean seams for testing is part of elegant code and I want to demonstrate that. Test::More is really great for this. I'll even throw in a test harness and formatter so we can get some elegant output. You can use TAP::Formatter::Console if you don't have TAP::Formatter::JUnit installed.
#!/usr/bin/env perl
use strict;
use warnings;
use TAP::Harness;
my $harness = TAP::Harness->new({
formatter_class => 'TAP::Formatter::JUnit',
merge => 1,
verbosity => 1,
normalize => 1,
color => 1,
timer => 1,
});
$harness->runtests('t/helper.t');
t/helper.t
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);
my %data = (
'B2' => {
'one' => {
timestamp => '00:12:30'
},
'two' => {
timestamp => '00:09:30'
}
},
'C3' => {
'three' => {
timestamp => '00:13:45'
},
'adam' => {
timestamp => '00:09:30'
}
}
);
my %formatted_data = %{ convert_to_simple_format( \%data ) };
my %expected_formatted_data = (
'B2' => [['one', '00:12:30'],
['two', '00:09:30']],
'C3' => [['three','00:13:45'],
['adam', '00:09:30']]
);
is_deeply(\%formatted_data, \%expected_formatted_data, "convert_to_simple_format test");
my @sorted_data = @{ sort_by_times_then_names( \%formatted_data ) };
my @expected_sorted_data = ( ['C3','adam', '00:09:30'],
['B2','two', '00:09:30'],
['B2','one', '00:12:30'],
['C3','thee','00:13:45'] #intentionally typo to demonstrate output
);
is_deeply(\@sorted_data, \@expected_sorted_data, "sort_by_times_then_names test");
done_testing;
Test Output
The nice thing about testing this way is that it will tell you what is wrong when a test fails.
<testsuites>
<testsuite failures="1"
errors="1"
time="0.0478239059448242"
tests="2"
name="helper_t">
<testcase time="0.0452120304107666"
name="1 - convert_to_simple_format test"></testcase>
<testcase time="0.000266075134277344"
name="2 - sort_by_times_then_names test">
<failure type="TestFailed"
message="not ok 2 - sort_by_times_then_names test"><![CDATA[not o
k 2 - sort_by_times_then_names test
# Failed test 'sort_by_times_then_names test'
# at t/helper.t line 45.
# Structures begin differing at:
# $got->[3][1] = 'three'
# $expected->[3][1] = 'thee']]></failure>
</testcase>
<testcase time="0.00154280662536621" name="(teardown)" />
<system-out><![CDATA[ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test
# Failed test 'sort_by_times_then_names test'
# at t/helper.t line 45.
# Structures begin differing at:
# $got->[3][1] = 'three'
# $expected->[3][1] = 'thee'
1..2
]]></system-out>
<system-err><![CDATA[Dubious, test returned 1 (wstat 256, 0x100)
]]></system-err>
<error message="Dubious, test returned 1 (wstat 256, 0x100)" />
</testsuite>
</testsuites>
In summary, I prefer readable and clear over concise. Sometimes you can make less efficient code that is easier to write and logically simpler. Putting ugly code inside functions is a great way to hide it! It isn't worth messing around with code to save 15ms when you run it. If your data set is large enough that performance becomes an issue, Perl might not be the right tool for the job. If you are really looking for some concise code, post a challenge over at the Code Golf Stack Exchange.