Hash of Arrays in Perl
Elements of hash can be anything, including references to array.
For example what if you have a bunch of people and each person has a list of scores.
Another interesting example would be a bunch of people each person belonging to 1 or more groups.
How would you store these in a file and how would you store them in memory?
Hash of Arrays - for one directional data
In the first example we have a bunch of people and each person has a list of scores. I call it one directional as usually the only thing interesting here is to look up the scores of one person. There is not much interest in finding all the people with a given score. (Although it might be interesting to find people in score ranges.)
Anyway, one can store this data in a file in many ways, for example like this:
examples/data/name_score.txt
Mary:23,17,99,11 Joe:74,29 Barbie:99,97,28
On purpose it is not a real CSV file. That would be too easy.
Each line has name followed by a colon and then a comma separated list of numbers. Each line can have a different number of values.
We can write a script like this to read in the data.
examples/names_and_scores.pl
use 5.010; use strict; use warnings; use Data::Dumper qw(Dumper); my $filename = shift || 'examples/data/name_score.txt'; my %scores_of; open my $fh, '<', $filename or die; while (my $line = <$fh>) { chomp $line; my ($name, $scores_str) = split /:/, $line; my @scores = split /,/, $scores_str; $scores_of{ $name } = \@scores; } print Dumper \%scores_of; say '-------------'; my $name = 'Mary'; for my $score (@{ $scores_of{ $name } }) { say $score; }
Reading the file line-by-line, first splitting into two, and then splitting the scores into as many pieces as there are values in the given line.
%scores_of is a hash of arrays or more precisely it is a hash of array references.
The back-slash \ in-front of the @ character returns the reference to the array.
The call to Dumper show what do we have in the hash. After that there is a small example showing how to go over the values of a single person.
The output of the above script will look like this:
examples/names_and_scores.txt
$VAR1 = { 'Joe' => [ '74', '29' ], 'Mary' => [ '23', '17', '99', '11' ], 'Barbie' => [ '99', '97', '28' ] }; ------------- 23 17 99 11
You might want to check out how to dereference a reference to a hash or to an array in Perl and array references.
Hash of Arrays - two directional data
In the second example we have a bunch of people each person belonging to 1 or more groups. This is slightly different from the previous one as in this case I can easily imagine two differnt ways to look at the data:
- Getting all the groups a person belongs to
- Getting all the people who belong to a group
In a file this data can be stored in a similar way as had in the first example, but we'll have a strange feeling that we duplicate a lot of data. For example we'll have this:
examples/data/name_group.txt
Mary:Mathematics,Chemistry,Sport,Physics,Spanish Joe:Sport,Theatre Barbie:Mathematics,Chemistry,Physics,Hungarian,Hebrew
In the previous example we would not complain even if several people had the same score. Here on the other hand we would probably protest the fact that we repeate group-names several times.
Unfortunately in a plain text file we don't have a lot of other options.
In a relational database (you know the one using SQL), this would be probabbly represented using 3 tables. A table with all the names:
examples/data/name_group/people.txt
id,name 1,Mary 2,Joe 3,Barbie
A table with all the groups (or subjects):
examples/data/name_group/groups.txt
id,group 1,Mathematics 2,Chemistry 3,Sport 4,Physics 5,Spanish 6,Theatre 7,Hungarian 8,Hebrew
Each one of the tables would have two columns. One for the actual value and one for a unique ID.
Then we would have a third table mapping between the two tables.
examples/data/name_group/name_groups.txt
name,group 1,1 1,2 1,3 1,4 1,5 2,3 2,6 3,1 3,2 3,4 3,7 3,8
Now if we wanted to list all the groups of a person we could look it up in the database.
However we are in the flat-file storage and our question was how to represent this in the memory of our Perl program.
One way would be to use an in-memory SQL database, but that's a different story.
If we would like to represent this with Perl data structures we can't do that without lots of repetition. Normally, unless we have a lot of data, this should not be a problem. (If we have too much data we might run out of memory because of the repetitions.)
examples/names_and_groups.pl
use 5.010; use strict; use warnings; use Data::Dumper qw(Dumper); my $filename = shift || 'examples/data/name_group.txt'; my %groups_of; my %members_of; open my $fh, '<', $filename or die; while (my $line = <$fh>) { chomp $line; my ($name, $groups_str) = split /:/, $line; my @groups = split /,/, $groups_str; $groups_of{ $name } = \@groups; for my $group (@groups) { push @{ $members_of{$group} }, $name; } } print Dumper \%groups_of; say '-------------'; print Dumper \%members_of;
We create two hashes to allow for the lookup in both directions.
To fill the %groups_of hash we use the same code as we had earlier. That's the easier part as the data in the data file was layed out that way.
To fill the %members_of needs another internal for-loop that goes over all the groups of the current person and adds the person to the right group relying on autovivification to create the references where necessary.
The output of this script looks like this:
examples/names_and_groups.txt
$VAR1 = { 'Mary' => [ 'Mathematics', 'Chemistry', 'Sport', 'Physics', 'Spanish' ], 'Barbie' => [ 'Mathematics', 'Chemistry', 'Physics', 'Hungarian', 'Hebrew' ], 'Joe' => [ 'Sport', 'Theatre' ] }; ------------- $VAR1 = { 'Hungarian' => [ 'Barbie' ], 'Mathematics' => [ 'Mary', 'Barbie' ], 'Theatre' => [ 'Joe' ], 'Physics' => [ 'Mary', 'Barbie' ], 'Sport' => [ 'Mary', 'Joe' ], 'Hebrew' => [ 'Barbie' ], 'Chemistry' => [ 'Mary', 'Barbie' ], 'Spanish' => [ 'Mary' ] };
Of course you don't have to have both hashes, only the one that you will really use, I just wanted to show both of them in a single example.
Published on 2019-04-16