A few days ago, one of the Perl Maven readers asked me about indexing in arrays in Perl, and then he was wondering if arrays that only have elements in some high index are sparse arrays? In other words if we have an array with a single value at index 1,000,000 does Perl allocate space for all the preceding 1,000,000 elements, or is does it take up place for the single element in the array.

In yet other words, how much memory does this use?

my @a;
$a[1_000_000] = 1;

We could read the documentation and probably find the answer somewhere, or we could do a little experiment.

In another article we have already used Devel::Size to check the memory use of Perl variables, let's use the same tool to check the arrays:

examples/memory_of_sparse_arrays.pl

use strict;
use warnings;
use 5.010;

use Devel::Size qw(size total_size);

my @a;
both('ARRAY', \@a);            #   64   64

my @b;
$b[0] = 1;
both('ARRAY-0', \@b);          #   96   120

my @c;
$c[100] = 1;
both('ARRAY-100', \@c);        #   872   896

my @d;
$d[1000] = 1;
both('ARRAY-1,000', \@d);      #  8072  8096

my @e;
$e[1_000_000] = 1;
both('ARRAY-1,000,000', \@e);  # 8000072 8000096


sub both {
    my ($name, $ref) = @_;
    printf "%-25s %5d %5d\n", $name, size($ref), total_size($ref);
}

As you can see the memory usage of the array growth as the index of the single element in the array growth. So the arrays in Perl don't have any special memory-saving algorithm for sparse arrays.

If we need to hold a few values in a data structure with various random large indexes, we can use a hash instead. The size of the hash only growth as the length of the key. So the key 1,000,000 that has 7 characters in it will take up 6 bytes more than the key 0 which has 1 character in it.

examples/memory_of_sparse_hash.pl

use strict;
use warnings;
use 5.010;

use Devel::Size qw(size total_size);

my %a;
both('HASH', \%a);             #   120   120

my %b;
$b{0} = 1;
both('HASH-0', \%b);           #   179   203

my %c;
$c{100} = 1;
both('HASH-100', \%c);         #  181   205


my %e;
$e{1_000_000} = 1;
both('HASH-1,000,000', \%e);   #  185   209

sub both {
    my ($name, $ref) = @_;
    printf "%-25s %5d %5d\n", $name, size($ref), total_size($ref);
}

The drawback of using hash is that we can't use array-operations on it such as push, pop, etc. and that the values are not sorted. We can fetch the keys of the hash, but they will return in random order.

We can of course use the spaceship operator to sort the keys numerically:

examples/sparse_hash.pl

use strict;
use warnings;
use 5.010;

my %h = (
   23 => 'twenty three',
   42 => 'forty two',
   1_000 => 'thousand',
   1_000_000 => 'million',
);

for my $k (keys %h) {
    say $k;
}

say '';
for my $k (sort {$a <=> $b} keys %h) {
    say $k;
}

The result is this:

42
23
1000
1000000

23
42
1000
1000000

You might also use a sorted hash in Perl using Tie::IxHash, though I don't remember ever really needing it.