What is autovivification?
Autovivification is both a wonderful blessing and a curse in Perl. It eliminates a lot of code required when initializing deep data structures, but if you come from a very strict world it can freak you out at first.
Or make you wonder how could you live without it.
The word autovivification itself comes from the word vivify which means to bring to life.
The simple cases for hashes
The simplest form of autovivification is when you have a hash and you set a value of a key that did not exist before.
use strict; use warnings; use Data::Dumper qw(Dumper); my %phone_of; print Dumper \%phone_of; $phone_of{Foo} = '123-456'; print Dumper \%phone_of;
$VAR1 = {}; $VAR1 = { 'Foo' => '123-456' };
As a Perl programmer this does not surprise you any more.
Even if you have a reference to a hash:
use strict; use warnings; use Data::Dumper qw(Dumper); my $phone_of; print Dumper $phone_of; $phone_of->{Foo} = '123-456'; print Dumper $phone_of;
A slightly more surprising version of this is when we use the auto-increment operator ++ on a hash element that did not exist before.
use strict; use warnings; use Data::Dumper qw(Dumper); my %counter; print Dumper \%counter; $counter{Foo}++; print Dumper \%counter;
$VAR1 = {}; $VAR1 = { 'Foo' => 1 };
Perl treats the nonexistent value as undef. When undef is used in a numerical operation it acts as if it were 0. In most cases this would generate a use of uninitialized value" warning, but specifically in the auto-increment operation it works without complaining.
The resulting value is then assigned back to the hash, creating the key.
The simple cases for arrays
In the case of an array, if you assign a value to a nonexistent element, or use auto-increment on such an element, Perl will automatically enlarge the array creating all the elements up to the required index, and assigning undef to each intermediate element.
use strict; use warnings; use Data::Dumper qw(Dumper); my @counter; print Dumper \@counter; $counter[1] = 20; $counter[3]++; print Dumper \@counter;
The output looks like this:
$VAR1 = []; $VAR1 = [ undef, 20, undef, 1 ];
This means writing $counter[1_000_000]++; will enlarge the array to have a million elements with almost all of them being undef. Such a sparse array is a huge waste of memory. In such cases a hash would be probably a better data structure to use.
Complex data structures
Autovivification starts to be really interesting in deep data structures. Even when creating a two dimensional hash, you can just write $people{Foo}{phone} = '123-456'; and Perl will create the internal hash for the 'Foo' key:
use strict; use warnings; use Data::Dumper qw(Dumper); my %people; print Dumper \%people; $people{Foo}{phone} = '123-456'; print Dumper \%people;
Resulting in
$VAR1 = {}; $VAR1 = { 'Foo' => { 'phone' => '123-456' } };
This is a good thing as you don't have to explicitly create the internal hash.
It even works on undefined scalars:
use strict; use warnings; use Data::Dumper qw(Dumper); my $people; print Dumper $people; $people->{Foo}{phone} = '123-456'; print Dumper $people;
When we created the $people scalar, Perl did not yet know that it will become a reference to a hash. As you can see from the following printout, it was still just an undef:
$VAR1 = undef; $VAR1 = { 'Foo' => { 'phone' => '123-456' } };
Once we used it as a reference to a hash, it autovivified to be a reference to a hash.
The same would happen if we used it as a reference to an array but, if we try to do one after the other:
use strict; use warnings; use Data::Dumper qw(Dumper); my $people; print Dumper $people; $people->{Foo}{phone} = '123-456'; print Dumper $people; $people->[0] = 23; print Dumper $people;
We will get an exception.
$VAR1 = undef; $VAR1 = { 'Foo' => { 'phone' => '123-456' } }; Not an ARRAY reference at autovivification.pl line 12.
Once $people became a reference to a hash, it will not automatically turn into being a reference to an array. This of course is a good thing as it can help us avoid some really erroneous code.
autovivification and accessing elements
As you might see the autovivification described above can save quite a lot of typing. It also seems quite logical even if surprising at first.
Unfortunately there are other cases when it is mostly surprising and not that logical.
use strict; use warnings; use Data::Dumper qw(Dumper); my %people; print Dumper \%people; if (exists $people{Foo}{phone}) { print "Good, Foo might have a phone\n"; } else { print "Foo has no phone\n"; } print Dumper \%people;
In this case we don't assign any value to the hash, we just check if one of the internal values (the phone of Foo) exists. It does not exist (Foo has not phone), but the internal hash was created by this operation:
$VAR1 = {}; Foo has no phone $VAR1 = { 'Foo' => {} };
This is quite unfortunate.
This means we have changed the state of the %people has just by observing it.
autovivification and deleting elements
It looks even worse if we are trying to delete an element that does not exist:
use strict; use warnings; use Data::Dumper qw(Dumper); my %people; print Dumper \%people; delete $people{Foo}{phone}; print Dumper \%people;
The result is:
$VAR1 = {}; $VAR1 = { 'Foo' => {} };
In an attempt to reach the element that needs to be deleted, Perl created the internal hash of 'Foo'.
A bug
I think these undesirable cases are now generally considered to be a bug in Perl. Unfortunately it is very unlikely that this bug will be fixed in Perl 5 as there is a lot of code out in the wild (both on CPAN and in companies) that rely on this behavior. Correcting the behavior would break a lot of code.
The way to avoid this is simple, it is "just" more typing:
use strict; use warnings; use Data::Dumper qw(Dumper); my %people; print Dumper \%people; if (exists $people{Bar}) { if ($people{Bar}{phone}) { print "Check Bar...\n"; } } if (exists $people{Foo}) { delete $people{Foo}{phone}; } print Dumper \%people;
If we first check if the key to the outer hash exists and only then check the phone number, or try to delete it, the element will not be created.
This will keep the outer hash clean:
$VAR1 = {}; $VAR1 = {};
no autovification
As an alternative, there is a pragma on CPAN called autovivification that can turn off the bad effects in a lexical scope (within a pair of curly braces), but it is not widely used:
use strict; use warnings; use Data::Dumper qw(Dumper); my %people; print Dumper \%people; delete $people{Foo}{phone}; { #no autovivification; delete $people{Bar}{phone}; } print Dumper \%people;
Running the above code will have the undesired effect for both 'Foo' and 'Bar':
$VAR1 = {}; $VAR1 = { 'Bar' => {}, 'Foo' => {} };
If you remove the # and run the script again the undesired effect is gone:
$VAR1 = {}; $VAR1 = { 'Foo' => {} };
Conclusion
Autovivification is a good thing but we need to be careful we don't create unnecessary elements while trying to check (or delete) internal elements.
At least now we know why were they created.
Published on 2015-01-21