Recently I received two very different questions, where the answer is quite similar:

  • How can we get index of particular value(element) of array quickly?
  • Given a list of input values in @user_array how can I find their reference in the @all_values array?

Index of a single element

Let's see an example for the first one:

Given an array

my @planets = qw(
   Mercury
   Venus
   Earth
   Mars
   Ceres
   Jupiter
   Saturn
   Uranus
   Neptune
   Pluto
   Charon
);

Given the value Mars we should return the number 3 because $planets[3] is Mars.

The quick answer is to use the first_index method of List::MoreUtils

use strict;
use warnings;
use 5.010;
use List::MoreUtils qw(first_index);

my @planets = qw(
   Mercury
   Venus
   Earth
   Mars
   Ceres
   Jupiter
   Saturn
   Uranus
   Neptune
   Pluto
   Charon
);

say first_index { $_ eq 'Mars' } @planets;

Get list of indexes

Given the above array and another array listing elements we would like to return the list of the indexes of these elements:

use strict;
use warnings;
use 5.010;

my @planets = qw(
   Mercury
   Venus
   Earth
   Mars
   Ceres
   Jupiter
   Saturn
   Uranus
   Neptune
   Pluto
   Charon
);

use List::MoreUtils qw(first_index);

my @input = qw(Mars Pluto Alderaan Venus);

my @indexes;
foreach my $place (@input) {
    push @indexes, first_index { $_ eq $place } @planets;
}

use Data::Dumper qw(Dumper);
print Dumper \@indexes;

In this code we loop over all the planet names we would like to locate in the array of planets, and for each one of them we call the first_index function as we did in the first example. We then use the Data::Dumper module to print out the results.

$VAR1 = [
          3,
          9,
          -1,
          1
        ];

As you can see the 3rd element of the resulting array is -1. That's because we had Alderaan, in the list, a planet that is was not in the original list of planets.

Implementing manually - index of one element

Though you are probably better off using the first_index function from List::MoreUtils, it might be interesting to see how we could solve the above questions if we did not have that module available.

The code with List::MoreUtils:

use List::MoreUtils qw(first_index);
say first_index { $_ eq 'Mars' } @planets;

The code with core perl functions:

my ($index) = grep { $planets[$_] eq 'Mars' } (0 .. @planets-1);
say defined $index ? $index : -1;

The built-in grep function can filter the values of list or array based on some condition. As we are looking for the index of the specific value we need to filter the potential indexes of all the elements. The 0 .. @planet-1 expression generates the list of whole numbers between 0 and one less than the number of elements in the @planet array. As the indexing of an array starts by 0, this will be the largest index available in the array.

As grep goes over all the elements of the 0 .. @planets-1 list, in each turn the current element will be placed in the $_ variable. The condition then checks if the element of the @planet array in that position is equal to the planet we are looking for. In list-context grep will return the list of all the values (all the indexes in our case) that passed the test. In scalar-context it would return the number of elements passed, which is not very interesting for us. Thus we put the parentheses around the variable on the left-hand-side my ($index) = to create the list-context. By putting only one scalar variable in the parentheses, we will only capture the first value returned by grep. This distinction is only interesting if there could be more values in the original array matching the searched value and if we were interested all the indexes and not just the first one.

With this the grep expression will work, but there is another difference. first_index will return -1 in case it did not find any matching value, while grep will return an empty list and thus $index will be undefined. This can be useful if we later want to check if there was any value by writing:

if (defined $index) {
    ...
}

but if we would like to imitate the same behavior as we had with first_index we can use the ternary operator:

say defined $index ? $index : -1;

Implementing manually - list of indexes

my @idxs;
foreach my $place (@input) {
    my ($index) = grep { $planets[$_] eq $place } (0 .. @planets-1);
    push @idxs, defined $index ? $index : -1;
}

In this code we used the two lines created for the one-element version together with the ternary operator code to have -1 where the element was not found in the @planets array

Performance issues?

One of the big differences between the first_index solutions and the manual solutions is that the first_index function will return the index immediately when it was found, while grep will go over the list of all the planets before returning the result. Even if the first element already matched. This is not a problem if the @planets array is very short or if we execute this code rarely, but in other cases this might have a performance penalty.

grep is basically foreach loop, so if we look at the most recent solution, we can see that we have two loops (a foreach loop and a grep inside it) which means the complexity of the code is O(n*m) where n is the number of elements in @planets and m is the number of elements in @input. The first_index solution is better, but in the worst-case situation (when all the values in the @input array are towards the end of the @planets array) the complexity is similar.

There is another solution we can use. In this solution first we create a look-up table mapping planet names to indexes. This is the %planet_index hash. Then in the second step we create the list of the indexes corresponding to the values in the @input array.

We use the map function of Perl to create pairs of "planet-name" => "index". We need to call reverse on the indexes before call map in order to ensure that if the same value appears twice in the @planets array we take the one with the smaller index. (The second assignment to the %planet_index array will overwrite the first one.)

In the second row we call map again. We could use a simple look-up like in this code: my @idxs = map { $planet_index{$_} } @input;, but that would put undef for the values where the planet does not exists. Instead of that we used the defined-or operator // introduced in perl 5.010 to put -1 instead of any undef values.

my %planet_index = map { $planets[$_] => $_ } reverse 0 .. @planets-1;
my @idxs = map { $planet_index{$_} // -1 } @input;

The complexity of this solution is O(n+m), which, for large m and n values is much smaller than O(m*n).

Profiling and Benchmarking

Of course computing the complexity is one thing, but actually comparing the performance of two solutions is a totally different issue. We won't discuss the details in this article, but before any "optimization" we should first use a profiler, probably the Devel::NYTProf profiler to see if this part of the application has any measurable impact on the overall performance of the application. If yes, then we can use the Benchmark module to compare the performance of two or more solutions.