When a Perl script is executed the user can pass arguments on the command line in various ways. For example perl program.pl file1.txt file2.txt or perl program.pl from-address to-address file1.txt file2.txt or, the most common and most useful way:

perl program.pl -vd --from from-address --to to-address file1.txt file2.txt

How can we deal with this information?

When the scripts starts to run, Perl will automatically create an array called @ARGV and put all the values on the command line separated by spaces in that variable. It won't include perl and it won't include the name of our script (program.pl in our case), that will be placed in the $0 variable. @ARGV will only include the values located after the name of the script.

In the above case @ARGV will contain: ('-vd', '--from', 'from-address', '--to', 'to-address', 'file1.txt', 'file2.txt')

We can access @ARGV manually as described in the article about @ARGV, but there are a number of modules that will handle most of the work for you. In this article we'll see Getopt::Long a module that also comes with the standard installation of Perl.

Explain the command line

Just before doing that, let's see what is really our expectation from the command line processing.

  • Long names with values: we would like to be able to accept parameters with long names followed by a value. For example --to VALUE. ("Long" is relative here, it just means more than 1 character.)
  • Long names without value: We would like to accept flags that by their mere existence will turn some flag on. For example --verbose.
  • Short names (or single-character names) with or without values. The above two just written -t VALUE and -v.
  • Combining short names: -vd should be understood as -v -d. So we want to be able to differentiate between "long names" and "multiple short names combined". The difference here is that "long names" start with double-dash -- while short names, even if several of them were combined together start with a single dash -.
  • Non-affiliated values, values that don't have any name starting with a dash in front of them. For example file1.txt file2.txt.

There can be lots of other requirements and Getopt::Long can handle quite a few of them, but we'll focus on the basics.

Getopt::Long

Getopt::Long exports a function called GetOptions, that can process the content of @ARGV based on the configuration we give to it. It returns true or false indicating if the processing was successful or not. During processing it removes the items from @ARGV that have been successfully recognized. We'll take a look at possible errors later on. For now, let' see a small example we save in cli.pl:

use strict;
use warnings;
use 5.010;
use Getopt::Long qw(GetOptions);

my $source_address;
GetOptions('from=s' => \$source_address) or die "Usage: $0 --from NAME\n";
if ($source_address) {
    say $source_address;
}

After loading the module we declare a variable called $source_address where the value of the --from command line flag will be stored. We call GetOptions with key-value pairs. The keys (in this case one key) is the description of the flag. In this case the from=s declares that we are expecting a command line parameter called --from with a string after it. Because in Perl numbers can also be seen as strings, this basically means "pass me any value". This declaration is then mapped to the variable we declared earlier. In case the syntax is unclear => is a "fat arrow" you might be familiar from hashes and the back-slash \ in-front of the variable indicates that we are passing a reference to the variable. You don't need to understand references in order understand this code. Just remember that the variables on the right hand side of the "fat comma" operators need to have a back-slash when calling GetOptions.

We can run this program in several ways: perl cli.pl --from Foo will print "Foo". The value passed after the -from flag is assigned to the $source_address variable. On the other hand running perl cli.pl will not print anything as we have no passed any value.

If we run it perl cli.pl Foo it won't print anything either, as GetOptions only deals with options that start with a dash (-). (This is actually configurable, but let's not get there now.)

Failures

So when will the short circuit or die kick-in?

Unknown option

If we run the script passing something that looks like a parameter name, but which has not been declared when calling GetOptions. Something that starts with a dash -. For example:

perl cli.pl --to Bar

Unknown option: to
Usage: cli.pl --from NAME

The first line is a warning printed by GetOptions, the second line is the string we generated using die.

Option requires an argument

Another case is when we run the script, pass --from, but without passing any value after it:

perl cli.pl --from

In that case the output will look like this:

Option from requires an argument
Usage: cli.pl --from NAME

Here too, the first line was from GetOptions and the second line from our call to die. When we called GetOptions we explicitly said =s that we are expecting a string after the --from.

Default values

Often we would like to give a default value to one of the options. For example in the case of the --from field we might want it to default to the word 'Maven'. We can do it by assigning this value to the $source_address variable before calling GetOptions. For example, at the time we declare it using my.

my $source_address = 'Maven';
GetOptions('from=s' => \$source_address) or die "Usage: $0 --from NAME\n";
if ($source_address) {
    say $source_address;
}

If the user does not pass the --from flag then GetOptions will not modify the value in the $source_address variable. Running perl cli.pl will result in "Maven".

Flags without value

In addition to parameters that require a value, we also would like to allow flags. Names, that by their presence make a difference. These things are used when we want to allow the users to turn on debugging, or to set the verbosity of the script.

use strict;
use warnings;
use 5.010;
use Getopt::Long qw(GetOptions);

my $debug;
GetOptions('debug' => \$debug) or die "Usage: $0 --debug\n";
say $debug ? 'debug' : 'no debug';

Originally the $debug variable contained undef which is considered to be false in Perl. If the user passes the --debug flag, the corresponding variable will be set to some true value. (I think it is the number one, but we should only rely on the fact that it evaluates to true.) We then use the ternary operator to decide what to print.

The various ways we call it and the output they produce:

$ perl cli.pl 
no debug

$ perl cli.pl --debug
debug

$ perl cli.pl --debug hello
debug

The last example shows that values placed after such name are disregarded.

Multiple flags

Obviously, in most of the scripts you will need to handle more than one flag. In those cases we still call GetOptions once and provide it with all the parameters:

Combining the above two cases together we can have a larger example:

use strict;
use warnings;
use 5.010;
use Getopt::Long qw(GetOptions);

my $debug;
my $source_address = 'Maven';
GetOptions(
    'from=s' => \$source_address,
    'debug' => \$debug,
) or die "Usage: $0 --debug  --from NAME\n";

say $debug ? 'debug' : 'no debug';
if ($source_address) {
    say $source_address;
}

Running without any parameter will leave $debug as undef and the $source_address as 'Maven':

$ perl cli.pl 
no debug
Maven

Passing --debug will set $debug to true, but will leave $source_address as 'Maven':

$ perl cli.pl --debug
debug
Maven

Passing --from Foo will set the $source_address but leave $debug as undef:

$ perl cli.pl  --from Foo
no debug
Foo

If we provide parameters, they will both set the respective variables:

$ perl cli.pl --debug --from Foo
debug
Foo

The order of the parameters on the command line does not matter:

$ perl cli.pl  --from Foo --debug
debug
Foo

Short names

Getopt::Long automatically handles shortening of the option names up to ambiguity. We can run the above script in the following manner:

$ perl cli.pl --fr Foo --deb
debug
Foo

We can even shorten the names to a single character:

$ perl cli.pl --f Foo --d
debug
Foo

and in that case we can even use single-dash - prefixes:

$ perl files/cli.pl -f Foo -d
debug
Foo

These however are not really single-character options, and as they are they cannot be combined:

$ perl cli.pl -df Foo
Unknown option: df
Usage: cli.pl --debug  --from NAME

Single-character options

In order to combine them we need two do two things. First, we need to declare the options as real single-character options. We can do this by providing alternate, single-character names in the definition of the options:

GetOptions(
    'from|f=s' => \$source_address,
    'debug|d' => \$debug,
) or die "Usage: $0 --debug  --from NAME\n";

The second thing is that we need to enable the gnu_getopt configuration option of Getopt::Long by calling Getopt::Long::Configure qw(gnu_getopt);

use Getopt::Long qw(GetOptions);
Getopt::Long::Configure qw(gnu_getopt);

After doing that we can now run

$ perl cli.pl -df Foo
debug
Foo

The full version of the script with the above changes looks like this:

use strict;
use warnings;
use 5.010;
use Getopt::Long qw(GetOptions);
Getopt::Long::Configure qw(gnu_getopt);
use Data::Dumper;

my $debug;
my $source_address = 'Maven';
GetOptions(
    'from|f=s' => \$source_address,
    'debug|d' => \$debug,
) or die "Usage: $0 --debug  --from NAME\n";

say $debug ? 'debug' : 'no debug';
if ($source_address) {
    say $source_address;
}

Non-affiliated values

The GetOptions function only handles the parameters that start with a dash and their corresponding values, when they are relevant. Once it processed the options it will remove them from @ARGV. (Both the option name and the option value will be removed.) Any other, non-affiliated values on he command line will stay in @ARGV. Hence if we add Data::Dumper to our script and use that to print the content of @ARGV at the end (print Dumper \@ARGV) as in this script:

use strict;
use warnings;
use 5.010;
use Getopt::Long qw(GetOptions);
use Data::Dumper;

my $debug;
my $source_address = 'Maven';
GetOptions(
    'from=s' => \$source_address,
    'debug' => \$debug,
) or die "Usage: $0 --debug  --from NAME\n";

say $debug ? 'debug' : 'no debug';
if ($source_address) {
    say $source_address;
}
print Dumper \@ARGV;

We get the following results:

$ perl files/cli.pl  -f Foo -d file1.txt file2.txt
debug
Foo
$VAR1 = [
          'file1.txt',
          'file2.txt'
        ];

After processing the options, file1.txt and file2.txt were left in @ARGV. We can now do whatever we want with them, for example we can iterate over the @ARGV array using foreach.

Advanced

Getopt::Long has tons of other options. You might want to check out the documentation.

There are also other solutions, for example if you are using Moo for light-weight object oriented programming, you could take a look at MooX::Options explained in a number of advanced articles: for example Switching to Moo - adding command line parameters and Writing Command line scripts and accepting command line parameters using Moo.