Perl and CPAN provide a plethora of tools to help you deal with directories and files in a portable way. Using them also makes your intentions much clearer than some random regexp pattern that "works" for you.
Motivation
We live in a world where most developers can do just fine pretending that everything runs on a kind of a Unixy operating system. This seems to have led to programmers forgetting, or never learning, some important lessons in how to deal portably with operations involving files and directories.
Even if you do not care about your code being able to run on a system that has different rules, using the facilities Perl and CPAN provide would add a much needed amount clarity to your code which I regard as a worthy goal in and of itself.
A couple of cases in point
Can you tell what each of the following lines of code does?
(my $coretests = $0) =~ s'[^/]+\.t'coretests.pm';
and
$self->{base} ||= $0 =~ m!(.*)/! ? $1 : ".";
Example A is from version.pm and example B is from TestML.
A lot of modules on CPAN depend on version.pm
. On the other
hand, TestML
seems to be a sneaky dependency for the new
Inline::C
via Pegex. I am mentioning this to underline the fact
that these are not modules no one cares about.
Yet, their test code includes these unportable and opaque lines. I only became aware of these due to unexpected test failures on my perl 5.20.1 built using Microsoft's Visual Studio 2013 Community Edition on my Windows 8.1 system.
But, let's get back to the topic: What do these lines do?
Example A is an attempt to construct a path to
the file coretests.pm
which is assumed to be in the same directory as the
current script. This is similar to
adding a directory relative to the current script's location.
This file whose path is thus constructed is then required on the
subsequent line. Of course, there are many ways the substitution
s'[^/]+\.t'coretests.pm'
can fail to work as intended, but the
most straightforward cause of failure would be due to code being run on a
system that does not use / as a directory separator in file names.
It is true that internal Windows APIs do not mind if you give them Unix style paths. But, programs invoked via the shell do not offer that luxury. Instead, most Windows console programs, following the DOS tradition, use the / character for command line options.
When this test script is invoked as ```t\00impl-pp.t, the
substitution ends up replacing the entire path in $coretests
with
'coretests.pm'
, and therefore the following
require
looks for this file in the wrong directory.
Similarly, the code in example B simply tries
to capture the path to the directory containing the current executable. Once
again, this fails on Windows because the path in $0
is unlikely
to contain a / in the right spot. In fact, much hilarity can
ensue if the path contains a mixture of ** and / as
directory separators.
Solutions
There are a number of ways of figuring out the directory in which the currently running script is located. Each and every one of these would be a better solution than these regular expression patterns.
First, their use would make it clear to the person reading your code the intent behind the code. Second, if some other operating system with some other directory separator became popular, you wouldn't have to locate each and every place where you have used a string operation on a file name to make your code work again. Third, these methods are likely to be a lot more robust than your regular expression based solution.
For example, suppose you were running the test file from version.pm in the following fashion:
prove product/stage.test/version-xyz/t/test.t
What would the substitution do then? Instead of having to consider this question anew every time you want to construct the path to a file in the same directory as the current script, you can use the facilities offered by Perl and CPAN, and enjoy the benefits of the correctness and clarity they offer.
Here are some alternatives. This is not an exhaustive list. In fact, I have purposefully omitted a few for the sake of stimulating some discussion.
Good old $FindBin::Bin
FindBin has been in the core
since 5.00307. It used to have an annoying aspect which
has been fixed
in recent Perl distributions. The bug resulted in the $PATH
being
searched if $0
contained a relative path, so you may want to
avoid it in code that is expected to run on older ```perls.
You can then use File::Spec->catfile
:
use FindBin qw( $Bin );
use File::Spec;
my $coretests = File::Spec->catfile($Bin, 'coretests.pm');
Even older File::Basename
File::Basename has been in the core since 5.000. You can simply do:
use File::Basename ();
my $bindir = File::Basename::dirname($0);
This function tries to emulate the shell function by the same name, and you can't rely on whether the returned path includes a trailing directory separator, so, it may not be suitable in all circumstances, but, if anything does go wrong, at least the person who is trying to diagnose your code will know what you were trying to do.
Keep in mind that, as with $FindBin::Bin
we still need a facility
to portably concatenate a file name to this path. Therefore, you may just
want to move directly to File::Spec
.
One could also take advantage of the lib module to avoid having to explicitly construct the path to the file to be required:
use File::Basename ();
use lib File::Basename::dirname( $0 );
# ...
require coretests;
File::Spec
File::Spec has been in the core since 5.00405. You can simply do:
my ($volume, $bindir, undef) = File::Spec->splitpath($0);
then use
my $coretests = File::Spec->catpath($volume, $bindir, 'coretest.pm');
and your code will do the right thing on all operating systems
File::Spec
knows about.
The File::Spec
solution is portable, but it does feel a little
clunky due to the need to explicitly handle the possibility that the path may
include a volume name.
You can eliminate the temporary variables, but the resulting code is no less clunky:
my $coretests = File::Spec->catpath(
(File::Spec->splitpath($0))[0,1], 'coretests.pm'
);
Especially if your code deals with a lot of filesystem related operations,
and you are comfortable adding a non-core dependency to your project, you
may want to consider Path::Class
or Path::Tiny
.
Path::Class
Path::Class is a beautiful module. Using it, you can simply write:
my $coretests = file(file($0)->parent, 'coretests.pm');
Path::Class
uses File::Spec
internally, but hides
a lot more of the ugliness. It also provides various convenience methods so
you don't have to, say, re-invent slurp
in every new module.
Path::Tiny
Path::Tiny is an elegant module that offers a nice, clean interface. It makes no guarantees for anything other than Unix-like, and Win32 systems. It does allow you to write:
my $coretests = path($0)->parent->child('coretests.pm');
to obtain the path to a file that is in the same directory as the current script.
Conclusion
In this post, motivated by a couple of examples, we looked at the question of how to compose the path to a file that is in the same directory as the currently executing script.
As is always the case with Perl, there are multiple ways of doing this. They are all better than rolling your own incomplete method based on a simple regular expression pattern. Not only does using using these modules make your code more portable, and easier to understand, they have the benefit of catering to corner cases you may not consider when you are busy banging out regular expression patterns.
I care about this because I like Perl, and I consider it a missed
opportunity when cpanm Some::Module
doesn't work due to an obscure
test failure in some other module because of an unwarranted assumption that
Unix style paths work everywhere.
Keep in mind the advice from perldoc perlport:
If all this is intimidating, have no (well, maybe only a little) fear. There are modules that can help. The File::Spec modules provide methods to do the Right Thing on whatever platform happens to be running the program.
Notes:
The bug in ```TestML was fixed in 0.52.
A pull request against version has been submitted.