Understanding recursive subroutines was one of the major break-through in my programming studies. It took me quite a while to understand them, but once that happened, suddenly a lot of things became easier. As if a strict limitation was released from me.

I hope the following explanation will help you too:

A better solution

Just to be clear, if you are reading this because you really want to go through a directory tree, then you should probably read the article about Path::Iterator::Rule. What you read here is a more manual approach to the problem, that helps us learn about recursion.

The problem:

Traversing a directory tree

We have a directory tree hat looks like this:

root
root/a
root/a/aa.txt
root/a/foo
root/a/foo/bar.txt
root/a/foo/baz.txt
root/a/foo.txt
root/b.txt
root/c

Given the path to the root of this directory tree, we would like to be able to go through all the elements and do something with each one of them. For example we would like to print them. Even better, we would like to print the above representation of the directory structure.

Doing

Traversing a directory tree recursively

Traversing a tree - for example a directory tree - can become complex. At least until we realize, that the traversing function does not need to know where it is in the tree. For the traversing subroutine every sub-tree should be the same.

We can have two approaches thinking about this. Let's start with a bottom-up approach.

Let's try to print the content of a directory that has only files in it. No subdirectories. The following code prints out a flat directory. The script can get a path as its command line parameter or defaults to the current directory. Inside the traverse subroutine we called it a $thing because the user might pass the name of a file instead of the name of the directory. Therefore we even check if the given thing is a directory and return immediately, without even giving a warning if it is not: return if not -d $thing;.

Then, we use the opendir function to open the directory, and then use the readdir function to read the entries of this directory one-by-one. Every directory listing contains the . representing the current directory, and .. representing the parent director. We want to skip them, so we call next when we encounter either of those. Then we print whatever we found in the directory.

examples/flat.pl

use strict;
use warnings;
use 5.010;

my $path = shift || '.';

traverse($path);

sub traverse {
    my ($thing) = @_;

    return if not -d $thing;
    opendir my $dh, $thing or die;
    while (my $sub = readdir $dh) {
        next if $sub eq '.' or $sub eq '..';
        say "$thing/$sub";
    }
    close $dh;
    return;
}

What happens if we run it with a path to one of the bottom directories? It will print the content.

$ perl flat.pl root/a/foo
root/a/foo/bar.txt
root/a/foo/baz.txt

What happens if we go one step up, and call with the parent directory of that directory: It prints out all the files and directories immediately in the root/a directory.

$ perl flat.pl root/a
root/a/aa.txt
root/a/foo
root/a/foo.txt

What we need now is to combine the two. What we need is that when the script recognizes that it found a directory (root/a/foo in our case), then it should run the same function passing to it the path to root/a/foo. That is we need out function to recursively call itself.

We would like to see an output like this:

$ perl tree.pl root/a
root/a/aa.txt
root/a/foo
root/a/foo/bar.txt
root/a/foo/baz.txt
root/a/foo.txt

For this the only thing we need to do is to add a call to traverse("$thing/$sub");, right after we printed the name of that directory. The full script will look like this:

examples/tree.pl

use strict;
use warnings;
use 5.010;

my $path = shift || '.';

traverse($path);

sub traverse {
    my ($thing) = @_;

    return if not -d $thing;
    opendir my $dh, $thing or die;
    while (my $sub = readdir $dh) {
        next if $sub eq '.' or $sub eq '..';
        say "$thing/$sub";
        traverse("$thing/$sub");
    }
    close $dh;
    return;
}

If we happen to call this on the root directory, we'll see the following:

$ perl tree.pl root
root/a
root/a/aa.txt
root/a/foo
root/a/foo/bar.txt
root/a/foo/baz.txt
root/a/foo.txt
root/b.txt
root/c

Almost exactly what we wanted to create. the only difference is that this printout does not include the root directory we passed as a parameter to the script.

Let's see another approach that might help further see how the recursion works.

Top down approach

In this approach we start by creating a subroutine called traverse() that can get the name of a directory or a file and print it out.

use strict;
use warnings;
use 5.010;

my $path = shift || '.';

traverse($path);

sub traverse {
    my ($thing) = @_;

    say $thing;
    return;
}

We then take this subroutine and extend it. If the given $thing is not a directory then we are done after the printing. If it is a directory then we go over its immediate content using the opendir/readdir pair. Each item is again needs the exact same treatment as the current one gets, so we can call traverse() for each item in the while loop: This is the additional code:

    opendir my $dh, $thing or die;
    while (my $sub = readdir $dh) {
        next if $sub eq '.' or $sub eq '..';

        traverse("$thing/$sub");
    }
    close $dh;

Each one of the calls to traverse() will print the current item and if it was a directory, it will go over all the elements in it and call traverse() on each subdirectory.

And so on.

It could go on forever but we know at some point it will reach the leafs of the tree, a directory in which there are no more directories. Then it won't call itself again, and slowly it will return to the top-most level.

The condition when the recursive function is not called again, in our case the directories without subdirectories, is usually called the halting condition of the recursion. A function without such halting condition, will recurse forever. (Well, not exactly forever as most system would run out of memory before that. In addition Perl will give a warning Deep recursion on subroutine when the code reaches 100 level depth. If use warnings; is in effect.

The full code of the top-down approach looks like this:

examples/recurse.pl

use strict;
use warnings;
use 5.010;

my $path = shift || '.';

traverse($path);

sub traverse {
    my ($thing) = @_;

    say $thing;
    return if not -d $thing;
    opendir my $dh, $thing or die;
    while (my $sub = readdir $dh) {
        next if $sub eq '.' or $sub eq '..';

        traverse("$thing/$sub");
    }
    close $dh;
    return;
}

And the result of its run is exactly what we wanted:

$ perl recurse.pl root
root
root/a
root/a/aa.txt
root/a/foo
root/a/foo/bar.txt
root/a/foo/baz.txt
root/a/foo.txt
root/b.txt
root/c

Recursive function

Just to put it together in a concise way. A recursive function has two major parts. The first one checks for some condition and returns if that condition was met. This is called the halting condition, or stop condition. Then at some later point in the function it calls itself with a different set of parameters than it was called earlier.

Other solution using a queue

Besides using recursion this problem can also be solved by using a queue.

And remember, if you really want to traverse a directory tree, you are probably better off using a module that already does that.