After writing the article on comparing the speed of grep with Python regexes and arriving to the conclusiong that grep is 50-100 times faster than Python I thought, what about Perl?

I've create two scripts that read a large file and ran a very simple regex. I know that for this we don't even need a regex as we can use the index function in Perl or the index or find methods in Python, but this is a good exercise.

I also ran the same regex several times as that was important for the original case comparing grep and Python.

In the original article you can also find the script that generated the text file we are parsing.

Grep with Python regexes

examples/grep_speed.py

import sys
import re

if len(sys.argv) != 3:
    exit(f"{sys.argv[0]} FILENAME LIMIT")

_, filename, limit = sys.argv

with open(filename) as fh:
    for line in fh:
        for _ in range(int(limit)):
            if re.search(r'y', line):
                print(line)

Grep with Perl regexes

examples/grep_speed.pl

use 5.030;

die "Usage: $0 FILENAME LIMIT" if @ARGV != 2;

my ($filename, $limit) = @ARGV;

open (my $fh, '<', $filename) or die;
while (my $line = <$fh>) {
    for (1..$limit) {
        print($line) if $line =~ /y/;
    }
}

Comparing the speed

$ time python examples/grep_speed.py a.txt 20

real  0m9.610s
user  0m9.590s
sys   0m0.020s

$ time perl examples/grep_speed.pl a.txt 20

real    0m1.275s
user    0m1.253s
sys     0m0.021s

Perl is about 8 times faster than Python.

More complex Python regex

This regex is slightly more complex. This is an expression that would be quite difficult to implement without regexes.

examples/grep_speed_oxo.py

import sys
import re

if len(sys.argv) != 3:
    exit(f"{sys.argv[0]} FILENAME LIMIT")

_, filename, limit = sys.argv

with open(filename) as fh:
    for line in fh:
        for _ in range(int(limit)):
            if re.search(r'(.)y\1', line):
                print(line)

More complex Perl regex

examples/grep_speed_oxo.pl

use 5.030;

die "Usage: $0 FILENAME LIMIT" if @ARGV != 2;

my ($filename, $limit) = @ARGV;

open (my $fh, '<', $filename) or die;
while (my $line = <$fh>) {
    for (1..$limit) {
        print($line) if $line =~ /(.)y\1/;
    }
}

Comparing the speed of the more complex examples

$ time python examples/grep_speed_oxo.py a.txt 20

real   0m24.472s
user   0m24.401s
sys    0m0.036s

$ time perl examples/grep_speed.pl a.txt 20

real  0m1.239s
user  0m1.227s
sys   0m0.012s

Here Perl is about 20 times faster than Python.

Version information

$ python -V
Python 3.8.2

$ perl -v
This is perl 5, version 30, subversion 1 (v5.30.1) built for x86_64-linux

Conclusion

The regex engine in Perl is much faster than the regex engine of Python.

The are both slower than grep as measured when I compares Python with grep.