Moving from CGI to PSGI and Starman

There are still some organizations out there that run applications written in Perl using plain old CGI. While it is a reasonable way to run small applications, there are a lot of benefits in moving to some of the new frameworks such as Perl Dancer or Mojolicious.

However that move might need a lot of changes to the code-base. An intermediate step, that can often be enough to reap a lot of the benefits, is to move to Plack/PSGI. That will make the code easier to test and it can be served by an application server such as Starman.

In this article we take a simple CGI script and we'll convert it to use Plack/PSGI. We also demonstrate how the old script runs as a plain CGI script, and how the new version can run both as a CGI script and loaded by the application server.

We used Rex to deploy the application to an Ubuntu-based 20.04 Digital Ocean Droplet.

The layout of the project

To be clear this layout is not a requirement in any way, it was just convenient for the deployment with Rex.

The main script are located in files/var/cgi-bin/app.cgi and files/var/cgi-bin/app.psgi and they are deployed to /var/cgi-bin/app.cgi and /var/cgi-bin/app.psgi respectively.

The files/etc/apache2/sites-enabled/apache.conf is the configuration file we need for Apache, it is copied to /etc/apache2/sites-enabled/apache.conf

For the version with Starman we could have switched to Nginx, but there was not a lot of value in the additional headache.

file/etc/systemd/system/starman.service is copied to /etc/systemd/system/starman.service and it is used to configure Starman as a service. (aka. daemon)

There are two test files in the t/ directory to verify the CGI and Plack/PSGI scripts.

Finally there is the Rexfile that includes the deployment script.

Of course you can use any other tool for deployment, but including the one I used makes it easier for you to check the solution yourself and it was certainly easier for me to develop it.

.
├── files
│   ├── etc
│   │   ├── apache2
│   │   │   └── sites-enabled
│   │   │       └── apache.conf
│   │   └── systemd
│   │       └── system
│   │           └── starman.service
│   └── var
│       └── cgi-bin
│           ├── app.cgi
│           └── app.psgi
├── Rexfile
└── t
    ├── cgi.t
    └── psgi.t

The original CGI script

examples/cgi-to-psgi/files/var/cgi-bin/app.cgi

#!/usr/bin/env perl
use strict;
use warnings;

use CGI qw(-utf8);
my $q = CGI->new;
print $q->header(-charset => 'utf8');

my $html;
if ($q->param('pid')) {
    $html = $$;
} else {
    my $name = $q->param('name') || '';
    $html = "Hello $name\n";
}

print $html;

The script used the CGI.pm module.

It has two cases, if the parameter pid is passed to the server then it sends back the current process ID. We are using this to show that a CGI script will create a new process on every invocation.

The second case is when pid is not supplied. The user can send a parameter called name with some content and the "application" will echo it back prefixing it with the word "Hello".

To make it simpler to read there is only one location that prints code.

Test for the CGI script

examples/cgi-to-psgi/t/cgi.t

use strict;
use warnings;
use Test::More;
use Test::LongString;
use Capture::Tiny qw(capture);

{
    my ($out, $err, $exit) = capture { system q{files/var/cgi-bin/app.cgi} };
    is $exit, 0;
    is $err, '';
    is_string $out, "Content-Type: text/html; charset=utf8\r\n\r\nHello \n";
}

{
    my ($out, $err, $exit) = capture { system q{files/var/cgi-bin/app.cgi name='Foo Bar'} };
    is $exit, 0;
    is $err, '';
    is_string $out, "Content-Type: text/html; charset=utf8\r\n\r\nHello Foo Bar\n";
}

done_testing;

To verify that the code works properly we wrote a test script. It executes the CGI script on the command line passing values to it. You can run it either as perl t/cgi.t or better yet as prove t/cgi.t

The PSGI version

examples/cgi-to-psgi/files/var/cgi-bin/app.psgi

#!/usr/bin/env plackup
use strict;
use warnings;

use Plack::Request;

my $app = sub {
    my $env = shift;

    my $request = Plack::Request->new($env);
    my $html;
    if ($request->param('pid')) {
        $html = $$;
    } else {
        my $name = $request->param('name') || '';
        $html = "Hello $name\n";
    }

    return [
      '200',
      [ 'Content-Type' => 'text/html; charset=utf8' ],
      [ $html ],
    ];
};

Getting the parameters supplied by the user is quite similar in the PSGI version as well, but instead of using the CGI.pm module we use the Plack::Request module and the whole thing is inside a function.

Then instead of printing the resulting HTML we return a 3-element array reference in which the first element is the HTTP status code, the 2nd element is the HTTP header, and the 3rd element is the HTML.

The first line has also changed as this application must be executed by the plackup command when running in CGI mode.

Now thinking about it, I am not sure how do you run this on Windows. There probably you need to associate the .psgi extension with plackup.

In order to try it on your own computer and during development you can run it with:

plackup files/var/cgi-bin/app.psgi

It will print

HTTP::Server::PSGI: Accepting connections at http://0:5000/

and then you can visit the application by browsing too http://0:5000/. You can stop it by pressing Ctrl-C. You can also try http://0.0.0.0:5000/?name=Foo and also http://0.0.0.0:5000/?pid=1.

Test for the PSGI version

examples/cgi-to-psgi/t/psgi.t

use strict;
use warnings;
use Test::More;
use Test::LongString;
use Plack::Test;
use Plack::Util;
use HTTP::Request::Common;

my $app = Plack::Util::load_psgi 'files/var/cgi-bin/app.psgi';

my $test = Plack::Test->create($app);

{
    my $res = $test->request(GET '/');
    is $res->header('Content-Type'), 'text/html; charset=utf8', 'content type';
    is $res->status_line, '200 OK', 'Status';
    is $res->content, "Hello \n", 'Content';
}

{
    my $res = $test->request(GET '/?name=Foo Bar');
    is $res->header('Content-Type'), 'text/html; charset=utf8', 'content type';
    is $res->status_line, '200 OK', 'Status';
    is $res->content, "Hello Foo Bar\n", 'Content';
}

{
    my $res = $test->request(POST '/', { name => 'Foo Bar'});
    is $res->header('Content-Type'), 'text/html; charset=utf8', 'content type';
    is $res->status_line, '200 OK', 'Status';
    is $res->content, "Hello Foo Bar\n", 'Content';
}


done_testing;

This is the test script for the PSGI version. We use load_psgi to load our application in the memory of the test script. From that we create a test-object and then we use the test-object to send in various requests.

These tests demonstrate that it is quite easy to send in different date to a GET and a POST request and then to verify the results.

You can run it either as perl t/psgi.t or better yet as prove t/psgi.t

You could run all the tests by just typing

prove

The Apache configuration file

examples/cgi-to-psgi/files/etc/apache2/sites-enabled/apache.conf

<VirtualHost *:80>
    ServerAdmin webmaster@localhost
    DocumentRoot /var/www

    ScriptAlias /cgi-bin/ /var/cgi-bin/
    <Directory "/var/cgi-bin">
        AllowOverride None
        Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
        Require all granted
    </Directory>

    ProxyPass        /starman/ http://localhost:81/
    ProxyPassReverse /starman/ http://localhost:81/

    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

The lines with ProxyPass in them are only needed for the Starman-version. The /var/cgi-bin mapping is only needed for the CGI version and for running the PSGI version in CGI mode.

The Systemd configuration file

examples/cgi-to-psgi/files/etc/systemd/system/starman.service

[Unit]
Description=Starman

[Service]
# Environment=key=value

Type=simple
User=root
Group=root
ExecStart=starman --workers 3 --port 81 /var/cgi-bin/app.psgi
Restart=always
WorkingDirectory=/var/cgi-bin/
Nice=19
LimitNOFILE=16384

[Install]
WantedBy=multi-user.target

This configuration file is needed to create a Starman service (or Daemon as it is usually called in the Unix/Linux world).

It has instruction on how to run the starman command. In this version Starman is configured to listen on port 81 for all the traffic. This makes it easier to debug, but not a recommended choice for production.

On a real production environment it would either only listen to requests coming from the localhost or it would use a socket.

We also configured it with 3 workers so it will be able to handle 3 concurrent requests. (On the PerlMaven server we use 20 workers to handle the load.)

The Rexfile to install it all

examples/cgi-to-psgi/Rexfile

use strict;
use warnings;
use Rex -feature => [qw( 1.4 exec_autodie)];

# Plan:
# Setup and configure Apache
# Add a simple CGI script written in Perl using CGI.pm
# Convert the script to use PSGI and run it as CGI.
# Set up Starman to run the PSGI without the CGI.
# Map Apache as a reverse proxy to be able to serve from Starman as well.

group all => qw(104.131.31.71);
user 'root';

desc 'Just printing hostname';
task 'print_hostname', sub {
    say run('hostname');
};

desc 'Install Apache';
task 'install_apache', sub {
    update_package_db;
    pkg 'apache2', ensure => 'present';
    service 'apache2', ensure => 'started';
};

desc 'Setup Apache';
task 'setup_apache',  sub {
    file '/etc/apache2/sites-enabled/000-default.conf', ensure => 'absent';
    file '/etc/apache2/sites-enabled/apache.conf',
        source => 'files/etc/apache2/sites-enabled/apache.conf';

    for my $module (qw(cgid proxy proxy_http)) {
        a2enmod($module);

    }


    # TODO collect information if the files were changed and if modules had to be enabled and only restart if something really changed.
    service 'apache2' => 'restart';
};

desc 'Setup CGI';
task 'setup_cgi', sub {
    # mkdir
    file '/var/cgi-bin', ensure => 'directory';

    #copy file, set mode
    file '/var/cgi-bin/app.cgi',
        source => 'files/var/cgi-bin/app.cgi',
        mode => '0755';

    # install CGI.pm
    my $packages = ['libcgi-pm-perl'];
    pkg $packages, ensure => 'present';

    # Verify that we can run /var/cgi-bin/app.cgi
    eval {
        my $out = run('/var/cgi-bin/app.cgi');
        die "Wrong output $out" if index($out, 'Content-Type: text/html; charset=utf8') < 0;
    };
    die "Could not run CGI script on the command line $@" if $@;
};

desc 'Setup PSGI as CGI';
task 'setup_psgi', sub {
    #copy file, set mode
    file '/var/cgi-bin/app.psgi',
        source => 'files/var/cgi-bin/app.psgi',
        mode => '0755';

    # install  Plack
    my $packages =  ['libplack-perl'];
    pkg $packages, ensure => 'present';
};

desc 'Install Starman and configure it as a service';
task 'setup_starman', sub {
    my $packages = ['starman'];
    pkg $packages, ensure => 'present';

    file '/etc/systemd/system/starman.service',
        source => 'files/etc/systemd/system/starman.service';

    service 'starman', ensure => 'started';
};


desc 'Access the CGI script through the public URL';
no_ssh task 'verify_cgi', sub {
    use LWP::UserAgent;
    my $ua = LWP::UserAgent->new;
    my $server = connection->server;
    my $res_get = $ua->get("http://$server/cgi-bin/app.cgi?name=Foo");
    my $html_get = $res_get->decoded_content;
    die "Wrong output '$html_get'" if $html_get ne "Hello Foo\n";

    my $res_post = $ua->post("http://$server/cgi-bin/app.cgi", {name => 'Foo' });
    my $html_post = $res_post->decoded_content;
    die "Wrong output '$html_post'" if $html_post ne "Hello Foo\n";
};

desc 'Access the PSGI script in CGI mode';
no_ssh task 'verify_psgi', sub {
    use LWP::UserAgent;
    my $ua = LWP::UserAgent->new;
    my $server = connection->server;
    my $res_get = $ua->get("http://$server/cgi-bin/app.psgi?name=Foo");
    my $html_get = $res_get->decoded_content;
    die "Wrong output '$html_get'" if $html_get ne "Hello Foo\n";

    my $res_post = $ua->post("http://$server/cgi-bin/app.psgi", {name => 'Foo' });
    my $html_post = $res_post->decoded_content;
    die "Wrong output '$html_post'" if $html_post ne "Hello Foo\n";
};

desc 'Access the PSGI script via Starman';
no_ssh task 'verify_starman', sub {
    use LWP::UserAgent;
    my $ua = LWP::UserAgent->new;
    my $server = connection->server;

    {
        my $res_get = $ua->get("http://$server:81?name=Foo");
        my $html_get = $res_get->decoded_content;
        die "Wrong output '$html_get'" if $html_get ne "Hello Foo\n";
    }

    {
        my $res_post = $ua->post("http://$server:81", {name => 'Foo' });
        my $html_post = $res_post->decoded_content;
        die "Wrong output '$html_post'" if $html_post ne "Hello Foo\n";
    }

    {
        my $res_get = $ua->get("http://$server/starman/?name=Foo");
        my $html_get = $res_get->decoded_content;
        die "Wrong output '$html_get'" if $html_get ne "Hello Foo\n";
    }

    {
        my $res_post = $ua->post("http://$server/starman/", {name => 'Foo' });
        my $html_post = $res_post->decoded_content;
        die "Wrong output '$html_post'" if $html_post ne "Hello Foo\n";
    }
};


task 'setup', sub {
    for my $task (qw(
        install_apache
        setup_apache
        setup_cgi
        setup_psgi
        setup_starman
        verify_cgi
        verify_psgi
        verify_starman
        )) {
        do_task($task); # run task?, batch
    }
};

sub a2enmod {
    my ($module) = @_;
    run("a2enmod $module", unless => qq{apache2ctl -M | grep -P '\\b$module\\b'});
}

# vim: syntax=perl

In order to install the whole demo you need a Virtual Private Server (VPS) running Ubuntu. I used Ubuntu 20.04 running at Digital Ocean. They call their VPS-es "droplets".

I installed Rexify on my local computer. Inserted the IP address of the newly created remote host in the Rexfile instead of the IP address you can see in the group all line. Then ran the following command:

rex -g all setup

It takes a few minutes till it installs everything, but then it also verifies the installation.

I've added some comments to the Rexfile, basically we execute the "setup" task that will execute all the other tasks in a given order.

Verify the results

Once the installation is done you can access the 4 versions (after replacing IP with the correct IP address) as:

http://IP/cgi-bin/app.cgi    - The regular CGI implementation.
http://IP/cgi-bin/app.psgi   - The PSGI implementation running in CGI mode.
http://IP:81                 - The Starman directly, enabled only for debugging.
http://IP/starman/           - The Starman via the Apache server using a Reverse Proxy setting.

To manually verify the proper working type in:

http://IP/cgi-bin/app.cgi?name=Foo Bar
http://IP/cgi-bin/app.psgi?name=Foo Bar

http://IP:81?name=Foo Bar
http://IP/starman/?name=Foo Bar

To check the persistence of the Plackup/PSGI/Starman solution vs. the CGI-mode solution try the following requests. Try to reload each one several times. The first 2 will show an always changing (growing) number as each request is handled by a separate process.

The 3rd and 4th will randomly show any of 3 fixed numbers as there are 3 workers waiting to handle the requests.

http://IP/cgi-bin/app.cgi?pid=1
http://IP/cgi-bin/app.psgi?pid=1

http://IP:81/?pid=1
http://IP/starman/?pid=1

Written by
Gabor Szabo

Published on 2021-04-08

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post