Start writing the Markua parser in Perl

Prev Next

Markua is a Magical Typewriter. It is a Markdown-inspired format to write books. It was created by Peter Armstrong and use by LeanPub for writing books.

In this project I am going to create a Markua parser in Perl 5, or at least I start doing it and will implement enough of so I can start writing the Perl Maven articles in Markua. That will allow me to easily include Perl Maven articles in an eBook published on LeanPub. For example to create the eBook of the Perl Maven Tutorial.

Set up Git and GitHub repository

On my local disk created a new directory called "perl5-markua-parser", and in it a README.md file which is a readme file written in Markdown format for GitHub to display nicely.

$ mkdir perl5-markua-parser
$ cd perl5-markua-parser

# Created README.md using vim

The README.md file:

examples/markua-parser/605d7df/README.md

# Markua Parser

[Markua](https://leanpub.com/markua/) is a Markdown-inspired format to write books.

This module implements parsing (part of) the Markua specification.

Set it up as a local git repository and committed the first change:

$ git init
$ git add README.md
$ git commit -m "start with a readme"

Then I've created a new repository on GitHub called perl5-markua-parser, I've told my local git repository about the remote repository, and pushed out the first changes.

$ git remote add origin git@github.com:szabgab/perl5-markua-parser.git
$ git push -u origin master

commit.

Create constructor and test it

Before we start writing the parser, let's create the skeleton of the module with a constructor and a test-case for them. I've created a directory called "lib/Markua" and a file called "Parser.pm" in it.

$ mkdir -p lib/Markua

examples/markua-parser/532b1b1/lib/Markua/Parser.pm

package Markua::Parser;
use strict;
use warnings;

sub new {
    my ($class) = @_;
    my $self = bless {}, $class;
    return $self;
}


1;

For details read getting started with classic Perl OOP or constructor in core Perl.

The corresponding test was saved in the new 't' directory we just created:

$ mkdir t

examples/markua-parser/532b1b1/t/01-test.t

use strict;
use warnings;

use Test::More;
use Markua::Parser;

plan tests => 1;

my $m = Markua::Parser->new;
isa_ok $m, 'Markua::Parser';

Nothing fancy. Just checking if the generated object is an instance of the class.

We can run the tests by typing in

$ prove -l

$ git add .
$ git commit -m "create module with constructor and test it"

commit

Start parsing

Before writing the parser, let's write a simple test-case for it. In the 't' directory I've created a subdirectory called 'input' where we are going to store the sample input files.

$ mkdir t/input

In there I've created a simple Markua file:

examples/markua-parser/491850e/t/input/heading1.md

# Heading One

The parser is expected to create a Perl data structure.

I've also created a directory called 't/dom' that will contain the expected data structures in JSON format. (DOM stands for Document Object Model.)

$ mkdir t/dom

In there I've placed the first such JSON file:

examples/markua-parser/491850e/t/dom/heading1.json

[
    {
        "tag" : "h1",
        "text" : "Heading One"
    }
]

In the test file we load two modules, Path::Tiny for easy reading of the JSON file and JSON::MaybeXS to parse the JSON string.

use JSON::MaybeXS qw(decode_json);
use Path::Tiny qw(path);

the test code itself is another 2 lines:

my $result = $m->parse_file('t/input/heading1.md');
is_deeply $result, decode_json( path('t/dom/heading1.json')->slurp_utf8 );

In the first line we use the not yet implemented parse_file method that receives the path to the Markua file and returns the data structure. Or so it will do once we implement it. The second line uses the is_deeply function from Test::More to compare the data structure generated by the Markua parser to the expected data structure that was read in from the JSON file and converted to a Perl data structure by decode_json.

The full test file is here:

examples/markua-parser/491850e/t/01-test.t

use strict;
use warnings;

use Test::More;
use JSON::MaybeXS qw(decode_json);
use Path::Tiny qw(path);
use Markua::Parser;

plan tests => 2;

my $m = Markua::Parser->new;
isa_ok $m, 'Markua::Parser';

my $result = $m->parse_file('t/input/heading1.md');
is_deeply $result, decode_json( path('t/dom/heading1.json')->slurp_utf8 );

Then finally the implementation of the parser itself uses Path::Tiny to read in the Markua source file and then uses regexes to parse the lines. Very simple, but works for the first test case:

examples/markua-parser/491850e/lib/Markua/Parser.pm

package Markua::Parser;
use strict;
use warnings;
use Path::Tiny qw(path);

sub new {
    my ($class) = @_;
    my $self = bless {}, $class;
    return $self;
}

sub parse_file {
    my ($self, $filename) = @_;
    my @entries;
    for my $line (path($filename)->lines_utf8) {
        if ($line =~ /^# (\S.*)/) {
            push @entries, {
                tag => 'h1',
                text => $1,
            };
        }
    }
    return \@entries;
}


1;

The parse_file method expects two paramers. The instance object represnting the current parser and the name of the file to be parsed.

We create an empty array called @entries that will hold the parsed DOM.

Then we ue the lines_utf8 method of the Path::Tiny object to read in all the lines of the Markua file and go over line-by-line using a for loop.

In the /^# (\S.*)/ regex the leading ^ forces the regex to look for a match at the beginning of the sting. # then tells it to match those two character immediately after the beginning of the string. Whatever is matched by the rge within the pair of parentheses () will be saved in the variable $1. In the regex inside the parentheses \S means any non-white-space character, . means any character (except of newline) and * tells the dot to match 0 or more so in other words the regex inside the parentheses will match any string of any length, it just has to start with something visible. (So there can't be 2 spaces after the initial #.)

I am not sure if this is the correct regex for the specification of Markua, for that I'd need to read it more thoroughly, but for now it works for us and it satisfies our test. We can always improve it later.

If the regex matches we create an reference to a hash with the name of the tag h1 and the value or "text" of it which the text that followed the #. We take the anonymous hash and push it (append it) to the @entries array.

At the end we return a reference to the @entries array.

$ git add .
$ git commit -m "first parsing of an h1 tag"
$ git push

commit

To be continued

In the meantime go and support the crowdfunding campaign.

Prev Next

Written by
Gabor Szabo

Published on 2018-03-02

Comments

In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.

comments powered by Disqus

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post

Start writing the Markua parser in Perl

Prev Next

Set up Git and GitHub repository

Create constructor and test it

Start parsing

To be continued

Prev Next

Comments

Author: Gabor Szabo