Parse numbered list in Markua

Prev Next

I am not sure if we have fully implemented the bulleted list parsing, but we can sat that aside for now and start to implement the numbered lists. It seems to be a lot more complicated as the bulleted list. There are a lot more options.

Also I have a feeling that there are some issues in the spec or just some thing that I have not understood yet. I've asked Peter, the author about them. We'll see.

Anyway let's start with some simple cases and then learn more and make progress that way.

Start with simple test casses

As before we start with the creation of test cases.

examples/markua-parser/60c8214/t/input/numbered-list-1-3.md


1. First line
2. Second line
3. Third line


1) First row
2) Second row 
3) Third row

Processing numbered lists

In the case of the bulleted lists we constantly checked if the list is still as expected and we maintained an attribute called "ok" to indicated if it is still ok or if there was a problem.

In the case of the numbered lists this is going to be a lot harder the numbers can be ascendig, descending, or fixed. In order to know which one and if the list behaves properly we would need the information for the whole list.

So let's delay the decision if list being ok or not ok to the end of processing the list in the save_tag function.

Every time when we encounter a line that looks like part of a numbered list we store the parsed data structure about it in the "list" attribute. Including the origial line as "raw". That will be useful in case we need to revert the whole thing to be a simple paragraph entry.

In the save_tag we should verify that the number is correct, but for now we don't do that. We just add a TODO note to remind us to come back to this later. Fir now we assume that the list is correct.

We remove the attributes that we don't really need to save ("raw", "sep", "space"). They will be useful for the verification step that we are skipping now.

examples/markua-parser/60c8214/lib/Markua/Parser.pm

package Markua::Parser;
use strict;
use warnings;
use Path::Tiny qw(path);

our $VERSION = 0.01;

sub new {
    my ($class) = @_;
    my $self = bless {}, $class;
    return $self;
}

sub parse_file {
    my ($self, $filename) = @_;
    my $path = path($filename);
    my $dir = $path->parent->stringify;
    my @entries;
    my @errors;
    my $cnt = 0;

    $self->{text} = '';

    for my $line ($path->lines_utf8) {
        $cnt++;
        if ($line =~ /^(#{1,6}) (\S.*)/) {
            push @entries, {
                tag => 'h' . length($1),
                text => $2,
            };
            next;
        }

        # numbered list
        if ($line =~ m{\A(\d+)([.\)])( {1,4}|\t)(\S.*)}) {
            my ($number, $sep, $space, $text) = ($1, $2, $3, $4);
            if (not $self->{tag}) {
                $self->{tag} = 'numbered-list';
                $self->{list} = [];
            }

            if ($self->{tag} eq 'numbered-list') {
                push @{ $self->{list} }, {
                        number => $number,
                        sep    => $sep,
                        space  => $space,
                        text   => $text,
                        raw    => $line,
                };
                next;
            }

            die "What to do if a numbered list starts in the middle of another element?";
        }

        # bulleted list
        if ($line =~ m{\A([\*-])( {1,4}|\t)(\S.*)}) {
            my ($bullet, $space, $text) = ($1, $2, $3);
            if (not $self->{tag}) {
                $self->{tag} = 'list';
                $self->{list}{type} = 'bulleted';
                $self->{list}{bullet} = $bullet;
                $self->{list}{space} = $space;
                $self->{list}{ok} = 1;
                $self->{list}{items} = [$text];
                $self->{list}{raw} = [$line];
                next;
            }

            if ($self->{tag} eq 'list') {
                if ($self->{list}{type} ne 'bulleted' or
                    $self->{list}{bullet} ne $bullet  or
                    $self->{list}{space} ne $space) {
                    $self->{list}{ok} = 0;
                }
                push @{ $self->{list}{raw} }, $line;
                push @{ $self->{list}{items} }, $text;
                next;
            }

            die "What to do if a bulleted list starts in the middle of another element?";
        }

# I should remember to always use \A instead of ^ even thoygh here we are really parsing lines so those two are the same
        if ($line =~ /\A ! \[([^\]]*)\]    \(([^\)]+)\)  \s* \Z/x) {
            my $title = $1;
            my $file_to_include = $2;
            eval {
                my $text = path("$dir/$file_to_include")->slurp_utf8;
                push @entries, {
                    tag   => 'code',
                    title => $title,
                    text  => $text,
                };
            };
            if ($@) {
                push @errors, {
                    row => $cnt,
                    line => $line,
                    error => "Could not read included file '$file_to_include'",
                };
            }
            next;
        }

        # anything else defaults to paragraph
        if ($line =~ /\S/) {
            $self->{tag} = 'p';
            $self->{text} .= $line;
            next;
        }

        if ($line =~ /^\s*$/) {
            $self->save_tag(\@entries);
            next;
        }

        push @errors, {
            row => $cnt,
            line => $line,
        }
    }
    $self->save_tag(\@entries);
    return \@entries, \@errors;
}

sub save_tag {
    my ($self, $entries) = @_;

    if ($self->{tag} and $self->{tag} eq 'numbered-list') {
        # TODO: verify that it is a proper list
        for my $row (@{ $self->{list} }) {
            delete $row->{raw};
            delete $row->{sep};
            delete $row->{space};
        }
        push @$entries, {
            tag => $self->{tag},
            list => $self->{list},
        };
        $self->{tag} = undef;
        delete $self->{list};
        return;
    }


    if ($self->{tag} and $self->{tag} eq 'list') {
        if ($self->{list}{ok}) {
            delete $self->{list}{raw};
            delete $self->{list}{ok};
            delete $self->{list}{space};
            delete $self->{list}{bullet};
            push @$entries, {
                tag => $self->{tag},
                list => $self->{list},
            };
            $self->{tag} = undef;
            delete $self->{list};
            return;
        }

        # If it is a failed list, convert it to paragraph
        $self->{tag} = 'p';
        $self->{text} = join '', @{ $self->{list}{raw} };
        delete $self->{list};
    }

    if ($self->{tag}) {
        $self->{text} =~ s/\n+\Z//;
        push @$entries, {
            tag => $self->{tag},
            text => $self->{text},
        };
        $self->{tag} = undef;
        $self->{text} = '';
    }
    return;
}


1;

Expected DOM

When we run perl bin/generate_test_expectations.pl we generate the expected DOM that looks like this:

examples/markua-parser/60c8214/t/dom/numbered-list-1-3.json

[
   {
      "list" : [
         {
            "number" : "1",
            "text" : "First line"
         },
         {
            "number" : "2",
            "text" : "Second line"
         },
         {
            "number" : "3",
            "text" : "Third line"
         }
      ],
      "tag" : "numbered-list"
   },
   {
      "list" : [
         {
            "number" : "1",
            "text" : "First row"
         },
         {
            "number" : "2",
            "text" : "Second row "
         },
         {
            "number" : "3",
            "text" : "Third row"
         }
      ],
      "tag" : "numbered-list"
   }
]

New commit

git add .
git commit -m "parse numbered list"
git push

commit

Coveralls reports: coverage decreased (-16.9%) to 80.435% for commit: parse numbered list

Looking at the report, I just realize I've forgotten to add the new test case to the list in the t/01-test.t file.

This is the second time I make this mistake. How could I avoid it?

Avoid forgetting to add test-case

I can stop having a fixed list of test-cases, just go over the list of files in the t/input/ directory. That would work, but then I don't have control over the order of the tests cases. Which, thinking about it again, might no be an issue.

The alternative would be to add a test case that checks if the list I have in the test-file is the full list of files in the t/input directory. This can also make sure I don't skip any of the test files by mistake. Then again, if I do skip some tests cases the test coverage will fall and I'll know about it.

Both solution assume that I'd want to run all the test cases. This is correct now, though if I ever need to reimplement the parser I might want to start with the re-implementation of only some of the test cases.

That's not now. I guess I should not worry about it.

So I replaced

my @cases = ('heading1', 'headers', 'paragraphs', 'include', 'bulleted-list', 'bulleted-list-dash');

my @cases = sort map { substr $_, 8, -3 } glob 't/input/*.md';

git add .
git commit -m"read the list of test cases from the disk"
git push

commit

Prev Next

Written by
Gabor Szabo

Published on 2020-05-06

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post

Parse numbered list in Markua

Prev Next

Start with simple test casses

Processing numbered lists

Expected DOM

New commit

Avoid forgetting to add test-case

Prev Next

Author: Gabor Szabo