Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Perl

Preface

Content

  • Introduction to writing Perl
  • Many code snippets that can be used as base or as example
  • Introduction to basic syntax
  • Many examples explained
  • Many examples where not all the syntax is explained

Introdcution

Installation

Editors

IDEs

Perl on the command line

  • -e|options
  • -v|options

On the command line one can type:

  • perl -e "print 42"
  • perl -v
  • perl -V
  • perl -E "say 42"

See Perl One-liners.

First script

  • string
  • number
  • "
  • ;
  • print
  • \n
  • sh-bang
#!/usr/bin/env perl
use v5.8.7;   # use 5.008007;
use strict;
use warnings;

print "Hello world\n";
print 42, "\n";

run it by typing perl hello_world.pl

On unix you can also make it executable: chmod u+x hello_world.pl and then run like: ./hello_world.pl

A couple of notes

  • Strings and numbers
  • Strings must be quoted, you can use special characters such as "\n"
  • The print statement (Output) - gets comma delimitered list of things
  • ; after every statement

Safety net

  • -w|options
  • strict
  • warnings
  • diagnostics
#!/usr/bin/perl
use strict;
use warnings;

You should always use them as they are a safety net helping reduce mistakes.

It is usually very hard to add this safety net after you already have some code.

If the warnings you get don't make sense add

use diagnostics;

line and you will get more verbose warnings.

Why are use warnings and use strict so important even in small (< 100 lines) scripts ?

First sub

  • sub
#!/usr/bin/env perl
use v5.8.7;   # use 5.008007;
use strict;
use warnings;

sub main {
    print "Hello world\n";
    print 42, "\n";
}

main();

  • Putting all your code in small(!) functions will make your code better.

First say

  • say
#!/usr/bin/env perl
use v5.10;   # use 5.010;
use strict;
use warnings;

sub main {
    say "Hello world";
    say 42;
}

main();


print vs. say

say  "hello";    # print a newline at the end. (available since perl 5.10)
print "hello";   # print as it is
print "hello\n"; # print a newline at the end.

Debugging

  • -d|options

  • debug

  • Devel::ptkdb

  • Running script perl mycode.pl

  • Use the built-in debugger perl -d mycode.pl

  • Install Devel::ptkdb and run ** perl -d:ptkdb mycode.pl**

  • IDEs have debuggers

  • Include print statements in the critical places of your code.

  • use Data::Dumper qw(Dumper);

Avoid global variables

  • Not really Perl specific, but declare variables as late as possible!
  • Don't use global variables!
  • You will thank yourself a year later.

Keep it Simple - KISS

  • A script has to be simple, readable and reusable.
  • Sophisticated optimization and algorithms are less interesting.

Comments

# Comments for other developers

print 42; # the answer

Perl documentation

  • perldoc

  • perldoc perl

  • perldoc perlsyn

  • perldoc perlfunc

  • perldoc -f print

  • perldoc -q sort

  • perldoc perlrun

  • perldoc strict

  • perldoc warnings

Web based: perldoc

Core Perl documentation and CPAN module documentation

POD - Plain Old Documentation

  • POD
  • =head1
  • =head2
#!/usr/bin/perl
use strict;
use warnings;

print "Hello, there is no more code here\n";

=head1 Explaining how PODs work

Documentation starts any time there is a  =tag
at the beginning of a line (tag can be any word)
and ends where there is a =cut at the beginning
of a line.

Around the =tags you have to add empty rows.

A few example tags:

 Main heading           =head1

 Subtitle               =head2

 Start of indentation   =over 4

 element                =item *

 end of indentation     =back

Documentation of PODs can be found in B<perldoc perlpod>

See a few examples:

=head1 Main heading

text after main heading

=head2 Less important title

more text

 some text shown verbatim
 more verbatim text typed in indented to the right

=over 4

=item *

Issue

=item *

Other issue

=back

documentation ends here

=cut

print "Just documentation\n";

  • perl examples/firstperl/documentation.pl
  • perldoc examples/firstperl/documentation.pl

POD - Plain Old Documentation

Exercise: Hello world

Try your environment:

  • Make sure you have access to the right version of Perl (5.8.x) or newer
  • Check you can read the documentation. (perldoc)
  • Check if you have a good editor with syntax highlighting
  • Write a simple script that prints Hello world
  • Add comments to your code
  • Add user documentation to your code

Scalars

Scalars intro

A single piece of data either a number or a string is called a 'scalar' in Perl.

Scalar values

  • undef
  • a number
  • a string
  • a reference to any other data structure or function

undef

  • undef in Perl
  • NULL in SQL
  • None in Python
  • null in PHP or JavaScript

Numbers - integers, real or floating-point

integer (decimal)

No need for quotes around the value.

26
1_234_567                # like 1,234,567 in human writing

integer (hex/oct/binary)

0x1a            # hex     also written as    hex("1a");
032             # oct     also written as    oct("32");
0b11010         # binary  also written as    oct("0b11010");
                # all 3 equal to 26 decimal

real or floating-point

3.5e+3          # 3500

Strings

"Hello world";

or

'Hello world';

Scalar variables (use my)

  • my

  • scalar variables

  • Scalar variables always start with a $ sign, name is alphanumeric (a-zA-Z0-9) and underscore (_)

  • A scalar variable can hold a string, a number, a reference to another data-structure, undef

  • Value assignment to variable is done by the = sign

  • Use the my keyword to declare variables (optional but recommended)

$this_is_a_long_scalar_variable
$ThisIsAlsoGoodButWeUseItLessInPerl
$h
$H             # $h and $H are two different variables
#!/usr/bin/perl
use strict;
use warnings;

my $greeting   = "Hello world\n";
my $the_answer = 42;
my $name;                   # undef
print $greeting;
print $the_answer, "\n";

$name = 'Foo Bar';
print $name, "\n";

$the_answer = "Hi, you two";

The value can be changed any time.

Scalar variables

Sigils and variables

  • $
  • @
  • %
  • &

Variables in Perl are Sigil + variable name:

  • $ - scalar
  • @ - array
  • % - hash
  • & - function (only used in references)
      • everything (just in very extreme code)
$name_of_a_scalar
@name_of_an_array
%name_of_a_hash

Common error messages

  • undef
Global symbol "$x" requires explicit package name at ..

You need to declare the variable $x by my.

Use of uninitialized value $x in ... at ...

$x contained undef.

Name "main::x" used only once: possible typo at ...

What it said. It probably refers to $x.

Can't locate Module/NameX.pm in @INC (@INC contains: ... )

You probably have "use Module::NameX" in your code meaning you are trying to load Module::NameX. Either there is a typo in the name of the module (e.g. in our case it is probably called Module::Name) or you need to install Module::NameX.

Scalar found where operator expected at ...

Probably a , is missing between parameters of a function?

Greeting with a name, Variable interpolation

  • interpolation|scalar
#!/usr/bin/perl
use strict;
use warnings;

my $name = "Foo";
print "Hello ", $name, " - how are you ?\n";


print "Hello $name - how are you ?\n";

User Input

  • <STDIN>
  • STDIN
#!/usr/bin/perl
use strict;
use warnings;

print "Enter your name, please: ";
my $name = <STDIN>;
print "Hello $name - how are you ?\n";

  • STDIN - Standard Input (usually it is the keyboard)
  • Reading one line (till ENTER) from STDIN
$ perl examples/scalars/read_from_stdin.pl
Enter your name, please: Foo
Hello Foo
 - how are you ?

There is this problem of the newline

chomp

  • chomp
#!/usr/bin/perl
use strict;
use warnings;

print "Enter your name, please: ";
my $name = <STDIN>;
chomp $name;
print "Hello $name - how are you ?\n";

chomp will remove the new line "\n" character from the end of the string if there was one.

Numerical Operators

  • operators|numerical
  • /

  • %
  • assignment
  • shorthand
  • =
  • ==
  • *=
  • print
  • variable interpolation
  • operator precedence
  • operator associativity
#!/usr/bin/perl
use strict;
use warnings;

my $x = 3;
my $y = 11;

my $z = $x + $y;
print "$z\n";         # 14

$z = $x * $y;
print "$z\n";         # 33
print $y / $x, "\n";  # 3.66666666666667

$z = $y % $x;         # (modulus)
print "$z\n";         # 2

$z += 14;             # is the same as   $z = $z + 14;
print "$z\n";         # 16

$z++;                 # is the same as   $z = $z + 1;
$z--;                 # is the same as   $z = $z - 1;

$z = 23 ** 2;         # exponentiation
print "$z\n";         # 529

See also perldoc perlop for all the operators.

Autoincrement

  • auto-increment
  • auto-decrement
  • ++
  • --
#!/usr/bin/perl
use strict;
use warnings;

my $x = 7;

# Postfix ++ increments AFTER the OLD value was used
my $y = $x++;  
print "y = $y, x = $x\n";     # y = 7,  x = 8,


$x = 7;

$y = ++$x; 
print "y = $y, x = $x\n";     # y = 8, x = 8

String Operators

  • operators|string
  • .
  • x
#!/usr/bin/perl
use strict;
use warnings;

my $x = "Hello";
my $y = "World";

# . is the concatenation operator, ataching ons string after the other
my $z = $x . " " . $y;  #       the same as "$x $y"
print $z, "\n";         # Hello World

my $w = "Take " . (2 + 3);     # you cannot write "Take (2 + 3)" here
print "$w\n";           # Take 5

$z .= "! ";             #       the same as  $z = $z . "! ";
print "'$z'\n";         # 'Hello World! '

# x is the string repetition operator
my $q = $z x 3;
print "'$q'\n";         # 'Hello World! Hello World! Hello World! '

See also perldoc perlop for all the operators.

String - Number conversion

  • automatic conversion
#!/usr/bin/perl
use strict;
use warnings;

print 3   . "", "\n";
print 3.1 . "", "\n";

print "3"   + 0, "\n";
print "3.1" + 0, "\n";

print "3x"  + 0, "\n"; # warning: Argument "3x" isn't numeric in addition (+)
print "3\n" + 0, "\n"; 
print "3x7" + 0, "\n"; # warning: Argument "3x7" isn't numeric in addition (+)

print ""    + 0, "\n"; # warning: Argument "" isn't numeric in addition (+)
print "z"   + 0, "\n"; # warning: Argument "z" isn't numeric in addition (+)
print "z7"  + 0, "\n"; # warning: Argument "z7" isn't numeric in addition (+)

Conditional statements: if

  • if
  • ==
  • indentation

Sometimes based on some condition a piece of code has to be executed or not.

#!/usr/bin/perl
use strict;
use warnings;

print "First number: ";
my $x = <STDIN>;
chomp $x;

print "Second number: ";
my $y = <STDIN>;
chomp $y;

if ($y == 0) {
    print "Cannot divide by zero\n";
} else {
    my $z = $x / $y;
    print "The result is $z\n";
}

Syntax of if statement

  • else
  • elsif
  • indentation

{} are always required

if (COND) {
    STATEMENTs;
}


if (COND) {
    STATEMENTs;
} else {
    STATEMENTs;
}

if (COND_1) {
    A_STATEMENTs;
} else {
    if (COND_2) {
        B_STATEMENTs;
    } else {
        if (COND_3) {
            C_STATEMENTs;
        } else {
            D_STATEMENTs;
        }
    }
}


if (COND_1) {
    A_STATEMENTs;
} elsif (COND_2) {
    B_STATEMENTs;
} elsif (COND_3) {
    C_STATEMENTs;
} else {
    D_STATEMENTs;
}

Comparing scalars in Perl

Comparison operators

  • comparison operators
  • ASCII
  • ==
  • <=
  • eq
  • ne
  • lt

Two sets of relation operators. One is to compare numerically the other is to compare based on the ASCII table

| Numeric | String (ASCII) | Meaning | | == | eq | equal | | != | ne | not equal | | < | lt | less than | | > | gt | greater than | | <= | le | less than or equal | | >= | ge | greater than or equal |

Compare values - examples

| Expression | Value | | "12.0" == 12 | TRUE | | "12.0" eq 12 | FALSE | | 2 < 3 | TRUE | | 2 lt 3 | TRUE | | 12 > 3 | TRUE | | 12 gt 3 | FALSE ! | | "foo" == "" | TRUE ! (Warning) | | "foo" eq "" | FALSE | | "foo" == "bar" | TRUE ! (Warning) | | "foo" eq "bar" | FALSE |

When reading from STDIN you can always expect a string

#!/usr/bin/perl
use strict;
use warnings;

my $input = <STDIN>;
chomp $input;
if ($input == "") {     # wrong! use eq
    # empty string 
}

undef, the initial value and defined

$q = $x + 1;        # is 1 + warning as the default number is 0
$w = $y . "abc";    # is 'abc' + warning as the default string is ""
$z++;               # is 1 - no warning

You can check if the variable already has a value or if it still undef:

if (not defined $x) {   # to avoid warning
     $x = 0;
}
$x = $x + 1;

Boolean expressions (logical operators)

  • and
  • or
  • not
  • xor
  • &&
  • ||

| and | && | | or | || | | not | ! | | xor | |

if (COND and COND) {
}

if (COND or COND) {
}

if (not COND) {
}

See also perldoc perlop for precedence and associativity tables and/or use () to define the order of evaluation.

Boolean values: TRUE and FALSE

  • undef
  • defined
  • boolean values
  • TRUE
  • FALSE
if ($z) {
    # $z is true
}
The FALSE values:

undef
""
0  0.0  00000  0e+10
"0"

All other values, such as the following are TRUE:

1
"00"
"0\n"

In many cases the separation must be between "real" values and undefined values. For that you can use the defined function:

if (defined $x) {
    # $x is defined (not undef)
}

Your Salary is in Danger - Short-Circuit

  • short circuit
If perl already knows the final value of a boolean expression after computing
only part of it, perl will NOT calculate the rest of the expression:
if ($my_money > 1_000_000 or $my_salary > 10_000) {
    # I can live well
}
if ($my_money > 1_000_000 or $my_salary++ > 10_000) {
    # I can live well
}

Exercise: Rectangular

Write a program that computes the area of a rectangular ($length * $width) and prints it. Use two variables and hard code their values.

Exercise: Rectangular prompt

Modify the previous program to prompt for the two values (on two separate lines)

Exercise: Rectangular warn

Modify the previous area-calculator program to print a warning if one of the values is negative and make the area 0 sized.

Exercise: Concatenation

Script that gets two strings (on two separate lines) and prints the concatenated version.

Exercises: Simple Calculator

Write a script that will ask for a number, an operator (+,*,-,/) and another number. Compute the result and print it out.

Solution: Rectangular

#!/usr/bin/perl
use strict;
use warnings;

my $length = 3;
my $width  = 5;
my $area   = $length * $width;
print "$area\n";

Solution: Rectangular prompt

#!/usr/bin/perl
use strict;
use warnings;

print "length: ";
my $length = <STDIN>;
print "width: ";
my $width  = <STDIN>;
my $area   = $length * $width;
print "$area\n";

Solution: Rectangular warn

#!/usr/bin/perl
use strict;
use warnings;

print "length: ";
my $length = <STDIN>;
chomp($length);

print "width: ";
my $width = <STDIN>;
chomp($width);

if ( $length < 0 ) {
    $length = 0;
    print "WARN: The first value was negative\n";
}
if ( $width < 0 ) {
    $width = 0;
    print "WARN: The second value was negative\n";
}
my $area = $length * $width;
print "The area is $area\n";

Solution: Concatenation

#!/usr/bin/perl
use strict;
use warnings;

print "Type in two strings, I'll print them out concatenated\n\n";
print "The first string: ";
my $first = <STDIN>;
chomp($first);

print "The second string: ";
my $second = <STDIN>;
 
print ("\nThe concatenated string: ", $first . $second);
 
# alternative solution:
# print ("The concatenated string: $first$second");

# don't forget to chomp the input strings.
# if you use them as numbers then you don't have to chomp.

Solution: Simple Calculator

#!/usr/bin/perl
use strict;
use warnings;

print "Type in 2 numbers and an operator and I'll print the results\n\n";
 
print "First number: ";
my $first = <STDIN>;
chomp($first);
 
print "Second number: ";
my $other = <STDIN>;
chomp($other);
 
print "The operator: ";
my $oper = <STDIN>;
chomp($oper);
 
my $result;
if ($oper eq "+") { $result = $first + $other; }
if ($oper eq "-") { $result = $first - $other; }
if ($oper eq "*") { $result = $first * $other; }
if ($oper eq "/") {
    if ($other == 0) {
        print "\nCannot divide by 0\n";
        $result = "ERROR";
    } else {
        $result = $first / $other;
    }
}
 
print "\nResult of $first $oper $other = $result\n";

# What if the given operator is not one of the 4 ?

Solution: Simple Calculator (using eval)

#!/usr/bin/perl
use strict;
use warnings;

print "Type in 2 numbers and an operator and I'll print the results\n\n";
 
print "First number: ";
my $first = <STDIN>;
chomp($first);
 
print "Second number: ";
my $other = <STDIN>;
chomp($other);
 
print "The operator: ";
my $oper = <STDIN>;
chomp($oper);
 
my $result = eval "$first $oper $other";
 
print "\nResult of $first $oper $other = $result\n";

String functions (length, lc, uc, index)

  • length

  • lc

  • uc

  • ucfirst

  • lcfirst

  • length STRING - number of characters

  • lc STRING - lower case

  • uc STRING - upper case

  • lcfirst STRING - lower case

  • ucfirst STRING - upper case

use 5.010;
use strict;
use warnings;

my $name = 'Foo Bar';
say length $name;                 # 7
say length "Hello World!\n";      # 13
say length 19;                    # 2

String functions (index, rindex)

  • index

  • rindex

  • index STRING, SUBSTRING - the location of a substring given its content

  • rindex STRING, SUBSTRING - the location of a substring given its content

#!/usr/bin/perl
use strict;
use warnings; 

my $s = "The black cat jumped from the green tree";

print index $s, "ac";                         # 6
print "\n";

print index $s, "e";                          # 2
print "\n";
print index $s, "e", 3;                       # 18
print "\n";

print index $s, "dog";                        # -1
print "\n";

print rindex $s, "e";                         # 39
print "\n";
print rindex $s, "e", 38;                     # 38
print "\n";
print rindex $s, "e", 37;                     # 33
print "\n";

String functions: length, lc, uc, index, substr

String functions (substr)

  • substr

substr STRING, OFFSET, LENGTH - the content of a substring given its location

#!/usr/bin/perl
use strict;
use warnings; 

my $s = "The black cat climbed the green tree";
my $z;
$z = substr $s, 4, 5;
print "$z\n";                         # $z = black
$z = substr $s, 4, -11;
print "$z\n";                         # $z = black cat climbed the 
$z = substr $s, 14;
print "$z\n";                         # $z = climbed the green tree
$z = substr $s, -4;
print "$z\n";                         # $z = tree
$z = substr $s, -4, 2;
print "$z\n";                         # $z = tr

$z = substr $s, 14, 7, "jumped from";
print "$z\n";                         # $z = climbed
print "$s\n";          # $s = The black cat jumped from the green tree

Strings - Double quoted

  • qq
use 5.010;
use strict;
use warnings;

say "normal string";               # normal string
say "two\nlines";                  # two
                                     # lines
say "another 'string'";            # another 'string'

my $name = "Foo";
say "Hello $name, how are you?";   # Hello Foo, how are you?

# say "His "real" name is Foo";      # ERROR - Bareword found where operator expected
say "His \"real\" name is Foo";    # His "real" name is Foo


say "His \"real\" name is \"$name\"";  # His "real" name is "Foo"
say qq(His "real" name is "$name");    # His "real" name is "Foo"

say qq(His "real" name is ($name));    # His "real" name is (Foo)
# say qq(His "real" name )is( $name);    # ERROR - Bareword found where operator expected
say qq[His "real" name )is ($name)];   # His "real" name )is (Foo)

In double quoted strings you can use the following:

  • Backslash escapes sequences like \n \t see in perldoc perlop
  • Variable interpolation

Strings - Single quoted

  • q
print 'one string';               # one string
print 'a\n';                      # a\n
print 'a $name';                  # a $name
print 'another "string"';         # another "string"

There are only two special characters in this kind of string the ' and the \ at the end of the string

print 'a'b';                      # ERROR - perl will see the string 'a'
                                  #         and something attached to it
print 'a\'b';                     # a'b

print q(His "variable" name '$name'\n);     # His "variable" name is '$name'\n

Scope of variables

  • scope

Variables defined within a block {} are hiding more global variables with the same name. They are destructed when leaving the block.

#!/usr/bin/perl
use strict;
use warnings;

{
    my $email = 'foo@bar.com';
    print "$email\n";     # foo@bar.com
}
# print $email;
# $email does not exists
# Global symbol "$email" requires explicit package name at ...

Scope of variables - 2

#!/usr/bin/perl
use strict;
use warnings;

my $lname = "Bar";
print "$lname\n";        # Bar

{
    print "$lname\n";    # Bar
    $lname = "Other";
    print "$lname\n";    # Other
}
print "$lname\n";        # Other

Scope of variables - 3

#!/usr/bin/perl
use strict;
use warnings;

my $fname = "Foo";
print "$fname\n";        # Foo

{
    print "$fname\n";    # Foo

    my $fname  = "Other";
    print "$fname\n";    # Other
}
print "$fname\n";        # Foo


Scope of variables - 4

#!/usr/bin/perl
use strict;
use warnings;

{
    my $name  = "Foo";
    print "$name\n";    # Foo
}
{
    my $name  = "Other";
    print "$name\n";    # Other
}

Scope of variables - 5

#!/usr/bin/perl
use strict;
use warnings;

my $fname  = "Foo";
print "$fname\n";    # Foo

package Other;
use strict;
use warnings;

print "$fname\n";    # Foo
my $lname = 'Bar';
print "$lname\n";    # Bar


package main;

print "$fname\n";    # Foo
print "$lname\n";    # Bar

Random numbers

#!/usr/bin/perl
use strict;
use warnings;

my $num = rand();       # returns a random number: 0 <= NUMBER < 1
my $n   = rand(100);    # returns a number: 0 <= NUMBER < 100
my $i   = int(3.12);    # returns the whole part of the number (3 in this case)

my $number = int(rand(100)); # returns a whole number: 0 <= NUMBER < 100

Here documents - double quotes

  • here document
  • <<
use strict;
use warnings;

my $fname = 'Foo';
my $lname = 'Bar';

my $str = <<"END_REPORT";
Daily user report
------------------

User details:
  Name: $fname $lname

END_REPORT

print $str;
Daily user report
------------------

User details:
  Name: Foo Bar

Here documents - single quotes

use strict;
use warnings;

my $str = <<'END_STRING';
Some text

$this_is_not_interpolated

more text

END_STRING

print $str;
Some text

$this_is_not_interpolated

more text

Exercise: Number Guessing game

Using the rand() function the computer "thinks" about a whole number between 1 and 200.

The user has to guess the number. After the user types in his guess the computer tells if this was bigger or smaller than the number it generated.

At this point there is no need to allow the user to guess several times.

Solution: Number Guessing game

#!/usr/bin/perl 
use strict;
use warnings;

my $N = 200;

my $hidden = 1 + int rand $N;
print "Please guess between 1 and $N\n";
my $guess = <STDIN>;
chomp $guess;

if ($guess < $hidden) {
    print "$guess is too small\n";
}
if ($guess > $hidden) {
    print "$guess is too big\n";
}
if ($guess == $hidden) {
    print "Heureka!\n";
}

Files

die, warn, exit

  • die
  • warn
  • exit
  • STDERR

exit() - exits from the program

warn() - writes to STDERR

die() - writes to STDERR and exits from the program (raising exception)

warn "This is a warning";
This is a warning at script.pl line 132.

If no \n at the end of the string both warn and die add the
name of file and line number and possibly the chunk of the input.

while loop

Another tool to create a loop is by using while.

while (EXPRESSION) { BLOCK }
my $i = 100;
while ($i > 0) {
    # do something
    $i--;
}
while (1) {
    # a nice infinite loop
}

Loop controls: next, last

  • next

  • last

  • next - evaluate the loop condition and if it is true go to the next iteration. (continue in other languages)

  • last - exit the loop. (break in other languages)

  • redo - start the iteration again without evaluating the loop condition.

{% embed include file="src/examples/perlarrays/loop_control.txt)

#!/usr/bin/perl
use strict;
use warnings;

my $counter = 0;
my $total = 0;
while (1) {
    $counter++;
    my $num = rand(1);

    # print "Debug: $num  $total\n";

    if ($num < 0.2) {
        next;
    }

    $total += $num;
    if ($total > 3) {
        last;
    }

    ### next jumps here ###
}
### last jumps here ###
print "Counter: $counter Total: $total\n"
'First line'
'Line after empty line'
'Another text line'
Number of non empty rows: 3 out of a total of 6

Opening file for reading

  • open
While working over most of the operating systems today, no program can
access a file directly. This is in order to allow the Operating System
to apply user rights.

Before you can read from a file you have to ask the Operating System to "open"
it for you. When opening a file you provide a variable that will become your
handle to the opened file. It is called a filehandle.
#!/usr/bin/perl
use strict;
use warnings;

my $filename = "input.txt";
open(my $fh, "<", $filename);

close $fh;

Open and read from text files

Opening a file

  • append
  • write
#!/usr/bin/perl
use strict;
use warnings;

my $filename = "some_filename";
open(my $fhb, "<",  $filename);          # read
open(my $fhc, ">",  $filename);          # write
open(my $fhd, ">>", $filename);          # append
open(my $fhe, "+<", $filename);          # read and write

Appending to files

Opening a file - error handling

  • $!

  • $! - error message from the Operating system

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "input.txt";
if (open my $in, "<", $filename) {
    # do your thing here
    # no need to explicitly close the file
} else {
    warn "Could not open file '$filename'. $!";
}
# here the $in filehandle is not accessible anymore

A more Perlish way to open a file and exit with error message if you could not open the file:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "input.txt";
open(my $fh, "<", $filename) or die "Could not open file '$filename'. $!";
# do your thing here
close $fh;

Opening a missing file

#!/usr/bin/perl
use strict;
use warnings;

if (open my $fh, '<', "other_data.txt") {
    # should do here something
} else {
    warn $!;
}

The error message we get:

No such file or directory at examples/files-perl/open_missing_file.pl line 7.

Read one line from a file

#!/usr/bin/perl
use strict;
use warnings;

# Reading a line from a file (or rather from a filehandle)
my $filename = "input.txt";
if (open my $data, "<", $filename) {
    my $line = <$data>; 
    print $line;

} else {
    warn "Could not open file '$filename': $!";
}

Process an entire file line by line (while, cat)

  • while

  • while - executes as long as there is something in $line, as long as there are lines in the file

  • Loop over file (name hard-coded) and print every line (UNIX cat)

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "input.txt";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
    print $line;
}

Instead of printing the line you could do anything with it.

Write to a file

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "report.txt";
open my $fh, '>', $filename or die "Could not open file '$filename' $!";

my $title = "Report by: Foo Bar"; 
print $fh "$title\n";
print $fh "-" x length $title, "\n";

Writing to files with Perl

Sum of numbers in a file

3
7
23
-17
98
12
#!/usr/bin/perl
use strict;
use warnings;

# given a file with a number on each row, print the sum of the numbers

my $sum = 0;
my $filename = "numbers.txt";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
    $sum += $line;
}
print "The total value is $sum\n";

Analyze the Apache log file

127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET / HTTP/1.1" 500 606 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET /favicon.ico HTTP/1.1" 200 766 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET / HTTP/1.1" 500 612 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET /favicon.ico HTTP/1.1" 200 766 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:53:10 +0300] "GET / HTTP/1.1" 500 612 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/1.0" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /style.css HTTP/1.1" 200 614 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /img/pti-round.jpg HTTP/1.1" 200 17524 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:21 +0300] "GET /unix_sysadmin.html HTTP/1.1" 200 3880 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET / HTTP/1.1" 200 34 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET /favicon.ico HTTP/1.1" 200 11514 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:54:53 +0300] "GET /cgi/pti.pl HTTP/1.1" 500 617 "http://contact.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/0.9" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:27 +0300] "GET / HTTP/1.1" 200 3700 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:34 +0300] "GET /unix_sysadmin.html HTTP/1.1" 200 3880 "http://pti.local/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
217.0.22.3 - - [10/Apr/2007:10:58:45 +0300] "GET /talks/Fundamentals/read-excel-file.html HTTP/1.1" 404 311 "http://pti.local/unix_sysadmin.html" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.3) Gecko/20061201 Firefox/2.0.0.3 (Ubuntu-feisty)"
#!/usr/bin/perl
use strict;
use warnings;

my $file = "examples/files/apache_access.log";
open my $fh, '<', $file or die "Could not open '$file': $!";


my $local  = 0;
my $remote = 0;
while (my $line = <$fh>) {
    my $length = index ($line, " ");
    my $ip = substr($line, 0, $length);
    if ($ip eq "127.0.0.1") {
        $local++;
    } else {
        $remote++;
    }
}

print "Local: $local Remote: $remote\n";

Encoding and UTF-8

use strict;
use warnings;

my $file = "data.txt";

open(my $fh, '<:encoding(utf8)', $file) or die "Could not open '$file'\n";
...

Open files in the old way

In old version of perl (before 5.6) we could not use scalar variables as file handles so we used uppercase letters such as XYZ or INPUT, QQRQ or FILEHANDLE.

Also the function had only 2 parameters.

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "data.txt";

open(FH, ">$filename") or die;
print FH "data";
close FH;

open(FH, $filename) or die;
my $line = <FH>;
close FH;

Security problems.

Being global, difficult to pass as parameter to functions.

Don't Open Files in the old way

Binary mode

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "input.txt";
open(my $fh, "<", $filename) or die;
binmode($fh);

Reading from file, read, eof

  • read
  • eof

In Perl we usually care about lines of input so the above is enough. Still some like to read files with chunks of arbitrary length. read puts the read string to the variable passed to the function and returns the number of characters actually read READ_LENGTH = read FILEHANDLE,SCALAR,LENGTH

#!/usr/bin/perl
use strict;
use warnings;

# reading in 30 characters:

open my $in, "<", $0 or die $!;
my $expected = 30;
my $buf;
my $actual = read $in, $buf, $expected;
if ($actual < $expected) {
    print "reached end of file\n";
}

# returns TRUE if we are at the end of file.
eof($in)

EOF - End of file in Perl

tell, seek

  • tell
  • seek

For our purposes a file is a line of characters. After a bunch of read and/or write operations we need to tell where are we on that line ?

LOCATION = tell FILEHANDLE

We might also want to move within that file

 seek FILEHANDLE, OFFSET, WHENCE
 
 WHENCE:
     0 from beginning of file
     1 from current location
     2 from end of file
 OFFSET: 
     +/- number of bytes to move

the important values are:

seek $fh, 0,0;    # go to the beginning of the file
seek $fh, 0,2;    # go to the end of the file

truncate

  • truncate
# Sometimes you need to 
truncate FILEHANDLE, LENGTH;
#!/usr/bin/perl
use strict;
use warnings;

my $new = $ARGV[0];

my $filename = "file.txt";
open my $fh, "+<", $filename or die "Could not open $!\n";
my $old = <$fh>;

seek $fh, 0, 0;              # move to the beginning of the file 
print $fh $new;
truncate $fh, length $new;   # cut the file to the new size

Exercise: Add more statistics

Take the script from the previous example (examples/files-perl/count_sum.pl" %} and in addition to the sum of the numbers print also

minimum
maximum
average

median and standard deviation are probably too difficult for now.

Exercise: Write report to file

Take the exercise creating statistics of the numbers.txt file and write the results to the numbers.out file.

minimum: -17
maximum:  98
total:   126
count:     6
average:  21

You might need to look up the documentation of the printf command in order to align the columns.

Exercise: Analyze Apache - number of successful hits

In the Apache log file (examples/files-perl/apache_access.log) after the "GET something HTTP/1.1" part there is the result code of the requests. 200 is OK the rest might be some failure.

Please create a report showing how many of the hits were successful (200) and how many were something else.

Could you put all the lines in either of the categories?

Solution: Add more statistics

#!/usr/bin/perl
use strict;
use warnings;

my $total = 0;
my $count = 0;
my $min;
my $max;

my $filename = "numbers.txt";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
    chomp $line;
    $total += $line;

    if (not $count) {
        $min = $line;
        $max = $line;
    }
    $count++;

    if ($line < $min) {
        $min = $line;
    }
    if ($line > $max) {
        $max = $line;
    }
}


if (not defined $min) {
    print "No values were given\n";
} else {
    print "Min: $min   Max: $max   Total: $total   count: $count   Average: ",
          $total / $count, "\n";
}

Solution: Write report to file

#!/usr/bin/perl
use strict;
use warnings;

my $total = 0;
my $count = 0;
my $min;
my $max;

my $filename = "examples/files/numbers.txt";
open(my $fh, "<", $filename) or die "Could not open '$filename'\n";
while (my $line = <$fh>) {
    chomp $line;
    $total += $line;

    if (not $count) {
        $min = $line;
        $max = $line;
    }
    $count++;

    if ($line < $min) {
        $min = $line;
    }
    if ($line > $max) {
        $max = $line;
    }
}

open my $out, '>', 'numbers.out';
if (not defined $min) {
    print $out "No values were given\n";
} else {
    printf($out "Minimum: %5s\n", $min);
    printf($out "Maximum: %5s\n", $max);
    printf($out "Total:   %5s\n", $total);
    printf($out "Count:   %5s\n", $count);
    printf($out "Average: %5s\n", $total / $count);
}

Solution: Analyze Apache - number of successful hits

#!/usr/bin/perl
use strict;
use warnings;

my $file = "examples/files/apache_access.log";
open my $fh, '<', $file or die "Could not open '$file': $!";


my $good    = 0;
my $bad     = 0;
my $invalid = 0;
while (my $line = <$fh>) {
    chomp $line;
    my $request = q( HTTP/1.1" );
    my $start = index ($line, $request);
    if ($start < 0) {
        $request = q( HTTP/1.0" );
        $start = index ($line, $request);
    }
    if ($start < 0) {
        #print "ERROR: Unrecognized Line: $line\n";
        $invalid++;
        next;
    }

    my $end = index($line, " ", $start + length($request));
    my $result = substr($line,
        $start + length($request),
        $end - $start - length($request));
        #print "$start, $end '$result'\n";
    if ($result eq "200") {
        $good++;
    } else {
        $bad++;
    }
}

print "Good: $good\n";
print "Bad:  $bad\n";
print "Invalid: $invalid\n";


# Disclaimer: this is not an optimal solution.
# We will see a much better one after learning functions, regular expressions

Lists and Arrays

Lists and Arrays intro

While a scalar is single value a list is a bunch of single values. A scalar value can be stored in a scalar variable. A list can be stored in a list variable. It is called and array.

List Literals, list ranges

  • ()
  • ..
  • qw
A list is an ordered series of scalar values separated by comma and enclosed in parentheses.
The scalar values themselves can be references to other data structures.
(An array, explained later is a variable holding the content of a list.)
Examples of lists:
(1, 5.2, "apple pie")      # 3 values
('string', 42, 2.3, undef, ['one', 'two'], { 'name' => 'Foo Bar' })  # 6 values

($x, $y, $z)          # We can also use scalar variables as elements of a list
($x, 3, "foobar")     # or we can mix them

Special case, all of them "words":

("apple", "banana", "peach", "blueberry")   # is the same as
qw(apple banana peach blueberry)            # quote word

Special case, range operator:

(1 .. 10)                # same as
(1,2,3,4,5,6,7,8,9,10)

Perl Arrays

List Assignment

my ($x, $y, $z);

my ($x, $y, $z) = (1, "apple pie", 3.14);

($x, $y, $z) = f();   # where f() returns (2, 3, 7);
                      # nearly the same as $x=2; $y=3; $z=7;
($x, $y, $z) = f();   # where f() returns (8, 1, 5, 9);
                      # ignore 9
($x, $y, $z) = f();   # where f() returns (3, 4);
                      # $z will be undef
A regular question on job interviews:
How can we swap the values of 2 variables, let say $x and $y?

loop over elements of list with foreach

  • foreach

  • for

  • list

  • foreach ITEM (LIST) {BLOCK}

  • my - in the foreach loop

#!/usr/bin/perl
use strict;
use warnings;

foreach my $color ("Blue", "Yellow", "Brown", "White") {
    print "$color\n";
}

Blue
Yellow
Brown
White

Create an Array, loop over with foreach

  • @
#!/usr/bin/perl
use strict;
use warnings;

my @colors = ("Blue", "Yellow", "Brown", "White");
print "@colors\n";

foreach my $color (@colors) {
    print "$color\n";
}

Blue Yellow Brown White
Blue
Yellow
Brown
White

Perl for loop explained with examples

Array Assignment

You can also mix the variables on the right side and if there are arrays on the right side the whole thing becomes one flat array !

use strict;
use warnings;

my @x = ('foo', 'bar');
my @y = ('moose', 'barney');
my @z = (@x, @y);             # ('foo', 'bar', 'moose', 'barney');
print "@z\n";                 # foo bar moose barney

my $owner = 'Moose';
my @tenants = qw(Foo Bar);
my @people = ($owner, 'Baz', @tenants);  # ('Moose', 'Baz', 'Foo', 'Bar')
print "@people\n";                       # Moose Baz Foo Bar

my ($x, @y, @z);
($x, @y)     = f(); # where f() returns (1, 2, 3, 4);
                    # $x is 1;  @y is (2, 3, 4)
($x, @y, @z) = f(); # where f() returns (1, 2, 3, 4);
                    # $x is 1;  @y is (2, 3, 4)  @z is empty: ()

@z = ();            # Emptying an array

Debugging an array

  • Data::Dumper
  • Dumper
  • @
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my @names_1 = ('Moose', 'Barney', 'Foo', 'Bar');
print "@names_1\n";         # Moose Barney Foo Bar


my @names_2 = ('Moose', 'Barney', 'Foo Bar');
print "@names_2\n";         # Moose Barney Foo Bar

print Dumper \@names_1;

print Dumper \@names_2;

Moose Barney Foo Bar
Moose Barney Foo Bar
$VAR1 = [
          'Moose',
          'Barney',
          'Foo',
          'Bar'
        ];
$VAR1 = [
          'Moose',
          'Barney',
          'Foo Bar'
        ];

foreach loop on numbers

foreach my $i (1..10) {
    print "$i\n";
}
1
2
3
4
5
6
7
8
9
10

Array index (menu)

  • $#array

  • $array[0]

  • $#array - the largest index

  • $array[1] - array elements are scalar

#!/usr/bin/perl
use strict;
use warnings;

my $color;

my @colors = ("Blue", "Yellow", "Brown", "White");
print "Please select a number:\n";
foreach my $i (0..$#colors) {
    print "$i) $colors[$i]\n";
}
my $num = <STDIN>;
chomp($num);
if (defined $colors[$num]) {
    $color = $colors[$num];
} else {
    print "Bad selection\n";
    exit;
}

print "The selected color is $color\n";

  • $array[-1]}

Load Module

  • Scalar::Util

  • looks_like_number

  • is_number

  • looks_like_number

#!/usr/bin/perl
use strict;
use warnings;

use Scalar::Util qw(looks_like_number);


my $color;

my @colors = ("Blue", "Yellow", "Brown", "White");
print "Please select a number:\n";
foreach my $i (0..$#colors) {
    print "$i) $colors[$i]\n";
}
my $num = <STDIN>;
chomp($num);
if (looks_like_number($num) and defined $colors[$num]) {
    $color = $colors[$num];
} else {
    print "Bad selection\n";
    exit;
}

print "The selected color is $color\n";

Command line parameters

  • @ARGV

  • $0

  • @ARGV - all the arguments on the command line

  • $ARGV[0] - the first argument

  • $0 - name of the program

  • perl read_argv.pl blue

#!/usr/bin/perl
use strict;
use warnings;

my $color = $ARGV[0];

if (not defined $color) {
    my @colors = ("Blue", "Yellow", "Brown", "White");

    print "Please select a number:\n";
    foreach my $i (0..$#colors) {
        print "$i) $colors[$i]\n";
    }
    my $num = <STDIN>;
    chomp($num);
    if (defined $colors[$num]) {
        $color = $colors[$num];
    } else {
        print "Bad selection\n";
        exit;
    }
}

print "The selected color is $color\n";

Process command line parameters, use modules

  • use

  • |reference

  • Getopt::Long

  • scalar reference

  • perl process_command_line.pl --color blue

#!/usr/bin/perl
use strict;
use warnings;

use Getopt::Long qw(GetOptions);

my $color;
GetOptions("color=s" => \$color) or die "Usage: $0 [--color COLOR]\n";

if (not defined $color) {
    my @colors = ("Blue", "Yellow", "Brown", "White");
    print "Please select a number:\n";
    foreach my $i (0..$#colors) {
        print "$i) $colors[$i]\n";
    }
    my $num = <STDIN>;
    chomp($num);
    if (defined $colors[$num]) {
        $color = $colors[$num];
    } else {
        print "Bad selection\n";
        exit;
    }
}

print "The selected color is $color\n";

Module documentation

perldoc Getopt::Long

perldoc Cwd

process CSV file

  • split

  • CSV

  • split

{% embed include file="src/examples/perlarrays/process_csv_file.csv)

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'process_csv_file.csv';
if (defined $ARGV[0]) {
    $file = $ARGV[0];
}
   
my $sum = 0;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
    chomp $line;

    my @fields = split /;/, $line;
    $sum += $fields[2];
}
print "$sum\n";

process csv file (short version)

while (<>) {
    $sum += (split /;/)[2];
}
print "$sum\n";


Use the following command to run the script:

`perl examples/perlarrays/process_csv_file_short.pl examples/perlarrays/process_csv_file.csv`

One-liner sum numbers in CSV file

perl -n -a -F; -e '$sum += $F[2]; END {print $sum}' examples/perlarrays/process_csv_file.csv

-n  = loop over lines but do NOT print them
-a  = autosplit by ' ' and assign to @F
-F; = replace the split string by ';'

The END block gets executed at the end of the execution and only once.

process csv file using Text::CSV

  • Text::CSV
  • Text::CSV_XS
What if there is a field with embedded comma (,)?

{% embed include file="src/examples/perlarrays/process_csv_file_module.csv)

#!/usr/bin/perl
use strict;
use warnings;

use Text::CSV;
my $csv = Text::CSV->new({
    sep_char  => ';'
});

my $file = 'process_csv_file_module.csv';
if (defined $ARGV[0]) {
    $file = $ARGV[0];
}
 
my $sum = 0;
open(my $data, '<', $file) or die "Could not open '$file'\n";
while (my $line = <$data>) {
    chomp $line;

    if ($csv->parse($line)) {

        my @fields = $csv->fields();
        $sum += $fields[2];

    } else {
        warn "Line could not be parsed: $line\n";
    }
}
print "$sum\n";


process csv file using Text::CSV

  • Text::CSV
  • Text::CSV_XS
What if there is a field with embedded newline?

{% embed include file="src/examples/perlarrays/process_csv_file_module_newline.csv)

#!/usr/bin/perl
use strict;
use warnings;

use Text::CSV;

my $file = 'process_csv_file_module.csv';
if (defined $ARGV[0]) {
    $file = $ARGV[0];
}

my $csv = Text::CSV->new ({
    binary    => 1,
    auto_diag => 1,
    sep_char  => ';'
});

 
my $sum = 0;
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file'\n";
while (my $fields = $csv->getline( $data )) {
    $sum += $fields->[2];
}
print "$sum\n";

Join

  • join
my @fields = qw(Foo Bar foo@bar.com);
my $line = join ";", @fields;
print "$line\n";     # Foo;Bar;foo@bar.com

join

Labels

Normally the last and next keywords are related to the innermost loop. In some cases that's not good. Perl allows us to define labels in front of loops and then to use those labels in conjunction with last or next to go to the last or next iteration of the specified loop.

use strict;
use warnings;


OUTER:
while (1) {
    print "start outer\n";
    while (1) {
        print "start inner\n";
        last OUTER;
        print "end inner\n";
    }
    print "end outer\n";
}
print "done\n";

Exercise: Make the color selector user friendly

Take the examples/perlarrays/process_command_line.pl script ( the color selector ) and make it more user friendly by showing the numbers starting from 1 (and not from 0).

Exercise: improve the color selector

Take the examples/perlarrays/process_command_line.pl script and improve it in several ways:

  1. Check if the value given on the command line is indeed one of the possible values and don't let other colors pass.
  2. Allow a --force flag that will disregard the previously implemented restriction. Meaning
  3. --color Purple should still report error but
  4. --color Purple --force should accept this color as well.

Exercise: Improve the Number Guessing game

Let the user guess several times (with responses each time) till he finds the hidden number. Base your code on the solution from the previous exercise. ( examples/scalars/number_guessing.pl )

Allow the user to type
n   - skip this game and start a new one (generate new number to guess)
s   - show the hidden value (cheat)
d   - debug mode 
      (It is a toggle. 
       Pressing once the system starts to show the current
       number to guess every time before asking the user for new input
       pressing again, turns off the behavior.
       )
m   - move mode
      (It is a toggle.
       Pressing once the object will start to move a little bit after
       every step. Pressing again will turn this feature off.)
x   - exit

Now I can tell you that what you have is actually a 1 dimensional space fight and you are trying to guess the distance of the enemy space ship. As it is not a sitting duck, after every shot the spaceship can randomly move +2-2.

Extra exercise:

  • For training purposes you might want to limit the outer space to 0-100.
  • Make sure the enemy does not wander off the training field.
  • Give warning if the user shoots out of space.
  • Keep track of the minimum and maximum number of hits (in a file).

Solution: improved color selector

#!/usr/bin/perl
use strict;
use warnings;

use Getopt::Long qw(GetOptions);

my $color;
my $force;
GetOptions(
        "color=s"    => \$color,
        "force"      => \$force,
) or exit;


my @colors = ("Blue", "Yellow", "Brown", "White");

if ($color and not $force) {
    my $valid_color;
    foreach my $c (@colors) {
        if ($c eq $color) {
            $valid_color = 1;
            next;
        }
    }
    if (not $valid_color) {
        print "The color '$color' is not valid.\n";
        $color = '';
    }
}


if (not $color) {
    print "Please select a number:\n";
    foreach my $i (0..$#colors) {
        print "$i) $colors[$i]\n";
    }
    my $num = <STDIN>;
    chomp($num);
    if (defined $colors[$num]) {
        $color = $colors[$num];
    } else {
        print "Bad selection\n";
        exit;
    }
}

print "The selected color is $color\n";

Solution: Improve the Number Guessing game

#!/usr/bin/perl 
use strict;
use warnings;

my $N = 200;
my $debug;

my $moving;
my $scores = 'scores.txt';
my ($min, $max);
if (open my $in, '<', $scores) {
    ($min, $max) = <$in>;
    chomp ($min, $max);
}

print <<'END_TXT';
x - exit
q - quit
n - next game
s - show target
d - toggle debug mode
m - toggle - allow object to move or not
END_TXT

GAME:
while (1) {
    my $hidden = 1 + int rand $N;
    my $count = 0;
    while (1) {
        print "Please guess between 1 and $N :";
        if ($debug) {
            print " ($hidden) ";
        }
        print "\n";
        my $guess = <STDIN>;
        $count++;
        chomp $guess;
        if ($guess eq 'x' or $guess eq 'q') {
            last GAME;
        }
        if ($guess eq 'n') {
            last;
        }
        if ($guess eq 'd') {
            $debug = $debug ? 0 : 1;
            next;
        }
        if ($guess eq 'm') {
            $moving = $moving ? 0 : 1;
            next;
        }
        if ($guess eq "s") {
            print "The hidden value is $hidden\n";
            next;
            # is it fair to let the player keep playing after seeing the result?
            # if not replace the next by last
        }
        if ($guess < 1 or $guess > $N) {
            warn "You shot ($guess) in the outer space.\n";
        }
        if ($guess < $hidden) {
            print "$guess is too small\n";
        }
        if ($guess > $hidden) {
            print "$guess is too big\n";
        }
        if ($guess == $hidden) {
            print "Heureka!\n";
            last;
        }
        if ($moving) {
            $hidden += int(rand 5)-2;
            # move target by -2 .. +2

            # don't wander off the space
            if ($hidden > $N) {
                $hidden = $N;
            }
            if ($hidden < 1) {
                $hidden = 1;
            }
        }
    }
    if (not defined $min) {
        $min = $count;
    }
    if (not defined $max) {
        $max = $count;
    }
    $min = $count < $min ? $count : $min;
    $max = $count > $max ? $count : $max;
}

if (defined $min) {
    open my $out, '>', $scores or die;
    print $out "$min\n";
    print $out "$max\n";
}

Advanced Arrays

The year 19100

First, let's talk about time.

  • time}
  • localtime}
  • gmtime}
$t = time();         # 1021924103
                     # returns a 10 digit long number,
                     # the number of seconds since 00:00:00 UTC, January 1, 1970

$x = localtime($t);  # returns a string like           Thu Feb 30 14:15:53 1998
$z = localtime();    # returns the string for the current time

$z = localtime(time - 60*60*24*365);
   # returns the string for a year ago, same time, well almost

@y = localtime($t);  # an array of time values:
                     # 53 15 14 30 1 98 4 61 0
                     # the 9 values are the following:

#  0    1    2     3     4    5     6     7     8    (the index)
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

# The localtime function is aware of what is on the left side of the = sign !!!!
# OK but where does that 19100 come from ?
$mon  0..11
$min  0..59
$sec  0..60
$year YEAR-1900         # for example 2000-1900 = 100
                        # but people used  "19$year"   instead of 1900 + $year
                        # which is          19100      instead of 2000

gmtime is the same just gives the time as it is in Greenwich. The year of 19100

SCALAR and LIST Context

  • SCALAR context
  • LIST context
my @a = ("zero", "one", "two", "three");
my @b = @a;           # LIST context
my $c = @a;           # SCALAR context

if (@a) {
}

while (@a) {
}

Scalar and List context in Perl

Context Sensitivity

  • scalar()}
Every operator creates a 'context' let's see a few examples

Assignment to a scalar variable creates SCALAR context:
$x = localtime();
$x = @z;
$x = SCALAR

Assignment to an array creates LIST context:
@y = localtime();
@y = @z;
@y = LIST
                            # Expressions providing SCALAR context
$x = SCALAR;
$y[3] = SCALAR;
8 + SCALAR
"Foo: " . SCALAR
if (SCALAR) { ... }
while (SCALAR) { ... }
scalar(SCALAR)

                            # Expressions providing LIST context:
@a = LIST;
($x, $y) = LIST;
($x) = LIST;
foreach $x (LIST) {...}
join ";", LIST
print LIST

                            # example
@a = qw(One Two Three);
print @a;                   # OneTwoThree       print LIST
print 0 + @a;               # 3                 SCALAR + SCALAR
print scalar(@a);           # 3                 scalar(SCALAR)

see also perldoc -f function-name

Filehandle in scalar and list context

#!/usr/bin/perl
use strict;
use warnings;

my $file = "numbers.txt";

open(my $fh, '<', $file) or die "Could not open '$file'";
# reading in SCALAR context (line by line) and processing each line
while (my $row = <$fh>) {
    chomp $row;
    print "READ: $row\n";
}


open (my $other_fh, '<', $file) or die "Could not open '$file'";
# reading in LIST context all the lines at once
my @rows = <$other_fh>;
chomp @rows;
print "READ ", scalar(@rows), " lines\n";

slurp mode

  • $/
  • slurp
#!/usr/bin/perl
use strict;
use warnings;

my $file = "numbers.txt";
# slurp mode
my $all;
{
    open(my $fh, '<', $file) or die "Could not open '$file'\n";
    local $/ = undef;
    $all = <$fh>;
}


File::Slurp

  • File::Slurp
  • slurp
#!/usr/bin/perl
use strict;
use warnings;

use File::Slurp qw(slurp);

my $filename = $ARGV[0];
if (not defined $filename) {
    die "Usage: $0 FILENAME\n";
}

my $text = slurp($filename);


my @lines = slurp($filename);

Diamond operator

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <>) {
    print $line;
}

pop, push

  • pop
  • push
#!/usr/bin/perl
use strict;
use warnings;

my @names = ("Foo", "Bar", "Baz");

my $last_name = pop @names;

print "$last_name\n";        # Baz
print "@names\n";            # Foo Bar


push @names, "Moo";

print "@names\n";            # Foo Bar Moo
1
27
-13
2

push example

use strict;
use warnings;

my $file = 'data.txt';

my @positive;
open my $fh, '<', $file or die "Could not open $file : $!";
while (my $line = <$fh>) {
   if ($line > 0) {
     push @positive, $line;
   }
}
print @positive;

stack (pop, push) Reverse Polish Calculator

  • stack
#!/usr/bin/perl
use strict;
use warnings;

my @stack;
while (1) {

    print '$ ';
    my $in = <STDIN>;
    chomp $in;

    if ($in eq "q") { last; }

    if ($in eq "c") { 
            pop @stack; 
            next;
            }   # fetch the last value
    if ($in eq "*") { 
            my $x = pop(@stack);
            my $y = pop(@stack);
            push(@stack, $x*$y);
            next;
            }
    if ($in eq "+") { 
            my $x = pop(@stack);
            my $y = pop(@stack);
            push(@stack, $x + $y);
            next;
            }
    if ($in eq "/") {
            my $x = pop(@stack);
            my $y = pop(@stack);
            push(@stack, $y /  $x); 
            next;
            }
    if ($in eq "-") { 
            my $x = pop(@stack);
            my $y = pop(@stack);
            push(@stack, $y - $x);
            next;
            }
    if ($in eq "=") { 
            print pop(@stack), "\n"; 
            next;
            }
    push @stack, $in;
}

shift, unshift

  • shift
  • unshift
#!/usr/bin/perl
use strict;
use warnings;

my @names = ("Foo", "Bar", "Baz");
my $first = shift @names;

print "$first\n";              # Foo
print "@names\n";              # Bar Baz



unshift @names, "Moo";
print "@names\n";              # Moo Bar Baz

FIRST = shift ARRAY;
unshift ARRAY, VALUEs;

queue (shift, push)

  • queue
#!/usr/bin/perl
use strict;
use warnings;

my @people = ("Foo", "Bar");
while (@people) {
    my $next_person = shift @people;
    print "$next_person\n"; # do something with this person

    print "Type in more people:";
    while (my $new = <STDIN>) {
        chomp $new;
        if ($new eq "") {
            last;
        }
        push @people, $new;
    }
    print "\n";
}

shift

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 FILENAME\n";

shift defaults to shift @ARGV

Another usage of the short circuit

Slight bug.
Does it matter?

reverse

  • reverse
use strict;
use warnings;

# LIST context
my @names = qw(Foo Bar Baz);
my @reverses_names = reverse @names;
print "@reverses_names\n";    #  Baz Bar Foo

# SCALAR context
my $string = "abcd";
my $reversed_string = reverse $string;
print "$reversed_string\n";   # dcba

Sort

  • sort
  • $a
  • $b
  • cmp
  • <=>
  • spaceship operator
#!/usr/bin/perl
use strict;
use warnings;

my @words = qw(Foo Moo Bar);

my @sorted_words = sort @words;
print "@sorted_words\n";  # Bar Foo Moo

my @data = (11, 2, 23, 12);
my @sorted = sort @data;
print "@sorted\n";         # 11 12 2 23

my @sorted_ascii = sort {$a cmp $b} @data;

my @sorted_numeric = sort {$a <=> $b} @data;
print "@sorted_numeric\n";  # 2 11 12 23

my @sorted_by_length
    = sort {length($a) <=> length($b)} @data;

my @sorted_by_length_and_ascii
    = sort {
               length($a) <=> length($b)
                       or
               $a cmp $b
           } @data;

my @sorted_by_abc = sort {lc($a) cmp lc($b)} @data;

my @sorted_abc_ascii
    = sort {
           lc($a) cmp lc($b)
                   or
               $a cmp $b
           } @data;

Sorting arrays in Perl

Ternary operator

  • ternary operator
  • ? :
my $var;
if (COND) {
    $var = A;
} else {
    $var = B;
}

my $var = COND ? A : B;

Count digits

23 34 9512341
3 34 2452345 5353 67 22
42136357013412
42 5 65 64 
#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";

my @count;

open(my $fh, "<", $filename)
    or die "Could not open '$filename': $!";

while (my $line = <$fh>) {
    chomp $line;
    my @chars = split //, $line;
    foreach my $c (@chars) {
        if ($c ne " ") {
            $count[$c]++;
        }
    }
}

foreach my $i (0..9) {
    print "$i ", ($count[$i] ? $count[$i] : 0), "\n";
}

Exercise: Color selector

Take the solution from the previous chapter (you can use the file examples/perlarrays/color_selector.pl ) and add the following features:

  1. Read the names of the colors from a file called colors.txt
  2. Let the user pass the name of the color file using the --filename FILENAME option.

Exercise: sort numbers

Take the file examples/perlarrays/count_digits.txt from the previous example and sort the numbers (not the digits).

Exercise: sort mixed string

In a file we have the string where each string has a single letter at the beginning and then a number. Sort them based on the number only, disregarding the letter.

Input:

A4
B1
A27
A3

Expected output:

B1
A3
A4
A27

File:

A1
A27
C1
B1
B12
A38
B3
A3

Expected output from sample file

A1
C1
B1
B3
A3
B12
A27
A38

Exercise: sort mixed string 2

Take the above example and change it so now we'll take in account the letter too. Sort the strings based on the first letter, and among values with the same leading letter, sort them according to the numbers.

Input

A4
B1
A3
A27

Expected output

A3
A4
A27
B1

Expected output from sample file

A1
A3
A27
A38
B1
B3
B12
C1

Solution: color selector files

#!/usr/bin/perl
use strict;
use warnings;

use Getopt::Long qw(GetOptions);

my $color;
my $filename = "examples/colors.txt";
my $force;
GetOptions(
        "color=s"    => \$color,
        "filename=s" => \$filename,
        "force"      => \$force,
) or exit;

open(my $fh, "<", $filename)
    or die "Could not open '$filename' for reading: $!";

my @colors = <$fh>;
chomp @colors;

if ($color and not $force) {
    my $valid_color;
    foreach my $c (@colors) {
        if ($c eq $color) {
            $valid_color = 1;
            next;
        }
    }
    if (not $valid_color) {
        print "The color '$color' is not valid.\n";
        $color = '';
    }
}


if (not $color) {
    print "Please select a number:\n";
    foreach my $i (0..$#colors) {
        print "$i) $colors[$i]\n";
    }
    my $num = <STDIN>;
    chomp($num);
    if (defined $colors[$num]) {
        $color = $colors[$num];
    } else {
        print "Bad selection\n";
        exit;
    }
}

print "The selected color is $color\n";

Solution: sort numbers

#!/usr/bin/perl 
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";

open(my $fh, "<", $filename)
    or die "Could not open '$filename': $!";

my @numbers;
while (my $line = <$fh>) {
    chomp $line;
    my @num = split / /, $line;
    push @numbers, @num;
}

my @sorted = sort {$a <=> $b} @numbers;

foreach my $n (@sorted) {
    print "$n\n";
}

Solution: sort mixed strings

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'sort_mixed_strings.txt';
if (@ARGV) {
    $file = shift;
}
open(my $fh, '<', $file) or die "Could not open '$file'\n";

my @data = <$fh>;
chomp @data;

my @sorted = sort { substr($a, 1) <=> substr($b, 1) } @data;

foreach my $v (@sorted) {
    print "$v\n";
}

Solution: sort mixed strings

#!/usr/bin/perl
use strict;
use warnings;

my $file = 'sort_mixed_strings.txt';
if (@ARGV) {
    $file = shift;
}
open(my $fh, '<', $file) or die "Could not open '$file'\n";

my @data = <$fh>;
chomp @data;

my @sorted = sort { 
            substr($a, 0, 1) cmp substr($b, 0, 1) 
            or
            substr($a, 1) <=> substr($b, 1) } @data;
foreach my $v (@sorted) {
    print "$v\n";
}

List::Util

List::Util provides functions such as

  • max
  • min
  • sum

It resides in a distribution called Scalar-List-Utils

Advanced: Multi dimensional array

  • @
#!/usr/bin/perl
use strict;
use warnings;

my @matrix;

$matrix[0][0] = 0;
$matrix[1][1] = 11;
$matrix[1][2] = 12;

#print "$matrix\n";
print "$matrix[0]\n";                 # ARRAY(0x814dd90)
print "$matrix[1][1]\n";              # 11

use Data::Dumper qw(Dumper);
print Dumper \@matrix;
ARRAY(0x5a7b4c)
11
$VAR1 = [
          [
            0
          ],
          [
            undef,
            11,
            12
          ]
        ];
Actually what we have is a simple array and each element of that
array can be another (anonymous) array (reference).

Functions and Subroutines

Why to use subroutines?

We use subroutines in order to

  • make the script more modular
  • reuse code for repeated tasks
  • easier to debug
  • easier to maintain

Subroutines

  • sub
  • subroutines
  • functions
  • return
#!/usr/bin/perl
use strict;
use warnings;

my $sum = add(2, 3);
print "$sum\n";
print add(5, 8), "\n";

my $result = add2(4, 7);
print "$result\n";

sub add {
    my ($x, $y) = @_;

    my $z = $x+$y;
    return $z;
}

sub add2 {
    my $x = shift;
    my $y = shift;

    return $x+$y;
}

sub add_ugly {
    return $_[0]+$_[1];
}


  • no prototypes needed
  • no signature
  • parentheses are optional
  • functions always return a value

Variable number of parameters

#!/usr/bin/perl
use strict;
use warnings;

print sum(3, 7, 11, 21), "\n";
print sum(1, 2, 3), "\n";

sub sum {
    my $sum = 0;
    foreach my $v (@_) {
        $sum += $v;
    }
    return $sum;
}

Subroutines and functions in Perl

Recursive subroutines: factorial

  • recursive

Mathematical definition of factorial:

n! = n * (n-1) * (n-2) * ... 1

1! = 1
n! = n * (n-1)!
#!/usr/bin/perl
use strict;
use warnings;

my $n = shift or die "Usage: $0 NUMBER\n";

my $result = factorial($n);
print $result;

sub factorial {
    my ($n) = @_;
    if ($n == 1) {
        return 1;
    }
    my $prev = factorial($n - 1);
    return $n * $prev;
}

Recursive subroutines: Fibonacci

Recursive definition of Fibonacci

f(1) = 1
f(2) = 1
f(n) = f(n-1) + f(n-2)
#!/usr/bin/perl
use strict;
use warnings;

sub fib {
    my $n = shift;

    if ($n == 1 or $n == 2) {
        return 1 
    }

    return (fib($n-1)+fib($n-2));            # recursive calling
}

print fib(10);                             # calling the function

Sort using a function

#!/usr/bin/perl
use strict;
use warnings;

my @data = qw(foo Foo Bar Moo FooBar Name Moose Apple Ape More Peach);
my @sorted = sort by_length @data;

print "@data\n";
print "@sorted\n";

# foo Foo Bar Moo FooBar Name Moose Apple Ape More Peach
# Ape Bar Foo foo Moo More Name Apple Moose Peach FooBar

sub by_length {
    length($a) <=> length($b)
    or
    lc($a) cmp lc($b)
    or
    $a cmp $b
}

Return a list

  • return|list
Perl allows us to return any number of values.
#!/usr/bin/perl
use strict;
use warnings;

my @numbers = fibonacci(10);
print "@numbers\n";

sub fibonacci {
    my $num = shift;

    if ($num == 1) {
        return (1);
    } 
    if ($num == 2) {
        return (1, 1);
    }
    my @fib = (1, 1);
    foreach (3..$num) {
        push @fib, $fib[-1]+$fib[-2];
    }
    return @fib;
}

Return several elements

#!/usr/bin/perl
use strict;
use warnings;

my ($sum, $multi) = calc(3, 4);
print "Sum:   $sum\n";    # 7
print "Multi: $multi\n";  # 12

sub calc {
    my ($x, $y) = @_;

    return $x + $y, $x * $y;
}

Error handling with eval - files

  • eval {};

  • die|catching

  • try|eval

  • die - raise exception or throw exception

  • eval - catch exception - try in other languages)

#!/usr/bin/perl
use strict;
use warnings;

my @files = ('abc.txt', $0);

foreach my $file (@files) {
    eval {
        process($file);
        print "   processing of $file done\n";
    };
    if ($@) {
        my $err = $@;
        chomp $err;
        warn "Error seen: $err";
    }
}

sub process {
    my ($file) = @_;
    open my $fh, '<', $file or die "Could not open '$file' $!";
    local $/ = undef;
    my $all = <$fh>;
    close $fh;
    print "Size of $file: ", length($all), "\n";
}

Error handling with eval - two calls

#!/usr/bin/perl
use strict;
use warnings;


my $result;
my $x = 19;
my $y = 23;

eval {
    $result = unstable_add_function($x, $y);
    print "unstable done\n";
};
if ($@) {
    chomp $@;
    warn "Exception '$@' received\n";
    $result = slow_but_stable_add($x, $y);
    print "slow done\n";
}

print "Result: $result\n";
    


sub unstable_add_function {
    if (rand() < 0.2) {
        die "broken";
    }
    return $_[0]+$_[1];
}

sub slow_but_stable_add {
    sleep (2);
    return $_[0]+$_[1];
}

Exercise: Subroutines

Take the solution from the subroutines chapter (examples/files-perl/statistics.pl" %} and move the code in a subroutine called main()

Take the solution from the subroutines chapter ( examples/files-perl/write_report_to_file.pl ) and create 3 subroutines. read_file() and write_file() for the two main parts of the script and main() that will call these two subroutines.

Take ( examples/arrays/color_selector.pl ) and create two functions. One that handles the case when the user provided a value on the command line, and another function that shows the menu and accepts the value from there. Maybe also add a main() function to wrap the rest of the code as well.

Exercise: Number guessing in sub

Take the solution of the improved number guessing game from examples/arrays/number_guessing.pl and change it so some parts of it will be functions.

Specifically you can create subs for

  1. the moving of the spaceship
  2. checking the hit How would you indicate the need to 'last' from within the function?

Solution: Number guessing in sub

#!/usr/bin/perl 
use strict;
use warnings;

my $N = 200;
my $debug;

my $moving;
my $scores = 'scores.txt';
my ($min, $max);
if (open my $in, '<', $scores) {
    ($min, $max) = <$in>;
    chomp ($min, $max);
}

print <<'END_TXT';
x - exit
q - quit
n - next game
s - show target
d - toggle debug mode
m - toggle - allow object to move or not
END_TXT

GAME:
while (1) {
    my $hidden = 1 + int rand $N;
    my $count = 0;
    while (1) {
        print "Please guess between 1 and $N :";
        if ($debug) {
            print " ($hidden) ";
        }
        print "\n";
        my $guess = <STDIN>;
        $count++;
        chomp $guess;
        if ($guess eq 'x' or $guess eq 'q') {
            last GAME;
        }
        if ($guess eq 'n') {
            last;
        }
        if ($guess eq 'd') {
            $debug = $debug ? 0 : 1;
            next;
        }
        if ($guess eq 'm') {
            $moving = $moving ? 0 : 1;
            next;
        }
        if ($guess eq "s") {
            print "The hidden value is $hidden\n";
            next;
            # is it fair to let the player keep playing after seeing the result?
            # if not replace the next by last
        }
        if (check_hit($hidden, $guess)) {
            last;
        }

        if ($moving) {
            $hidden = move($hidden);
        }
    }
    if (not defined $min) {
        $min = $count;
    }
    if (not defined $max) {
        $max = $count;
    }
    $min = $count < $min ? $count : $min;
    $max = $count > $max ? $count : $max;
}

if (defined $min) {
    open my $out, '>', $scores or die;
    print $out "$min\n";
    print $out "$max\n";
}

sub move {
    my ($hidden) = @_;
    $hidden += int(rand 5)-2;  # move a bit
    
    # don't wander off the space
    if ($hidden > $N) {
        $hidden = $N;
    }
    if ($hidden < 1) {
        $hidden = 1;
    }
    return $hidden;
}

sub check_hit {
    my ($hidden, $guess) = @_;

    if ($guess < 1 or $guess > $N) {
        warn "You shot ($guess) in the outer space.\n";
    }
    if ($guess < $hidden) {
        print "$guess is too small\n";
    }
    if ($guess > $hidden) {
        print "$guess is too big\n";
    }
    if ($guess == $hidden) {
        print "Heureka!\n";
        return 1;
    }
    return;
}

Hashes

What is a hash?

  • %

  • hash

  • associative array

  • Unordered group of key/value pairs where

  • key is a unique string

  • value is any scalar

  • Associative Arrays in PHP

  • Dictionary in Python

Uses of hashes

Mapping of single feature of many similar items:

  • phone book (name => phone number)
  • worker list (ID number => name)
  • CGI: (fieldname => field value)

Features of an object:

  • Information about a person (fname, lname, email, phone, ...)

Creating hashes

my %user;
%user = ("fname", "Foo", "lname", "Bar");

my %user = (
    "fname", "Foo", 
    "lname", "Bar",
    );

my %user = (
    fname => "Foo",
    lname => "Bar",
    );

print $user{"fname"}, "\n";
print $user{fname}, "\n";

my $key = "fname";
print $user{$key}, "\n";


$user{fname} = 'Moo';
$user{email} = 'foo@bar.com';

Create hash from an array

my %user = qw(fname Foo lname Bar);

my @person = qw(fname Foo lname Bar);
my %user = @person;

my @foobar = %user;
print "@foobar\n"; # fname Foo lname Bar
                   # or
print "@foobar\n"; # lname Bar fname Foo 



$user{phone} = '123-456';     # change the value of one element
                              # or add key/value pair

%user = (phone => '123-456'); # change the hash
                              # remove all previous elements from the hash
                              # add a single key/value pair

Hash in scalar context

A hash in LIST context returns its keys and values.

my @foobar = %user;

In SCALAR context:

if (%user) {
    # the hash is not empty
}

Fetching data from hash

  • keys
  • sort keys
my @fields = keys %user;
foreach my $field (@fields) {
    print "$field    $user{$field}\n";
}


foreach my $field (keys %user) {
    print "$field    $user{$field}\n";
}


my @fields = keys %user;
my @sorted_fields = sort @fields;
foreach my $field (@sorted_fields) {
    print "$field    $user{$field}\n";
}

foreach my $field (sort keys %user) {
    print "$field    $user{$field}\n";
}

exists, delete hash element

  • exists
  • delete
  • Data::Dumper
#!/usr/bin/perl 
use strict;
use warnings;

my %phones;
$phones{Foo} = '111';
$phones{Bar} = '222';
$phones{Moo} = undef;

print defined $phones{Foo} ? "Foo: $phones{Foo}\n" : "Foo not defined\n";
print defined $phones{Moo} ? "Moo: $phones{Moo}\n" : "Moo not defined\n";
print defined $phones{Baz} ? "Baz: $phones{Baz}\n" : "Baz not defined\n";

print exists $phones{Moo}  ? "Moo exists\n"        : "Moo does not exist\n";
print exists $phones{Baz}  ? "Baz exists\n"        : "Baz does not exist\n";

delete $phones{Foo};
print exists $phones{Foo}  ? "Foo exists\n"        : "Foo does not exist\n";


Foo: 111
Moo not defined
Baz not defined
Moo exists
Baz does not exist
Foo does not exist

Multi dimensional hashes

  • Data::Dumper
#!/usr/bin/perl
use strict;
use warnings;

my %grades;
$grades{"Foo Bar"}{Mathematics}   = 97;
$grades{"Foo Bar"}{Literature}    = 67;
$grades{"Peti Bar"}{Literature}   = 88;
$grades{"Peti Bar"}{Mathematics}  = 82;
$grades{"Peti Bar"}{Art}          = 99;

#foreach my $name (sort keys %grades) {
#    foreach my $subject (keys %{$grades{$name}}) {
#        print "$name, $subject: $grades{$name}{$subject}\n";
#    }
#}

use Data::Dumper qw(Dumper);

print Dumper \%grades;
$VAR1 = {
          'Peti Bar' => {
                          'Art' => 99,
                          'Literature' => 88,
                          'Mathematics' => 82
                        },
          'Foo Bar' => {
                         'Literature' => 67,
                         'Mathematics' => 97
                       }
        };

Count words

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 filename\n";

my %count;

open(my $fh, "<", $filename)
    or die "Could not open '$filename': $!"; 
while (my $line = <$fh>) {
    chomp $line;
    my @words = split / /, $line;
    foreach my $word (@words) {
        $count{$word}++;
    }
}



foreach my $word (keys %count) {
    print "$word : $count{$word}\n";
}

Exercise: Parse HTTP values

You get one line like the following:
fname=Foo&amp;lname=Bar&amp;phone=123&amp;email=foo@bar.com

Build a hash table from it so:
print $h{fname};      #  Foo 
print $h{lname};      #  Bar
...

Start with this file:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);
 
my @input = (
    'fname=Foo&lname=Bar&email=foo@bar.com',
    'ip=127.0.0.1&machine=foobar',
);

foreach my $str (@input) {
    process($str);
}

sub process {

    my %data;

    # put your code here

    print Dumper \%data;
}
$VAR1 = {
          'email' => 'foo@bar.com',
          'lname' => 'Bar',
          'fname' => 'Foo'
        };
$VAR1 = {
          'ip' => '127.0.0.1',
          'machine' => 'foobar'
        };

Exercise: Improve the color selector

In the external file where we defined the colors, for each color
keep also a character that will be used to display the menu:

{% embed include file="src/examples/hashes/color_map.txt)

When displaying the menu show:

y) yellow
z) brown
b) black
e) blue

and wait till the user selects the appropriate letter.

Allow the user to provide a color on the command line
using the --color option and check if that is one of
the valid colors.

When reading in the file describing the menu options,
check for duplicate use of the same letter.

Exercise: Display scores

Read in a file where on each line there is a name and a score
with a comma between them.
Print them sorted based on name.
Then also print sorted based on score.
Foo,23
Bar,70
Baz,92
Bozo,17
Gozo,52
Dardon,20
Mekodra,23

Exercise: Analyze Apache log file

In the files section earlier we had a an example counting how many hits came from localhost and from other places.

Please improve that analyzer to provide a report: which client IP address were used and how many hits were from each IP address.

The log file can be found here: examples/files-perl/apache_access.log

Exercise: Parse variable width fields

# In a log file there are rows in which the first 16 and last 16 characters
# describe addresses while everything in between describes several commands
# Each command is built up by a leading character (A, B, C, D, etc) and a number
# of digits. The number of digits depend on the leading character.
#
# In this example we split up the data to commands and count how many times
# each command type was given.
#
1234567890123456A001B0002D00004C0000051234567890123456
1234567890123456A001A002D00004C0000051234567890123456

Solution: Parse HTTP values

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper qw(Dumper);

my @input = (
    'fname=Foo&lname=Bar&email=foo@bar.com',
    'ip=127.0.0.1&machine=foobar',
);

foreach my $str (@input) {
    process($str);
}

sub process {
    my $str = shift;

    my @pairs = split /&/, $str;
    my %data;
    foreach my $p (@pairs) {
        my ($k, $v) = split /=/, $p;
        $data{$k} = $v;
    }
    print Dumper \%data;
}

Solution: Improve the color selector

#!/usr/bin/perl
use strict;
use warnings;

use Getopt::Long qw(GetOptions);

my $color;
my $filename = "examples/color_map.txt";
GetOptions(
        "color=s"    => \$color,
        "filename=s" => \$filename,
);

my %colors;

open(my $fh, "<", $filename)
    or die "Could not open '$filename' for reading: $!";

while (my $line = <$fh>) {
    chomp $line;
    my ($color_name, $letter) = split / /, $line;
    if ($colors{$letter}) {
        warn 
            sprintf "%s appears to be allocated to both %s and %s\n",
                $letter, $colors{$letter}, $color_name
    } else {
        $colors{$letter} = $color_name;
    }
}

if ($color) {
    my $valid_color;
    foreach my $c (values %colors) {
        if ($c eq $color) {
            $valid_color = 1;
            next;
        }
    }
    if (not $valid_color) {
        print "The color '$color' is not valid.\n";
        $color = '';
    }
}


if (not $color) {
    print "Please select a number:\n";
    foreach my $k (sort keys %colors) {
        print "$k) $colors{$k}\n";
    }
    my $letter = <STDIN>;
    chomp($letter);
    if ($colors{$letter}) {
        $color = $colors{$letter};
    } else {
        print "Bad selection\n";
        exit;
    }
}

print "The selected color is $color\n";

Solution: Display scores

#!/usr/bin/perl
use strict;
use warnings;

my $filename = shift or die "Usage: $0 FILENAME\n";

open(my $fh, "<", $filename) or die "Could not open '$filename'\n";

my %score_of;
while (my $line = <$fh>) {
    chomp $line;
    my ($name, $score) = split /,/, $line;
    $score_of{$name} = $score;
}

foreach my $name (sort keys %score_of) {
    printf "%-10s %s\n", $name, $score_of{$name};
}
print "--------------------------\n";

foreach my $name (sort { 
                        $score_of{$b} <=> $score_of{$a} 
                       } keys %score_of) {
    printf "%-10s %s\n", $name, $score_of{$name};
}


Solution: Analyze Apache log file

#!/usr/bin/perl
use strict;
use warnings;

# examples/files/apache_access.log

my $file = shift
    or die "Usage: $0 FILENAME\n";
open my $fh, '<', $file or die "Could not open '$file': $!";

my %count;
while (my $line = <$fh>) {
    chomp $line;
    my $length = index ($line, " ");
    my $ip = substr($line, 0, $length);
    $count{$ip}++;   
}

foreach my $ip (keys %count) {
    print "$ip   $count{$ip}\n";
}

Solution: Parse variable width fields

#!/usr/bin/perl
use strict;
use warnings;

my %count;

my %length = (
     A => 3,
     B => 4,
     C => 6,
     D => 5,
);

my $filename = "examples/hashes/variable_width_fields.log";
if ($ARGV[0]) {
    $filename = $ARGV[0];
}

open my $data, '<', $filename or die "Could not open '$filename' $!";

LINE:
while (my $line = <$data>) {
    chomp $line;
    if (substr($line, 0, 1) eq "#") {
        next;
    }
    my $cmds = substr($line, 16, -16);
    while ($cmds) {
        my $c = substr($cmds, 0, 1, "");
        if (not defined $length{$c}) {
            warn "....";
            next LINE;
        }
        my $cmd = substr($cmds, 0, $length{$c}, "");
 
        $count{$c}++;
        print "$c : $cmd\n";
    }
}

print "-" x 80, "\n";
foreach my $c (keys %count) {
    print "$c  $count{$c}\n";
}

Shell to Perl

Shell and Perl intro

Manipulating Files and Directories

File test or -X operators

  • -e
  • -f
  • -d
  • -M
Before we try to read from a file or try to write to a file
we might want to check our rights, if we can do the required action at all.

For this there is a bunch of so called -X operators. Usually you use them in
an if statement:
if (-e "file.txt") {
    print "File exists !\n";
}
  • -e File (or directory) exists
  • -r File (or directory) is readable by this user
  • -w File (or directory) is writable by this user
  • -x File (or directory) is executable by this user
  • -f Entry is a file
  • -d Entry is a directory
  • -l Entry is a symbolic link
  • -s Size of the file (hence also means 'file is not empty')
  • -M Number of days between the modification date of a file and the start time of our script
Hence -s can be used either in an if statement or like this:
$size = -s $filename;
There are more such operators see `perldoc -f -x`

Running External Programs

  • system
  • ``
  • qx

system() can execute any external program. You pass to it the same string as you would type on the command line.

It returns 0 on success and the exit code of the external program on failure. Hence the strange way we check if it fails.

Passing the program name and the parameters as an array is more secure as it does not involve invocation of a shell. There is no shell processing involved;

system("some_app.exe --option");

See perldoc -f system for more error handling

my $result = `some_app.exe --option`;
my @result = `some_app.exe --option`;

backticks `` are also know as qx{}

Open pipe for reading

  • -|

Connect to STDOUT of find

#!/usr/bin/perl
use strict;
use warnings;

my @resp;
open my $ph, '-|', 'find /' or die;
while (my $line = <$ph>) {
    push @resp, $line;
    last if $line =~ /root/;
}
close $ph;
print @resp;

Open pipe for writing

  • |-

Connect to STDIN of grep

#!/usr/bin/perl
use strict;
use warnings;

open my $ph, '|-', 'grep Perl ' or die "Cannot start process";
print $ph "aa\n";
print $ph "Perl was here\n";
print $ph "bb\n";
print $ph "ccPerl\n";
close $ph;

Pager

Connect to STDIN for paging

use strict;
use warnings;

if (not @ARGV) {
    die "Usage: $0 command    to page the output of the given command \n";
}

my $SIZE = 10;
my $cnt = $SIZE;
open my $ph, '-|', "@ARGV" or die;
while (my $line = <$ph>) {
    print $line;
    $cnt--;
    if ($cnt <= 0) {
        print "---";
        my $in = <STDIN>;
        chomp $in;
        if ($in eq ' ') {
            $cnt = $SIZE;
        } elsif ($in eq 'q') {
            last;
        } else {
            $cnt = 1;
        }
    }
}
close $ph;

sendmail.pl

Connect to STDIN of sendmail

#!/usr/bin/perl
use strict;
use warnings;

sendmail('Info <info@perlmaven.com>',
    'hello world',
    'text...',
    'Noreply <noreply@perlmaven.com>');

sub sendmail {
    my ($tofield, $subject, $text, $fromfield) = @_;
    my $mailprog = "/usr/lib/sendmail";

    open my $ph, '|-', "$mailprog -t -oi" or die $!;
    print $ph "To: $tofield\n";
    print $ph "From: $fromfield\n";
    print $ph "Reply-To: $fromfield\n";
    print $ph "Subject: $subject\n";
    print $ph "\n";
    print $ph "$text";
    close $ph;
    return ;
}

# Warning: do not use the above script in an environment
# where anyone can supply the fields in the header
# (To, From, Reply-To, Subject in this case)
# as this can create an open relay.

UNIX commands from the inside

  • unlink
  • rm
  • del
  • rename
  • chmod
  • chown
  • cd
  • chdir
  • rmdir
  • ln
  • link
  • symlink
  • readlink
  • glob
  • %ENV
  • $ENV{HOME}
You can run every external command using `system`
but it makes it platform dependent and might have more security implications.

The following calls are available from Perl.
There are more but we won't cover them now.

| | | UNIX | DOS | | unlink | FILENAME | rm | del | | rename | OLDFILE, NEWFILE | mv | ren | | chmod | MODE, FILE | chmod | - | | chown | UID, GID, FILE | chown | - | | chdir | DIRNAME | cd | cd | | mkdir | DIRNAME, PERM | mkdir | mkdir | | rmdir | DIRNAME | rmdir | rmdir | | link | OLDNAME, NEWNAME | ln | - | | symlink | OLDNAME, NEWNAME | ln -s | - | | readlink | LINKNAME | ls -l | - | | glob | WILDCARDS | ls -1 | dir | | opendir, readdir | | ls -1 | dir | | %ENV, $ENV{HOME} | | | |

my $uid = getpwnam($username);
my $gid = getgrnam($groupname);

How to remove, copy or rename a file with Perl

File globbing (wildcards)

#!/usr/bin/perl
use strict;
use warnings;

# File globbing
my @xml_files_in_current_dir = glob "*.xml";

my $bin_dir = "/home/foo/bin";
my @perl_files = glob "$bin_dir/*.pl $bin_dir/*.pm";

# my @xml_files_using_old_syntax = <*.xml>;

Rename files

  • ls

  • glob - directory listing

#!/usr/bin/perl
use strict;
use warnings;

foreach my $file (glob "*.xml") {
    my $new = substr($file, 0, -3) . "html";
    rename $file, $new;
}

Directory handles

  • opendir
  • readdir

For a platform independent approach use opendir and readdir.

In order to read the content of a directory (that is the list of the files) first we have to open the directory similarly to the way we opened a file but using the opendir function This way we get a directory handle which we can use in subsequent operations.

Once the directory was opened successfully we can use the function readdir in a loop to get the names of the files in that directory

#!/usr/bin/perl
use strict;
use warnings;

my $dir = shift or die "Usage: $0 DIRECTORY\n";

opendir my $dh, $dir or die "Cannot open $dir: $!\n";
while (my $entry = readdir $dh) {
    if ($entry eq "." or $entry eq "..") {
        next;
    }
    print "$entry\n";
}
closedir $dh;

in LIST context readdir returns all the files in the directory.

opendir(my $dh, "/etc") or die $!;
@files = readdir $dh;

File::HomeDir

#!/usr/bin/perl 
use strict;
use warnings;

use File::HomeDir;
my $home = File::HomeDir->my_home;
my $docs = File::HomeDir->my_documents;

print "$home\n";
print "$docs\n";

More UNIX commands implemented in modules

  • pwd
  • cwd
  • basename
  • dirname
  • cp
  • copy
  • mv
  • move

| Module | Usage | Comment | | Cwd | $dir = cwd; | current working directory | | File::Copy | copy "oldfile", "newfile"; | | | | move "oldfile", "newfile"; | this works between file systems as well | | File::Basename | basename "/a/b/c/file.pl"; | file.pl | | | dirname "/a/b/c/file.pl"; | /a/b/c | | File::Path | mkpath("a/b/c") | (like mkdir -p) | | | rmtree("/") | | | File::Find | | | | File::Find::Rule | | |

File::Spec

  • File::Spec
  • catfile
use strict;
use warnings;

use File::Spec;
use Cwd qw(cwd);

my $dir = cwd;
my $f = File::Spec->catfile($dir, 'admin', 'project.txt');

print "$f\n";          # ...\admin\project.txt on Windows
                       # .../admin/project.txt on Linux

use File::Spec::Functions;

my $f = catfile($dir, 'admin', 'project.txt');

Change Copyright text in every source file in a directory hierarchy

  • File::Find

  • find

  • File::Find

  • reference to subroutine

#!/usr/bin/perl
use strict;
use warnings;

use File::Find qw(find);
use File::Slurp qw(read_file write_file);

my $dir = $ARGV[0] || '.';


find( \&change_file, $dir);
 

sub change_file {
    my $name= $_;

    if (not -f $name) {
        return;
    }
    if (substr($name, -3) ne ".pl") {
        return;
    }
    print "$name\n";
    my $data = read_file($name);

    $data =~ s/Copyright Old/Copyright New/g;

    # Let's not ruin our example files....
    my $backup = "$name.bak";
    rename $name, $backup, 
    write_file($name, $data);

    return;
}

Exercise: Tree

  • tree
Implement tree: prints a tree structure of a given directory.
All filenames are printed and subdirectories are properly indented.
$ tree.pl .
.
  subdir_1
    file_1_in_subdir_1
    file_2_in_subdir_1
  subdir_2
    subdir_2_1
      file_1_in_subdir_2_1
    file_1_in_subdir_2
Implement the previous one using File::Find
Implement the previous one using File::Find::Rule

Exercise: Check for UID 0

In UNIX/Linux the information about users is kept in the /etc/passwd file. Each line represents a user. The fields in each line are as follows: username, password,UID,GID,Gecos,home directory,shell Today the passwords are usually kept separately hence in this file you will only see an x in the second field.

When someone breaks in to a UNIX/Linux machine she might try to setup a user with UID 0 in order to gain root (superuser) access to the machine. Please check the following file and print a message if there is a user with 0 as UID which is NOT the root user.

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
cracker:x:0:99:games:/usr/games:/bin/sh
games:x:5:60:games:/usr/games:/bin/sh

Exercise: CPU load

Check if the load on the computer is over a certain threshold and log the event to syslog. See the 3rd number from the right of the uptime command or the first number in the /proc/loadavg file on linux. The file looks like this:

0.15 0.26 0.72 1/128 9230

A sample file

Exercise: MANIFEST file

We have a file called MANIFEST listing all the required files in a distribution. One file on every line.

Check if all the file exist.

Check if there are files that are not listed in MANIFEST

Optionally allow for wildcards in the file ?

Example MANIFEST file:

Makefile.PL
README
MANIFEST
lib/Module.pm

Expected solution:

File 'Makefile.PL' listed in MANIFEST is missing from package
File 'CHANGES' on disk but is not listed in MANIFEST

Solutions: Tree

  • File::Find
  • File::Find::Rule
#!/usr/bin/perl
use strict;
use warnings;

my $dir = '.';
if (@ARGV) {
    $dir = $ARGV[0];
}
traverse_dir('', $dir, 0);

sub traverse_dir {
    my ($dir, $thing, $depth) = @_;
    my $path = ($dir ? "$dir/$thing" : $thing);
    print " " x ($depth*3), "$thing\n";
    return if not -d $path;

    if (opendir my $dh, $path) {
        while (my $entry = readdir $dh) {
            next if $entry eq "." or $entry eq "..";
            traverse_dir ($path, $entry, $depth+1);
        }
    } else {
        print " " x ($depth*3-3), "#### Could not open $dir\n";
    }
    return;
}

#!/usr/bin/perl
use strict;
use warnings;

use File::Find;

if (not @ARGV) {
    @ARGV = (".");
}

find (\&find_name, @ARGV);

sub find_name {
    # $File::Find::name looks like: 
    # dir/subdir/subdir/file.ext
    # split at / and at \  for both unix and windows
    my @directories = split m{[/\\]}, $File::Find::name;

    print "  " x (@directories -1); # () needed for precedence
    print "$_\n";
    return;
}

#!/usr/bin/perl
use strict;
use warnings;

use File::Find::Rule;
my $dir = '.';
if (@ARGV) {
    $dir = shift;
}

foreach my $thing (File::Find::Rule->in($dir)) {
    my @parts = split m{/}, $thing;
    print "  " x @parts;
    print "$parts[-1]\n";
}

Solutions: Check for UID 0

#!/usr/bin/perl
use strict;
use warnings;

# Look for users with 0 uid

open my $pw, "/etc/passwd" or die "Cannot open /etc/passwd\n";

while (<$pw>) {
    my @items = split /:/;
    if ($items[2] =~ /^0*$/) {
        print ;
    }
}

# Alternatively writen:
#    if ((split /:/)[2] =~ /^0*$/) {



Solution: CPU load

#!/usr/bin/perl
use strict;
use warnings;

my $threshold = 0.01;

open my $fh, "/proc/loadavg" or die "Cannot read /proc/loadavg\n";
my $line = <$fh>;
my $load = (split " ", $line)[0];
if ($load > $threshold) {
    system "/usr/bin/logger", "-p", "crit", "High LOAD: $load\n";
}

# you can add here the code for logging in a file and sending e-mail
# but be careful as if the load if high you won't be able to send the
# email.


Solution: MANIFEST file

#!/usr/bin/perl
use strict;
use warnings;

use File::Find::Rule;
use File::Spec;


my $dir = shift or die "Usage: $0 DIR\n";

if (not -d $dir) {
    die "'$dir' is not a directory"; 
}

my $manifest_file = File::Spec->catfile($dir, 'MANIFEST');
if (not -f $manifest_file) {
    die "We have not found the '$manifest_file' file\n";
}

open my $fh, '<', $manifest_file;
my @manifest = <$fh>;
chomp @manifest;

foreach my $mfile (@manifest) {
    if (not -e File::Spec->catfile($dir, $mfile)) {
        warn "File '$mfile' listed in MANIFEST is missing from package\n";
    }

}

my %manifest;
foreach my $mfile (@manifest) {
    $manifest{$mfile} = 1;
}

# advanced solution of the above:
# my %manifest = map { $_ => 1 } @manifest;

my @files = File::Find::Rule->file->relative->in($dir);

foreach my $file (@files) {
    if (not $manifest{$file}) {
        warn "File '$file' on disk but is not listed in MANIFEST\n";
    }
}

Using Perl modules, using CPAN

Using modules exporting functions

use Cwd;
my $path = cwd;

Probably better this way, so the reader will know where each function comes from and we reduce the risk of redefining other functions by importing exactly the functions we want.

use Cwd ('cwd');
my $path = cwd;

also written as

use Cwd qw(cwd);
my $path = cwd;

You can also make sure not to import anything and the use fully qualified names.

use Cwd ();
my $path = Cwd::cwd;

Using Object Oriented modules

#!/usr/bin/perl 
use strict;
use warnings;

use WWW::Mechanize;

my $mech = WWW::Mechanize->new;

$mech->get('http://perlmaven.com/');

$mech->follow_link( text_regex => qr/tutorial/ );

print $mech->content;


Selecting Module to use

Evaluating modules, getting information about them

Some interesting CPAN Modules: DateTime

Days passed

use strict;
use warnings;


use DateTime;

my $d1 = DateTime->new(year => 2010, month => 8, day => 7, second => 37);
my $d2 = DateTime->new(year => 2010, month => 6, day => 17);

my $sec = $d2->subtract_datetime_absolute($d1)->seconds;
print $sec/60/60/24, "\n";

my $dur = $d1->delta_days($d2);
print $dur->in_units('days'), "\n";


Some interesting CPAN Modules: Databases

Some interesting CPAN Modules: Web

Some interesting CPAN Modules: common file format

Some interesting CPAN Modules

Installing modules

  • Use the packaging system of your OS
  • cpan
  • cpanm from cpanmin.us
  • Ask your sysadmin.

Installing modules from the os vendor

Let's install WWW::Mechanize

Debian/Ubuntu:

aptitude search mechanize | grep perl

sudo aptitude install libwww-mechanize-perl

Fedora / RedHat

yum search Mechanize | grep perl

yum install WWW-Mechanize

ActivePerl:

ppm install WWW::Mechanize

Using CPAN.pm

$ cpan WWW::Mechanize

or

$ cpan
cpan> install WWW::Mechanize

or

$ perl -MCPAN -eshell
cpan> install WWW::Mechanize

Configure CPAN.pm

Need to configure CPAN.pm:

$ cpan
cpan> o conf      to show the configuration options 

cpan> o conf urllist  http://cpan.pair.com/       a CPAN mirror that is close by
cpan> o conf prerequisites_policy follow
cpan> o conf makepl_arg  (PREFIX=... LIB=...)

cpan> o conf commit       # to save the changes

other interesting commands

cpan> install Module::Name

cpan> look Module::Name    gets to subshell
cpan> force install Module::Name
cpan> notest install Module::Name
cpan> test Module::Name

Installing modules manually

Download the tar.gz file from metacpan.org:

$ wget URL
$ tar xzf distribution.tar.gz
$ cd distribution
$ perl Makefile.PL
$ make
$ make test
$ make install  (as root or with sudo)

Without root rights

perl Makefile.PL PREFIX=/home/foobar/perlib LIB=/home/foobar/perlib/lib

Installing modules manually using Build.PL

Module::Build
perl Build.PL --install_base /home/foo/perl5 \
   --install_path lib=/home/foo/perl5/lib
perl Build
perl Build test
perl Build install

On Windows

On Strawberry Perl it would be dmake instead of make and there is also nmake in some installations.

      # on Unix
$ gunzip distribution.tar.gz
$ tar xf distribution.tar

Changing @INC

Set the environment variable PERL5LIB or PERLLIB for all the scripts

export PERL5LIB=/path/to/lib

Adding a path to the beginning of @INC. Good for the specific script

BEGIN {
   unshift @INC, '/path/to/lib';
}

The same but with the standard tool:

use lib '/path/to/lib';           # good for the specific script

On the command line. Good for this invocation only.

perl -I /path/to/lib script.pl

How to change @INC to find Perl modules in non-standard locations

Changing @INC - Relative PATH

Directory layout

  script/app.pl
  lib/My/Module.pm
# relative path
use FindBin;
use File::Spec;
use lib File::Spec->catfile($FindBin::Bin, '..', 'lib');
# relative path
use File::Spec;
use File::Basename;
use lib File::Spec->catfile(
            File::Basename::dirname(File::Spec->rel2abs($0)),
            '..',
            'lib');

local::lib

Download local::lib and follow the bootstrapping technique.

$ wget http://cpan.metacpan.org/authors/id/A/AP/APEIRON/local-lib-1.008004.tar.gz
$ tar xzf local-lib-1.008004.tar.gz
$ cd local-lib-1.008004
$ perl Makefile.PL --bootstrap
$ make test
$ make install
$ echo 'eval $(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)' >>~/.bashrc
$ source ~/.bashrc

$ cpan WWW::Mechanize

It will install itself in ~/perl5/lib you will need to add some code to ~/.bashrc as well to make it visible to all the perl scripts from your user. Then you can start installing modules using the regular cpan client.

CPAN.pm

export PERL5LIB=/home/gabor/perl5lib/lib

# This is CPAN.pm's systemwide configuration file. This file provides
# defaults for users, and the values can be changed in a per-user
# configuration file. The user-config file is being looked for as
# ~/.cpan/CPAN/MyConfig.pm.

$CPAN::Config = {
  'build_cache' => q[10],
  'build_dir' => q[/home/gabor/.cpan/build],
  'cache_metadata' => q[1],
  'cpan_home' => q[/home/gabor/.cpan],
  'dontload_hash' => {  },
  'ftp' => q[/usr/kerberos/bin/ftp],
  'ftp_proxy' => q[],
  'getcwd' => q[cwd],
  'gpg' => q[/usr/bin/gpg],
  'gzip' => q[/bin/gzip],
  'histfile' => q[/home/gabor/.cpan/histfile],
  'histsize' => q[100],
  'http_proxy' => q[],
  'inactivity_timeout' => q[0],
  'index_expire' => q[1],
  'inhibit_startup_message' => q[0],
  'keep_source_where' => q[/home/gabor/.cpan/sources],
  'links' => q[/usr/bin/links],
  'make' => q[/usr/bin/make],
  'make_arg' => q[],
  'make_install_arg' => q[],
  'makepl_arg' => q[PREFIX=/home/gabor/perl5lib LIB=/home/gabor/perl5lib/lib],
  'ncftpget' => q[/usr/bin/ncftpget],
  'no_proxy' => q[],
  'pager' => q[/usr/bin/less],
  'prerequisites_policy' => q[follow],
  'scan_cache' => q[atstart],
  'shell' => q[/bin/bash],
  'tar' => q[/bin/tar],
  'term_is_latin' => q[1],
  'unzip' => q[/usr/bin/unzip],
  'urllist' => [q[http://mirror.mirimar.net/cpan/]],
  'wget' => q[/usr/bin/wget],
};
1;
__END__

Installing modules on ActivePerl

C:> ppm
ppm> install Name-Of-Module

in case it returns a list of modules, pick up the correct number:

ppm> install 3

There are additional sites with ppm repositories once can find on Kobes Search

Add the repository to ppm and install modules from that place as well:

ppm> rep add uwin http://theoryx5.uwinnipeg.ca/ppms/
ppm> install IO-Socket-SSL 

in ActiveState 5.6.x

ppm> set rep name URL

In case the computer is behind a company proxy you can configure the http_proxy environment variable and ppm will use the proxy:

set http_proxy=http://proxy.company.com:8080

CPAN::Reporter

cpan> install CPAN::Reporter
cpan> reload cpan
cpan> o conf init test_report
cpan> o conf commit

Exercise: setup local::lib

Follow the instruction in the documentation of local::lib to bootstrap on your machine.

Exercise: Module installation

Install the Acme::EyeDrops module from CPAN and write a script
to draw a camel. As you are not root, you might need to install it in
a local subdirectory.

Create a simple script that does some simple computation.
Create a script using Acme::EyeDrops that will use the above simple script as
source.
Save your camel in a file.
Run the file containing the camel using Perl.

Exercise: Read Excel file

There are two Excel files included, read the data from the two files.

  • examples/spreadsheet.xls
  • examples/person.xls

Solution: Module installation

#!/usr/bin/perl

=pod

    search.cpan.org
    search Acme::EyeDrops
    download the latest Acme-EyeDrops gziped file
    (for me it was Acme-EyeDrops-1.01.tar.gz)


    mkdir modules   (create a local directory where we'll install the module)
    tar xzf Acme-EyeDrops-1.01.tar.gz
    cd Acme-EyeDrops-1.01
    perl Makefile.PL PREFIX=/home/user/modules LIB=/home/user/module/lib
         (the full path to the directory you created for the modules)
    make
    make test
    make install


    Create a script called hello_world.pl that asks for your name and then
    prints Hello NAME.


    Run this script. See the camel.
    Now run this script  and redirect to another file
    perl acme_camel.pl > camel.pl
    Now run the camel:
    perl camel.pl

=cut

use strict;
use warnings;

use lib qw (/home/user/modules/lib/);
use Acme::EyeDrops qw (sightly);

print sightly({
        Shape       => 'camel',
        SourceFile  => 'hello_world.pl',
    });


Solution: Read Excel file

#!/usr/bin/perl 
use strict;
use warningds;

use Spreadsheet::ParseExcel::Simple;
my $xls = Spreadsheet::ParseExcel::Simple->read("spreadsheet.xls");
foreach my $sheet ($xls->sheets) {
    while ($sheet->has_data) {
        my @data = $sheet->next_row;
        print join "|", @data;
        print "\n";
    }
}

Applications

Simple uses of Perl

After leaning the syntax of the language let's see a few simple ways to use it in real life tasks.

Create Unix user account

#!/usr/bin/env perl
use strict;
use warnings;

# --fname Foo --lname Bar
# /usr/sbin/adduser --home /opt/bfoo --gecos "Foo Bar" bfoo

my $adduser = '/usr/sbin/adduser';

use Getopt::Long qw(GetOptions);

if (not @ARGV) {
    usage();
}

my %opts;
GetOptions(\%opts,
    'fname=s',
    'lname=s',
    'run',
) or usage();

if (not $opts{fname} or $opts{fname} !~ /^[a-zA-Z]+$/) {
    usage("First name must be alphabetic");
}
if (not $opts{lname} or $opts{lname} !~ /^[a-zA-Z]+$/) {
    usage("Last name must be alphabetic");
}
my $username = lc( substr($opts{lname}, 0, 1) . $opts{fname});
my $home     = "/opt/$username";

print "Username: $username\n";

my $cmd = qq($adduser --home $home --gecos "$opts{fname} $opts{lname}" $username);

print "$cmd\n";
if ($opts{run}) {
    system $cmd;
} else {
    print "need to add the --run flag to actually execute\n";
}


sub usage {
    my ($msg) = @_;
    if ($msg) {
        print "$msg\n\n";
    }
    print "Usage: $0 --fname FirstName --lname LastName --run\n";
    exit;
}

Reporting file system diskspace usage (df)

  • df
#!/usr/bin/env perl
use strict;
use warnings;

use Filesys::DfPortable qw(dfportable);

#my $df = dfportable("/", 1024 * 1024 * 1024);

my $df = dfportable("/", 1024);
print "Total Size:             $df->{blocks} K\n";
print "Available:              $df->{bfree} K\n";
print "Used:                   $df->{bused} K\n";
print "Percent Full:           $df->{per} %\n";
print "Total available to me:  $df->{bavail} K\n";

$ perl diskspace.pl ; df /
Total Size:             48062440 K
Available:              38720692 K
Used:                   9341748 K
Percent Full:           20 %
Total available to me:  36279216 K

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1             48062440   9341748  36279216  21% /

Reporting diskspace usage on the mail server

  • du
#!/usr/bin/env perl
use strict;
use warnings;


#
# Reporting disk usage on the mail server
#
# Run the script in a cron job
#
#  1) Report to Boss if there are people with large files
#    
#  2) If a user has a file that is too big then ask him to remove the
#      large e-mail from the mail server via web access
#      This one has not been implemented yet
#
######################################################

use Mail::Sendmail      qw(sendmail);
use Filesys::DfPortable qw(dfportable);

################## Limit Definitions
# the size of the /var/spool/mail/username file   in bytes
my $report_to_boss_limit = 1_000_000;
my $report_to_user_limit = 500_000;
my $domain     = '@company.com';
my $boss_email = 'boss@company.com';
my $from_email = 'Disk Usage Report <sysadmin@company.com>';
my $disk_space_percantage = 80;


my %file_size;
# each user has a file in that directory
foreach my $path (glob "/var/spool/mail/*") {
    if ($path =~ /Save/) {          # disregard the Save directory
        next;
    }
    if ($path =~ /\.pop$/) {        # disregard temporary .pop files
        next;
    }

    $file_size{$path} = -s $path;
}


my $txt = "";
# sort files by size
foreach my $path (sort {$file_size{$b} <=> $file_size{$a}} keys %file_size) {
   my $name = $path;
   $name =~ s{/var/spool/mail/}{};

   if ($file_size{$path} > $report_to_boss_limit) {
      $txt .= "$name\t\t" . int ($file_size{$path}/1_000_000) . " MB\n";
   }
   if ($file_size{$path} > $report_to_user_limit) {
     my $msg = "You are currently using $file_size{$path} bytes\n";
     $msg .= "Please reduce it to under $report_to_user_limit\n";
     sendmail (
          To      => "$name$domain",
          From    => $from_email,
          Subject => 'Disk Usage Report' . localtime(),
          Message => $msg,
     );
   }
}

my @disks = qw(/ /boot);
foreach my $disk (@disks) {
   my $df = dfportable($disk, 1024);
   if ($df->{per} > $disk_space_percantage) {
      $txt .= "\n\nDiskspace is low\n\nUsing ";
      $txt .= $df->{per} . "\% of the space on $disk\n";
   }
}

if ($txt) {
   $txt = "Disk Usage of /var/spool/mail on the incoming mail server\n" .
          "Reporting users over $report_to_boss_limit bytes\n\n" .
          $txt;
   sendmail (
        To      => $boss_email,
        From    => $from_email,
        Subject => 'Disk Usage Report' . localtime(),
        Message => $txt,
   );
}

A du like script

  • du
#!/usr/bin/env perl
use strict;
use warnings;

use Filesys::DiskUsage qw(du);

if (not @ARGV) {
    die "Usage: $0 DIRs\n";
}

my %sizes = du({'make-hash' => 1}, @ARGV);
foreach my $entry (sort { $sizes{$a} <=> $sizes{$b} } keys %sizes) {
    print "$entry => $sizes{$entry}\n";
}

Send files by e-mail

  • MIME
#!/usr/bin/env perl
use strict;
use warnings;

use File::Basename qw(basename);
use File::Slurp qw(read_file);
use Getopt::Long qw(GetOptions);
use MIME::Lite;
use Pod::Usage qw(pod2usage);

my $text    = <<'END_TEXT';
<html>
<head>
 <title>Hello</title>
</head>
<body>
 <h1>World</h1>
</body>
</html>
END_TEXT

my %opt;
GetOptions(\%opt,
  'from=s',
  'to=s',
  'cc=s',
  'subject=s',
  'textfile=s',
  'smtp=s',
) or pod2usage();
if (not $opt{from} or
    not $opt{to} or 
    not $opt{subject}
    ) {
  pod2usage();
}
if ($opt{textfile}) {
  $text = read_file( $opt{textfile} );
}

send_files(\%opt, $opt{subject}, $text, @ARGV);

sub send_files {    
  my ($opt, $subject, $message_body, @files) = @_;
  
  my $msg = MIME::Lite->new(
      From    => $opt->{from},
      To      => $opt->{to},
      Cc      => $opt->{cc},
      Subject => $subject,
      Type    => 'multipart/mixed'
  ) or die "Error creating multipart container: $!\n";
  $msg->attach(
     Type => ($message_body =~ /<html>/ ? 'text/html' : 'text/plain'),
     Data => $message_body
  ) or die "Error adding the text message part: $!\n";
  
  foreach my $filename (@files) {
    $msg->attach(
      Type => ($filename =~ /\.xls$/ ?  'application/xls' : 'text/plain'),
      Path => $filename,
      Filename    => basename($filename),
      Disposition => 'attachment'
    ) or die "Error adding $filename: $!\n";
  }
  if ($opt->{smtp}) {
    $msg->send('smtp', $opt->{smtp}, Timeout => 60) or die $!;
  } else {
    $msg->send or die $!;
  }

  return;
}

=head1 SYNOPSIS

Sending and e-mail with or without attachements

perl send_files.pl 
--from from@company.com
--to   to@company.com
--subject "Subject line"
report.xls
 
--textfile path/to/content.txt
--smtp HOSTNAME


=cut

Read Excel file

  • Excel
#!/usr/bin/env perl
use strict;
use warnings;


use Spreadsheet::ParseExcel::Simple qw();
my $xls = Spreadsheet::ParseExcel::Simple->read("spreadsheet.xls");
foreach my $sheet ($xls->sheets) {
    while ($sheet->has_data) {
        my @data = $sheet->next_row;
        print join "|", @data;
        print "\n";
    }
}

How to create an Excel file with Perl.

Process file with fixed width records

#!/usr/bin/env perl
use strict;
use warnings;
  
# You need to parse a log file where the fields are fixed length long 
# and have no delimiters
# The definition is as follows:
# LABEL:       4 chars
# SOURCE:      8 digits
# DESTINATION: 8 digits
# TYPE:        4 chars
# VALUE:       8 digits
my $file = shift or die "Usage: $0 pack.txt\n";

open(my $data, '<', $file) or die "Could not open '$file'\n";
while (my $line = <$data>) {
    print $line;
    chomp $line;
    my ($label, $source, $dest, $type, $value) = unpack ("A4 A8 A8 A4 A8", $line);
    print "LABEL: $label SOURCE: $source DEST: $dest TYPE: $type VALUE: $value\n";
}

XALD37845566974923342XYZ24023984
YQRW49327408234028434ERD24448009

Process file with multiline records

device   =    234234
name     =    Big
address  =    115.6.79.8
class    =    B

device   =    234224
name     =    Big Blue
address  =    115.6.69.8
class    =    B
alias    =    Foxbox

device   =    234235
name     =    Big Green box
address  =    115.6.79.1
class    =    G
owner    =    Boss

device   =    334235
name     =    Small Yellow
address  =    115.6.79.10
class    =    Y
#!/usr/bin/env perl
use strict;
use warnings;

=head1 DESCRIPTION

File have sections separated by empty lines
Each section has several   field = value entries like this:
Given a value of the name field print out all the values in this section

device   =    234234
name     =    Big
address  =    115.6.79.8
class    =    B

=cut

if (@ARGV != 2) {
    die "\n  Usage: $0 filename name\n  Try:   $0 examples/config.txt Big\n\n";
}
my ($filename, $name) = @ARGV;

open(my $fh, "<", $filename) or die "Could not open '$filename' $!";
my %data;
while (my $line = <$fh>) {
    chomp $line;
    if ($line =~ /^\s*$/ and %data) {
        if ($data{name} eq $name) {
            foreach my $k (keys %data) {
                printf "%-10s = %s\n", $k, $data{$k};
            }
            exit;
        }
        %data = ();
    } else {
        my ($field, $value) = split /\s*=\s*/, $line;
        $data{$field} = $value;
    }
}

How to read a CSV file using Perl?

Process multi field csv file

  • csv
Name,ID,Input,Output
Big Venta,12,Left,Right
Small Elevator,7343124,Bottom,Top
Giant Ant,423235,Lower floor,Upper floor 
#!/usr/bin/env perl
use strict;
use warnings;

use Text::CSV_XS qw();
use Data::Dumper qw(Dumper);

my $filename = shift or die "Usage: $0 FILENAME\n";
open(my $fh, "<", $filename) or die "Could not open '$filename': $!";

my $csv = Text::CSV_XS->new;

my $key = "Name";

my $header = <$fh>;
chomp $header;
$csv->parse($header);
my @header = $csv->fields;

my %data;

while (my $line = <$fh>) {
    chomp $line;
    $csv->parse($line);
    my @cols = $csv->fields;
    my %h;
    @h{@header} = @cols;

    $data{$h{$key}} = \%h;
}

print Dumper \%data;



Fetch web page

  • http
  • web
  • GET
  • LWP::Simple
#!/usr/bin/env perl
use strict;
use warnings;

use LWP::Simple qw(get);

my $page = get "http://perlmaven.com/";
if ($page) {
    print "Site is alive\n";
} else {
    print "Site is not accessible\n";
}

Generate web page

  • CGI
  • HTML::Template

We are building the HTML pages from a template utilizing the HTML::Template module from CPAN. Besides the plain HTML the template has additional TMPL_* tags that will be filled by the values by HTML::Template.

{% embed include file="src/examples/applications/html.tmpl)

This is a simple Perl script that should be installed to a CGIExec enabled directory of Apache. When the user hits this page the first time it displays a white page with only entry-box and a submit button on it. the user can fill the box.

#!/usr/bin/env perl
use strict;
use warnings;

use CGI;
use HTML::Template;

my $template = HTML::Template->new(filename => "examples/html.tmpl");
my $q = CGI->new;
print $q->header;


if ($q->param("text")) {
    my $text = $q->param("text");
    $template->param(echo => $text);
}
print $template->output

Parse XML file

  • XML
  • XML::Simple
<people>
  <person id="1">
    <fname>Josef</fname>
    <lname>Kiss</lname>
    <idnum>4342324234</idnum>
    <children>
      <child>
        <id>3</id>
      </child>
    </children>
  </person>
  <person id="2">
    <fname>Peter</fname>
    <lname>Kiss</lname>
    <idnum>8768996</idnum>
  </person>
  <person id="3">
    <fname>Zoltan</fname>
    <lname>Kiss</lname>
    <idnum>09808760</idnum>
  </person>
</people>
#!/usr/bin/env perl
use strict;
use warnings;

use XML::Simple qw(XMLin);

my $xml = XMLin("examples/simple.xml", ForceArray => 1);
#use Data::Dumper qw(Dumper);
#print Dumper $xml;
#exit;

print join "-", keys %{$xml->{person}};
print "\n";

foreach my $id (keys %{$xml->{person}}) {
    printf "%-10s %-10s %-10s\n",
        $xml->{person}{$id}{fname}[0],
        $xml->{person}{$id}{lname}[0],
        $xml->{person}{$id}{idnum}[0];
}

Database access using DBI and DBD::SQLite

  • database
  • DBI
  • DBD
  • SQLite
#!/usr/bin/env perl
use strict;
use warnings;

use DBI;

my $dbfile = "sample.db";

unlink $dbfile if -e $dbfile;
my $sql = <<'SCHEMA';
CREATE TABLE people (
   id INTEGER PRIMARY KEY,
   fname VARCHAR(100),
   lname VARCHAR(100)
);
SCHEMA

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","", {});
$dbh->do($sql);

#!/usr/bin/env perl
use strict;
use warnings;

use Getopt::Long qw(GetOptions);
use DBI          qw();

my $action;
GetOptions("action=s" => \$action);
if (not $action or $action !~ /^(insert|selecta|selecth)$/) {
    print <<"USAGE";
Usage:
      $0 --action insert|selecta|selecth

USAGE
    exit;
}
my $dbfile = "sample.db";

my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","", {});

if ($action eq "insert") {
    insert();
}
if ($action eq "selecta") {
    fetch_arrays();
}
if ($action eq "selecth") {
    fetch_hashref();
}


sub insert {
    my @people = (
        ['Foo', 'Bar'],
        ['Apple', 'Pie'],
    );
    foreach my $person (@people) {
        $dbh->do("INSERT INTO people (id, fname, lname) VALUES(?, ?, ?)",
            undef,
            1, @$person);
    }
    return;
}

sub fetch_arrays {
    my $sth = $dbh->prepare("SELECT lname, fname FROM people WHERE id = ?");
    $sth->execute(1);
    while (my @result = $sth->fetchrow_array()) {
        print "lname: $result[0], fname: $result[1]\n";
    }
    $sth->finish;
    return;
}

sub fetch_hashref {
    my $sth = $dbh->prepare("SELECT lname, fname FROM people WHERE id = ?");
    $sth->execute(1);
    while (my $result = $sth->fetchrow_hashref("NAME_lc")) {
        print "lname: $result->{lname}, fname: $result->{fname}\n";
    }
    $sth->finish;
    return;
}

Simple Database access using Perl DBI and SQL

Net::LDAP

  • LDAP
  • Net::LDAP
#!/usr/bin/env perl
use strict;
use warnings;

use Net::LDAP;
my $server = "ldap.itd.umich.edu";
my $ldap = Net::LDAP->new( $server ) or die $@;
$ldap->bind;

my $result = $ldap->search(
    base   => "",
    filter => "(&(cn=Jame*) (sn=Woodw*))",
    attr   => ["mail"],
);

$result->code && die $result->error;

printf "COUNT: %s\n", $result->count;

foreach my $entry ($result->entries) {
    $entry->dump;
}
print "===============================================\n";

foreach my $entry ($result->entries) {
    printf "%s <%s>\n",
        $entry->get_value("displayName"),
        ($entry->get_value("mail") || '');
    $entry->add ( "nephew" => "Foo Bar" );
    $entry->replace ( "mail" => 'foo@bar.com');
    my $res = $entry->update($ldap);
    if ($res->code) {
        warn "Failed to update: " . $res->error . "\n";
    };
}

my $res = $ldap->add(
    'cn=root, o=University of Michigan, c=US',
    attr => [
        cn   => 'Foo Bar',
        ou   => 'My Place in the Universe',
        mail => 'mail@address.com',
    ],
);
if ($res->code) {
    warn "Failed to add: " . $res->error . "\n";
}

$ldap->unbind;

Reading from LDAP in Perl using Net::LDAP

Tie::File

  • Tie::File
#!/usr/bin/env perl
use strict;
use warnings;

use Tie::File;

tie my @file, 'Tie::File', "data.txt" or die $!;
$file[7] = "Hello";

Extra slides

List all defined variables

#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;

my $x;
my @y;

our $xx;

foreach my $variable (sort keys %main::) {
    say $variable;
}

Perl internal variables

  • $! - error of the last system call e.g. open file.
  • $/ - input record separator, slurp mode.
  • $_ - the default variable for many operations.
  • $0 - Name of the current program.
  • $1, $2, .. Regex match variables.
  • $] - version number of Perl
  • $a, $b - variables of sort.
  • @ARGV - command line parameters.
  • @_ - parameters of the current function.
  • @INC - list of directories to search for modules.
  • %INC - loaded modules and the path to them on the disk.
  • %ENV - environment variables
  • %SIG - signal handlers
  • Full list at perldoc perlvar

What is Perl ?

  • Created by Larry Wall in 1987

  • Scripting language replacing awk, sed and shell scripts

  • Glue language - putting together tools provided for the OS

  • Free software distributed under GPL and the Artistic licenses

  • Available for most of the platforms (UNIX/Linux/Windows/Macintosh/VMS/main frame/...)

  • Generally speaking machine and OS independent but the system calls are OS specific

  • Interpreted language compiled to a machine independent byte tree and ...

  • ... Executed immediately by the interpreter

  • C-like syntax

What does Perl stand for ?

  • Perl is not an acronym it is what we call a backronym
  • Practical Extraction and Reporting Language
  • Pathologically Eclectic Rubbish Lister
  • Objective: Making Easy Things Easy and Difficult Things Possible
  • Motto: TMTOWTDI = There's More Than One Way To Do It

Scripting language ?

Some people talk about writing code in Perl as scripting and others as programming. Some say Perl is a scripting language others that it is a programming language.

Actually there is no such definition as scripting language. The real definitions are:

  • Compiled languages
  • Interpreted languages

Is Perl Interpreted or Compiled Language

  • Compiled languages: C, C++
  • Compiled and interpreted languages: Java, Perl
  • Interpreted languages: BASIC (or Shell)

Perl is in the middle as it is compiled to a byte tree (similarly to Java) that is never saved as a file. It also requires the interpreter to be available with the Perl program.

Advantages of a compiled language:

  • Can be faster especially in short applications
  • Does not need a compiler/interpreter/Virtual Machine on the users machine (it can be distributed on its own)

Disadvantages of a compiled language:

  • It has to be compiled separately to every platform
  • Difficult to make it platform independent

Why use Perl ?

  • Easy - for daily use
  • Nearly unlimited - for high-level programming (no real time, no kernel, no device drivers please)
  • Mostly fast - fast enough for the tasks you would do with Perl
  • Can solve problems - the symbol of Perl is the camel, ugly but can handle difficult environment

What is it good for ?

  • QA (test automation)
  • System Administration
  • Configuration Management
  • Web backend (CGI/Database access)
  • Quick and dirty programs (automating simple daily jobs)
  • Biotechnology
  • Prototypes

Its strengths

  • Text processing (pattern matching)
  • List processing
  • Database access
  • System language

Redirection by the user

In order to read a real file we have to wait for a later chapter or you can ask the user of the script to use redirection on the command line.

The user of out program might do the following to redirect the input and/or the output of our program.

myperl.pl &lt; in.txt
myperl.pl &gt; out.txt
myperl.pl &lt; in.txt  &gt; out.txt

for loop

  • for
for (INITIALIZE; TEST; INCREMENT) {
    BODY;
}

for (my $i=0; $i &lt; 10; $i++) {
     print "$i\n";
}

# but a foreach loop might be better for the same looping:
foreach my $i (0..9) {
    print "$i\n";
}

ASCII table

  • chr
  • ord
foreach my $i (32..128) {
	print $i, " ", chr($i), "\n";
}

                           # the inverse function is ord()
print ord('a');            # 97

default variable $_

  • $_

Default variable

my @fruits = qw(apple banana peach);
foreach my $fruit (@fruits) {
	print $fruit;
}
foreach my $name (@people) {
    print $name;
}

foreach (@people) {
    print;
}

foreach $_ (@people) {
    print $_;
}

Return values

Perl functions always returns a value. Either explicitly by calling return or the result of the last computation.

#!/usr/bin/perl

sub loop {
    $i = 0;
    while ($i < 10) {
       $i++;
    }
}
print loop();            # what is this value ?
print $i;                # 10

Solving the above problem: always use return Add return $i; before the end of the subroutine.

UNIX file system, hard links symbolic links

  • What is in a Directory ?
  • What is in an Inode ?
  • What is a symbolic link ?
  • What is a hard link ?
What links can be between different partitions ?

(hint: only symbolic links as hard links are bound to the inode
number which is local to each partition)

stat, lstat

  • stat
  • lstat

In order to get information from the inode table you can use the stat system call

 ARRAY = stat FILEHANDLE| FILENAME
 @fields = stat ($filename);
 @fields = stat ($fh);

 $fields[4]  is the UID
 $fields[7]  is the size in bytes
  0 dev      device number of file system
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only)
  7 size     total size of file, in bytes
  8 atime    last access time in seconds since the epoch
  9 mtime    last modify time in seconds since the epoch
 10 ctime    inode change time (NOT creation time!) in seconds since the epoch
 11 blksize  preferred block size for file system I/O
 12 blocks   actual number of blocks allocated

 for symbolic links use lstat

Compare values

#!/usr/bin/perl
use strict;
use warnings;

my $first  = <STDIN>;
chomp $first;
my $other  = <STDIN>;
chomp $other;

if ($first == $other) {
    print "The two numbers are the same\n";
} else {
    print "The two numbers are NOT the same\n";
}

if ($first eq $other) {
    print "The two strings are the same\n";
} else {
    print "The two strings are NOT the same\n";
}

if ($first > $other) {
    print "First is a BIGGER number\n";
} else {
    print "First is a smaller number\n";
}

if ($first gt $other) {
    print "First is a BIGGER string\n";
} else {
    print "First is a smaller string\n";
}

Bitwise Operations

  • bitwise
  • &
  • |
  • ~
  • ^
  • <<
  • >>
The following operators work on the binary representation of the values

 &amp;    AND
 
 |    OR

 ^    XOR

 ~    NOT  (complement)

 >>   right shift

 &lt;&lt;   left shift

bitwise examples

#!/usr/bin/perl
use strict;
use warnings;

my $x = 5;        # 00101
my $y = 3;        # 00011

my $z = $x & $y;  # 00001    = 1

$z = $x | $y;     # 00111    = 7

$z = $x ^ $y;     # 00110    = 6

$z = ~ 1;         # 11111111111111111111111111111110   (32 chars)  = 4294967294


$z = $x << 2;     # 10100    = 20

$z = $x >> 2;     # 00001    = 1

printf "%b", $x; # in order to print the binary form of a value

### Bitwise operators on strings
$a = "h l o";
$b = " e l ";
print $a | $b, "\n";
print $a ^ $b, "\n";

# hello
# HELLO

Filename creation based on date

 # creating a filename based on the date YYYY-MM-DD-HH-MM-SS
 @t = (localtime)[5,4,3,2,1,0];
 $t[0]+=1900;
 $t[1]++;
 $filename = sprintf "%d-%0.2d-%0.2d-%0.2d-%0.2d-%0.2d", @t;

Debugging prints

# Scalar:
print STDERR "Debug: '$a'\n";


# Array:
print STDERR "Debug: @a\n";

print STDERR "Debug: ", join "|", @a, "\n";


# Hash:
print STDERR "Debug: ", join "|", %h, "\n";

print STDERR "Debug: "; 
foreach (keys %h) { print STDERR "$_=>$h{$_}|";
print STDERR "\n";


print STDERR "Debug: "; 
print STDERR "$_=>$h{$_}|" for keys %h;
print STDERR "\n";

Where is perl installed on your system ?

It can be installed in any place but usually you'll find it in one of the following places:

on UNIX:

/usr/bin/perl
/usr/local/bin/perl

on Windows:

c:\perl\bin\perl.exe

In addition you'll find a bunch of Perl standard modules installed somewhere on the system.

Send e-mail

  • mail

  • sendmail

  • Mail::Sendmail

  • Mail::Sendmail

#!/usr/bin/perl
use strict;
use warnings;

use Getopt::Long "GetOptions";
use Mail::Sendmail "sendmail";

my $to;
my $from;
my $help;
my $file;

GetOptions(
    "to=s"   => \$to,
    "from=s" => \$from,
    "help"   => \$help,
    "file=s" => \$file,
);

if ($help) {
    usage();
}
if ($to and $from and $file) {
    my ($subject, $message) = read_file($file);
    my %mail = (
        To      => $to,
        From    => $from,
        Subject => $subject,
        Message => $message,
    );
    sendmail(%mail) or die $Mail::Sendmail::error;
} else {
    usage();
}





sub usage {
    print "Usage: $0\n";
    print "        --to   TO\n";
    print "        --from FROM\n";
    print "        --file FILE\n";
    print "\n";
    print "        --help\n";
    print "\n";
    print "The given FILE  is going to be the content of the e-mail\n";
    print "The first line of the file should be:\n";
    print "Subject: and the subject itself\n";
    print "\n";
    exit;
}

sub read_file {
    my ($file) = @_;
    open(my $fh, "<", $file) or die "Could not open '$file'\n";
    my $subject = <$fh>;
    local $/ = undef;
    my $message = <$fh>;
    $subject =~ s/^Subject: //;

    return ($subject, $message);
}


XML using Perl

XML sample data

We are going to use the following XML file and will try to answer te various questions that are listed in the comment at the top of the file.

<!--
    - List all the currencies
    - What is the currency in the USA
    - What is the currency in the country with id 2 ?
    - Replace the currency in Hungary to Euro
    - Add another country called Beregengocia with currency 
-->
<data>
  <country id="1">
    <name>USA</name>
    <languages>
      <language id="1">English</language>
      <language id="2">Spanish</language>
    </languages>
    <currency>USD</currency>
  </country>
  <country id="2">
    <name>Israel</name>
    <languages>
      <language id="1">Hebrew</language>
      <language id="2">English</language>
      <language id="3">Arabic</language>
      <language id="4">Russian</language>
    </languages>
    <currency>NIS</currency>
  </country>
  <country id="3">
    <name>Hungary</name>
    <languages>
      <language id="1">Hungarian</language>
    </languages>
    <currency>HUF</currency>
  </country>
</data>

XML Definitions

  • Tags, attributes
  • Well formed
  <tag>text</tag>

or an empty element:

  <tag />
  • Valid XML (DTD, XML Schema, RelexNG, Schematron)
  • Namespaces, Entity references,

XML processing modes

  • DOM - build an in-memory representation of the XML (tree)
  • SAX - process the XML chunk-by-chunk (stream)

XML::Simple

Let's see some of the parameters controlling the way XML is read into memory by XML::Simple.

XML::Simple is, well, relatively simple but it can still be used in a number of ways. The most straight forward way is to import the XMLin function, and pass a filename to it or a scalar with XML data in it. In addition the XMLin subroutine accepts several options. Two of them are the most important: ForceArray and KeyAttr. Also interesting is KeepRoot.

ForceArray defaults to 0 KeyAttr defaults to ['name', 'key', 'id'] KeepRoot defaults to 0

'Array folding' feature is controlled by KeyAttr

#!/usr/bin/perl
use strict;
use warnings;

use XML::Simple qw(XMLin);
use Data::Dumper;
my $data = <<'END_XML';
<people>
  <person id="1">
     <name>Foo</name>
  </person>
  <person id="2">
     <name>Bar</name>
     <email id="3">bar@home</email>
     <picture src="img.png" />
  </person>
</people>
END_XML

my $xml = XMLin($data,
        ForceArray => 1,
        KeyAttr => [],
        #KeepRoot => 1,
        );

$Data::Dumper::Sortkeys = 1; 
print Dumper $xml;

print "The name of the first  person: $xml->{person}[0]{name}[0]\n";
print "The name of the second person: $xml->{person}[1]{name}[0]\n";

print "The id of the first  person: $xml->{person}[0]{id}\n";
print "The id of the second person: $xml->{person}[1]{id}\n";

print "The id of the first email of the second person: $xml->{person}[1]{email}[0]{id}\n";
print "The first email of the second person: $xml->{person}[1]{email}[0]{content}\n";
print "The srt of the first picture of the second person: $xml->{person}[1]{picture}[0]{src}\n";

# There still is a difference on how to get to the content of an element with attributes
# (the email) and the content of an element without attribute (name).

Without any options this will generate something that is apparently a mess. This is caused by the fact that KeyAttr is trying to be nice and uses the value of the 'name' tag and the 'id' attribute as the keys to the hash it is generating.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Simple;# qw(:strict);
use Data::Dumper;
my $xml = XMLin('examples/data.xml',
        ForceArray => 0, 
        KeyAttr => [],
        );
print Dumper $xml;

XML::Simple dumped

$VAR1 = {
          'country' => {
                       'Israel' => {
                                   'currency' => 'NIS',
                                   'languages' => {
                                                  'language' => [
                                                                'Hebrew',
                                                                'English',
                                                                'Arabic',
                                                                'Russian'
                                                              ]
                                                },
                                   'id' => '2'
                                 },
                       'Hungary' => {
                                    'currency' => 'HUF',
                                    'languages' => {
                                                   'language' => 'Hungarian'
                                                 },
                                    'id' => '3'
                                  },
                       'USA' => {
                                'currency' => 'USD',
                                'languages' => {
                                               'language' => [
                                                             'English',
                                                             'Spanish'
                                                           ]
                                             },
                                'id' => '1'
                              }
                     }
        };

XML::Simple processing

#!/usr/bin/perl
use strict;
use warnings;

use XML::Simple qw(XMLin XMLout);
use Data::Dumper;
my $xml = XMLin('examples/data.xml',
        ForceArray => 0, 
        KeyAttr => [],
);

print "List all currencies\n";

# KeyAttr is the default
#foreach my $country_name (keys %{ $xml->{country} }) {
#    printf "%-10s %s\n", 
#        $country_name, $xml->{country}->{$country_name}->{currency};
#}

# KeyAttr => [],
foreach my $country (@{ $xml->{country} }) {
    printf "%-10s %s\n", $country->{name}, $country->{currency}
}

#printf "The currency of the USA is %s.\n", $xml->{country}->{USA}->{currency};
foreach my $country (@{ $xml->{country} }) {
    if ($country->{name} eq 'USA') {
        printf "The currency of the USA is %s.\n", $country->{currency};
        last;
    }
}


#foreach my $country_name (keys %{ $xml->{country} }) {
#    if ($xml->{country}->{$country_name}->{id} == 2) {
#        printf "The currency of country id 2 is %s.\n", 
#            $xml->{country}->{$country_name}->{currency};
#        last;
#    }
#}

foreach my $country (@{ $xml->{country} }) {
    if ($country->{id} == 2) {
        printf "The currency of country id 2 is %s.\n", 
            $country->{currency};
        last;
    }
}


#$xml->{country}->{Hungary}->{currency} = 'Euro';
foreach my $country (@{ $xml->{country} }) {
    if ($country->{name} eq 'Hungary') {
        $country->{currency} = 'Euro';
        last;
    }
}

XMLout($xml, 
        OutputFile => 'out.xml', 
        RootName   => 'data',
        KeyAttr    => [],
        NoAttr     => 1,
);
        #KeyAttr    => ['id'],
        #NoAttr     => 0,


XML::Parser

Originally written by Larry Wall, using expat written by James Clark allow several styles of parsing such as Stream, Object, Tree.

XML::Parser tree

#!/usr/bin/perl
use strict;
use warnings;

use XML::Parser;
use Data::Dumper;

my $parser = XML::Parser->new(Style => 'Tree');
my $tree = $parser->parsefile('examples/data.xml');

$Data::Dumper::Indent = 0;
print Dumper $tree;

__END__
print "\n----------------\n";
print "$tree->[0]\n";             #   data

print "$tree->[1][3]\n";          #   country
print "$tree->[1][4][0]{id}\n";   #   1
print "$tree->[1][4][3]\n";       #   name
print "$tree->[1][4][4][2]\n";    #   USA
print "$tree->[1][4][7]\n";       #   languages

print "$tree->[1][7]\n";          #   country

Run the code and see what is dumped out

Read the documentation of XML::Parser (search Tree)

XML::Parser tree results

$VAR1 =
[
    'data',
    [
        {},
        0,
        '',
        'country',
        [
            {'id' => '1'},
            0,
            '',
            'name',
            [
                {},
                0,
                'USA'
            ],
            0,
            '',
            'languages',
            [
                {},
                0,
                '',
                'language',
                [
                    {},
                    0,
                    'English'
                ],
                0,
                '',
                'language',
                [
                    {},
                    0,
                    'Spanish'
                ],
                0,
                ''
            ],
            0,
            '',
            'currency',
            [
                {},
                0,
                'USD'
            ],
            0,
            ''
        ],
        0,
        '',
        'country',
        [
            {'id' => '2'},
            0,
            '',
            'name',[{},0,'Israel'],....
        ],
        0,
        '',
        'country',
        [
            {'id' => '3'},
            0,
            '',
            'name',[{},0,'Hungary'],....
        ],
    ]
];

XML::Parser

#!/usr/bin/perl
use strict;
use warnings;

use XML::Parser;

my $parser = XML::Parser->new(Style => 'Tree');
my $tree = $parser->parsefile('examples/data.xml');

# List all the currencies
foreach my $country (get_tag_subtrees($tree->[1], 'country')) {
    my ($name_tree) = get_tag_subtrees($country, 'name');
    my $name = $name_tree->[2];
    foreach my $currency (get_tag_subtrees($country, 'currency')) {
        printf "%-10s %s\n", $name, $currency->[2];
    }
}


# get a country with a given name
foreach my $country (get_tag_subtrees($tree->[1], 'country')) {
    my ($name_tree) = get_tag_subtrees($country, 'name');
    next if $name_tree->[2] ne 'USA';
    foreach my $currency (get_tag_subtrees($country, 'currency')) {
        printf "The currency in the USA is %s.\n", $currency->[2];
    }
}

foreach my $country (get_tag_subtrees($tree->[1], 'country')) {
    next if $country->[0]{id} ne '2';
    foreach my $currency (get_tag_subtrees($country, 'currency')) {
        printf "The currency in the country with id 2 is %s.\n", $currency->[2];
    }
}



sub get_tag_subtrees {
    my ($tree, $name) = @_;

    my @subtrees;

    foreach my $i (1.. (@{ $tree }-1) / 2) {
        if ($tree->[2*$i-1] eq $name) {
            push @subtrees, $tree->[2*$i];
        }
    }
    return @subtrees;
}

XML::LibXML

Low level library, Perl binding to libxml2 supports the standard
XML processing mode called Document Object Mode (DOM).
#!/usr/bin/perl
use strict;
use warnings;

use XML::LibXML;

my $parser = XML::LibXML->new;
my $doc = $parser->parse_file('examples/data.xml');
print $doc;

process_node( $doc->getDocumentElement, 0);

sub process_node {
    my ($node, $depth) = @_;
    print "$node\n";
    return unless( $node->nodeType eq &XML_ELEMENT_NODE );
    print "  " x $depth, $node->nodeName, "\n";
    foreach my $child ( $node->getChildnodes ) {
        process_node( $child, $depth+1 );
    }
}



Other Node types (taken from the source code)

XML_COMMENT_NODE
XML_TEXT_NODE
XML_CDATA_SECTION_NODE
XML_ELEMENT_NODE
XML_ENTITY_REF_NODE
XML_DOCUMENT_NODE
XML_HTML_DOCUMENT_NODE
XML_DOCUMENT_FRAG_NODE
XML_PI_NODE
XML_XINCLUDE_START
XML_XINCLUDE_END
XML_DTD_NODE

XML::XPath

Built on XML::Parser
#!/usr/bin/perl
use strict;
use warnings;

use XML::XPath;
my $xml = XML::XPath->new(filename => 'examples/data.xml');
my $name_nodes = $xml->find('/data/country');
foreach my $node ($name_nodes->get_nodelist) {
    my $nn = $node->find('name');
    foreach my $x ($nn->get_nodelist) {
        print XML::XPath::XMLParser::as_string($x);
        print "\n";
    }
}

XML::Twig

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;
my $twig = XML::Twig->new(
    pretty_print => 'indented',
);
$twig->parsefile('examples/data.xml');

# print the content again (look at the pretty_print option)
# $twig->print;


my $root = $twig->root;
# print $root->name, "\n";        # data
# print $root->gi, "\n";          # data


#print $twig, "\n";               # HASH(0x.....)
#print $root->twig, "\n";         # HASH)0x.....)

#my $elem = $root->first_child;
#print $elem->gi, "\n";           # country
#print $elem->att('id'), "\n";    # 1

# replace the name of the root element by a new name
#$root->set_tag('new data');
#$twig->print;

use strict;
use warnings;

my $t= XML::Twig->new();
  $t->parse( '<d><title>title</title><para>p 1</para><para>p 2</para></d>');
  my $root= $t->root;
  $root->set_tag( 'html');              # change doc to html
  $title= $root->first_child( 'title'); # get the title
  $title->set_tag( 'h1');               # turn it into h1
  my @para= $root->children( 'para');   # get the para children
  foreach my $para (@para)
    { $para->set_tag( 'p'); }           # turn them into p
  $t->print;                            # output the document

XML::Writer

Close to handwriting of the XML but can also check well formedness.

#!/usr/bin/perl
use strict;
use warnings;

use XML::Writer;

open my $out, '>', 'out.xml' or die;

my $writer = XML::Writer->new( OUTPUT => $out, NEWLINES => 0 );

$writer->startTag('data');
$writer->startTag('country', 'id' => 1);
$writer->startTag('name');
$writer->characters('USA');
$writer->endTag('name');
$writer->endTag('country');
$writer->endTag('data');
$writer->end();

close $out;

XML::Dumper

#!/usr/bin/perl
use strict;
use warnings;

use XML::Dumper;

open my $out, '>', 'out.xml' or die;

my $dumper = XML::Dumper->new();

my $perl = {
    people => [
        {
            name => 'Foo',
            phone => 123,
        },
        {
            name => 'Bar',
            phone => 456,
        },
        {
            name => 'Baz',
            phone => 789,
        },
    ],
};

my $xml = $dumper->pl2xml($perl);
print $xml;

my $perl_again = $dumper->xml2pl( $xml );
use Data::Dumper;
print Dumper $perl, $perl_again;

Exercise: parse glade.xml

{% embed include file="src/examples/xml/entry.glade)

Given the XML file created by Glade write the following functions:

my @names   = get_all_widget_names(); # all the names
my @widgets = get_all_widges();       # all the objects
my $widget  = get_widget($name);      # an object representing the widget
$widget->get_handler($name);          # name of the handler function
$widget->get_property($name);
$widget->set_property($name, $value);

DTD validation

Using XML::LibXML;

XML::Checker

Specialized XML modules

SOAP::Lite
Gtk2::GladeXML
XML::RSS
XML::RSS::SimpleGen
XML::Atom
XML::Atom::SimpleFeed

Resources

See CPAN for uncounted number of XML related modules.

The perl-xml mailing list
http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/perl-xml

http://xmltwig.com/

http://perl-xml.sourceforge.net/

http://www.xml.com

"Perl and XML" (first ed) By Erik T. Ray, Jason McIntosh, Published by O'Reilly

Many of our examples are taken from these places either directly or
after some transformation.

Parsing HTML File

HTML::Parser - fetch title

HTML::Parser

use strict;
use warnings;

use HTML::Parser ();

sub start_handler {
    my ($elem, $self) = @_;
#    print "start $elem\n";
    return if $elem ne "title";
    $self->handler(text => sub { print shift }, "dtext");
    $self->handler(end  => sub { shift->eof if shift eq "title"; }, "tagname,self");
}

my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, "tagname,self");
my $filename = shift || 'title_source.html';
$p->parse_file($filename) or die "Could not parse '$filename' $!";
print "\n";
  
<head>
<title>This is the title tag</title>
</head>
<body>
cont</body>

HTML::Parser

use strict;
use warnings;
use HTML::Parser;
use Data::Dumper qw(Dumper);

my $p = HTML::Parser->new(
	api_version => 3,
	start_h => [ \&start, "event, self, tagname, attr, "],
	text_h  => [ \&text, "event, self, dtext"],
	end_h   => [ 'end',   "event, self, tagname"], # no point in attr
);

my $html = <<'END_HTML';

<body>
<ul>
 <li>first elem</li>
  <li>second <a href="http://url" id=42>link</a> elem 
 <li>third <b>bold</b> elem</li>
 <!-- <li>commented out elem -->
 <li>5th elem</li>
</ul>
<img src="/path/to/img.png" />

</body>
END_HTML


# callback can be either referencfe to subroutine (or anonymous sub) 
# or name of sub

# does not call "end" when </li> was missed out, even if new <li> starts
# element that is both opening and closing tag will get a '/' key with a '/' value 
#	empty_element_tags => 1,
#       will remove that entry and generate an end call after the start call

# in the attributes of the opening tag but no call to 'end'


sub start {
	print Dumper \@_;
}
sub end {
	print Dumper \@_;
}
sub text {
	print Dumper \@_;
}

$p->parse($html);

HTML::Encode

use strict;
use warnings;

use HTML::Entities qw(decode_entities encode_entities);


print decode_entities("x&eacute;x"), "\n";
print decode_entities("x&aring;x"), "\n";
print encode_entities(decode_entities("x&aring;x")), "\n";


print encode_entities('This expression $x < 3'), "\n";

Autoclose

use strict;
use warnings;

use HTML::Parser ();

sub event_handler {
    my ($event, $elem) = @_;
    print "$event $elem\n";
}


my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&event_handler, "event, tagname");
$p->handler( end   => \&event_handler, "event, tagname");
$p->parse('<head><title>abc</title></head>');
$p->eof;
print "----\n";
$p->parse('<head><title>abc</head>');
$p->eof;

print "----\n";
$p->parse('<ul><li>abc</li><li>def</ul>');
$p->eof;
exit;

Appendix

Resources, How to get support for Perl ?

Recommended Books

  • Randal L. Schwartz: Lerning Perl - O'Reilly This book covers a bit more than the Fundamentals course
  • Modern Perl book
  • Tom Christiansen: Perl Cookbook - O'Reilly Huge repository of examples and explanations covering lots of areas where you'll use Perl.
  • Larry Wall: Programming Perl - O'Reilly The ultimate source for Perl written by the experts. This is NOT for the beginner !
  • Peter J. Scott: Perl Medic (Addison Wesley) For people who need to maintain Perl code or who need to write maintainable Perl code.
  • Damian Conway: Perl Best Practices - O'Reilly

Perl books

More about files

Exercise: Create a counter

Each time I run the script I want to get a higher number.

Exercise: Create multiple counters

Create multiple counters separated by newlines
I run the script like this:   counter.pl 3
This will increment the counter #3 by one and print the value.

Exercises: List old logfiles

List all the logfiles in the current directory that are older than 3 days
List all the log files in this directory and subdirectories that
are more than 3 days old.

Solution: Create a counter

#!/usr/bin/perl
use strict;
use warnings;

my $filename = "counter.txt";
if (not -e $filename) {
    open my $fh, ">", $filename or die "Could not create counter file: $!";
    print $fh 0;
}

open my $fh, "+<", $filename or die "Could not open counter: $!\n";
my $c = <$fh>;
chomp $c;

seek $fh, 0, 0;
truncate $fh, 0;
$c++;
print $c;

print $fh $c;
close $fh;

Solution: Create multiple counters

#!/usr/bin/perl
use strict;
use warnings;

unless (@ARGV) {
    print "Usage: $0 <counter-id>\n";
    exit;
}

my $id = shift @ARGV;
$id--;               # because we index the counters from 1 and the array is from 0

my $filename = "multiple_counter.txt";
if (not -e $filename) {
    open my $fh, ">", $filename or die "Could not create counter file: $!";
    print $fh 0;
}

open my $fh, "+<", $filename or die "Could not open counter: $!\n";
my @c = <$fh>;
chomp @c;

seek $fh, 0, 0;   # move to the beginning of the file 
truncate $fh, 0;  # cut the file to a certain size

$c[$id]++;
print $c[$id];

foreach my $v (@c) {
    if (defined $v) {
            print $fh "$v\n";
    } else {
        print $fh "\n"; 
    }
}
close $fh;

Solutions: List old logfiles

perl -e 'for (<*.log>) {print "$_\n" if -M $_ > 3}'

#!/usr/bin/perl
use strict;
use warnings;

use File::Find 'find';

find({
    wanted   => \&old_files,
    no_chdir => 1,
}, $ARGV[0] || '.');

sub old_files {
    if (substr($_, -4) ne ".log") {
        return;
    }
    if (-M $_ > 3) {
        print "$_\n";
    }
    return;
}

Testing Demo for Perl

Testing Demo AUT

AUT = Application under Testing

{% embed include file="src/examples/testing-demo/lib/MyMath.pm)

use strict;
use warnings;
use feature 'say';

use MyMath qw(add);

if (scalar @ARGV != 2) {
    die "Usage: $0 X Y\n";
}

say add(@ARGV);


use strict;
use warnings;
use feature 'say';

use MyMath qw(multiply);

if (scalar @ARGV != 2) {
    die "Usage: $0 X Y\n";
}

say multiply(@ARGV);


perl -I lib add.pl
perl -I lib multiply.pl

Testing Demo using Test::More (success)

cpanm Test::More
use strict;
use warnings;

use Test::More;
use MyMath qw(add);

is add(2, 2), 4;

done_testing;

$ prove -l t/01-add.t
t/01-add.t .. ok
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.02 usr  0.00 sys +  0.04 cusr  0.00 csys =  0.06 CPU)
Result: PASS

Testing Demo using Test::More (failure)

use strict;
use warnings;

use Test::More;
use MyMath qw(add);

is add(2, 2), 4;
is add(2, 3), 5;
is add(0, 0), 0;

done_testing;

$ prove -l t/02-add.t
t/02-add.t .. 1/?
#   Failed test at t/02-add.t line 8.
#          got: '6'
#     expected: '5'
# Looks like you failed 1 test of 3.
t/02-add.t .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests

Test Summary Report
-------------------
t/02-add.t (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1
Files=1, Tests=3,  1 wallclock secs ( 0.01 usr  0.02 sys +  0.04 cusr  0.00 csys =  0.07 CPU)
Result: FAIL

Testing Demo - Test coverage report

cpanm Devel::Cover
cover -delete
HARNESS_PERL_SWITCHES=-MDevel::Cover prove -l t/01-add.t
cover
$ cover
Reading database from .../cover_db

------------- ------ ------ ------ ------ ------ ------ ------
File            stmt   bran   cond    sub    pod   time  total
------------- ------ ------ ------ ------ ------ ------ ------
lib/MyMath.pm   84.6    n/a    n/a   80.0    0.0    0.0   75.0
t/01-add.t     100.0    n/a    n/a  100.0    n/a   99.9  100.0
Total           92.5    n/a    n/a   88.8    0.0  100.0   86.8
------------- ------ ------ ------ ------ ------ ------ ------

HTML output written to .../cover_db/coverage.html done.

Open cover_db/coverage.html.

File processing

Filehandle in scalar and list context

#!/usr/bin/perl
use strict;
use warnings;

my $file = "numbers.txt";

open(my $fh, '<', $file) or die "Could not open '$file'";
# reading in SCALAR context (line by line) and processing each line
while (my $row = <$fh>) {
    chomp $row;
    print "READ: $row\n";
}


open (my $other_fh, '<', $file) or die "Could not open '$file'";
# reading in LIST context all the lines at once
my @rows = <$other_fh>;
chomp @rows;
print "READ ", scalar(@rows), " lines\n";

slurp mode

#!/usr/bin/perl
use strict;
use warnings;

my $file = "numbers.txt";
# slurp mode
my $all;
{
    open(my $fh, '<', $file) or die "Could not open '$file'\n";
    local $/ = undef;
    $all = <$fh>;
}


Diamond operator

#!/usr/bin/perl
use strict;
use warnings;

while (my $line = <>) {
    print $line;
}

File::Slurp

#!/usr/bin/perl
use strict;
use warnings;

use File::Slurp qw(slurp);

my $filename = $ARGV[0];
if (not defined $filename) {
    die "Usage: $0 FILENAME\n";
}

my $text = slurp($filename);


my @lines = slurp($filename);

Other

Read CSV file as array of hashes

  • csv

The csv function that can be imported from Text::CSV can read a CSV into memory, creating an array of hashes.

Element from the first row will be used as the keys of these hashes and elements from all the other rows will be used as the values of these hashes.

Planet name,Distance (AU),Mass
Mercury,0.4,0.055
Venus,0.7,0.815
Earth,1,1
Mars,1.5,0.107
Ceres,2.77,0.00015
Jupiter,5.2,318
Saturn,9.5,95
Uranus,19.6,14
Neptune,30,17
Pluto,39,0.00218
Charon,39,0.000254
use strict;
use warnings;

use Text::CSV qw(csv);
use Data::Dumper qw(Dumper);

my $filename = 'planets.csv';
my $solar_system = csv(
    in => $filename,
    headers => 'auto');

print Dumper $solar_system;
$VAR1 = [
          {
            'Planet name' => 'Mercury',
            'Mass' => '0.055',
            'Distance (AU)' => '0.4'
          },
          {
            'Planet name' => 'Venus',
            'Mass' => '0.815',
            'Distance (AU)' => '0.7'
          },
          {
            'Distance (AU)' => '1',
            'Planet name' => 'Earth',
            'Mass' => '1'
          },
          {
            'Planet name' => 'Mars',
            'Mass' => '0.107',
            'Distance (AU)' => '1.5'
          },
          {
            'Planet name' => 'Ceres',
            'Mass' => '0.00015',
            'Distance (AU)' => '2.77'
          },
          {
            'Distance (AU)' => '5.2',
            'Mass' => '318',
            'Planet name' => 'Jupiter'
          },
          {
            'Distance (AU)' => '9.5',
            'Mass' => '95',
            'Planet name' => 'Saturn'
          },
          {
            'Distance (AU)' => '19.6',
            'Mass' => '14',
            'Planet name' => 'Uranus'
          },
          {
            'Mass' => '17',
            'Planet name' => 'Neptune',
            'Distance (AU)' => '30'
          },
          {
            'Distance (AU)' => '39',
            'Mass' => '0.00218',
            'Planet name' => 'Pluto'
          },
          {
            'Mass' => '0.000254',
            'Planet name' => 'Charon',
            'Distance (AU)' => '39'
          }
        ];

Sort array using sort and sort_by

use strict;
use warnings;

use List::UtilsBy qw(sort_by);

my @numbers = (2, -3, 4);
my @sorted = sort @numbers;
print "@sorted\n";

@sorted = sort { abs($a) <=> abs($b) } @numbers;
print "@sorted\n";

@sorted = sort_by { abs($_) }  @numbers;
print "@sorted\n";



use strict;
use warnings;

use List::UtilsBy qw(sort_by);

sub func {
    my ($val) = @_;
    print "func $val\n";
    return $val;
}

my @sorted = sort { func($a) <=> func($b) } (4, 3, 2, 1);
print "@sorted\n";
print "\n";  # looks like func is called n * log(n)

@sorted = sort_by { func($_) } (4, 3, 2, 1);
print "@sorted\n";
print "\n";  # func is only called n times

use strict;
use warnings;

use Text::CSV qw(csv);
use Data::Dumper qw(Dumper);
use List::UtilsBy qw(sort_by);

my $filename = 'planets.csv';
my $solar_system = csv(
    in => $filename,
    headers => 'auto');

#print Dumper $solar_system;

#my @sorted = sort_by { $_->{'Planet name'} }  @$solar_system;
my @sorted = sort_by { $_->{'Mass'} }  @$solar_system;
print Dumper \@sorted;

my @sorted  = sort { $a->{'Planet name'} cmp $b->{'Planet name'} }
    @$solar_system;

Range iterator

use strict;
use warnings;

use Range::Iter qw(range_iter);

my $iter = range_iter(1, 10, 2);
while (my $val = $iter->()) {
    print "$val\n"; # 1, 3, 5, 7, 9
}

Benchmark

use strict;
use warnings;
use 5.010;

use Benchmark qw(:hireswallclock);
#say(timeit(1, sub { count(1000)  })->real);
#say(timeit(1, sub { count(10000)  })->real);

#timethis(1, sub { count(10000) });   # (warning: too few iterations for a reliable count)
#timethis(100, sub { count(1000) });

timethese(1, {
    '1' => sub { count(1000) },
    '10' => sub { count(10000) },
});


sub count {
    my ($n) = @_;
    for my $i (1 .. $n) {
        for my $j (1 .. $n) {
            my $square = $i+$j;
        }
    }
}


Bubble sort

use strict;
use warnings;
use 5.010;
use Benchmark qw(:hireswallclock);

my @numbers = (1 .. 4);
print "@numbers\n";
my @sorted = bubble_sort(@numbers);
print "@sorted\n";

#timethese(1, {
#    '1000'  => sub { bubble_sort(1..1000) },
#    '10000' => sub { bubble_sort(1..10000) },
#});

#say(timeit(10, sub { bubble_sort(1..1000)  })->real);

sub bubble_sort {
    my @items = @_;
    for my $ix (0 .. $#items) {
        for my $jx (0 .. $#items-$ix-1) {
            if ($items[$jx] < $items[$jx+1]) {
                ($items[$jx], $items[$jx+1]) = ($items[$jx+1], $items[$jx]);
                print "@items\n";
            }
        }
    }
    return @items;
}

Insert sort

use strict;
use warnings;
use 5.010;
use Benchmark qw(:hireswallclock);

my @numbers = (1 .. 4);
print "@numbers\n";
my @sorted = insert_sort(@numbers);
print "@sorted\n";

#timethese(1, {
#    '1000'  => sub { insert_sort(1..1000) },
#    '10000' => sub { insert_sort(1..10000) },
#});

#say(timeit(10, sub { insert_sort(1..1000)  })->real);

sub insert_sort {
    my @items = @_;
    for my $ix (1 .. $#items) {
        my $current_item = $items[$ix];
        my $jx = $ix - 1;
        while ($jx >= 0 and $current_item > $items[$jx]) {
            $items[$jx+1] = $items[$jx];
            $jx--;
        }
        $items[$jx+1] = $current_item;
        #print "@items\n";
    }
    return @items;
}

Merge sort

use strict;
use warnings;
use 5.010;
use Benchmark qw(:hireswallclock);

my @numbers = (1 .. 4);
print "@numbers\n";
my @sorted = merge_sort(@numbers);
print "@sorted\n";

#timethese(1, {
#    '1000'   => sub { merge_sort(1..1000) },
#    '10000'  => sub { merge_sort(1..10000) },
#    '100000' => sub { merge_sort(1..100000) },
#});

#say(timeit(10, sub { merge_sort(1..1000)  })->real);

sub merge_sort {
    my @items = @_;
    #print "@items\n";
    if (scalar(@items) <= 1) {
        return @items;
    }
    my $half = int(scalar(@items) / 2);
    my @left = merge_sort(@items[0 .. $half-1]);
    my @right = merge_sort(@items[$half .. $#items]);

    my @sorted_items = merge_sorted_arrays(\@left, \@right);
    return @sorted_items;
}

sub merge_sorted_arrays {
    my ($left, $right) = @_;

    my @sorted_items;

    my ($i, $j) = (0, 0);
    while ($i < @$left and $j < @$right) {
        if ($left->[$i] > $right->[$j]) {
            push @sorted_items, $left->[$i];
            $i++;
        } else {
            push @sorted_items, $right->[$j];
            $j++;
        }
    }
    if ($i < @$left) {
        push @sorted_items, @$left[$i .. $#$left];
    } else {
        push @sorted_items, @$right[$j .. $#$right];
    }

    return @sorted_items;
}
  • It can run out of the recursion limit of Perl

Quicksort

use strict;
use warnings;
use 5.010;
use Benchmark qw(:hireswallclock);

my @numbers = (1 .. 4);
print "@numbers\n";
my @sorted = quick_sort(@numbers);
print "@sorted\n";

#timethese(1, {
#    '1000'   => sub { quick_sort(1..1000) },
#    '10000'  => sub { quick_sort(1..10000) },
#    '100000' => sub { quick_sort(1..100000) },
#});
#
#say(timeit(10, sub { quick_sort(1..1000)  })->real);

sub quick_sort {
    my @items = @_;
    #print "@items\n";
    if (scalar(@items) <= 1) {
        return @items;
    }
    my $pivot  = int(scalar(@items) / 2);  # this could be the first element or a random element.
    my @left  = grep { $_ > $items[$picot] } @items;
    my @right = grep { $_ < $items[$picot] } @items;
    my @same  = grep { $_ == $items[$picot] } @items;

    return quick_sort(@left), @same, quick_sort(@right);
}

  • The worst-case is if we happen to select the smallest element every time, then the complexity is O(n^2)
  • It can run out of the recursion limit of Perl

Timsort

  • Timsort
  • Created for Python by Tim Peters

Traverse directory tree

use strict;
use warnings;
use File::Spec::Functions qw(catfile);

my ($name) = @ARGV;
die "Usage: $0 DIR/FILE\n" if not $name;

my $result = dir_walk($name);

sub dir_walk {
    my ($name) = @_;

    if (-f $name) {
        print "$name\n";
        return;
    }
    if (-d $name) {
        if (opendir my $dh, $name) {
            while (my $subname = readdir $dh) {
                next if $subname eq '.' or $subname eq '..';
                my $resuld = dir_walk(catfile($name, $subname));
            }
        } else {
            warn "Could not open dir '$name'";
        }
        return;
    }
    warn "The '$name' is not a file and not a directory. Skipping.";
    return;
}

Traverse directory tree with call back

use strict;
use warnings;
use File::Spec::Functions qw(catfile);

my ($name) = @ARGV;
die "Usage: $0 DIR/FILE\n" if not $name;

#my $result = dir_walk($name, \&print_name);
#
#sub  print_name {
#    print "$_[0]\n";
#}

{
    my $total = 0;
    sub collect {
        my ($filename) = @_;
        $total += -s $filename;
    }
    sub total {
        return $total;
    }
}

my $result = dir_walk($name, \&collect);
print total(), "\n";

sub dir_walk {
    my ($name, $cb) = @_;

    if (-f $name) {
        $cb->($name);
        return;
    }
    if (-d $name) {
        if (opendir my $dh, $name) {
            while (my $subname = readdir $dh) {
                next if $subname eq '.' or $subname eq '..';
                my $resuld = dir_walk(catfile($name, $subname), $cb);
            }
        } else {
            warn "Could not open dir '$name'";
        }
        return;
    }
    warn "The '$name' is not a file and not a directory. Skipping.";
    return;
}

Linear search in unorderd array

use strict;
use warnings;

my @planets = qw(Mercury Venus Earth Mars Jupiter Saturn);

sub search {
    my ($name, @items) = @_;

    for my $ix (0 .. $#items) {
        if ($items[$ix] eq $name) {
            return $ix;
        }
    }
    return;
}
print search('Mars', @planets), "\n";
print search('Pluto', @planets), "\n";

Binary search in sorted array

use strict;
use warnings;

my @planets = qw(Earth Jupiter Mars Mercury Saturn Venus);

sub search {
    my ($name, @items) = @_;

    my $left = 0;
    my $right = $#items;
    while ($left < $right) {
        my $current = $left + int(($right-$left)/2);
        if ($items[$current] lt $name) {
            $left = $current+1;
            next;
        }
        if ($items[$current] gt $name) {
            $right = $current-1;
            next;
        }
        return $current;
    }

    return;
}
print search('Mars', @planets), "\n";
print search('Pluto', @planets), "\n";

# If there are duplicate matching elements then depending on the length of the array
# this might return the index of any of those items.

Convert curl command with -F to LWP::UserAgent

curl -X POST -F field=value -F name=Corion "https://httpbin.org/post" -H  "accept: application/json"
use strict;
use warnings;

use LWP::UserAgent ();
use HTTP::Request::Common qw(POST);

my $ua = LWP::UserAgent->new(timeout => 10);
my $url = 'https://httpbin.org/post';
my %content = (
    field => 'value',
    name => 'Corion',
);

my $request = POST $url, 'Content-Type' => 'form-data', Content => \%content;
my $response = $ua->request($request);
print $response->decoded_content;

Modify time anomality in two files

use strict;
use warnings;
use feature 'say';
use Time::HiRes qw(sleep time stat);

{
    open(my $fh, '>', 'first.txt');
    $fh->flush;
    close $fh;
}

say "before ", time;
#sleep(0.0006);
say "after  ", time;

{
    open(my $fh, '>', 'second.txt');
    $fh->flush;
    close $fh;
}

my $first = (stat('first.txt'))[9];
my $second = (stat('second.txt'))[9];
say "first  $first";
say "second $second";
say $first == $second ? "same" : "different";

use strict;
use warnings;
use feature 'say';
use Time::HiRes qw(stat);

{
    open(my $fh, '>', 'first.txt');
}
{
    open(my $fh, '>', 'second.txt');
    $fh->flush;
    close $fh;
}

my $first = (stat('first.txt'))[9];
my $second = (stat('second.txt'))[9];
say "first  $first";
say "second $second";
say $first == $second ? "same" : "different";