CSV (where CSV stand for Comma-separated values) is one of the most common file formats as it can be used to easily represent table-like data. Similar to what you would put in an Excel file or in a Relational database where you used SQL.

CSV file with multi-line fields

In this example we have simple fields separated by comma and we also have a field that contains both a comma and a newline as part of the value. That field is wrapped in quotes " to make it clear it is a single unit.

examples/data/multiline.csv

Tudor,Vidor,10,Hapci
Szundi,Morgo,7,Szende
Kuka,"Hofeherke,
alma",100,Kiralyno
Boszorkany,Herceg,9,Meselo

This script expects the path to the CSV file as input and will print the content of each line using Data::Dumper.

examples/read_and_print_multiline_csv.pl

#!/usr/bin/perl
use strict;
use warnings;

use Text::CSV;
use Data::Dumper qw(Dumper);

my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";

my $csv = Text::CSV->new ({
  binary    => 1,
  auto_diag => 1,
  sep_char  => ','    # not really needed as this is the default
});

open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
    print Dumper $fields;
}
if (not $csv->eof) {
  $csv->error_diag();
}
close $data;


The output will look like this:

$VAR1 = [
          'Tudor',
          'Vidor',
          '10',
          'Hapci'
        ];
$VAR1 = [
          'Szundi',
          'Morgo',
          '7',
          'Szende'
        ];
$VAR1 = [
          'Kuka',
          'Hofeherke,
alma',
          '100',
          'Kiralyno'
        ];
$VAR1 = [
          'Boszorkany',
          'Herceg',
          '9',
          'Meselo'
        ];

If you'd like to access the individual elements in each row you can do it with the following syntax: $fields->[2]; which would access the 3rd element in the current row (indexing starts from 0).

For more details see the article explaining how to read a CSV file using Perl.

Text::CSV or Text::CSV_XS ?

Text::CSV is a pure-Perl implementation which means you can "install" it by downloading and unzipping the distribution. Text::CSV_XS implements the CSV parser in C which makes it a lot faster.

Luckily when using Text::CSV it will check if Text::CSV_XS is installed and if it is, the faster one will be used automatically.

So unless you want to force your users to always use Text::CSV_XS, you'd be probably better off using Text::CSV and letting your users decide if they want to "pay the price"?

Alternative modules

  • Text::CSV a pure-Perl implementation
  • Text::CSV_XS implement in C which makes it a lot faster
  • DBD::CSV use SQL statements to access the data
  • Spreadsheet::Read a wrapper around Text::CSV and other spreadsheet readers to make your code nicer.

Related Articles