CSV (where CSV stand for Comma-separated values) is one of the most common file formats as it can be used to easily represent table-like data. Similar to what you would put in an Excel file or in a Relational database where you used SQL.
CSV file with multi-line fields
In this example we have simple fields separated by comma and we also have a field that contains both a comma and a newline as part of the value.
That field is wrapped in quotes " to make it clear it is a single unit.
Tudor,Vidor,10,Hapci
Szundi,Morgo,7,Szende
Kuka,"Hofeherke,
alma",100,Kiralyno
Boszorkany,Herceg,9,Meselo
This script expects the path to the CSV file as input and will print the content of each line using Data::Dumper.
examples/read_and_print_multiline_csv.pl
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper qw(Dumper);
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $csv = Text::CSV->new ({
  binary    => 1,
  auto_diag => 1,
  sep_char  => ','    # not really needed as this is the default
});
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
    print Dumper $fields;
}
if (not $csv->eof) {
  $csv->error_diag();
}
close $data;
The output will look like this:
$VAR1 = [
          'Tudor',
          'Vidor',
          '10',
          'Hapci'
        ];
$VAR1 = [
          'Szundi',
          'Morgo',
          '7',
          'Szende'
        ];
$VAR1 = [
          'Kuka',
          'Hofeherke,
alma',
          '100',
          'Kiralyno'
        ];
$VAR1 = [
          'Boszorkany',
          'Herceg',
          '9',
          'Meselo'
        ];
If you'd like to access the individual elements in each row you can do it with the following syntax:
$fields->[2]; which would access the 3rd element in the current row (indexing starts from 0).
For more details see the article explaining how to read a CSV file using Perl.
Text::CSV or Text::CSV_XS ?
Text::CSV is a pure-Perl implementation which means you can "install" it by downloading and unzipping the distribution. Text::CSV_XS implements the CSV parser in C which makes it a lot faster.
Luckily when using Text::CSV it will check if Text::CSV_XS is installed and if it is, the faster one will be used automatically.
So unless you want to force your users to always use Text::CSV_XS, you'd be probably better off using Text::CSV and letting your users decide if they want to "pay the price"?
Alternative modules
- Text::CSV a pure-Perl implementation
- Text::CSV_XS implement in C which makes it a lot faster
- DBD::CSV use SQL statements to access the data
- Spreadsheet::Read a wrapper around Text::CSV and other spreadsheet readers to make your code nicer.
