CSV (where CSV stand for Comma-separated values) is one of the most common file formats as it can be used to easily represent table-like data. Similar to what you would put in an Excel file or in a Relational database where you used SQL.
CSV file with multi-line fields
In this example we have simple fields separated by comma and we also have a field that contains both a comma and a newline as part of the value.
That field is wrapped in quotes "
to make it clear it is a single unit.
Tudor,Vidor,10,Hapci
Szundi,Morgo,7,Szende
Kuka,"Hofeherke,
alma",100,Kiralyno
Boszorkany,Herceg,9,Meselo
This script expects the path to the CSV file as input and will print the content of each line using Data::Dumper
.
examples/read_and_print_multiline_csv.pl
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Data::Dumper qw(Dumper);
my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";
my $csv = Text::CSV->new ({
binary => 1,
auto_diag => 1,
sep_char => ',' # not really needed as this is the default
});
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";
while (my $fields = $csv->getline( $data )) {
print Dumper $fields;
}
if (not $csv->eof) {
$csv->error_diag();
}
close $data;
The output will look like this:
$VAR1 = [
'Tudor',
'Vidor',
'10',
'Hapci'
];
$VAR1 = [
'Szundi',
'Morgo',
'7',
'Szende'
];
$VAR1 = [
'Kuka',
'Hofeherke,
alma',
'100',
'Kiralyno'
];
$VAR1 = [
'Boszorkany',
'Herceg',
'9',
'Meselo'
];
If you'd like to access the individual elements in each row you can do it with the following syntax:
$fields->[2];
which would access the 3rd element in the current row (indexing starts from 0).
For more details see the article explaining how to read a CSV file using Perl.
Text::CSV or Text::CSV_XS ?
Text::CSV is a pure-Perl implementation which means you can "install" it by downloading and unzipping the distribution. Text::CSV_XS implements the CSV parser in C which makes it a lot faster.
Luckily when using Text::CSV it will check if Text::CSV_XS is installed and if it is, the faster one will be used automatically.
So unless you want to force your users to always use Text::CSV_XS, you'd be probably better off using Text::CSV and letting your users decide if they want to "pay the price"?
Alternative modules
- Text::CSV a pure-Perl implementation
- Text::CSV_XS implement in C which makes it a lot faster
- DBD::CSV use SQL statements to access the data
- Spreadsheet::Read a wrapper around Text::CSV and other spreadsheet readers to make your code nicer.