CSV - Comma Separated Values and Perl
CSV (where CSV stand for Comma-separated values) is one of the most common file formats as it can be used to easily represent table-like data. Similar to what you would put in an Excel file or in a Relational database where you used SQL.
CSV file with multi-line fields
In this example we have simple fields separated by comma and we also have a field that contains both a comma and a newline as part of the value. That field is wrapped in quotes " to make it clear it is a single unit.
examples/data/multiline.csv
Tudor,Vidor,10,Hapci Szundi,Morgo,7,Szende Kuka,"Hofeherke, alma",100,Kiralyno Boszorkany,Herceg,9,Meselo
This script expects the path to the CSV file as input and will print the content of each line using Data::Dumper.
examples/read_and_print_multiline_csv.pl
#!/usr/bin/perl use strict; use warnings; use Text::CSV; use Data::Dumper qw(Dumper); my $file = $ARGV[0] or die "Need to get CSV file on the command line\n"; my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, sep_char => ',' # not really needed as this is the default }); open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n"; while (my $fields = $csv->getline( $data )) { print Dumper $fields; } if (not $csv->eof) { $csv->error_diag(); } close $data;
The output will look like this:
$VAR1 = [ 'Tudor', 'Vidor', '10', 'Hapci' ]; $VAR1 = [ 'Szundi', 'Morgo', '7', 'Szende' ]; $VAR1 = [ 'Kuka', 'Hofeherke, alma', '100', 'Kiralyno' ]; $VAR1 = [ 'Boszorkany', 'Herceg', '9', 'Meselo' ];
If you'd like to access the individual elements in each row you can do it with the following syntax: $fields->[2]; which would access the 3rd element in the current row (indexing starts from 0).
For more details see the article explaining how to read a CSV file using Perl.
Text::CSV or Text::CSV_XS ?
Text::CSV is a pure-Perl implementation which means you can "install" it by downloading and unzipping the distribution. Text::CSV_XS implements the CSV parser in C which makes it a lot faster.
Luckily when using Text::CSV it will check if Text::CSV_XS is installed and if it is, the faster one will be used automatically.
So unless you want to force your users to always use Text::CSV_XS, you'd be probably better off using Text::CSV and letting your users decide if they want to "pay the price"?
Alternative modules
- Text::CSV a pure-Perl implementation
- Text::CSV_XS implement in C which makes it a lot faster
- DBD::CSV use SQL statements to access the data
- Spreadsheet::Read a wrapper around Text::CSV and other spreadsheet readers to make your code nicer.
Related Articles
- Split CSV file into multiple small CSV files
- Multiple command line counters with plain TSV text file back-end
- How to read a CSV file using Perl?
- How to calculate the balance of bank accounts in a CSV file, using Perl?
- Calculating bank balance, take two: DBD::CSV
- Process CSV file (screencast)
- Process CSV file using Text::CSV_XS (screencast)
- Process CSV file short version (screencast)
- One-liner sum of column in CSV
- How to replace a column in a CSV file using Perl
- How to splice a CSV file in Perl (filter columns of CSV file)
Published on 2016-05-28