What if you get an error like in the title while reading a perfectly good looking CSV file?

Someone sent me this CSV file:

examples/data/input_file_actual_data.csv

IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isEconomicallyDisadvantagedWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isJointVentureWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isJointVentureEconomicallyDisadvantagedWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isVeteranOwned,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isServiceRelatedDisabledVeteranOwnedBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,award_idv_flag,CHAR(1),Not Null,Indicates whether record is from the Award or Closed_Award Atom Feed Entry ("A") or IDV or Closed_IDV Atom Feed Entry ("I").,

and a 100 lines long script asking why does he get the above error.

I've never encountered the above error, so the first thing I did was creating a minimal script that only reads the CSV file to see if that exhibits the problem:

examples/minimal_csv_reader.pl

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";

my $csv = Text::CSV->new ({ binary    => 1, auto_diag => 1, sep_char  => ',' });
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";

while (my $fields = $csv->getline( $data )) {
}

It did, though the error I got was slightly different. I got:

CSV_PP ERROR: 2034 - EIF - Loose unescaped quote @ rec 5 pos 194

It seems the difference is that the other person had Text::CSV_XS installed and I did not, so in my case Text::CSV was using the PP (Pure Perl) backend.

I was still baffled. I opened the CSV file went to line 5 position 194 and there was a quote character ". At first I though it might not be an ASCII character, just one that looks similar, but using a small script that prints the ord of the characters I eliminated that possibility.

That's when it stroke me, CSV files can have their values within quotes and then they can even have newlines inside the field. However in this case the quote character was not the first character of a field. It did not come immediately after a field-separator (which is comma in this case).

So either we need to change the CSV file and escape the quote characters. By default the escape character is a double quote character, so we have to duplicate them, but this will only work if the whole field is insied double quotes:

examples/data/input_file_changed_data.csv

IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isEconomicallyDisadvantagedWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isJointVentureWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isJointVentureEconomicallyDisadvantagedWomenOwnedSmallBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isVeteranOwned,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,isServiceRelatedDisabledVeteranOwnedBusiness,VARCHAR(255),Null,Reference FPDS data dictionary (XML Tag Name).,
IMPORT.fpds_awards_raw,Raw import data from FPDS stored as the XML Tag Name (as-is).,award_idv_flag,CHAR(1),Not Null,"Indicates whether record is from the Award or Closed_Award Atom Feed Entry (""A"") or IDV or Closed_IDV Atom Feed Entry (""I"").",

Probably the better route is to tell the CSV reader that the double-quote " is not the escape character. We do that by explicitly saying what is the quote character:

examples/minimal_csv_reader_fixed.pl

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;

my $file = $ARGV[0] or die "Need to get CSV file on the command line\n";

my $csv = Text::CSV->new ({ binary    => 1, auto_diag => 1, sep_char  => ',', quote_char => q{'} });
open(my $data, '<:encoding(utf8)', $file) or die "Could not open '$file' $!\n";

while (my $fields = $csv->getline( $data )) {
}

A single-quote ' in this case.