Perl split - to cut up a string into pieces

Prev Next

PHP has the explode function, Python, Ruby and JavaScript all have split methods.

In Perl the function is called split.

Syntax of split

split REGEX, STRING will split the STRING at every match of the REGEX.

split REGEX, STRING, LIMIT where LIMIT is a positive number. This will split the the STRING at every match of the REGEX, but will stop after it found LIMIT-1 matches. So the number of elements it returns will be LIMIT or less.

split REGEX - If STRING is not given, splitting the content of $_, the default variable of Perl at every match of the REGEX.

split without any parameter will split the content of $_ using /\s+/ as REGEX.

Simple cases

split returns a list of strings:

use Data::Dumper qw(Dumper);

my $str = "ab cd ef gh ij";
my @words = split / /, $str;
print Dumper \@words;

The output is:

$VAR1 = [
          'ab',
          'cd',
          'ef',
          'gh',
          'ij'
        ];

Limit the number of parts

split can get a 3rd parameter that will limit the number of elements returned:

use Data::Dumper qw(Dumper);

my $str = "ab cd ef gh ij";
my @words = split / /, $str, 2;
print Dumper \@words;

The result:

$VAR1 = [
          'ab',
          'cd ef gh ij'
        ];

Assign to scalars

Instead of assigning the result to a single array, we can also assign it to a list of scalar variables:

my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh";
my ($username, $password, $uid, $gid, $real_name, $home, $shell) = split /:/, $str;
print "$username\n";
print "$real_name\n";

The output is like this:

root
System Administrator

Another way people often write this is the following: First they assign the results to and array, and then they copy the specific elements of the array:

my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh";
my @fields = split /:/, $str;
my $username = $fields[0];
my $real_name = $fields[4];
print "$username\n";
print "$real_name\n";

This is longer and I think less clear.

A slightly better way is to use an array slice:

my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh";
my @fields = split /:/, $str;
my ($username, $real_name) = @fields[0, 4];
print "$username\n";
print "$real_name\n";

Please note, in the array slice @fields[0, 4]; we have a leading @ and not a leading $.

If we are really only interested in the elements 0 and 4, the we could use array slice on the fly:

Slice on the fly

my $str = "root:*:0:0:System Administrator:/var/root:/bin/sh";
my ($username, $real_name) = (split /:/, $str)[0, 4];
print "$username\n";
print "$real_name\n";

Here we don't build an array, but as we put the whole expression in parentheses, we can put an index on them and fetch only elements 0 and 4 from the temporary (and invisible) array that was created for us: (split /:/, $str)[0, 4]

Split on more complex regex

The separator of split is a regex. So far in the examples we used the very simple regex / / matching a single space. We can use any regex: For example if we have strings that look like these:

fname    = Foo
lname =    Bar
email=foo@bar.com

We want to split where the = sign and disregard the spaces around it. We can use the following line:

my ($key, $value) = split /\s*=\s*/, $str

This will include any white-space character around the = sign in the part that cuts the pieces.

Split on multiple characters

For example we might have a string built up from pairs concatenated with &. The two parts of each pair is separated by =.

use Data::Dumper qw(Dumper);

my $str = 'fname=Foo&lname=Bar&email=foo@bar.com';
my @words = split /[=&]/, $str;
print Dumper \@words;

$VAR1 = [
          'fname',
          'Foo',
          'lname',
          'Bar',
          'email',
          'foo@bar.com'
        ];

Of course, if we know these are key-value pairs, then we might want to assign the result to a hash instead of an array:

use Data::Dumper qw(Dumper);

my $str = 'fname=Foo&lname=Bar&email=foo@bar.com';
my %user = split /[=&]/, $str;
print Dumper \%user;

And the result looks much better:

$VAR1 = {
          'fname' => 'Foo',
          'email' => 'foo@bar.com',
          'lname' => 'Bar'
        };

Split on empty string

Splitting on the empty string, or empty regex, if you wish is basically saying "split at every place where you find an empty string". Between every two characters there is an empty string so splitting on an empty string will return the original string cut up to individual characters:

use Data::Dumper qw(Dumper);

my $str = "Hello World";
my @chars = split //, $str;

print Dumper \@chars;

$VAR1 = [
          'H',
          'e',
          'l',
          'l',
          'o',
          ' ',
          'W',
          'o',
          'r',
          'l',
          'd'
        ];

Including trailing empty fields

By default split will exclude any fields at the end of the string that are empty. However you can pass a 3rd parameter to be -1. If the 3rd parameter is a positive number it limits the number of fields returned. When it is -1, it instructs split to include all the fields. Even the trailing empty fields.

examples/split_empty_trailing.pl

use strict;
use warnings;
use 5.010;
use Data::Dumper qw(Dumper);

say Dumper [split /;/, ";a;b;c"];
say Dumper [split /;/, ";a;b;c;"];
say Dumper [split /;/, ";a;b;c;;"];

say Dumper [split/;/, ";a;b;c;;", -1];

$VAR1 = [
          '',
          'a',
          'b',
          'c'
        ];

$VAR1 = [
          '',
          'a',
          'b',
          'c'
        ];

$VAR1 = [
          '',
          'a',
          'b',
          'c'
        ];

$VAR1 = [
          '',
          'a',
          'b',
          'c',
          '',
          ''
        ];

Beware of regex special characters

A common pitfall with split, especially if you use a string as the separator (split STRING, STRING) as in split ';', $line; is that even if you pass the first parameters as a string it still behaves as a regex. So for example

split '|', $line;

is the same as

split /|/, $line;

and both will split the string character by character. The right way to split on a pipe | character is to escape the special regex character:

split /\|/, $line;

Other examples

Though in the general case split is not the right tool for this job, it can be employed for reading simple CSV files. Check that article for much better ways to read a CSV or TSV file.

It is also a critical part of the example showing how to count words in a text file.

Another special case helps to retain the separator or parts of it.

Prev Next

Written by
Gabor Szabo

Published on 2013-12-15

Comments

In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.

comments powered by Disqus

If you have any comments or questions, feel free to post them on the source of this page in GitHub. Source on GitHub. Comment on this post

Author: Gabor Szabo

Gabor who runs the Perl Maven site helps companies set up test automation, CI/CD Continuous Integration and Continuous Deployment and other DevOps related systems.

Gabor can help refactor your old Perl code-base.

He runs the Perl Weekly newsletter.

Contact Gabor if you'd like to hire his service.

Buy his eBooks or if you just would like to support him, do it via Patreon.