There are cases when it might be quite important to know how much each variable in Perl uses. For this Devel::Size module provides two functions. Both size and total_size accept a reference to a variable or a data structure. The difference between them is that in complex data structures (aka. arrays and hashes), size only returns the memory used by the structure, not by the data.

There are a few more caveats pointing out some differences between the memory Perl asked for, what Devel::Size can report, and what the operating system has actually allocated. If interested, there is a nice explanation in the documentation of Devel::Size

The following script tries to show some basic values:

examples/memory_of_variables.pl

use strict;
use warnings;
use 5.010;

use Devel::Size qw(size total_size);

my $x;
my @y;
my %z;

say '                           size  total_size';
both('SCALAR', \$x);        #    24    24
both('ARRAY',  \@y);        #    64    64
both('HASH',   \%z);        #   120   120
both('CODE', sub {} );      #  8452  8452
say '';

both('SCALAR', \$x);        #  24    24
$x = 'x';
both('SCALAR-1', \$x);      #  56    56
$x = 'x' x 15;
both('SCALAR-15', \$x);     #  56    56
$x = 'x' x 16;
both('SCALAR-16', \$x);     #  72    72
$x = 'x' x 31;
both('SCALAR-31', \$x);     #  72    72
$x = 'x' x 32;
both('SCALAR-32', \$x);     #  88    88
$x = '';
both('SCALAR=""', \$x);     #  88    88
$x = undef;
both('SCALAR=undef', \$x);  #  88    88
undef $x;
both('undef SCALAR', \$x);  #  40    40
say '';

both('ARRAY',  \@y);               #    64    64
@y = ('x');
both('ARRAY-1', \@y);              #    96   152
@y = ('x' x 15);
both('ARRAY-15', \@y);             #    96   152
@y = ('x' x 16);
both('ARRAY-16', \@y);             #    96   168
@y = ('x' x 31);
both('ARRAY-31', \@y);             #    96   168
@y = ('x' x 32);
both('ARRAY-32', \@y);             #    96   184
@y = ('x') x 2;
both('ARRAY-1-1', \@y);            #    96   208
@y = ('x') x 4;
both('ARRAY-1-1-1-1', \@y);        #    96   320
@y = ('x') x 5;
both('ARRAY-1-1-1-1-1', \@y);      #   104   384
@y = ('x') x 6;
both('ARRAY-1-1-1-1-1-1', \@y);    #   112   448
@y = ('x') x 7;
both('ARRAY-1-1-1-1-1-1-1', \@y);  #   128   520
@y = ();
both('ARRAY = ()', \@y);           #   128   128
undef @y;
both('undef ARRAY', \@y);          #    64    64
say('');

both('HASH',   \%z);                       #  120   120
%z = ('x' => undef);
both('HASH x => undef',   \%z);            #  179   203
%z = ('x' => "x");
both('HASH x => "x"',   \%z);              #  179   235
%z = ('x' x 10 => "x" x 20);
both('HASH "x" x 10 => "x" x 20',   \%z);  #  188   260
for my $c (qw(a b c d e f g h i)) {
    $z{$c x 10} = $c x 20;
}
both('HASH 10 * 10 + 10 * 20',   \%z);     #  864  1584
%z = ();
both('HASH=()',   \%z);                    #  184   184
undef %z;
both('undef HASH',   \%z);                 #  120   120
my $o = bless \%z,'Some::Very::Long::Class::Name::That::Probably::Noone::Uses';
both('blessed HASH', $o);                  #  120   120
say('');

both('CODE', sub {}  );                   #  8516  8516
both('CODE2', sub { my $w }  );           #  8612  8612
both('CODE3', sub { my $w = 'a' }  );     #  8820  8820

sub both {
    my ($name, $ref) = @_;
    printf "%-25s %5d %5d\n", $name, size($ref), total_size($ref);
}

The environment

These results were generated on 64 bit OSX, running perl 5.18.2 using Devel::Size 0.79. (BTW I got almost the same results when I ran the script on 5.18.1, except that the values for CODE-references were 8 bytes smaller.)

Some observations

The size of code-references look huge. I wonder if those number are correct.

Strangely bless does not change the size of the reference. Or at least, it is not reported.

Memory is allocated in 16 byte chunks for strings. Hence the memory used by a 1-character long string is the same as used by a 15-character long string.

Neither setting the string to the empty string ($x = '';), nor assigning undef to it ($x = undef;) reduced the memory usage. I had to call undef $x; for that. Even then it went back only to 40, instead of the original 24.

In arrays, every element uses 8 bytes + memory allocated to the scalar container + the data.

Setting @y = (); eliminated the memory allocation of the date (or at least total_size does not show it any more) Calling undef @y; also freed the memory allocated to the structure.

In hashes it's even more complex. I won't attempt to describe it. The documentation of Devel::Size has some explanation.

The actual results look like this

                          size    total_size
SCALAR                       24    24
ARRAY                        64    64
HASH                        120   120
CODE                       8452  8452

SCALAR                       24    24
SCALAR-1                     56    56
SCALAR-15                    56    56
SCALAR-16                    72    72
SCALAR-31                    72    72
SCALAR-32                    88    88
SCALAR=""                    88    88
SCALAR=undef                 88    88
undef SCALAR                 40    40

ARRAY                        64    64
ARRAY-1                      96   152
ARRAY-15                     96   152
ARRAY-16                     96   168
ARRAY-31                     96   168
ARRAY-32                     96   184
ARRAY-1-1                    96   208
ARRAY-1-1-1-1                96   320
ARRAY-1-1-1-1-1             104   384
ARRAY-1-1-1-1-1-1           112   448
ARRAY-1-1-1-1-1-1-1         128   520
ARRAY = ()                  128   128
undef ARRAY                  64    64

HASH                        120   120
HASH x => undef             179   203
HASH x => "x"               179   235
HASH "x" x 10 => "x" x 20   188   260
HASH 10 * 10 + 10 * 20      864  1584
HASH=()                     184   184
undef HASH                  120   120
blessed HASH                120   120

CODE                       8516  8516
CODE2                      8612  8612
CODE3                      8820  8820

Comments

Hi Gabor,

I am running a script, i am creating a string containing 20 thousand records to be written in xml file in each iteration of loop. Memory consumption is increasing even after using undef variable holding string. Is there any way to release memory after use the variable in each iteration.

---

In general when you assign new data to a varaible the old memory is reused so the problem might be elsewhere. Without seeing your code it is quite impossible to point to the problem.

---

my $start=1;my $end =300000; ($succesMsg) = &fetch_data($start,$end,$total_count,$dbh); ($msg) = &disconnect_db($dbh);

sub fetch_data { my $start = shift; my $end = shift; my $total_count = shift; my $dbh = shift; @Data_arr1 = (); if($dbh) { my $sqlsplit="select * from table where rn between $start and $end"; my $sthsplit = $dbh->prepare($sqlsplit); if($sthsplit->execute()) { while (my $hasplit = $sthsplit->fetchrow_hashref) { $global_record++; push(@Data_arr1,$hasplit); } ($succesMsg) = &splitquery(\@Data_arr1,$total_count,$global_record,$dbh,$start,$end); } } else { &print_oracle_error(__FILE__,__LINE__,"Cant connect to database","","$DBI::errstr"); } }

sub splitquery { my $arr_ref=shift; my $total_count=shift; my $global_record=shift; my $dbh = shift; my $start=shift; my $end=shift; my $final_xml = ''; my $final_xml1 = ''; @data_arr = @$arr_ref;

foreach my $ha (@data_arr) { $counter++; $record++; $fullpath = qq~https://somepath.html~; $xml = qq~$fullpath~; $xml .= qq~$largeimagePath$title~; $xml .= qq~\n~; $final_xml .= $xml; undef($xml); if($counter==20000) { $j++; $counter=0; $final_xml1=qq~\n\n$final_xml~; $file_name="some dynamic filePath";

unless(open (WRITE,">$file_name")) { my $log_message = "\nFailed To Open File \n$file_name \nAt LINE: ".__LINE__ ."\nIn FILE:".__FILE__; exit; } print WRITE $final_xml1; close(WRITE); undef($final_xml1); undef($final_xml); sleep(1); } } undef(@data_arr); undef($arr_ref); if($record==300000 && $global_record < $total_count) { $start= $global_record+1; $end = $global_record+300000; &fetch_data($start,$end,$total_count,$dbh); } else { #send mail }

}

--- This code runs for around 20000000 of data, i am fetching 300000 data in one iteration and writing that in xml file. Main issue is with $final_xml variable which is getting appended some text in each iteration.

---

One more thing, i am really very glad with your so fast response.

----

You don't need all those calls to undef and setting arrays to (), It is better to re-declare them using my in every iteration and/or in every call the the function. That will take care of reusing the memory.

It seems splitquery and fetch_date call each other, but it is unclear to me why? Why not only one calls the other? Isn't that recursion the source of the memory leak?

Also I'd put use strict and use warnings at the top of the code and clean up any errors/warnings. That can help track down issues. Calling $sthsplit->finish at the end of the fetch_data function might also help in case your version of the Oracle driver has some issues.

--- I had not sent you whole script, in complete script i am using strict and warnings. There are no warnings in code. Actually i am permitted to fetch only 3,00,000 data in one go so i am using recursion to fetch 3,00,000 data and create 15 xml files using that data, each xml file contains 20,000 records. There are 1,60,00,000 records in table. $final_xml variable is defined outside of the loop and script is appending text in each iteration till 20,000 records. This variable is not releasing memory due to which script is eating up all the server memory(6 GB) in no time.

--- If you don't paste the whole script then how do you expect me to understand it? Anyway, if you keep appending to $final_xml then why are you surprised it keeps growing? Maybe you need to write the partial xml out in chuncks or with a SAX XML creator. But then again, do you really want to create a file that is so big? What will be able to read it? --- Please excuse me for this, I had sent that part of script which is causing memory leakage. This script is used to create sitemap for a website, We store 20k records in each sitemap(xml) file, google crawlers use this to index. I keep on appending some string(e.g. URLs, images) to $final_xml variable for 20K records and then i write it to xml file and want to release this memory to be used in same process for next iteration of $final_xml variable to write in another xml file. Thus we are making around 900 xml files. Please let me know if my question is not clear. --- It is now a bit clearer. I don't see any reason for the recursion here, you could remove that and that might automatically fix the issue. Besides that I don't have any more ideas.


I am planning to store 5GB data in array and then start processing . Just wanted to know is there any limitation for storing in the array . --- AFAIK, just the size of the memory in your computer.

--- Hello, Thanks for replying .There are total 6 files having 500MB , while reading those and storing into an array its getting out of memory . Is there any way to solve this .

--- How much memory does your computer have? How much is free when you start running your program? Which Operating System are you running on? --- Its a linux 64 bit .. but not sure how much is free while running the program . --- What is the result of the

free -h

command? ---

its 64 GB and 59 GB is free. ---

Then it should work.

Try loading one file and one it is loaded while the program is still running check how much memory do you have then. That will help you see if a single file can be loaded and how much it really uses. You did not say what format is your file and how do you store the data in memory? Can you paste a snippet of your code?