Saving disk space with Perl

Disk space is cheaper these days but here's one way of using less disk space by working directly with gzipped files. Here's a very straight forward example of Perl code that opens a gzipped file and outputs a gzipped file.


#!/usr/bin/perl

use strict;
use warnings;

my $infile = 'test.txt.gz';
#the three argument open is the preferred way
open(IN,'-|',"gunzip -c $infile") || die "Could not open $infile: $!\n";

my $outfile = 'test.out.gz';
open(OUT,'|-',"gzip >$outfile") || die "Could not gzip $outfile: $!\n";

while(<IN>){
   chomp;
   print OUT "$_\n";
}
close(IN);
close(OUT);

exit(0);

__END__

And here's some other code that just counts the number of lines in a file, when gzipped and when it is not gzipped.

#!/usr/bin/perl

use strict;
use warnings;

my $infile = 'big_whoop';
open(IN,'<',$infile) || die "Could not open $infile: $!\n";

#my $infile = 'big_whoop.gz';
#open(IN,'-|',"gunzip -c $infile") || die "Could not open $infile: $!\n";

my $line_count = '0';
while(<IN>){
   chomp;
   ++$line_count;
}
close(IN);

print "$line_count\n";

exit(0);

__END__

Using time, the difference between working with a gzipped and not gzipped file when counting ~7.3 million lines:

Gzipped result:

7320248

real 0m3.725s
user 0m5.313s
sys 0m0.844s

Not gzipped:

7320248

real 0m2.481s
user 0m2.151s
sys 0m0.328s




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
One comment Add yours

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.