Disk space is cheaper these days but here's one way of using less disk space by working directly with gzipped files. Here's a very straight forward example of Perl code that opens a gzipped file and outputs a gzipped file.
#!/usr/bin/perl use strict; use warnings; my $infile = 'test.txt.gz'; #the three argument open is the preferred way open(IN,'-|',"gunzip -c $infile") || die "Could not open $infile: $!\n"; my $outfile = 'test.out.gz'; open(OUT,'|-',"gzip >$outfile") || die "Could not gzip $outfile: $!\n"; while(<IN>){ chomp; print OUT "$_\n"; } close(IN); close(OUT); exit(0); __END__
And here's some other code that just counts the number of lines in a file, when gzipped and when it is not gzipped.
#!/usr/bin/perl use strict; use warnings; my $infile = 'big_whoop'; open(IN,'<',$infile) || die "Could not open $infile: $!\n"; #my $infile = 'big_whoop.gz'; #open(IN,'-|',"gunzip -c $infile") || die "Could not open $infile: $!\n"; my $line_count = '0'; while(<IN>){ chomp; ++$line_count; } close(IN); print "$line_count\n"; exit(0); __END__
Using time, the difference between working with a gzipped and not gzipped file when counting ~7.3 million lines:
Gzipped result:
7320248
real 0m3.725s
user 0m5.313s
sys 0m0.844s
Not gzipped:
7320248
real 0m2.481s
user 0m2.151s
sys 0m0.328s

This work is licensed under a Creative Commons
Attribution 4.0 International License.