A poor man’s Caesar cipher in Perl

Updated 2014 May 7th

If you're looking for a poor man's version of the Caesar cipher in Perl, then here it is. Why is it the poor man's version? First, I'm a PhD student and second, I wrote the code.

#!/usr/bin/perl

#A poor man's Caesar cipher in Perl
#written poorly by Dave Tang

use strict;
use warnings;

#usage
my $usage = "Usage: $0 <encrypt|decrypt> <number> <infile>\n";
my $type = shift or die $usage;
my $number = shift or die $usage;
my $infile = shift or die $usage;

#contains "a b c d e f g h i j k l m n o p q r s t u v w x y z"
my @first = map { chr } ord ( 'a') .. ord ( 'z' );
#contains "b c d e f g h i j k l m n o p q r s t u v w x y z a"
my @second = map { chr } ord ( 'b') .. ord ( 'z' );
push(@second,'a');

#contains "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
my @first_cap = map { chr } ord ( 'A') .. ord ( 'Z' );
#contains "B C D E F G H I J K L M N O P Q R S T U V W X Y Z A"
my @second_cap = map { chr } ord ( 'B') .. ord ( 'Z' );
push(@second_cap,'A');

#%key is a hash mapping the alphabet
#to a shifted alphabet
my %key = ();
if ($type =~ /^e/i){
   for(my $i=0;$i<scalar(@first);++$i){
      $key{$first[$i]} = $second[$i];
   }
   for(my $i=0;$i<scalar(@first_cap);++$i){
      $key{$first_cap[$i]} = $second_cap[$i];
   }
} elsif ($type =~ /^d/i){
   for(my $i=0;$i<scalar(@first);++$i){
      $key{$second[$i]} = $first[$i];
   }
   for(my $i=0;$i<scalar(@first_cap);++$i){
      $key{$second_cap[$i]} = $first_cap[$i];
   }
} else {
   die "Unrecognised type: please enter encrypt or decrypt\n";
}

#to check what's inside %key
#when using encrypt
#use Data::Dumper;
#print Dumper \%key;
#$VAR1 = {
#          'S' => 'T',
#          'T' => 'U',
#          'a' => 'b',
#          'N' => 'O',
#          'K' => 'L',
#          'd' => 'e',
#          'Y' => 'Z',
#          'E' => 'F',
#          'j' => 'k',

#read input file
my @infile = ();
open(IN,'<',$infile) || die "Could not open $infile: $!\n";
while(<IN>){
   chomp;
   push(@infile,$_);
}
close(IN);

#now here's why this code is poor
#I perform the substitution $number of times
#to achieve the shifted outcome
my @mod = ();
for (1 .. $number){
   @mod = ();
   foreach my $line (@infile){
      my $length = length($line);
      my $mod = '';
      for(my $i=0;$i<$length;++$i){
         my $original = substr($line,$i,1);
         #if the letter is not from the alphabet
         #leave it alone
         if ($original =~ /[^a-zA-Z]/){
            $mod .= $original;
            next;
         } else {
            my $replaced = $key{$original};
            $mod .= $replaced;
         }
      }
      push(@mod,$mod);
   }
   @infile = @mod;
}

foreach my $line (@mod){
   print "$line\n";
}

exit(0);

To test it, I've created a file called test.txt, which contains this text (without the quotations):

"The quick brown fox jumps over the lazy dog."

#running the code without input parameters
#shows the usage
caesar_cipher.pl 
Usage: ./caesar_cipher.pl <encrypt|decrypt> <number> <infile>

#encrypt
caesar_cipher.pl e 5 test.txt > test_encrypted.txt

cat test_encrypted.txt 
Ymj vznhp gwtbs ktc ozrux tajw ymj qfed itl.

#returns the same result
caesar_cipher.pl e 31 test.txt
Ymj vznhp gwtbs ktc ozrux tajw ymj qfed itl.

#decrypt
caesar_cipher.pl d 5 test_encrypted.txt 
The quick brown fox jumps over the lazy dog.

A better version

The original code is bad because if we run it like so:

caesar_cipher.pl e 500000000 test.txt

it would take a long time. Here's a better version, which shifts the alphabet instead of using a loop:

#!/usr/bin/perl

#A better version of a poor man's Caesar cipher in Perl
#written by Dave Tang

use strict;
use warnings;

my $usage = "Usage: $0 <encrypt|decrypt> <number> <infile>\n";
my $type = shift or die $usage;
my $number = shift or die $usage;
my $infile = shift or die $usage;

$number = $number % 26;

my @lc_alphabet = ('a'..'z');
my @uc_alphabet = ('A'..'Z');

my @first = map { chr } ord ( 'a' ) .. ord ( 'z' );
my @second = map { chr } ord ( $lc_alphabet[$number] ) .. ord ( 'z' );
push(@second, @lc_alphabet[0..$number-1]);

my @first_cap = map { chr } ord ( 'A' ) .. ord ( 'Z' );
my @second_cap = map { chr } ord ( $uc_alphabet[$number] ) .. ord ( 'Z' );
push(@second_cap, @uc_alphabet[0..$number-1]);

my %key = ();
if ($type =~ /^e/i){
   for(my $i=0;$i<scalar(@first);++$i){
      $key{$first[$i]} = $second[$i];
   }
   for(my $i=0;$i<scalar(@first_cap);++$i){
      $key{$first_cap[$i]} = $second_cap[$i];
   }
} elsif ($type =~ /^d/i){
   for(my $i=0;$i<scalar(@first);++$i){
      $key{$second[$i]} = $first[$i];
   }
   for(my $i=0;$i<scalar(@first_cap);++$i){
      $key{$second_cap[$i]} = $first_cap[$i];
   }
} else {
   die "Unrecognised type: please enter encrypt or decrypt\n";
}
my @infile = ();
open(IN,'<',$infile) || die "Could not open $infile: $!\n";
while(<IN>){
   chomp;
   push(@infile,$_);
}
close(IN);

my @mod = ();
foreach my $line (@infile){
   my $length = length($line);
   my $mod = '';
   for(my $i=0;$i<$length;++$i){
      my $original = substr($line,$i,1);
      if ($original =~ /[^a-zA-Z]/){
         $mod .= $original;
         next;
      } else {
         my $replaced = $key{$original};
         $mod .= $replaced;
      }
   }
   push(@mod,$mod);
}

foreach my $line (@mod){
   print "$line\n";
}

exit(0);

Now if we run:

caesar_cipher.pl e 500000000 test.txt 
Znk waoiq hxuct lud pasvy ubkx znk rgfe jum.

it will return the encrypted text instantaneously.

Decrypting

Recently there was this tweet:

Is it simply text encrypted by a Caesar cipher? I've saved the text inside a file called test.txt:

cat test.txt 
tpfccdlfdtte pcaccplircdt dklpcfrp?qeiq lhpqlipqeodf gpwafopwprti izxndkiqpkii krirrifcapnc dxkdciqcafmd vkfpcadf.

for i in {1..26}; do caesar_cipher.pl d $i test.txt; done
soebbckecssd obzbbokhqbcs cjkobeqo?pdhp kgopkhopdnce fovzenovoqsh hywmcjhpojhh jqhqqhebzomb cwjcbhpbzelc ujeobzce.
rndaabjdbrrc nayaanjgpabr bijnadpn?ocgo jfnojgnocmbd enuydmnunprg gxvlbigonigg ipgppgdaynla bvibagoaydkb tidnaybd.
qmczzaicaqqb mzxzzmifozaq ahimzcom?nbfn iemnifmnblac dmtxclmtmoqf fwukahfnmhff hofoofczxmkz auhazfnzxcja shcmzxac.
plbyyzhbzppa lywyylhenyzp zghlybnl?maem hdlmhelmakzb clswbklslnpe evtjzgemlgee gnennebywljy ztgzyemywbiz rgblywzb.
okaxxygayooz kxvxxkgdmxyo yfgkxamk?lzdl gcklgdklzjya bkrvajkrkmod dusiyfdlkfdd fmdmmdaxvkix ysfyxdlxvahy qfakxvya.
njzwwxfzxnny jwuwwjfclwxn xefjwzlj?kyck fbjkfcjkyixz ajquzijqjlnc ctrhxeckjecc elcllczwujhw xrexwckwuzgx pezjwuxz.
miyvvweywmmx ivtvviebkvwm wdeivyki?jxbj eaijebijxhwy ziptyhipikmb bsqgwdbjidbb dkbkkbyvtigv wqdwvbjvtyfw odyivtwy.
lhxuuvdxvllw husuuhdajuvl vcdhuxjh?iwai dzhidahiwgvx yhosxghohjla arpfvcaihcaa cjajjaxushfu vpcvuaiusxev ncxhusvx.
kgwttucwukkv gtrttgczituk ubcgtwig?hvzh cyghczghvfuw xgnrwfgngikz zqoeubzhgbzz biziizwtrget uobutzhtrwdu mbwgtruw.
jfvsstbvtjju fsqssfbyhstj tabfsvhf?guyg bxfgbyfguetv wfmqvefmfhjy ypndtaygfayy ahyhhyvsqfds tnatsygsqvct lavfsqtv.
ieurrsausiit erprreaxgrsi szaeruge?ftxf awefaxeftdsu velpudelegix xomcszxfezxx zgxggxurpecr smzsrxfrpubs kzuerpsu.
hdtqqrztrhhs dqoqqdzwfqrh ryzdqtfd?eswe zvdezwdescrt udkotcdkdfhw wnlbrywedyww yfwffwtqodbq rlyrqweqotar jytdqort.
gcsppqysqggr cpnppcyvepqg qxycpsec?drvd yucdyvcdrbqs tcjnsbcjcegv vmkaqxvdcxvv xeveevspncap qkxqpvdpnszq ixscpnqs.
fbroopxrpffq bomoobxudopf pwxbordb?cquc xtbcxubcqapr sbimrabibdfu uljzpwucbwuu wduddurombzo pjwpoucomryp hwrbompr.
eaqnnowqoeep anlnnawtcnoe ovwanqca?bptb wsabwtabpzoq rahlqzahacet tkiyovtbavtt vctcctqnlayn oivontbnlqxo gvqanloq.
dzpmmnvpnddo zmkmmzvsbmnd nuvzmpbz?aosa vrzavszaoynp qzgkpyzgzbds sjhxnusazuss ubsbbspmkzxm nhunmsamkpwn fupzmknp.
cyollmuomccn yljllyuralmc mtuyloay?znrz uqyzuryznxmo pyfjoxyfyacr rigwmtrzytrr taraaroljywl mgtmlrzljovm etoyljmo.
bxnkkltnlbbm xkikkxtqzklb lstxknzx?ymqy tpxytqxymwln oxeinwxexzbq qhfvlsqyxsqq szqzzqnkixvk lfslkqykinul dsnxkiln.
awmjjksmkaal wjhjjwspyjka krswjmyw?xlpx sowxspwxlvkm nwdhmvwdwyap pgeukrpxwrpp rypyypmjhwuj kerkjpxjhmtk crmwjhkm.
zvliijrljzzk vigiivroxijz jqrvilxv?wkow rnvwrovwkujl mvcgluvcvxzo ofdtjqowvqoo qxoxxoligvti jdqjiowiglsj bqlvigjl.
yukhhiqkiyyj uhfhhuqnwhiy ipquhkwu?vjnv qmuvqnuvjtik lubfktubuwyn necsipnvupnn pwnwwnkhfush icpihnvhfkri apkuhfik.
xtjgghpjhxxi tgeggtpmvghx hoptgjvt?uimu pltupmtuishj ktaejstatvxm mdbrhomutomm ovmvvmjgetrg hbohgmugejqh zojtgehj.
wsiffgoigwwh sfdffsolufgw gnosfius?thlt okstolsthrgi jszdirszsuwl lcaqgnltsnll nuluulifdsqf gangfltfdipg ynisfdgi.
vrheefnhfvvg receernktefv fmnrehtr?sgks njrsnkrsgqfh irychqryrtvk kbzpfmksrmkk mtkttkhecrpe fzmfeksechof xmhrecfh.
uqgddemgeuuf qdbddqmjsdeu elmqdgsq?rfjr miqrmjqrfpeg hqxbgpqxqsuj jayoeljrqljj lsjssjgdbqod eyledjrdbgne wlgqdbeg.
tpfccdlfdtte pcaccplircdt dklpcfrp?qeiq lhpqlipqeodf gpwafopwprti izxndkiqpkii krirrifcapnc dxkdciqcafmd vkfpcadf.

OK, so it wasn't simply using a shifted key. If we are still convinced that the text is encrypted by a substitution cipher, we can build up random keys and check the outcome against a word list (on Linux there are files with a bunch of words in them; on Ubuntu it's the /etc/dictionaries-common/words file and on RHEL, it's the /usr/share/dict/words file).

#!/usr/bin/perl

use strict;
use warnings;

my $usage = "Usage: $0 <infile>\n";
my $infile = shift or die $usage;

my %dict = read_dict();
my @infile = read_infile($infile);

my $score = 0;
my $threshold = 3;
my $word_size = 4;
my $iteration = 1;

while($score <= $threshold){
   $score = 0;

   my %key = make_key();

   my @mod = ();
   foreach my $line (@infile){
      $line = lc($line);
      my $length = length($line);
      my $mod = '';
      for(my $i=0; $i<$length; ++$i){
         my $original = substr($line,$i,1);
         if ($original =~ /[^a-zA-Z]/){
            $mod .= $original;
            next;
         } else {
            my $replaced = $key{$original};
            $mod .= $replaced;
         }
      }
      push(@mod,$mod);
   }

   foreach my $line (@mod){
      my @word = ();
      $line =~ s/\s+//g;
      for (my $i=0; $i<=length($line)-$word_size; $i++){
         my $pword = substr($line, $i, $word_size);
         if (exists $dict{$pword}){
            push(@word, $pword);
            ++$score;
         }
      }
      print "$iteration\t$score\n";
      ++$iteration;
      if ($score > $threshold){
         use Data::Dumper;
         print "@word\n$line\n";
         print Dumper \%key;
      }
   }
}

exit(0);

#adopted from the Perl cookbook
sub shuffle {
   my $array = shift;
   my @array = @$array;
   for (my $i = scalar(@array)-1; $i>0; --$i) {
      my $j = int rand ($i);
      next if $i == $j;
      @array[$i,$j] = @array[$j,$i];
   }
   return(@array);
}

sub make_key {
   my @lc_alphabet = ('a'..'z');
   my @uc_alphabet = ('A'..'Z');
   my @number = (0..25);
   my @number_shuf = shuffle(\@number);
   my %key = ();
   for(my $i=0; $i<scalar(@number_shuf); ++$i){
      $key{$lc_alphabet[$number_shuf[$i]]} = $lc_alphabet[$number[$i]];
   }
   for(my $i=0; $i<scalar(@number_shuf); ++$i){
      $key{$uc_alphabet[$number_shuf[$i]]} = $uc_alphabet[$number[$i]];
   }
   return(%key);
}

sub read_infile {
   my ($infile) = @_;
   my @infile = ();
   open(IN, '<', $infile) || die "Could not open $infile: $!\n";
   while(<IN>){
      chomp;
      push(@infile,$_);
   }
   close(IN);
   return(@infile);
}

sub read_dict {
   #my $dict = '/etc/dictionaries-common/words';
   #my $dict = '/usr/share/dict/words';
   my $dict = 'my_dict.txt';
   my %dict = ();
   open(IN, '<', $dict) || die "Could not open $dict: $!\n";
   while(<IN>){
      chomp;
      my $word = $_;
      $word = lc($word);
      $dict{$word} = 1;
   }
   close(IN);
   return(%dict);
}

__END__

I created a text file, called first.txt, that contains the first part of the tweet before the question mark:

cat first.txt 
tpfccdlfdtte pcaccplircdt dklpcfrp

#run script
time check_cipher.pl first.txt

How many combinations?

Let's use R to determine how many key combinations:

#install if necessary
install.packages('gplots')
library(gtools)

#stick with 9 or smaller numbers
#or risk hanging your machine
for (my_number in 3:9){
  my_letter <- letters[1:my_number]
  my_perm <- permutations(n=my_number,r=my_number,v=my_letter)
  
  for (i in 1:my_number){
    my_perm <- my_perm[my_perm[,i]!=my_letter[i],]
  }
  print(paste(my_number, nrow(my_perm)))
}

[1] "3 2"
[1] "4 9"
[1] "5 44"
[1] "6 265"
[1] "7 1854"
[1] "8 14833"
[1] "9 133496"

This number series is a derangement:

In combinatorial mathematics, a derangement is a permutation of the elements of a set such that none of the elements appear in their original position.

The formula for calculating the number of derangements:

$$!n! \sum^{n}_{k=0} \frac{(-1)^k}{k!}$$

I implemented the derangement formula in R:

derangement <- function(n){
  (factorial(n)) * (sum((-1)^(0:n)/factorial(0:n)))
}

for (i in 1:10){
  print(derangement(i))
}

[1] 0
[1] 1
[1] 2
[1] 9
[1] 44
[1] 265
[1] 1854
[1] 14833
[1] 133496
[1] 1334961

derangement(26)
[1] 1.483626e+26

So blindly brute forcing my way to solving the tweet is not very feasible.

Conclusions

I wrote this post a while ago without much thought. Upon revisiting this post, I improved the code and thought about decrypting text that has been encrypted with a substitution cipher. I had no previous knowledge of the permutation known as derangement, as I found this out by thinking about how many keys would be possible using some grassroots coding with R and by googling the number series. I came to the conclusion that without smarter code, I won't be able to decrypt text just by randomly trying different substitution keys.

See also

quipqiup is a fast and automated cryptogram solver (although it couldn't solve "Znk waoiq hxuct lud pasvy ubkx znk rgfe jum.", it solved the tweet).




Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.