devig.pl - Attempts to break 'le chiffre indechiffrable'.
devig.pl [ --help ] [ --file <filename> ] [ --charts ] [ --html ]
devig.pl, given input of plaintext encoded in some fashion of the vigenere cipher, attempts to 'best-guess' the key length, the key, and the decrypted text.
devig.pl scans the input text for repeating strings, indicating that (with luck) the same text has been encoded the same way (with the same letters of the key) at least a few times. Once devig.pl finds repeating sequences, it calculates the greatest common factor of the numbers (of all sequences) and assumes that number to be the key length.
Having calculated the key length (N), it takes every Nth character of the ciphertext, plots a letter distribution graph, and compares that to the standard letter distribution frequency chart for similarity. devig.pl rotates through graph permutations until the graph for the given iteration matches as closely as possible, and assumes that to be the index key letter for a given offset.
--help--file filename--charts--html
perl devig.pl [ options ] --html > outputfile.html
The Vigenere cipher, known by some as 'le chiffre indechiffrable' (French for 'the indecipherable cipher') is a method of encryption that uses a series of different Caesar ciphers based on the letters of a keyword, though there is some argument among cryptographic circles which incarnation of this particular polyalphabetic substitution can accurately be attributed to Blaise de Vigenere. The implementation that devig.pl attempts to break is the simpler form of a polyalphabetic substitution. To encipher, a table of alphabets (termed a tabula recta, or Vigenere table) is used. It consists of the alphabet written out 26 times in different rows, each alphabet shifted cyclicly to the left compared to the previous alphabet. Each letter of the plaintext is encrypted using the row from the Vigenere table corresponding to the next letter of the keyword.
Kent Cowgill - <http://www.kentcowgill.org/>
#!/usr/bin/perl use strict; use warnings;
use Getopt::Long; our $verbose = 0; our $charts = 0; our $file = '-'; our $html = 0; my %strseq; my $help = 0; my $result = GetOptions ( "help" => \$help, "html" => \$html, "verbose" => \$verbose, "charts" => \$charts, "file:s" => \$file,); if( $help ){ usage() && exit } my $fh; my $mykey; my $ciphertext; my $disttotal; my $chartincrementor = 0; # read in the ciphertext open( $fh, , $file ) or die "can't open file $file for deciphering - $!\n"; while( <$fh> ){ $ciphertext .= lc $_ } close( $fh ); $html && print html_header(); # build_charset creates a hash of string sequences that repeat for a given # length of string, in this case i'm interested in substrings from 3 to # 6 characters long. for my $length( 3 .. 6 ){ build_charset( $length, $ciphertext, \%strseq ) } # %disttable is a hash of a moderately sized standard letter frequency # distribution for english text. (out of roughly 1000 characters # of english text, 82 will be an 'a', 15 will be a 'b', etc.) my %disttable = qw( a 82 b 15 c 28 d 43 e 127 f 22 g 20 h 61 i 70 j 2 k 8 l 40 m 24 n 67 o 75 p 19 q 1 r 60 s 63 t 91 u 28 v 10 w 24 x 2 y 20 z 1 ); # generate a few key variables, first of which is a likely key length (N) for # a given ciphertext, based on finding the greatest common factor for the # intervals between repeating substrings, followed by the length of the # ciphertext, and how to transform the 1003 characters of the standard letter # frequency distribution into appropriate values for a text the size of # ciphertext. (each Nth character the ciphertext has been encoded by the # same key letter, and should have a letter frequency distribution similar # to the standard letter frequency distribution, but shifted X places.) my $keylen = get_likely_keylen( \%strseq ); my $totlen = length( $ciphertext ); my $len = int( $totlen / $keylen ); $disttotal += $disttable{$_} for keys %disttable; my $len_multi = $disttotal / $len / 100; # generate some nice display text. print boldline( "Cipher text for analysis:\n" ); print pre( $ciphertext ); print boldline( "\nLikely Key length: $keylen\n" ); print boldline( "Total length of ciphertext: $totlen\n" ); print boldline( "Total length of stand. freq. dist.: $disttotal\n" ); print boldline( "Length multiplier for graphing: $len_multi\n\n" ); print boldline( "Standard frequency distribution graph based " ."on $len (every ${keylen}'th) characters.\n\n" ); # show a graph of the standard letter frequency distribution for text the # length of every Nth character concatenated together. print disp_graph( build_graph( $len_multi, \%disttable ) ), "\n"; print $html ? "<hr>" : '='x52,"\n"; # this loop goes through every position N of the key length and finds the most # likely key letter based on rotating through permutations of a simple # representation of the letter frequency distribution of every Nth character # of the ciphertext. for my $offset( 0 .. ( $keylen - 1 ) ){ my $n = nth_char( $offset, $keylen ); my $len = length( $n ); print boldline( "Every ${keylen}'th char (offset: $offset): " ); print line( $n ); my %nth_dist = fill_dist( $n ); my @graph2 = build_graph( 1, \%nth_dist ); $mykey .= find_key( \@graph2, $len_multi, \%disttable ); if( $charts ){ print disp_graph( @graph2 ); print $html ? "<hr>" : '='x52,"\n" } } print boldline( "Likely key for cipher: $mykey\n" ); print $html ? "<hr>" : '='x52,"\n"; print boldline( "Likely plaintext:\n" ); # since this 'breaks' the encryption of text encrypted with possibly the # obfuscated implementation of the encryption algorithm, for ease in reversing # the encryption, i stole a few lines from the obfuscated implementation. # this just creates the tabula recta in the form of 26 arrays, @a = ( a, b, # c, etc. ); @b = ( b, c, d, etc. ); etc. but it doesn't run under strict :) no strict; @_ = ( 'a' .. 'z' ); map { @$_ = @_; if( $;++ ){ for $"( 2 .. $; ){ push @$_, shift@$_ } } } @_; use strict; # this just decrypts the ciphertext based on the likely key computed above. my $c = 0; my $subout; for( (split //, $ciphertext) ){ my $l = ( split //, $mykey )[$c]; no strict; my $s = join '', @$l; use strict; my $p = index( $s, $_ ); $subout .= $p >= 0 ? $main::a[$p] : $_; $c += $c < $keylen-1 ? 1 : -$keylen+1; } print pre( "$subout\n" ); $html && print html_footer(); exit; # when i was trying to come up with an obfuscated decryption program, i # posted a query to perlmonks.org to see how to accomplish what i was after # with a regex. japhy gave me closest to what i was looking for, # $string =~ m{(?{[pos]})([a-z]{3})(?:.*?(?{[@{$^R},pos]})\1)+(?{@loc=@{$^R}})}; # but TedPride suggested a much better alternative saving the pain of finding # multiple repeating strings sequences, which of course is what i wanted to do. sub build_charset { my( $length, $ciphertext, $strseq ) = @_; $$strseq{ substr( $ciphertext, $_, $length ) }++ for 0 .. ( length( $ciphertext ) - $length ) } # this handles finding the key letter for a given offset, and computing # likeliness percentages (for display purposes only). it cycles through # the alphabet, rotating the computed graph and figuring out which # rotation most closely matches the computed standard letter frequency # distribution graph for english text of a similar length. sub find_key{ my( $graph2find, $len_multi, $disttable ) = @_; my @comparator; for( 0 .. 25 ){ my $overlay_graph = overlay_graph( $graph2find, $len_multi, $disttable ); push @comparator, compare_graph( $overlay_graph, $$graph2find[0][0] ); swap_graph( $graph2find ) } my $accumulator; my @keys = map { my $percent = $accumulator > 0 ? int($_->[0]/$accumulator*100) : 0; $_->[1] . " ($percent\%)" } map { if( $_->[0] > 0 ){ $accumulator += $_->[0] } [ $_->[0], $_->[1] ] } sort { $b->[0] <=> $a->[0] } @comparator; print boldline( "Possible offset key in order of decreasing probability: " ); print line( join( ", ", @keys[0..5] ) . "\n\n" ); my $ret = $keys[0]; $ret =~ s/([a-z])\s\(.*/$1/; return $ret } # inelegantly (or if you're kind, elegantly ;) overlays one graph on another, # marking common graph elements with a '+', missing graph elements with a '-'. sub overlay_graph { my( $gr2, $len_multi, $disttable ) = @_; my @graph = build_graph( $len_multi, $disttable ); my $gr1 = \@graph; my $g1c = 0; for my $gr( @$gr2 ){ my $g2c=0; for my $gra( @$gr ){ my $graph1_char = $$gr1[$g1c][$g2c] || ''; my $graph2_char = $$gr2[$g1c][$g2c] || ''; if( $graph2_char eq '#' && $graph1_char eq '#' ){ $$gr1[$g1c][$g2c] = '+' } elsif( $graph2_char eq '#' && $graph1_char ne '#' ){ $$gr1[$g1c][$g2c] = '-' } $g2c++ } $g1c++ } return $gr1 } # succintly rotates the graph. sub swap_graph { my $graphref = shift; push @$graphref, shift @$graphref } # inelegantly (or if you're kind, elegantly ;) finds how highly similar # two graphs are when overlaid on top of each other. sub compare_graph { my( $graph, $which ) = @_; my $count = 0; for( @$graph ){ for( @$_ ){ /\+/ && $count++; /\-/ && $count-- } } return [ $count, $which ] } # creates the multi dimensional array to build the graph for later # comparison purposes sub build_graph { my( $len_multi, $letters ) = @_; my @graph; for( sort keys %$letters ){ push @graph, [ $_, '=', ( split //, '#' x int( $$letters{$_} * $len_multi ) ) ]; } return @graph } # returns either a plain text representation of a graph, or an # HTML representation. sub disp_graph { return $html ? html_graph( @_ ) : text_graph( @_ ); } # see comments for the text graph. the HTML/CSS chicanery here hides # the algorithm. sub html_graph { $chartincrementor++; my @graph = @_; my $offset = 15; my $incr = 15; my( $letter, $height, $ret ); my $increment = 0; ($ret=<<" EOR") =~ s/^ //g; <style type="text/css"> #chart$chartincrementor { display: block; position: relative; width:400; height: 230px; border: 1px solid gray; padding: 0; spacing: 0; } #chart$chartincrementor tr, #chart$chartincrementor td { position: absolute; bottom: 0; width: 15px; z-index:2; margin: 0; padding: 0; text-align: center; vertical-align: bottom; font: 11px Tahoma, Verdana, Arial, sans-serif } #chart$chartincrementor .letter { width: 15px; border: 1px solid; background-color: #999 } #chart$chartincrementor .letter p { margin: 0; padding: 0;} EOR for( @graph ){ $letter = shift @$_; $height = $offset * $#{$_} + 15; $ret .= "#chart$chartincrementor #$letter { left: ${increment}px; " ."height: ${height}px;}\n"; $increment += $incr; } $ret .= "</style><table id=\"chart$chartincrementor\" width=\"400\">" ."<tbody><tr>\n"; for( 'a' .. 'z' ){ $ret .= "<td class=\"letter\" id=\"$_\"><p>$_</p></td>\n" } $ret .= "</tr></tbody></table>\n"; return $ret } # very simply iterates backwards over the arrays starting from element # $limit, spitting out the apropriate text. sub text_graph { my @graph = @_; my $limit = 15; my $ret; for( 1 .. ( $limit + 1 ) ){ for my $letter( 0 .. 25 ){ my $c = $graph[$letter][$limit] || ' '; $ret .= "$c " } $limit--; $ret .= "\n" } return $ret } # returns a concatenation of every Nth character of the ciphertext. sub nth_char { my( $offset, $keylen ) = @_; my $count = 0; my $ret; for( split //, $ciphertext ){ $ret .= $_ unless( ( $count + ( $keylen - $offset ) ) % $keylen ); $count++ } $ret =~ s/[^a-z]//g; return $ret .= "\n" } # some letters never occur in a given string. since they don't occur, # this fills in anything in the distribution hash that might otherwise be # empty, causing the graph to look funny for missing letters. this appears # to be a hold out for when i was going to have the final version be an # obfuscation, as i can't tell what gets passed to it by looking at it... sub fill_dist { my %distribution; $distribution{$_}++ for grep /[a-z]/, split//, shift; for( 'a' .. 'z' ){ if( ! $distribution{$_} ){ $distribution{$_} = '0' } } return %distribution } # this goes through the repeated string sequences hash, discards # sequences repeated fewer than 3 times, discards sequences that # don't contain at least two letter characters, finds their location, # and computes the greatest common factor for those sequence positions. # this also contains code from TedPride's suggestion on perlmonks.org. sub get_likely_keylen { my $strseq = shift; my @grt_fct; print boldline( " Repeating string locations:\n" ); for( sort { $$strseq{$b} <=> $$strseq{$a} || $a cmp $b } keys %$strseq ){ last if $$strseq{$_} < 3; next unless $_ =~ y/a-z// > 2; my @factors; push @factors, ( pos( $ciphertext ) - 3 ) while $ciphertext =~ /$_/g; print boldline( " $_: " ); print line( join ', ', @factors ); my $gcf = mgcf( normalize( @factors ) ); print line( " (normalized: ". join( ", ", normalize( @factors ) ). ")" ); push @grt_fct, $gcf unless $gcf < 3; print line( "\n Greatest Common Factor: $gcf\n" ); } return mgcf( @grt_fct ) } # given an array ( 56, 156, 606 ) takes the first element off and # subtracts its value from every other element ( 100, 550 ). this # makes determining the greatest common factor a bit easier. sub normalize { my $o = shift; return map { $_ - $o } @_ } # returns the greatest common factor of 2 numbers. sub gcf and sub mgcf # borrowed from code posted to perlmonks.org by MrNobo1024 sub gcf { my( $x, $y ) = @_; ( $x, $y ) = ( $y, $x % $y ) while $y; return $x } # returns the greatest common factor for 3 or more numbers. sub mgcf { my $x = shift; $x = gcf( $x, shift ) while @_; return $x } # processes text either as html or not at all. sub boldline { return $html ? "<b>" . line( shift ) . "</b>" : shift } # processes text either as html or not at all. sub pre { return $html ? "<pre>\n" . shift() . "\n</pre>\n" : shift } # processes text either as html or not at all. sub line { my $val = shift; $html && $val =~ s/\n/<br>/g; return $val } # processes text either as html or not at all. sub output { my $val = shift; if( $html ){ $val =~ s/(\n)/<br>$1/g; $val = "<p>$val</p>\n"; } return $val } # self explanatory :) sub html_header { ( my $html=<<" EOH" ) =~ s/^ //g; <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html><head><title>cryptanalysis :: vigenére cipher</title> <style type="text/css"> body { background: #888; font-size: 10pt; color: #222; font-family: tahoma, verdana, helvetica, sans-serif } #main { border: solid 1px #222; padding: 25px; background: #fff } p { font-family: tahoma, verdana, helvetica, sans-serif; font-size: 10pt; color: #222 } </style> </head> <body bgcolor="#ffffff"> <div id="main"> EOH return $html } # self explanatory :) sub html_footer { return "</div></body></html>\n"; } sub usage { print <<" EOU"; devig.pl v.1 author: kent cowgill usage: devig.pl [ --help ] [ --file <file> ] [ --charts ] [ --html ] --help displays this text and exits. --file accepts a filename as input. otherwise reads STDIN (as in cat ciphertext | perl devig.pl [options]). --charts display all letter distribution charts for each set of Nth characters for frequency analysis. --html display output as HTML. capture output to a file with: perl devig.pl --html [options] > output.html EOU return 1; }