Fast Fourier Transform in Haskell

The Discrete Fourier Transform (DFT) converts a finite list of equally-spaced samples of a function into the list of coefficients of a finite combination of complex sinusoids, ordered by their frequencies, that has those same sample values.

The DFT is used to perform Fourier analysis in different applications like digital signal processing, polynomial multiplication, partial differential equations solving, large integers multiplication etc. It has also been used in statistical tests of random and pseudorandom number generators for cryptographic applications (see NIST Special Publication 800-22).

The Haskell function dft (shown below) evaluates the DFT for any number of points. Since the vectors [e^{2{\pi}kn/N} | n = 0,1..N-1] constitute an orthogonal basis over the set of N-dimensional complex vectors, the inverse DFT can be calculated by the same algorithm using the complex conjugate vectors (see function idft). In order to make the forward and the inverse transforms unitary, a normalization factor equal to \sqrt{1/N} is used in both functions.

-- Author: Joao H de A Franco (
-- Description: DFT implementation in Haskell
-- Date: 2013-Mar-28
-- License: Attribution-NonCommercial-ShareAlike 3.0 Unported
--          (CC BY-NC-SA 3.0)
module Aux.DFT (dft,idft) where
import Data.Complex (Complex((:+)))

dft, idft :: [Complex Float] -> [Complex Float]
dft = dft' 1
idft = dft' (-1)
dft' :: Float -> [Complex Float] -> [Complex Float]
dft' e x = let t = length x
               y = 1 / fromIntegral t
               dft'' k = let w = fromIntegral k * 2 * e * pi * y
                         in (sqrt y :+ 0) *
                             sum (zipWith (\n z -> z * exp (0 :+ n * w)) [0..] x)                         
           in map dft'' [0..t-1]

The DFT can be seen as a linear transformation where an real-valued N-tuple is multiplied by an N x N matrix returning a complex-valued N-tuple. This matrix operation takes N² scalar multiplications, which makes it very expensive to deal with large data sets.

Fast Fourier Transform is the generic name given to algorithms that compute the DFT (or its inverse) faster than O(N²). The most commonly used FFT algorithm is due to Cooley & Tukey (1965), a “divide and conquer” procedure that recursively breaks down a DFT of any composite size N = N1 * N2 into many smaller DFTs of sizes N1 and N2. The Cooley-Tukey and other FFTs schemes reduce the algorithm complexity from O(N²) to O(N log N).

The advent of FFT algorithms has revolutionized the use of the Discrete Fourier Transform, making it possible to use it on a variety of areas: applied mechanics, biomedical engineering, communications, electromagnetics, finance, instrumentation, numerical methods, radar, signal processing, sonics and acoustics etc.

It’s interesting to note, however, that the general idea behind the FFT algorithms was invented by Carl Friedrich Gauss (“The Prince of Mathematicians”) around 1805, when he was calculating the orbit of an asteroid. Amazing!

The Cooley–Tukey algorithm is usually employed to recursively divide a N-point transform into two N/2-point transforms and for this reason it is limited to cases where N is a power of two. The Haskell code below implements this technique, called decimation-in-time radix-2 (fft implements the forward transform and ifft the inverse one).

-- Author: Joao H de A Franco (
-- Description: FFT implementation in Haskell
-- Date: 2013-Mar-28
-- License: Attribution-NonCommercial-ShareAlike 3.0 Unported
--          (CC BY-NC-SA 3.0)
module Aux.FFT (fft,ifft) where
import Data.Complex(Complex((:+)))

fft, ifft :: [Complex Float] -> [Complex Float]
fft = fft' 1
ifft = fft' (-1)

fft' :: Float -> [Complex Float] -> [Complex Float]
fft' e xs = if pow2 (length xs) 
            then let z = sqrt (1 / fromIntegral (length xs))
                  in map ((z :+ 0) *) $ fft'' e xs
            else error "Number of points is not a power of 2"
            where pow2 n
                     | n == 1 || n == 2 = True
                     | otherwise = n `mod` 2 == 0 && pow2 (n `div` 2)

fft'' :: Float -> [Complex Float] -> [Complex Float]
fft'' _ [] = []
fft'' _ [x] = [x]
fft'' e xs = fft'' e (evens xs) <+> t (fft'' e (odds xs))
  where (<+>) r s = zipWith (+) (r ++ r) (s ++ map negate s)
        evens [] = []
        evens [u] = [u]
        evens (v:_:vs) = v:evens vs
        odds = evens . drop 1
        n = 2 * pi / fromIntegral (length xs)
        t = zipWith (\k z -> z * exp (0 :+ k * n * e)) ([0..] :: [Float])

Besides classic references on this subject such as Brigham’s and Bracewell’s books, I recommend the excellent lectures on The Fourier Transforms and Its Applications (EE261) by Prof. Brad Osgood from Stanford University. As a matter of fact, I’ve watched only lectures 20 to 22, as they cover DFT and the basics of FFT algorithms, but most probably the remaining lectures should be equally good. Extensive course materials (syllabus, handouts, past exams, exercises & solutions) are available.

The box below shows some simple tests to check the correctness of the above implementations.

-- Author: Joao H de A Franco (
-- Description: Test module for FFT implementation in Haskell
-- Date: 2013-Mar-28
-- License: Attribution-NonCommercial-ShareAlike 3.0 Unported
--          (CC BY-NC-SA 3.0)
module Main where
import Data.Complex (Complex((:+)),realPart,imagPart)
import Aux.DFT (dft,idft)
import Aux.FFT (fft,ifft)

main :: IO ()
main = print (if test1 && test2 && test3 then "ok" else "nok")

-- 7-point DFT (prime number) compared to Mathematica's evaluation
test1 :: Bool
test1 = let mathematica1 = [3.40168 :+ 0, (-0.566947) :+ 0.564962,
                            (-0.566947) :+ (-1.41899), (-0.566947) :+ 0.645967,
                            (-0.566947) :+ (-0.645967), (-0.566947) :+ 1.41899,
                            (-0.566947) :+ (-0.564962)]
        in map (roundComplex 3 0.001) (dft [0,1,2,3,0,1,2]) == 
           map (roundComplex 3 0.001) mathematica1

-- 8-point FFT (power of 2) compared to Mathematica's evaluation
test2 :: Bool
test2 = let mathematica2 = [4.24264 :+ 0, 0 :+ 0,
                            (-1.41421) :+ (-1.41421), 0 :+ 0, 
                            (-1.41421) :+ 0, 0 :+ 0,
                            (-1.41421) :+ 1.41421, 0 :+ 0]
        in map (roundComplex 3 0.001) (fft [0,1,2,3,0,1,2,3]) == 
           map (roundComplex 3 0.001) mathematica2

-- 512-point DFT compared to FFT
test3 :: Bool
test3 = let list = take 512 $ cycle ([0,1,2,3] :: [Complex Float])
            z1 = map (roundComplex 2 0.001) $ dft list
            z2 = map (roundComplex 2 0.001) $ fft list
         in z1 == z2

-- 128-point time-shifted impulse forward & inverse DFT (auto-test)
test4 :: Bool
test4 = all (==True) t4
        where t4 = do
              n <- [0..127]
              let x = impulse n 128 :: [Complex Float]
              return $ map (roundComplex 3 0.001) (idft (dft x)) == x

-- 1024-point time-shifted impulse forward & inverse FFT (auto-test)         
test5 :: Bool
test5 = all (==True) t5
        where t4 = do
              n <- [0..1023]
              let x = impulse n 1024 :: [Complex Float]
              return $ map (roundComplex 3 0.001) (ifft (fft x)) == x

-- Discrete Dirac delta generator
impulse :: Num a => Int -> Int -> [a]
impulse n m = replicate n 0 ++ 1:replicate (m - n - 1) 0

-- Rounding function
roundComplex :: Int -> Float -> Complex Float -> Complex Float
roundComplex n e x = let zeroFloat e' f = if abs f < e' then 0 else f
                         trim = roundFloat n . zeroFloat e
                         roundFloat m y = fromIntegral (round $ y * 10 ^ m :: Int) / 10 ^^ m
                     in trim (realPart x) :+ trim (imagPart x)

You can run the above FFT script to evaluate the forward and inverse FFT for a small data set using the facilities provided by, an online compiler/interpreter for Haskell and other languages.

A quick benchmark shows the speed-up achieved by the radix-2 FFT over the classic DFT: 18 ms vs. 2.5 s for 4,096 points (two orders of magnitude!). FFTW, a C subroutine library for computing the DFT in one or more dimensions, is much faster, as it takes only 0.7 ms for processing 4,096 points on the same Linux system.

RC4 coded in Haskell

After using Python in some small projects, my attention turned to a much more attractive language, Haskell. Haskell is a functional programming language with non-strict (“lazy”) semantics and strong static typing. It is relatively new (1990) and surprisingly well-defined, despite being created by a committee. The most popular Haskell compiler & interpreter, GHC (the Glasgow Haskell Compiler), is open-source software, available for Windows, Mac and Linux platforms.

The language is named after Haskell Brooks Curry, whose work in mathematical logic serves as a foundation for functional languages. Haskell and other functional languages are based on the lambda calculus which, according to the Church–Turing thesis, is as powerful as the Turing machine (that serves as a paradigm for imperative languages like C, C++, Java and others) for expressing computable functions.

Learning how to program in Haskell has been a rewarding experience for me. At some point, it reminded myself of a famous quote by Benjamin Whorf, “language shapes the way we think and determines what we can think about.” I will comment on my learning experience in Haskell in another post. For the moment, I would like to comment about the RC4 cryptographic algorithm and its implementation in Haskell. RC4 is a remarkably simple algorithm (already coded in Python in another post), and thus the obvious choice as the first cryptographic algorithm to be coded in a new programming language.

As the next box shows [1], it is possible to code the RC4 algorithm in less than ten lines of Haskell. The resulting code, although totally functional (no pun intended), definitely does not meet software quality standards, but does really show Haskell conciseness.

module Crypto.RC4 where
import Data.Char;import Data.Array
prga=prga' 0 0 where prga' i j s=let{i'=succ i`mod`256;j'=(s!i'+j)`mod`256;
s' = swap i' j' s;t = s'!((s!i'+s!j') `mod` 256)} in chr t : prga' i' j' s'
ksa k = ksa' 0 0 $ array (0,255) [(i,i)|i<-[0..255]] where {ksa' 256 _ s=s;
ksa' i j s = let {i'= i `mod` length k; j' = (j+s!i+ord(k!!i')) `mod` 256}
in ksa' (succ i) j' (swap j' i s)}
swap i j s = if i /= j then s // [(i, s ! j), (j, s ! i)] else s

The RC4 stream cipher algorithm

RC4, a variable key-size stream cipher with byte-oriented operations, is the most widely used stream cipher algorithm. It was designed by Ronald Rivest in 1987 for RSA Data Security (now RSA Security, a division of EMC²). RC4, also known as ARC4 [2], was kept as a trade secret until 1994, when it was posted anonymously to the cypherpunks mailing list.

RC4 creates a finite-state machine that uses a state array of 256 bytes. Initially, this array is set to the identity permutation (see pseudo-code below). The next step is performed by the Key Scheduling Algorithm (KSA), which traverses the state array swapping its elements based on the key, which size can be defined anywhere in the range of 40 to 2,048 bits [3]. As soon as KSA finishes its job, the key can be discarded since its contribution has been incorporated into the state array.

The state array is then ready for the Pseudo-Random Generation Algorithm (PRGA) to generate the stream cipher that will be xor’d with the plaintext (encryption) or with the ciphertext (decryption). Every time a byte is produced, the state array is permuted once more [4].

############ RC4 pseudo-code (extracted from Wikipedia) ############

# RC4 array state initialization
for i from 0 to 255
    S[i] := i

# Key Scheduling Algorithm (KSA)
j := 0
for i from 0 to 255
    j := (j + S[i] + key[i mod keylength]) mod 256
    swap values of S[i] and S[j]

# Pseudo-Random Generation Algorithm (PRGA)
i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    K := S[(S[i] + S[j]) mod 256]
    output K

RC4 implementation in Haskell

In the expanded version (see box below), the following aspects were changed:

  • An extra module, Data.Bits, was imported to support the xor operation required by the encrypt function, absent in the first program, since it is not related to the RC4 algorithm itself (its task is just to combine the keystream generated by any stream cipher with the cleartext or ciphertext digits).
  • Only external objects (functions, types, data and value constructors) effectively used were imported from modules Data.Char, Data.Array and Data.Bits.
  • All top-level functions declared were assigned type signatures (those lines containing the double-colon operator). Although the Haskell compiler can (most of time) infer types automatically, it is strongly recommended to state them explicitly to assist the compiler in generating a more efficient code and to provide a better documentation.
  • Curly braces and semi-colons were substituted for indentation to structure code (unlike Python, indentation is not compulsory in Haskell).
  • A short descriptive comment was added to the four top-level functions (encrypt, prga, ksa and swap).
-- Author: Joao H de A Franco (
-- Description: RC4 implementation in Haskell
-- Date: 2013-Feb-22
-- License: Attribution-NonCommercial-ShareAlike 3.0 Unported
--          (CC BY-NC-SA 3.0)
module Crypto.RC4 (encrypt) where
import Data.Array (Array, array, (//), (!))
import Data.Char (chr, ord)
import Data.Bits (xor)

type ArrayRC4 = Array Int Int
type Index = Int
type Key = String
type Stream = String

-- Encrypt (or decrypt) using RC4 keystream
encrypt :: Key -> String -> String
encrypt k c = map chr $ zipWith xor (map ord (stream k)) (map ord c)
              where stream = prga . ksa

-- Pseudo-random generation algorithm (PRGA)                             
prga :: ArrayRC4 -> Stream
prga = prga' 0 0
  where prga' i j s = let i' = succ i `mod` 256
                          j' = (s ! i' + j) `mod` 256
                          s' = swap i' j' s
                          t = s' ! ((s ! i' + s ! j') `mod` 256)
                       in chr t : prga' i' j' s'

-- Key Scheduling Algorithm (KSA)
ksa :: Key -> ArrayRC4
ksa k = ksa' 0 0 $ array (0,255) [(i,i) | <- [0..255]]
  where ksa' 256 _ s = s
        ksa' i j s = let i' = i `mod` length k
                         j' = (j + s ! i + ord (k !! i')) `mod` 256
                     in ksa' (succ i) j' (swap j' i s)

-- Swap the values of S[i] and S[j]
swap :: Index -> Index -> ArrayRC4 -> ArrayRC4
swap i j s = if i /= j then s // [(i, s ! j), (j, s ! i)] else s

Let’s go over the code. Initially, the state array is created and set to the identity permutation, which is done by the expression (listed inside ksa body)

array (0,255) [(i,i) | i<- [0..255]]

which creates a value of the type ArrayRC4 (which is a synonym for Array Int Int) with the help of the array function imported from the Data.Array module. This function requires two parameters: the array bounds and the array contents. The array bounds (the lowest and highest indices in the array) are expressed as the Haskell tuple (0,255), while its contents are specified by a list of tuples of the form (index, value): [(0,0),(1,1),(2,2)...(255,255)], that can be expressed much more concisely as a list comprehension: [(i,i)|<-[0..255]].

Being Haskell a strong static typed language, every expression to be computed must have a well defined type during compilation time. Since indices and elements change places in both functions, ksa and prga, their types must be exactly the same, Int in this case. Although we haven’t explicitly declared the type of the array state, the Haskell compiler will infer it from the type declarations ksa :: Key -> ArrayRC4 (which states that the function ksa produces an output of type ArrayRC4) and ksa' 256 _ s = s (which requires that ksa' last parameter to be of type ArrayRC4).

The swap function

-- Swap values of S[i] and S[j]
swap :: Index -> Index -> ArrayRC4 -> ArrayRC4
swap i j s = if i /= j then s // [(i, s ! j), (j, s ! i)] else s

The swap function accepts two indices (of type Index, a synonym for Int) and the state array (of type ArrayRC4) as inputs and produces a new, updated state array after swapping the contents of the two specified array elements. Arrays of type Data.Array.Array (the only array type supported by Haskell 98) are, like any other pure functional data structures, immutables, so once created they cannot be changed, only queried. In order to do its job, swap uses the incremental array update infix operator, //, imported from the  Data.Array module. For example, an array a with an element i updated to value v is written a//[(i,v)].

This array update is not very efficient (both in terms of time and memory), as a new copy of the array must be carried out before any changes can be applied. On the other hand, those arrays can be used in pure functional code [6]. The second operator imported from the Data.Array module is the array subscripting infix operator ! (as an example, the expression a!i returns the value at index i of an array a).

The ksa function

-- Key Scheduling Algorithm (KSA)
ksa :: Key -> ArrayRC4
ksa k = ksa' 0 0 $ array (0,255) [(i,i) | <- [0..255]]
  where ksa' 256 _ s = s
        ksa' i j s = let i' = i `mod` length k
                         j' = (j + s ! i + ord (k !! i')) `mod` 256
                     in ksa' (succ i) j' (swap j' i s)

The ksa function accepts a key (of type Key, a synonym for String) as input and outputs the state array (of type ArrayRC4). Since there is no “loop” construction in Haskell, recursion is used instead. The ksa function just set the initial parameters for an auxiliary, recursive function, ksa', that does the hard work, visiting every single element of the array state. Its parameters are i (first index and counter), j (second index) and s (the state array). The stop condition (otherwise the recursion will never stop) is met, via Haskell’s pattern match, when i is equal to 256.

Two additional comments:

  • List subscripting is done by the infix operator !!, made available by the Prelude module (a standard module imported by default into all Haskell modules).
  • If a function (or constructor) takes two or more arguments, like mod, it can be used in infix form by enclosing its name in backtick characters (`mod`).

The prga function

-- Pseudo-random generation algorithm (PRGA)                         
prga :: ArrayRC4 -> Stream
prga = prga' 0 0
  where prga' i j s = let i' = succ i `mod` 256
                          j' = (s ! i' + j) `mod` 256
                          s' = swap i' j' s
                          t = chr $ s' ! ((s ! i' + s ! j') `mod` 256)
                       in t : prga' i' j' s'

The prga function accepts the initialized state array (of type ArrayRC4) as input and outputs the keystream (of type Stream, a synonym for String), represented in Haskell by a list of characters of (potentially) infinite length. This is no problem for non-strict languages like Haskell, Miranda and Clean. In particular, Haskell, and also Miranda, adopt the approach known as “lazy evaluation” (or “call-by-need”): an expression is evaluated only when its value is required.

The encrypt function

-- Encrypt (or decrypt) using RC4 keystream
encrypt :: Key -> String -> String
encrypt k c = map chr $ zipWith xor (map ord (stream k)) (map ord c)
              where stream k' = prga (ksa k')

The encrypt function defines the encryption/decryption service interface: it accepts the key (of type Key) and the plaintext/ciphertext (of type String) and returns the ciphertext/plaintext (also of type String). It is interesting to examine how the expression below is evaluated.

-- zipWith xor (map ord (stream k)) (map ord c)

The zipWith function (exported by the Prelude module) accepts a binary function and two lists and produces a third list which elements are calculated applying the function to the elements occurring at the same position in both lists (in this particular case, the binary function is xor, and the three lists are of type [Int]). If the first two lists are of different length, the longer one is truncated to make them of the same size.

The first list, the keystream (represented as a list of integers), has potentially infinite length, while the second one, the cleartext/ciphertext (also represented as a list of integers), has finite length. Haskell’s lazy evaluation of this expression will demand from the prga function a keystream exactly as long as that of the cleartext or ciphertext being encrypted (decrypted).

The Main module

Since RC4 is a proprietary algorithm, there are no official test suite for validating RC4 implementations [5]. A quick check can be performed, however, using the unofficial test vectors available at Wikipedia.

The program below executes the three tests described at Wikipedia and a fourth that checks whether the encrypt function is an involution (a function that it is own inverse) testing a single point of its domain. The keys and plaintext shown in the code are ASCII while the keystream and ciphertext are in hexadecimal.

-- Author: Joao H de A Franco (
-- Description: Test module for RC4 implementation in Haskell
-- Date: 2013-Feb-22
-- License: Attribution-NonCommercial-ShareAlike 3.0 Unported
--          (CC BY-NC-SA 3.0)
module Main where
import Crypto.RC4 (encrypt)
import Numeric (showHex)
import Data.Char (ord)

-- The first threee test-vectors are from Wikipedia
-- (
main :: IO ()
main = do
-- Test #1 ----------------------------------------
  let t1 = encrypt "Key" "Plaintext"
  check (convert t1 == "bbf316e8d940af0ad3") 1
-- Test #2 ----------------------------------------
  let t2 = encrypt "Wiki" "pedia"
  check (convert t2 == "1021bf0420") 2
-- Test #3 ----------------------------------------
  let t3 = encrypt "Secret" "Attack at dawn"
  check (convert t3 == "45a01f645fc35b383552544b9bf5") 3
-- Test #4 ----------------------------------------
  let t4 = encrypt "bla-bla-bla" "Do you want to know a secret?"
  let t4' = encrypt "bla-bla-bla" t4
  check (t4' == "Do you want to know a secret?") 4

check :: Bool -> Int -> IO ()
check f t = putStrLn $ "Test " ++ show t ++ if f then ": ok" else ": nok"

convert :: String -> String
convert = concatMap (showHex' . ord)
          where showHex' x = let y = showHex x ""
                              in if length y == 1 then '0' : y else y


The main function (in the Main module) can be evaluated using GHCi (GHC’s interactive environment), in which Haskell expressions can be interactively evaluated and programs can be interpreted. The output is shown below:

GHCi, version 7.4.2:
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package array- ... linking ... done.
[2 of 2] Compiling Main (/home/jhaf/workspace/RC4a/src/Main.hs, interpreted)
Ok, modules loaded: Main, Crypto.RC4
*Main> main
Test 1: ok
Test 2: ok
Test 3: ok
Test 4: ok

Note 1 – You will see no syntax highlighting in all boxes showing Haskell code because it is not a programming language supported by Although Haskell is not particularly popular — it is in position #33 in the TIOBE Programming Community Index — other languages ranked worse than Haskell in the TIOBE index (Scala, Erlang, F#, Clojure, VB and Powershell) are indeed supported by

Note 2 – The name RC4 is a trademark of RSA Security. For this reason, RC4 is often referred to as ARCFOUR or ARC4 (alleged RC4) to avoid trademark problems. RSA Security has never officially released the algorithm.

Note 3 – Until 2000, when the U.S. government relaxed export regulations on cryptography, only encryption software with keys up to 40 bits could be exported by U.S. companies. While this lower limit is just a convention, the upper one is real: if the key is greater than 2,048 bits (which happens to be 8 x 256), the excess bytes will be just dropped by the KSA.

Note 4 – An interesting RC4 tutorial with animation is available at Marc Loiseau’s blog).

Note 5 – Haskell libraries with other, more complex, implementations of arrays – but which have far more features – are available, some of which are mutable and/or strict.

Note 6 – As RSA was the sole primary vendor of cryptographic kits that include implementations of RC4, which was not an approved algorithm for use by the U.S. government, there was no (strong) demand for a test suite for validating RC4 implementations.

Four chess problems

More than stating the importance of time, material sacrifices in chess combinations are tangible proofs of the superiority of mind over matter. Your task, dear reader, is to find out the solution of each of the following chess problems. The first two are composed problems and the other two were extracted from (unknown) real games.

Problem 1 - White to play and mate in two.
V. Melnichenko (1958)

Problem 2 - White to play and mate in two.
S. Loyd (1877)

Problem 3 - White to play and mate in three.

Problem 4 - White to play and mate in three.

Solutions: P1: 1.Bh5 / P2: 1.Qa1 / P3: 1.Ng6 hg6 2.Nd5 ed5 3.Qe5# / P4: 1.Qa6 ba6 2.Nb5 Ka8 3.Rg8#

My favorite quotes (II)

“If Java had true garbage collection, most programs would delete themselves upon execution.”
Robert Sewell

“If debugging is the process of removing bugs, then programming must be the process of putting them in.”
Edsger W. Dijkstra

“Programming can be fun, so can cryptography; however they should not be combined.”
Kreitzberg and Shneiderman

“Science is what we understand well enough to explain to a computer. Art is everything else we do.”
Donald Knuth

“In C++ it’s harder to shoot yourself in the foot, but when you do, you blow off your whole leg.”
Bjarne Stroustrup

“Software is getting slower more rapidly than hardware becomes faster.”
Niklaus Wirth

“A computer once beat me at chess, but it was no match for me at kick boxing.”
Emo Philips

“A refund for defective software might be nice, except it would bankrupt the entire software industry in the first year.”
Andrew Tanenbaum

“Quantum mechanic Seth Lloyd says the universe is one giant, hackable computer. Let’s hope it’s not running Windows.”
Kevin Kelly

“Any sufficiently advanced technology is indistinguishable from magic.”
Arthur Clarke

“Any sufficiently advanced bug is indistinguishable from a feature.”
Rich Kulawiec

“From a programmer’s point of view, the user is a peripheral that types when you issue a read request.”
Peter Williams

“The teaching of BASIC should be rated as a criminal offence: it mutilates the mind beyond recovery.”
Edsger Dijkstra

“Real Programmers always confuse Christmas and Halloween because Oct31 == Dec25.”
Andrew Rutherford

“When your work speaks for itself, don’t interrupt.”
Henry Kaiser

“Real men don’t use backups. They post their stuff on a public ftp server and let the rest of the world make copies.”
Linus Torvalds

“Premature optimization is the root of all evil.”
Donald Knuth

“GOTO is a four letter word.”
Edsger Dijkstra

“unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep”

“If it wasn’t for C, we’d be writing programs in BASI, PASAL, and OBOL.”

“SQL, Lisp, and Haskell are the only programming languages that I’ve seen where one spends more time thinking than typing.”
Philip Greenspun

“There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.”
Tony Hoare

“If you don’t want to be replaced by a computer, don’t act like one.”
Arno Penzias

“God exists since mathematics is consistent, and the Devil exists since we cannot prove it.”
André Weil

“Machines should work. People should think.”
Richard Hamming

“Good judgement comes from experience, and experience comes from bad judgement.”
Fred Brooks

“The limits of my language mean the limits of my world.”
Ludwig Wittgenstein

Linpack Users’ Guide

Multiplication over the binary finite field GF(2^m)

The multGF2() function shown in the Python script below implements the element (polynomial) multiplication over a binary finite field GF(2^{m}). The second function, setGF2(), sets the three constants needed for its colleague to perform its multiplication task: "mask1" and "mask2" (used in “and” operations) and "polyred", a polynomial constant used in the polynomial reduction of the product. These constants are defined from the parameters passed to setGF2(): "degree" (the extension degree of the binary field GF(2^{m})) and "irPoly" (the irreducible polynomial used for the product reduction). The version of multGF2() in the post AES implementation in Python, dubbed “mult()“, was configured specifically for the AES binary finite field used in the MixColumns operation, GF(2^{8})/x^{8} + x^{4} + x^{3} + x + 1.

# Author: Joao H de A Franco (
# Description: Binary finite field multiplication in Python 3
# Date: 2012-02-16
# License: Attribution-NonCommercial-ShareAlike 3.0 Unported
#          (CC BY-NC-SA 3.0)
from functools import reduce

# constants used in the multGF2 function
mask1 = mask2 = polyred = None

def setGF2(degree, irPoly):
    """Define parameters of binary finite field GF(2^m)/g(x)
       - degree: extension degree of binary field
       - irPoly: coefficients of irreducible polynomial g(x)
    global mask1, mask2, polyred
    mask1 = mask2 = 1 << degree
    mask2 -= 1
    if sum(irPoly) <= len(irPoly):
        polyred = reduce(lambda x, y: (x << 1) + y, irPoly[1:])    
        polyred = poly2Int(irPoly[1:]) 
def multGF2(p1, p2):
    """Multiply two polynomials in GF(2^m)/g(x)"""
    p = 0
    while p2:
        if p2 & 1:
            p ^= p1
        p1 <<= 1
        if p1 & mask1:
            p1 ^= polyred
        p2 >>= 1
    return p & mask2

#                        Auxiliary formatting functions
def int2Poly(bInt):
    """Convert a "big" integer into a "high-degree" polynomial"""
    exp = 0
    poly = []
    while bInt:
        if bInt & 1:
        exp += 1
        bInt >>= 1
    return poly[::-1]

def poly2Int(hdPoly):
    """Convert a "high-degree" polynomial into a "big" integer"""
    bigInt = 0
    for exp in hdPoly:
        bigInt += 1 << exp
    return bigInt

def i2P(sInt):
    """Convert a "small" integer into a "low-degree" polynomial"""
    return [(sInt >> i) & 1 for i in reversed(range(sInt.bit_length()))]

def p2I(ldPoly):
    """Convert a "low-degree" polynomial into a "small" integer"""
    return reduce(lambda x, y: (x << 1) + y, ldPoly)

def ldMultGF2(p1, p2):
    """Multiply two "low-degree" polynomials in GF(2^n)/g(x)"""
    return multGF2(p2I(p1), p2I(p2))

def hdMultGF2(p1, p2):
    """Multiply two "high-degree" polynomials in GF(2^n)/g(x)"""
    return multGF2(poly2Int(p1), poly2Int(p2))

if __name__ == "__main__":
    # Define binary field GF(2^3)/x^3 + x + 1
    setGF2(3, [1, 0, 1, 1])
    # Alternative way to define GF(2^3)/x^3 + x + 1
    setGF2(3, i2P(0b1011))    
    # Check if (x + 1)(x^2 + 1) == x^2
    assert ldMultGF2([1, 1], [1, 0, 1]) == p2I([1, 0, 0])

    # Check if (x^2 + x + 1)(x^2 + 1) == x^2 + x
    assert ldMultGF2([1, 1, 1], [1, 0, 1]) == p2I([1, 1, 0])
    # Define binary field GF(2^8)/x^8 + x^4 + x^3 + x + 1
    setGF2(8, [1, 0, 0, 0, 1, 1, 0, 1, 1])
    # Alternative way to define GF(2^8)/x^8 + x^4 + x^3 + x + 1
    setGF2(8, i2P(0b100011011))
    # Check if (x)(x^7 + x^2 + x + 1) == x^4 + x^2 + 1
    assert ldMultGF2([1, 0], [1, 0, 0, 0, 0, 1, 1, 1]) == p2I([1, 0, 1, 0, 1])
    # Check if (x + 1)(x^6 + x^5 + x^3 + x^2 + x) == x^7 + x^5 + x^4 + x
    assert ldMultGF2([1, 1], [1, 1, 0, 1, 1, 1, 0]) == p2I([1, 0, 1, 1, 0, 0, 1, 0])

    # Define binary field GF(2^571)/x^571 + x^10 + x^5 + x^2 + x
    setGF2(571, [571, 10, 5, 2, 1])

    # Calculate the product of two polynomials in GF(2^571)/x^571 + x^10 + x^5 + x^2 + x,
    # x^518 + x^447 + x^320 + x^209 + x^119 + x + 1 and x^287 + x^145 + x^82 + + x^44
    print(int2Poly(hdMultGF2([518, 447, 320, 209, 119, 1, 0], [287, 145, 82, 44])))

The output of the above script is:

[562, 529, 496, 491, 465, 406, 402, 364, 354, 291, 288, 287, 264, 253, 244,
239, 235, 201, 173, 168, 165, 164, 163, 146, 145, 102, 97, 94, 93, 83, 82,
46, 45, 44, 41, 39, 38, 37, 34, 30, 26, 23, 22]

This means that the product of the polynomials x^{518} + x^{447} + x^{320} + x^{209} + x^{119} + x + 1 and x^{287} + x^{145} + x^{82} + x^{44} is:

x^562 + x^529 + x^496 + x^491 + x^465 + x^406 + x^402 + x^364 + x^354 +
x^291 + x^288 + x^287 + x^264 + x^253 + x^244 + x^239 + x^236 + x^235 +
x^201 + x^173 + x^168 + x^165 + x^164 + x^163 + x^146 + x^145 + x^102 +
x^97 + x^94 + x^93 + x^83 + x^82 + x^46 + x^45 + x^44 + x^41 + x^39 +
x^38 + x^37 + x^34 + x^30 + x^26 + x^23 + x^22

An elegant algorithm for calculating Hamming distance

Generally speaking, the Hamming distance between two strings of the same length is the number of positions in which the corresponding symbols are different. This metric, proposed by Richard W. Hamming in his seminal paper “Error Detecting and Error Correcting Codes”, Bell System Technical Journal 26(2):147-160 (1950), also expresses the least number of (intended or unintended) substitutions needed to change one string to another.

Besides being used in computer- and communications-related fields such as information theory, coding theory, and cryptography, the Hamming distance concept has also found its way into genomics for the comparison of genomic sequences. Its importance can also be judged by the fact that modern microprocessors have a specific instruction (POPCNT, “population count”) for counting the number of bits set to 1.

Assuming two bit strings of equal length, x and y, the “obvious” algorithm to calculate the Hamming distance between them is to count the number of “1” bits in the result of the expression “x xor y”, as shown in the following Python code:

def hamming1(x,y):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x) == len(y)
    count,z = 0,int(x,2)^int(y,2)
    while z:
        if z&1:
            count += 1
        z >>= 1
    return count

While searching the web the other day, I have come across a very elegant algorithm for the same purpose, which can be coded in Python as follows:

def hamming2(x,y):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x) == len(y)
    count,z = 0,int(x,2)^int(y,2)
    while z:
        count += 1
        z &= z-1 # magic!
    return count

Apart from being 50% faster than the first one, its simplicity is awesome! After being counted, the lowest-order nonzero bit is cleared like magic! When I started a new search later, this time looking for the algorithm’s authorship, I finally learned that it was proposed by Peter Wegner in 1960 (“A technique for counting ones in a binary computer“, Communications of the ACM 3 (5): 322).

Where did I find this information? The answer was, talking again about “obvious” things, in the very same Wikipedia’s entry for “Hamming distance“! This little gem has been there for at least a year (according to the entry’s history), just waiting to be discovered! Quoting Antoine de Saint Exupéry, “You know you have achieved perfection in design, not when you have nothing more to add, but when you have nothing more to take away.”

Simplified AES implementation in Python

Simplified AES, created by Edward Schaefer and two of his students at Santa Clara University in 2003, is described in the paper “A Simplified AES Algorithm and Its Linear and Differential Cryptoanalyses”, Cryptologia, Vol. XXVII (2), pages 148-177. Like Simplified DES, its purpose is educational, since its key and block size are very small (16-bit and 8-bit, respectively). Nonetheless, due to its simplicity, it is possible for students to encrypt or decrypt a block doing all operations by hand, making it easier for them to understand the structure and operation of its elder brother, AES.

# Author: Joao H de A Franco (
# Description: Simplified AES implementation in Python 3
# Date: 2012-02-11
# License: Attribution-NonCommercial-ShareAlike 3.0 Unported
#          (CC BY-NC-SA 3.0)
import sys

# S-Box
sBox  = [0x9, 0x4, 0xa, 0xb, 0xd, 0x1, 0x8, 0x5,
         0x6, 0x2, 0x0, 0x3, 0xc, 0xe, 0xf, 0x7]

# Inverse S-Box
sBoxI = [0xa, 0x5, 0x9, 0xb, 0x1, 0x7, 0x8, 0xf,
         0x6, 0x0, 0x2, 0x3, 0xc, 0x4, 0xd, 0xe]

# Round keys: K0 = w0 + w1; K1 = w2 + w3; K2 = w4 + w5
w = [None] * 6

def mult(p1, p2):
    """Multiply two polynomials in GF(2^4)/x^4 + x + 1"""
    p = 0
    while p2:
        if p2 & 0b1:
            p ^= p1
        p1 <<= 1
        if p1 & 0b10000:
            p1 ^= 0b11
        p2 >>= 1
    return p & 0b1111

def intToVec(n):
    """Convert a 2-byte integer into a 4-element vector"""
    return [n >> 12, (n >> 4) & 0xf, (n >> 8) & 0xf,  n & 0xf]            

def vecToInt(m):
    """Convert a 4-element vector into 2-byte integer"""
    return (m[0] << 12) + (m[2] << 8) + (m[1] << 4) + m[3]

def addKey(s1, s2):
    """Add two keys in GF(2^4)"""   
    return [i ^ j for i, j in zip(s1, s2)]
def sub4NibList(sbox, s):
    """Nibble substitution function"""
    return [sbox[e] for e in s]
def shiftRow(s):
    """ShiftRow function"""
    return [s[0], s[1], s[3], s[2]]

def keyExp(key):
    """Generate the three round keys"""
    def sub2Nib(b):
        """Swap each nibble and substitute it using sBox"""
        return sBox[b >> 4] + (sBox[b & 0x0f] << 4)

    Rcon1, Rcon2 = 0b10000000, 0b00110000
    w[0] = (key & 0xff00) >> 8
    w[1] = key & 0x00ff
    w[2] = w[0] ^ Rcon1 ^ sub2Nib(w[1])
    w[3] = w[2] ^ w[1]
    w[4] = w[2] ^ Rcon2 ^ sub2Nib(w[3])
    w[5] = w[4] ^ w[3]

def encrypt(ptext):
    """Encrypt plaintext block"""
    def mixCol(s):
        return [s[0] ^ mult(4, s[2]), s[1] ^ mult(4, s[3]),
                s[2] ^ mult(4, s[0]), s[3] ^ mult(4, s[1])]    
    state = intToVec(((w[0] << 8) + w[1]) ^ ptext)
    state = mixCol(shiftRow(sub4NibList(sBox, state)))
    state = addKey(intToVec((w[2] << 8) + w[3]), state)
    state = shiftRow(sub4NibList(sBox, state))
    return vecToInt(addKey(intToVec((w[4] << 8) + w[5]), state))
def decrypt(ctext):
    """Decrypt ciphertext block"""
    def iMixCol(s):
        return [mult(9, s[0]) ^ mult(2, s[2]), mult(9, s[1]) ^ mult(2, s[3]),
                mult(9, s[2]) ^ mult(2, s[0]), mult(9, s[3]) ^ mult(2, s[1])]
    state = intToVec(((w[4] << 8) + w[5]) ^ ctext)
    state = sub4NibList(sBoxI, shiftRow(state))
    state = iMixCol(addKey(intToVec((w[2] << 8) + w[3]), state))
    state = sub4NibList(sBoxI, shiftRow(state))
    return vecToInt(addKey(intToVec((w[0] << 8) + w[1]), state))

if __name__ == '__main__':
    # Test vectors from "Simplified AES" (Steven Gordon)
    # (
    plaintext = 0b1101011100101000
    key = 0b0100101011110101
    ciphertext = 0b0010010011101100
        assert encrypt(plaintext) == ciphertext
    except AssertionError:
        print("Encryption error")
        print(encrypt(plaintext), ciphertext)
        assert decrypt(ciphertext) == plaintext
    except AssertionError:
        print("Decryption error")
        print(decrypt(ciphertext), plaintext)
    print("Test ok!")