# Untwisted: A Cryptol Implementation of Keccak - Part 1

The Matryoshka and the Hermetic Sponge Strategy

## Introduction

In the last Untwisted issue, we explored how to bitslice a cryptographic algorithm, writing a vectorized TEA implementation in C. This time we will play with Cryptol, digging into the elegant design of Keccak, a candidate to the SHA-3 cryptographic hash algorithm competition. This will provide us the opportunity to learn more about the Cryptol specification language and see how its expressive power matches the purity and the clearness of the Keccak's reference specification.

## Cryptol

Cryptol is a purely functional domain specific language, developed over the past decade by Galois for the NSA, for the design, implementation and verification of cryptographic algorithms. As announced by Galois:

Cryptol allows a cryptographer to:

- Create a reference specification and associated formal model.
- Quickly refine the specification, in Cryptol, to one or more implementations, trading off space, time, and other performance metrics.
- Compile the implementation for multiple targets, including: C/C++, Haskell, and VHDL/Verilog.
- Equivalence check an implementation against the reference specification, including implementations not produced by Cryptol.

With its features set, Cryptol can significantly contribute towards meeting the challenges arising from the integration of cryptographic components into hardware and software systems, such as: their test and verification, the unambiguous specification of algorithms, and the retargeting of cryptographic implementations to new platforms.

The relationship between languages and information security has a long and interesting lineage. Martin Abadi reminds us how "objects and types have long been used for protection against incompetence and malice, at least since the 1970s". Distributed systems sometimes benefited fromabstractions for communication on secure channels provided by programming languages. Cryptol strenghten this relationship in the cryptographic setting.

## Keccak

Keccak is a family of cryptographic hash functions or, more accurately, sponge functions, designed by Guido Bertoni, Joan Daemen, Gilles Van Assche (STMicrosystems) and Michaël Peeters (NXP Semiconductors).

With a construction based on a permutation *f* from a set of 7 permutations that should not have any structural distinguishers, Keccak provides provable security against all generic attacks and is one of the five finalists in the NIST's SHA-3 cryptographic hash algorithm competition,

Any instance of the Keccak sponge function family makes use of one of the seven Keccak-f permutations, denoted

Keccak-f[b], whereb ∈ {25, 50, 100, 200, 400, 800, 1600}is the width of the permutation. These Keccak-f permutations are iterated constructions consisting of a sequence of almost identical rounds. The number of roundsnrdepends on the permutation width, and is given by nr = 12 + 2l, where 2^l = b/25. This gives 24 rounds for Keccak-f[1600].

In pseudo code:

Keccak-f[b](A) for i in 0 ... nr - 1 A = Round[b](A, RC[i]) return A

All Keccak members proposed for SHA-3 standardization make use of the same permutation: Keccak-f[1600].

Let's implement Keccak-f in Cryptol, starting from giving a representation to its permutation state.

## Parts of the State

The Keccak-f[b] permutation steps operate on a three dimensional state array of elements in GF(2), namely a[5][5][w], with w = 2^l. In the specifications the authors refer to a bit in position (x, y, z) with the expression a[x][y][z], where x, y ∈[0..4] and z ∈ [0..z-1].

The diagram below represents the permutation state array and is convenient for introducing the terminology used in both the original specifications and the implementation below.

The one dimensional parts are:

- A row is a set of 5 bits with constant y and z coordinates.
- A column is a set of 5 bits with constant x and z coordinates.
- A lane is a set of w bits with constant x and y coordinates.

The two dimensional parts are:

- A sheet is a set of 5w bits with constant x coordinate.
- A plane is a set of 5w bits with constant y coordinate.
- A slice is a set of 25 bits with constant z coordinate.

Structuring the permutation state in Cryptol is pretty straightforward. We can conveniently represent it as a three dimensional array of bytes, with signature [5][5][w][8].

[ [ [0 .. z] [0 .. z] [0 .. z] [0 .. z] [0 .. z] ] [ [0 .. z] [0 .. z] [0 .. z] [0 .. z] [0 .. z] ] [ [0 .. z] [0 .. z] [0 .. z] [0 .. z] [0 .. z] ] [ [0 .. z] [0 .. z] [0 .. z] [0 .. z] [0 .. z] ] [ [0 .. z] [0 .. z] [0 .. z] [0 .. z] [0 .. z] ] ]

In the multidimensional array above, the inner elements are lanes of length w ([][][z]), grouped in sheets along the second dimension ([][y][z]).

## The step mappings of Keccak-f

A Keccak-f round is composed from a sequence of dedicated mappings, each one with its particular task operating on a state array of 5 x 5 lanes. All allowed lane lengths are powers of two - w ∈ {1, 2, 4, 8, 16, 32, 64} (b = 25w); hence shorter lanes divides longer lane lengths. As "the propagation structures for smaller versions are embedded as symmetric structure in larger width version" the authors call it a Matryoshka structure.

Round[b](A, RC) step θ step ρ and π step χ step ι return A

With the exception of ι, all step mappings of the Keccak-f round function are translation-invariant in the direction of the z axis. Let's review the mappings one by one, constrasting their Cryptol implementation to the original specifications.

## The θ mapping

The first step in the round function of Keccak is linear mapping, translation-invariant in all directions. It provides diffusion by adding to each bit a[x][y][z] the bitwise sum of the parities of two columns: that of a[x-1][][z] and that of a[x+1][][z-1]. In pseudo-code theta can be specified as:

for x = 0 to 4 do C[x] = a[x,0] for y = 1 to 4 do C[x] = C[x] ⊕ a[x,y] end for end for for x = 0 to 4 do D[x] = C[x - 1] ⊕ ROT(C[x + 1], 1) for y = 0 to 4 do A[x, y] = a[x, y] ⊕ D[x] end for end for

Or more succintly - using the notation adopted by the specification summary - as:

C[x] = A[x,0] ⊕ A[x,1] ⊕ A[x,2] ⊕ A[x,3] ⊕ A[x,4], ∀ x in 0...4 D[x] = C[x - 1] ⊕ ROT(C[x + 1], 1), ∀ x in 0...4 A[x,y] = A[x,y] ⊕ D[x], ∀ (x, y) in (0...4, 0...4)

It is remarkable how much Cryptol can approach the elegance of the former notations:

theta : [5][5][8][8] -> [5][5][8][8]; theta (A) = A' where { /* * Over higher-dimensional sequences the exclusive-or operates * element-wise. Here the innermost elements are lanes [][][z]. * They are processed in sheets by accessing the permutation * state array via the index operator. */ C = (A @ 0) ^ (A @ 1) ^ (A @ 2) ^ (A @ 3) ^ (A @ 4); /* Rotate left by 1 each lane in C */ D = [] # [| splitBy(8, join(C @ x) <<< 1) || x <- [0..4] |]; /* The pivot needs explicit typing, otherwise some index * evaluations would fail. * For instance, x + 4 % 5 = 0, where x is 4 */ A' = [| [| (A @ y @ x) ^ (D @ (((x : [8]) + 1) % 5)) ^ (C @ (((x : [8]) + 4) % 5)) || y <- [0..4] |] || x <- [0..4] |]; };

Computing the intermediate value C couldn't be more sweet. As you have probably noticed, here we are not iterating over the lanes adding their elements modulo-2 - as a straighforward implementation in other languages would have asked. We rather add up directly the sheets. In fact, when used over higher-dimensional sequences the Cryptol exclusive-or operator operates element-wise. That's great!

In Cryptol we construct sequences and interate over them using sequence comprehensions. Sequences are described by drawing and combining elements from other sequences, referred as generators. This is what we do to compute the intermediate variable D and to updated the state A', nesting two sequence comprehensions. The *join* function concatenates the elements of a sequence. With *splitBy* sequence are split n-ways.

As you will find out playing with Cryptol, when coupled with recursive definitions, sequence comprehensions allows to "express sophisticated recurrence relations that apper often in cryptographic algorithms".

## The ρ and π step

With the ρ mapping Keccak aims at providing inter-slice dispersion. This step consists of translations within lanes that are best described by the pseudo-code below:

A[0, 0] = a[0, 0] (x y) = (1 0) for t = 0 to 23 do A[x, y] = ROT(a[x, y], (t + 1)(t + 2)/2) (x y) = ((0 1)(2 3))(x y) end for

The ρ step is coupled with the π mapping to provide long-term diffusion by a transposition of the lanes. The pseudo-code for the π step is:

for x = 0 to 4 do for y = 0 to 4 do (X Y) = ((0 1)(2 3))(x y) A[X, Y ] = a[x, y] end for end for

Both mappings can be specified as:

B[y, 2x + 3y] = ROT(A[x, y], r[x, y]), ∀(x, y) in (0...4, 0...4)

Again, the Cryptol implementation is straightforward:

rho : [5][5][8][8] -> [5][5][8][8]; rho (A) = A' where { A' = [| [| splitBy(8, join(A @ x @ y) <<< (KeccakRhoOffsets@x@y)) || y <- [0..4] |] || x <- [0..4] |]; }; piLookupIndex = [[[0x0 0x0] [0x3 0x0] [0x1 0x0] [0x4 0x0] [0x2 0x0]] [[0x1 0x1] [0x4 0x1] [0x2 0x1] [0x0 0x1] [0x3 0x1]] [[0x2 0x2] [0x0 0x2] [0x3 0x2] [0x1 0x2] [0x4 0x2]] [[0x3 0x3] [0x1 0x3] [0x4 0x3] [0x2 0x3] [0x0 0x3]] [[0x4 0x4] [0x2 0x4] [0x0 0x4] [0x3 0x4] [0x1 0x4]]]; pi : [5][5][8][8]; pi (A) = A' where { S = [] # [| (A @ x @ y) || [x y] <- join(piLookupIndex) |]; A' = groupBy(5, S); };

## The χ mapping

The only nonlinear mapping in Keccak-f, χ has algebraic degree two and is translation-invariant in all directions. It can be seen as the parallel application of 5w S-boxes operating on 5-bit rows, and is defined as follows:

A[x,y] = B[x,y] ⊕ ((NOT B[x+1, y]) AND B[x+2, y]), ∀(x,y) in (0...4,0...4)

Or In pseudo-code:

for y = 0 to 4 do for x = 0 to 4 do A[x, y] = a[x,y] ⊕ ((NOT a[x+1, y]) AND a[x+2, y]) end for end for

We can implement this in Cryptol like this:

/* Arithmetic in Cryptol is modulo the size of the word. * Here we need to explicitly make sure the cursors stay * in range */ chi : [5][5][8][8] -> [5][5][8][8]; chi (A) = [| [| (A @ x @ y) ^ ((~(A @ ((x+1)%5) @ y )) & (A @ ((x+2)%5) @ y)) || y <- [0..4] |] || x <- [0..4] |];

## Iota

With the step , Keccak aims at disrupting symmetries that could be exploited by attackers. At each round round-constants are taken from the ouput of a maximum-length LFSR and added to the state array.

RC = (x^t mod x^8 + x^6+ x^5 + x^4 + 1) mod x in GF(2)[x]

A[0, 0] = A[0, 0] ⊕ RC

In Cryptol we can write:

repl (xs, i, y) = [| if k == i then y else x || k <- [0.. (width xs)] || x <- xs |]; iota : ([5][5][8][8], [8]) -> [5][5][8][8]; iota (A, indexRound) = A' where { S = repl(A@0, 0, (A@0@0) ^ splitBy(8,(KeccakRoundConstants@indexRound))) # A @ 1 # A @ 2 # A @ 3 # A @ 4; A' = groupBy(5, S); };

The generators of the repl sequence comprehension are combined in parallel. Parallel generators are invaluable when we need to implement element-wise operations between sequences. With them we iterate for as many elements exist in the shortest group.

The round constants can be initialized like follows:

type LFSRState = {state : [9]; result : Bit}; LFSR86540 : [9] -> LFSRState; LFSR86540(State) = if (State & 0x01 != 0) then ({state=State'; result=True }) else ({state=State'; result=False} ) where { State' = LFSR86540_Update(State); }; LFSR86540_Update(State) = if (State & 0x80 != 0) then ( /* * Cryptol support polynomial arithmetic over polynomial with * binary coefficients. Keccak make use of a primitive * polynomial over GF(2). Here we specify the polynomial term * inside <| |> brakets and mask the result with 0xff in order * to have a byte-wide state. */ ((State << 1) ^ <| x^8 + x^6 + x^5 + x^4 + 1 |>) & 0xff ) else ( State << 1 ); iterateLFSR(x) = nr where { outs = [ {state = x; result = True; } ] # [| LFSR86540(((prev) : LFSRState ).state) || i <- [0..23], j <- [0..6] || prev <- outs |]; nr = drop(1, outs); }; KeccakInitializeRoundConstants : [24][64]; KeccakInitializeRoundConstants = Constants where { KeccakRoundConstants : [24][64]; KeccakRoundConstants = splitBy(24, 0 : [1536]); W = [ KeccakRoundConstants ] # [| if ((state : LFSRState).result) then repl(prev, i, (prev @ i) ^ (1<<((1<<j) - 1))) else prev || i <- [0..23], j <- [0..6] || prev <- W || state <- iterateLFSR 0x01 |]; Constants = W @ 168; };

Few more remarks on the code above before concluding this first part.

You might have noticed there are multiple generators for few of the sequence comprehensions above. In such a case, the elements produced form the cartesian product of the two sequences.

Modular arithmetic is not the only one to be supported by Cryptol. The Galois language provides support also for polynomial arithmetic over polynomials with binary coefficients. Keccak makes use of a primitive polynomial over GF(2) in the LFSR, used to generate the round constants for the Iota mapping. I believe the notation could hardly have been more elegant. In the implementation of the LFSR, the result is masked with 0xff in order to have byte-wide state.

## Conclusions

I have found the language designed by Galois to be relevant to industrial players. Notational conventions used for earlier cryptographic standards (eg, DES) has shown to be not appropriate for the description of algorithms designed also for fast software implementation. With its DSL, Cryptol takes us closer to the unambiguous specification of cryptographic algorithms and I believe that its expressive power matches very well the beauty of the Keccak's specifications.

In the second part of this series, we will give a closer look to the Hermetic Sponge Strategy and implement the absorbing and squeezing phases of the sponge construction.

Are you eager to see more implementations of current or former SHA-3 candidates in Cryptol? Here are Skein and MD6. Enjoy.

## Comments

## Ketan

Posted 5 months, 24 days ago.

Hello Sir,

I am Ketan Dhimmar doing M.E in "VLSI & ENBEDDED SYSTEM DESIGN" from Gujarat Technological University. Currently, I am doing intern at C-DAC,pune,Maharahshtra and working a project on "sha3(keccak)".. I know the whole algorithm of Sha1 and sha2 but i am facing problems in understanding sha3 algorithm as it is totally different from sha1 and sha2. can you help me how padding process is being done in sha3??? can you help by giving example of any message and how actually it is being digested using sha3 algorithm??? or any reference from where i can get this type of things....I searched a lot but i could not get the things..and yes your article about

"Untwisted: A Cryptol Implementation of Keccak - Part 1" is littel bit easy to understand compare to other sites..

Thanks & Regards,

Ketan Dhimmar.