Case Study #2: SHA-256
I. Introduction
The SHA-2 (Secure Hash Algorithm) is a set of cryptographic hash functions designed by the National Security Agency and SHA-256 is one of these algorithms. These algorithms were defined semi-formally in documents produced by NIST. Cryptographic algorithms are good candidates for hardware implementation because they typically possess components whose performance benefits from hardware acceleration. But they are also prime candidates for formal verification because of their critical role in maintaining security and integrity in a wide variety of systems.
This case study explores the model-driven design and synthesis of a hardware accelerator for the SHA-256 cryptographic hash function. The approach is semantics-driven, starting from a reference semantics for SHA-256. The whole derivation takes place within the Haskell functional language and so the hardware accelerator produced by the ReWire compiler is formally verifiable with respect to the reference semantics using equational reasoning in Haskell.
Overview
The development in this case study proceeds along the same lines as in Case Study #1. That is, first we develop a reference semantics for SHA-256 in Haskell, show how to transform it into a ReWire implementation, and, finally, show how the resulting implementation is synthesized. The code that we make use of in this is as follows:
- ReWirePrelude.hs: some standard definitions.
- Reference256.hs: the reference semantics for SHA-256 in Haskell.
- TestHarness.hs: harness for testing specifications.
- FirstImplementation.hs and SecondImplementation.hs: two derived implementations of SHA256 in Haskell.
- RWFirstImplementation.hs and RWSecondImplementation.hs: two derived implementations of SHA256 in ReWire.
- RWFirstImplementation.vhd and RWSecondImplementation.vhd: two VHDL implementations of SHA256 generated by the ReWire compiler.
- Counter.hs, Globals256.hs, MetaprogrammingRW.hs, and prims.vhd: various and sundry files whose purpose will be explained along the way.
- Tarball of All This Code.
II. Haskell Reference Semantics for SHA-256
The Top Level
The input to the SHA-256 algorithm is assumed to be a sequence of bytes and its output is a digest of
eight 32-bit words. Respectively, these are represented in Haskell as the types [Char]
(i.e., a list of ASCII characters) and Oct Word32
.
A Haskell function implementing this algorithm,
refsha256
, is presented below; its type is [Char] -> Oct Word32
.
The function refsha256
is the composition of functions pad
, sha256
, and runM
which are discussed in more detail below.
refsha256 :: [Char] -> Oct Word32
refsha256 = runM . sha256 . pad
pad :: [Char] -> [Hex Word32]
sha256 :: [Hex Word32] -> M (Oct Word32)
runM :: M a -> a
data Oct a = Oct a a a a a a a a
data Hex a = Hex a a a a a a a a a a a a a a a a
The SHA-256 algorithm first pads the input sequence so that the result has length as a multiple of 512 bits.
The padded input is then parsed into 512 bit blocks,
$M^{(1)},\ldots,M^{(N)}$. Each of these blocks, $M^{(i)}$, is further parsed into sixteen 32-bit words,
$M_0^{(i)},\ldots,M_{15}^{(i)}$ and represented in Haskell with the type Hex Word32
— thus, the type of the pad
function. The code for pad
(as well as for runM
) is found in TestHarness.hs
; their definitions would not be illuminating here.
The function sha256
implements the main loop of the SHA-256 algorithm. It takes the sequence of blocks formatted by pad
(i.e., a list of type Hex Word32
) as input and produces a computation of a digest (i.e., a computation of type M (Oct Word32)
) as output. Here, M
is a monad, which is a construction used for representing side-effects in the pure functional language Haskell. We define M
and its use below.
Testing Against the NIST Standard
The NIST document defining SHA-256 provides three example hash computations against which one may test implementations of the algorithm. The transcript below summarizes the tests applied to refsha256
. The tests apply refsha256
to messages msg1
, msg2
, and msg3
; msg1
is just "abc"
while the other two are too long to print easily here.
ghci> refsha256 msg1
Oct 3128432319 2399260650 1094795486 1571693091
2953011619 2518121116 3021012833 4060091821
ghci> hashed1
Oct 3128432319 2399260650 1094795486 1571693091
2953011619 2518121116 3021012833 4060091821
ghci> refsha256 msg2
Oct 613247585 3523623096 3854575251 205414457
2738676825 1694441831 4142722516 433784513
ghci> hashed2
Oct 613247585 3523623096 3854575251 205414457
2738676825 1694441831 4142722516 433784513
ghci> refsha256 msg3
Oct 3452399196 2568289170 2174863330 2228698727
4051737160 2761367566 74267084 3339791568
ghci> hashed3
Oct 3452399196 2568289170 2174863330 2228698727
4051737160 2761367566 74267084 3339791568
In the above transcript, a test harness is loaded into the GHC (Glasgow Haskell Compiler) interpreter. First the refsha256
is applied, producing the hash digest calculated by the reference semantics. Then, the published hash digests from NIST, Appendix B, are displayed—these are hashed1
, hashed2
, and hashed3
, respectively—and note that they are identical to the corresponding output of refsha256
. By default, these digests are displayed in decimal format rather than the hexadecimal format in which they appear in NIST.
SHA-256 Logical Operators
SHA-256 defines a number of operations on 32-bit words called logical operators in Sections 3.2 and 4.1.2 of NIST. These are excerpted in part below:
Each of these functions transliterates simply into Haskell using built-in functions and data from the Haskell libraries, Data.Bit
and Data.Word
.
Within the code listing below, .&.
, complement
, and xor
are, resp., the bitwise and, complement and exclusive or operations. The rotateR
and shiftR
functions are the rotate right and shift right operations.
ch :: Word32 -> Word32 -> Word32 -> Word32
ch x y z = (x .&. y) `xor` (complement x .&. z)
maj :: Word32 -> Word32 -> Word32 -> Word32
maj x y z = (x .&. y) `xor` (x .&. z) `xor` (y .&. z)
bigsigma0 :: Word32 -> Word32
bigsigma0 x = (rotateR x 2) `xor` (rotateR x 13) `xor` (rotateR x 22)
bigsigma1 :: Word32 -> Word32
bigsigma1 x = (rotateR x 6) `xor` (rotateR x 11) `xor` (rotateR x 25)
sigma0 :: Word32 -> Word32
sigma0 x = (rotateR x 7) `xor` (rotateR x 18) `xor` (shiftR x 3)
sigma1 :: Word32 -> Word32
sigma1 x = (rotateR x 17) `xor` (rotateR x 19) `xor` (shiftR x 10)
Constants and Counters
The inner loop of the SHA-256 algorithm is, in effect, a for-loop counting from $0$ to $63$. To allow expression of this in Haskell, we define a counter type Ctr
below.
It would be possible in the reference semantics to use the built-in Int
type as a loop variable, but, anticipating the shift to hardware implementation, a discrete, finite type like Ctr
makes more sense.
The SHA-256 algorithm also makes use of 64 constants (these are defined in Section 4.2.2 of NIST). We index these constants via the seed
function. This code appears in Counter.hs
.
data Ctr = C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 |
C8 | C9 | C10 | C11 | C12 | C13 | C14 | C15 |
C16 | C17 | C18 | C19 | C20 | C21 | C22 | C23 |
C24 | C25 | C26 | C27 | C28 | C29 | C30 | C31 |
C32 | C33 | C34 | C35 | C36 | C37 | C38 | C39 |
C40 | C41 | C42 | C43 | C44 | C45 | C46 | C47 |
C48 | C49 | C50 | C51 | C52 | C53 | C54 | C55 |
C56 | C57 | C58 | C59 | C60 | C61 | C62 | C63
incCtr :: Ctr -> Ctr
incCtr C0 = C1
...stuff deleted...
incCtr C62 = C63
incCtr C63 = C0
seed :: Ctr -> Word32
seed C0 = 0x428a2f98
...stuff deleted...
seed C63 = 0xc67178f2
Memory Layout of SHA-256
SHA-256 is an imperative algorithm, meaning that it involves assignments to registers, loops, etc. To program in imperative style in Haskell, one normally uses state monads. This section will provide a quick refresher of the fundamentals of monadic programming in Haskell for the sake of being as self-contained as possible. For readers requiring more information, please consult, for example, Liang 1995.
There are four types of storage in the algorithm:
- Intermediate Digest: temporary storage for a digest value; of type
Oct Word32
. - Current Block: the current, formatted block; of type
Hex Word32
. - Digest: the value of the digest; of type
Oct Word32
. - Loop Counter: the current value of the loop counter register of type
Ctr
.
To create storable registers for each of these types, we create a monad M
using the state monad transformer StateT
in the code below. Notice that, for each type of register above, there is a corresponding application of the state monad transformer. This stack of StateT
applications applies to the Identity
monad, which has no registers at all.
type M =
StateT (Oct Word32) -- Intermediate Digest
(StateT (Hex Word32) -- Current Block
(StateT (Oct Word32) -- Digest
(StateT Ctr Identity))) -- Loop Counter
For each register type, there are corresponding read and write operations in M
for manipulating the register. These operations are prepended with get
and put
, respectively, and are defined below.
Paying attention to the types of the operation is the best way to understand them. For example,
to read the current digest value, one uses getDigest :: M (Oct Word32)
.
This type signifies that getDigest
is an M
computation that returns an Oct Word32
value.
To update the current value of the digest register, one uses putDigest :: Oct Word32 -> M()
.
For a given
d :: Oct Word32
, (putDigest d) :: M()
is a computation that sets the current value of the digest register to d
.
getIntDig :: M (Oct Word32) -- Intermediate Digest Operations
getIntDig = get
putIntDig :: Oct Word32 -> M ()
putIntDig = put
getBlock :: M (Hex Word32) -- Current Block Operations
getBlock = lift get
putBlock :: Hex Word32 -> M ()
putBlock = lift . put
getDigest :: M (Oct Word32) -- Digest Operations
getDigest = lift (lift get)
putDigest :: Oct Word32 -> M ()
putDigest = lift . lift . put
getCtr :: M Ctr -- Loop Counter Operations
getCtr = lift (lift (lift get))
putCtr :: Ctr -> M ()
putCtr = lift . lift . lift . put
To chain together these operations, we use Haskell’s do
notation. In the code below, first the current digest is read with getDigest
, then the current intermediate digest is read with getIntDig
, and finally they are added together component-wise and stored back in the digest register using putDigest
. This intermediate
operation is part of the Haskell reference semantics for SHA-256.
intermediate :: M ()
intermediate =
do
Oct h1 h2 h3 h4 h5 h6 h7 h8 <- getDigest
Oct a b c d e f g h <- getIntDig
putDigest (Oct (a+h1) (b+h2) (c+h3) (d+h4)
(e+h5) (f+h6) (g+h7) (h+h8))
There is one more M
operation to describe: return :: a -> M a
. This polymorphic operation takes a value v
of some type a
and produces an M
computation that simply returns v
. The return
operation produces no side effects and is, in some sense, a "do nothing" operation.
Main Function
Below is displayed the Haskell code for the main loop of the SHA-256 algorithm, which is represented by the recursive list function sha256
.
The sha256
function first initializes the digest and then passes the parsed input on to the mainloop
function. The mainloop
function is a straightforward rendering of the pseudo-code defining the main
loop of SHA-256 (see Section 6.2.2 of NIST).
The code for the message scheduling and compression routines can be found in Reference256.hs
.
sha256 :: [Hex Word32] -> M (Oct Word32)
sha256 hws = do
putDigest initialSHA256State
mainloop hws
getDigest
mainloop :: [Hex Word32] -> M ()
mainloop [] = return ()
mainloop (hw32 : hw32s) = do
hi_1 <- getDigest
putIntDig hi_1
putBlock hw32
putCtr C0
innerloop
mainloop hw32s
innerloop :: M ()
innerloop = do
ctr <- getCtr
s <- sched
compress (seed ctr) s
putCtr (incCtr ctr)
case ctr of
C63 -> intermediate
_ -> innerloop
-- SHA-256 message scheduling and compression routines
sched :: M Word32
compress :: Word32 -> Word32 -> M ()
III. The First Implementation
The reference semantics as a whole will not translate directly to hardware: it makes extensive use of list types in Haskell (i.e., list types) that are of potentially unbounded size. Hardware circuits, which require a finite storage footprint, cannot accommodate such types generally. However, large parts of the reference semantics may be re-used in a ReWire specification of a circuit that computes the sha256
function from the end of the previous section, albeit with a different calling convention to eliminate lists.
Hardware Accelerator for SHA-256 in Haskell
The code below presents the input and output signals for the hardware accelerator device; these are the Inp
and Out
types, resp.
The Init
input signal tells the accelerator to initialize the current block and begin a new hash computation. The Load
signal instructs the accelerator to load new input into the current block register. The DigestQ
signal is used to return the computed hash digest.
The DigestR
output signal returns the computed hash digest and the Nix
output signal is the default output of the accelerator.
The code below also presents the definition of the SHA-256 accelerator.
The most important thing to note is that devsha256
reuses most of the code directly from the reference semantics sha256
.
First, devsha256
outputs the Nix
signal, receives an input i
and passes it to dev
. The function dev
handles each Inp
signal as described in the previous paragraph.
data Inp = Init (Hex Word32) | Load (Hex Word32) | DigestQ
data Out = DigestR (Oct Word32) | Nix
type Device = ReacT Inp Out M ()
devsha256 :: Device
devsha256 = do
i <- signal Nix
dev i
dev :: Inp -> Device
dev (Init hw32) = do
lift (do
putDigest initialSHA256State
hi_1 <- getDigest
putIntDig hi_1
putBlock hw32
putCtr C0)
signal Nix
innerloop
dev (Load hw32) = do
lift (do
hi_1 <- getDigest
putIntDig hi_1
putBlock hw32
putCtr C0)
signal Nix
innerloop
dev DigestQ = do
h_n <- lift getDigest
i <- signal (DigestR h_n)
dev i
innerloop :: Device
innerloop = do
ctr <- lift (do
c <- getCtr
s <- sched
compress (seed c) s
putCtr (incCtr c)
return c)
i <- signal Nix
case ctr of
C63 -> do lift intermediate
dev i
_ -> innerloop
Top Level for devsha256
The code below presents the top-level functions for testing the devsha256
accelerator in Haskell. Below, we discuss the calling convention for devsha256
, which is encapsulated as the format
function. The simdev
function takes a Device
and a list of Inp
s and computes the final Out
signal produced by the device on those inputs. The code for both of the functions is found in TestHarness.hs
.
Importantly, the hardware accelerator may be tested in Haskell: using godev256
produces the same results as the refsha256
did in Section II.
godev256 :: [Char] -> Out
godev256 = runM . simdev devsha256 . format . pad
format :: [Hex Word32] -> [Inp]
simdev :: Device -> [Inp] -> M Out
To call the main function from the reference semantics, one simply applies the function sha256
to the list of input blocks.
That is, ${\mathtt{sha256~ [{\mathit{M}}^{(\mathit{1})},\ldots,{\mathit{M}}^{(\mathit{N})}]}}$ computes a digest.
The device version, devsha256
, requires that this input be formatted differently, and this is accomplished with the function, format :: [Hex Word32] -> [Inp]
.
The code for format
is defined in TestHarness.hs
, but the
action of the format
function is described below:
This calling convention is analogous to those found in publications presenting SHA-256 accelerators hand-coded in VHDL (e.g.,
Chaves06,
Algredo13,
Ting02,
McEvoy06,
Garcia14,
Sklavos05, and
Kahri15).
Note that the sixty four DigestQ
signals are used only as padding to wait for the inner loop within the device to finish. The final DigestQ
will result in the completed hash value being returned.
Hardware Accelerator for SHA-256 in ReWire
This section highlights the transformations required of the first implementation (i.e., FirstImplementation.hs
) to produce a ReWire implementation (i.e., RWFirstImplementation.hs
).
These changes proceed along the following lines:
- Recasting the Primitive Types and Operations. We have, heretofore, used the
Word32
type and related operations from the Haskell librariesData.Bit
andData.Word
. To produce a synthesizable implementation, we implement our own type,W32
, and related operations. In this part, we see how to link a ReWire type to an implementation in VHDL using the ReWire keywordnativeVhdl
. - Eliminating
type
synonyms. This mainly involves eliminating the use ofM
. - Adding a
start
symbol. This will involve theextrude
primitive used whenever devices have internal state (i.e., use the state monad transformer).
Recasting the Primitive Types and Operations
Up to this point, we have used the Word32
type provided by the Haskell libraries, Data.Bit
and Data.Word
. This has been very convenient for the rapid prototyping of the SHA-256 algorithm in Haskell, because those libraries have nice syntax for 32-bit words (e.g., 0x0
) and common operations (e.g., +
and .&.
) are already defined. In this section, we describe the conversion of the specification from Word32
to the W32
type provided in the ReWire prelude (i.e., ReWirePrelude.hs
). This conversion is necessary to transform the SHA-256 specification into proper ReWire.
Constants
With the Haskell libraries Data.Bit
and Data.Word
, one can write hexadecimal constants for 32 bit words by prepending the constant with 0x
, as in the following code we have already seen:
seed :: Ctr -> Word32
seed C0 = 0x428a2f98
seed C1 = 0x71374491
seed C2 = 0xb5c0fbcf
...
This syntax for constants is not available from the ReWire front end, and so each constant required by the SHA-256 algorithm must be defined as a value of type W32
, as in the following code (from MetaprogrammingRW.hs
).
w428a2f98 :: W32
w428a2f98 = W32 Zero One Zero Zero Zero Zero One Zero One Zero Zero Zero One Zero One Zero Zero Zero One Zero One One One One One Zero Zero One One Zero Zero Zero
While this is a bit tedious to perform these instantiations, the burden was alleviated somewhat by Haskell code to generate them — see the commented code in MetaprogrammingRW.hs
. Once all the necessary constants have been defined, the Word32
constants were replaced with their corresponding W32
constants; please note the change of type declarations as well.
seed :: Ctr -> W32
seed C0 = w428a2f98
seed C1 = w71374491
seed C2 = wb5c0fbcf
...
Native VHDL Implementations of W32
operations
The Data.Bit
and Data.Word
Haskell libraries also provided implementations for standard operations: addition (+
), bitwise and (.&.
), bitwise exclusive or (xor
), and bitwise complement (complement
) as well as rotate and shift operations.
These we replace, respectively, by w32Plus
, w32And
, w32Xor
, and w32Not
. The rotate and shift operations are replaced by operations from ReWirePrelude.hs
. So, for example, the code for the SHA-256 logical operators now becomes the following:
ch :: W32 -> W32 -> W32 -> W32
ch x y z = (x `w32And` y) `w32Xor` (w32Not x `w32And` z)
maj :: W32 -> W32 -> W32 -> W32
maj x y z = (x `w32And` y) `w32Xor` (x `w32And` z) `w32Xor` (y `w32And` z)
bigsigma0 :: W32 -> W32
bigsigma0 x = (rotateR2 x) `w32Xor` (rotateR13 x) `w32Xor` (rotateR22 x)
bigsigma1 :: W32 -> W32
bigsigma1 x = (rotateR6 x) `w32Xor` (rotateR11 x) `w32Xor` (rotateR25 x)
sigma0 :: W32 -> W32
sigma0 x = (rotateR7 x) `w32Xor` (rotateR18 x) `w32Xor` (shiftR3 x)
sigma1 :: W32 -> W32
sigma1 x = (rotateR17 x) `w32Xor` (rotateR19 x) `w32Xor` (shiftR10 x)
The question now remains as to how we implement w32Plus
, w32And
, w32Xor
, w32Not
, and incCtr
? For these operations, we choose to implement them directly in VHDL. Accomplishing this in ReWire comes in two parts: writing the type signatures of the operations using nativeVhdl
and providing VHDL implementations of the operations in a separate file.
The following shows the first step (in RWFirstImplementation.hs
):
w32Plus :: W32 -> W32 -> W32
{-# INLINE w32Plus #-}
w32Plus = nativeVhdl "w32Plus" w32Plus
w32Xor :: W32 -> W32 -> W32
{-# INLINE w32Xor #-}
w32Xor = nativeVhdl "w32Xor" w32Xor
w32And :: W32 -> W32 -> W32
{-# INLINE w32And #-}
w32And = nativeVhdl "w32And" w32And
w32Not :: W32 -> W32
{-# INLINE w32Not #-}
w32Not = nativeVhdl "w32Not" w32Not
incCtr :: Ctr -> Ctr
{-# INLINE incCtr #-}
incCtr = nativeVhdl "incCtr" incCtr
Above, the nativeVhdl
keyword tells the ReWire compiler that it will find the implementations of these operations in a separate VHDL file. The second step defines each operation, as in the following file, prims.vhd
:
Eliminating type
synonyms
There are two type
synonyms used in FirstImplementation.hs
; these are:
type M = StateT (Oct Word32)
(StateT (Hex Word32)
(StateT (Oct Word32)
(StateT Ctr Identity)))
type Dev i o = ReacT i o M
This means that, in RWFirstImplementation.hs
, each occurrence of M
is replaced by:
StT (Oct Word32) (StT (Hex Word32) (StT (Oct Word32) (StT Ctr I)))
and each occurrence of Dev Inp Out
is replaced by
ReT Inp Out (StT (Oct Word32) (StT (Hex Word32) (StT (Oct Word32) (StT Ctr I))))
There are three small syntactic changes to note. In ReWire, the built-in type constructors for the state and reactive resumption monad transformers are StT
and ReT
(instead of StateT
and ReacT
). Also, as we have seen before, the built-in contructor for the identity monad in ReWire is I
rather than Identity
. Eliminating this distinction without a difference is another example of a small tweek to the language which is on our to-do list.
Adding a start
symbol
The device specification in RWFirstImplementation.hs
is identical to that of FirstImplementation.hs
, although we affix an ` to it for ease of presentation:
devsha256' :: ReT Inp Out (StT (Oct W32) (StT (Hex W32) (StT (Oct W32) (StT Ctr I)))) ()
devsha256' = signal Nix >>= \ i -> dev i
This subsection illustrates how we produce a declaration compilable to VHDL from this.
The ReWire compiler can compile device expressions of type ReT i o I t
for appropriate input, output, and return types, i
, o
, and t
. So, clearly, some function must be applied to devsha256
to produce something of the right type for the compiler. ReWire has a built-in function for doing just that called extrude
.
The ReWire built-in function extrude
is used to provide a device defined with StT s
with an initial state of type s
. For any monad m
, it has type:
extrude :: ReT i o (StT s m) a -> s -> ReT i o m (a,s)
In a call (extrude dev s0)
, extrude
, in effect, initializes the s
-state to s0
in device dev
.
By using multiple applications of extrude
, we can initialize all the states in a device specification. For devsha256'
, this will require four applications of extrude
. Therefore, we can define the start
symbol as:
start :: ReT Inp Out I (((((), Oct W32), Hex W32), Oct W32), Ctr)
start = extrude
(extrude
(extrude
(extrude
devsha256'
(Oct w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000))
(Hex w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000))
(Oct w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000 w00000000))
C0
In the above definition, we have initialize the intermediate digest register, the current block register, the digest register, and the counter register to appropriate forms of 0. Observe that start
now possesses a type that the ReWire compiler can translate to VHDL.
IV. Second Implementation: Refactoring for Implementation
Recall that the inputs to the devsha256
are 512 bits wide—i.e., a Hex Word32
value requires 512 bits ($= 16 \times 32)$ to represent. Similarly, the Oct Word32
outputs of devsha256
require 256 ($= 8 \times 32$) bits to represent.
Consequently, we would need an FPGA with at least 768 (= 512+256) I/O pins in order to synthesize devsha256 directly; this is an unrealistically large number.
To bring the pin count down, we can refactor the devsha256 specification to marshal and unmarshal the input and output 64 bits at a time. If we want to target a smaller FPGA, then we can refactor the devsha256'
specification to marshal and unmarshal the input and output to and from the board, thereby cutting the number of required I/O pins.
Marshaling and Unmarshaling
The code below displays the new definitions of the Inp
and Out
data types to accomplish the marshaling and unmarshaling of data on and off the targeted board. Each load signal (i.e., either Init
or Load0
through Load7
) contains two 32 bit words, so that words are loaded two at a time onto the targeted board. Each output digest signal (i.e., DigestR
) off loads two 32 bit words per cycle, so that it takes four digest request signals (i.e., DigestQ0
through DigestQ3
) to output one digest. The calling convention for the new marshaling accelerator device changes as well, and it is defined similarly to the format
function from the previous section (and can be found in TestHarness.hs
). The code for the refactored device is found in RWSecondImplementation.hs
.
data Inp = Init Word32 Word32 | Load0 Word32 Word32 |
Load1 Word32 Word32 | Load2 Word32 Word32 |
Load3 Word32 Word32 | Load4 Word32 Word32 |
Load5 Word32 Word32 | Load6 Word32 Word32 |
Load7 Word32 Word32 |
DigestQ0 | DigestQ1 | DigestQ2 | DigestQ3
data Out = DigestR Word32 Word32 | Nix
The following is the new implementation, devsha256''
that marshals and unmarshals data blocks off of the circuit. The loading and digest functions (i.e., load0
, digest0
, etc.) can be found in the ReWire specification RWSecondImplementation.hs
.
devsha256'' :: ReT Inp Out (StT (Oct W32) (StT (Hex W32) (StT (Oct W32) (StT Ctr I)))) ()
devsha256'' = do i <- signal Nix
dev i
dev :: Inp -> ReT Inp Out (StT (Oct W32) (StT (Hex W32) (StT (Oct W32) (StT Ctr I)))) ()
dev (Init w1 w2) = do
lift (do
putDigest initialSHA256State
hw <- getBlock
putBlock (load0 w1 w2 hw))
i <- signal Nix
dev i
dev (Load0 w1 w2) = do
lift (do
hw <- getBlock
putBlock (load0 w1 w2 hw))
i <- signal Nix
dev i
...stuff deleted...
dev (Load7 w1 w2) = do
lift (do
putCtr C0
hi_1 <- getDigest
putIntDig hi_1
hw <- getBlock
putBlock (load7 w1 w2 hw))
signal Nix
innerloop
dev DigestQ0 = do
h_n <- lift getDigest
i <- signal (digest0 h_n)
dev i
...stuff deleted...
dev DigestQ3 = do
h_n <- lift getDigest
i <- signal (digest3 h_n)
dev i
Synthesis
The devsha256''
accelerator has been tested in Haskell to show that it computes the same hashes as the reference semantics.
Using the ReWire compiler, we can produce synthesizable VHDL and synthesize to a FPGA target — here we target the Xilinx Spartan-3E XC3S500E-4FG320 featured on the Xilinx Spartan-3E Starter Kit.
Note that this board possesses sufficient I/O pins to accommodate devsha256''
.
The table below presents the synthesis numbers for devsha256''
for the Xilinx ISE toolset targeting the Spartan-3E. The maximum clock rate was 60 MHz.
There are 76 cycles per output, and so the total throughput for devsha256''
is calculated as:
This throughput for the devsha256''
accelerator is on the low end of the performance spectrum of the hand-crafted VHDL implementations reported in Chaves06 — higher than Sklavos and Koufopavlou Sklavos05 (376 Mbps) and lower than Chaves et al. (1370 Mbps).
The aforementioned prior works are not targeting the same FPGA, and so the above is something of a sort of "apples-to-oranges" comparison.
The performance numbers should be taken merely as indicative that performance is reasonable for a rapidly developed prototype.