fpga accelerated search of cbg patterns in dna strings · the project focuses on approximate...

FPGA accelerated search of CBGpatterns in DNA strings

Giuseppe Li

4th Year Project ReportArtificial Intelligence and Computer Science

School of InformaticsUniversity of Edinburgh

2018

iii

AbstractThe advancing technologies in bioinformatics, such as the completion of the HumanGenome project, have led to the increased amount of data biologists have to process intheir research. DNA pattern matching, the problem of finding a certain pattern within along sequence of DNA, is a costly operation frequently required in research, especiallywhen matching is approximate. This project proposes a programmable logic designthat efficiently performs approximate pattern matching of CBG patterns. The strengthof the approach rely on the reconfigurability of an FPGA, allowing the creation ofcircuits tailored to the specific search operation, and parallelism level achievable bythe logic blocks’ fabric. Good theoretical results are shown, showing the potentials ofthe taken approach.

iv

Acknowledgements

My first thanks go to Nigel Topham. His course on Computer Design has been a majorfactor in inspiring me to expand my knowledge in computer system architectures, aninterest I hope to continue in my further studies and career. Ultimately this has ledto my decision in undertaking a project involving FPGAs and I am truly honoured tohave Prof. Topham as supervisor of this project.

I want to thank my classmates and friends for supporting me in the past four yearsof my life. I find amazing how fast time has past, four years in which I have madeso many memories. As most of us will be leaving Edinburgh, I wish the best for thefuture and hope we will come together one day to recall probably the best years of ourlives.

I am thankful towards the University for providing me with the education and op-portunities such as attending Robocup 2015 as part of Edinferno (Institute of Percep-tion, Action and Behaviour).

Finally I would like to dedicate my work to my grandfather Fulai, who sinceJanuary 2018 developed an unrecoverable paraplegia paralysis, and my grandmotherRuyao, who in December 2017 sustained a severe head injury. They were the only toraise me up and always encouraged me to do my best in my studies. Since childhoodmy grandfather has been telling me he wanted to see me at a graduation ceremonybefore passing away and this means a lot for me. I wished you both were coming toScotland this summer to see me. I hope I make you proud.

I truly appreciate the support of my parents and family in these hard times, whomade so many sacrifices in order for me to continue my fourth year at university.

v

Declaration

I declare that this thesis was composed by myself, that the work contained herein ismy own except where explicitly stated otherwise in the text, and that this work has notbeen submitted for any other degree or professional qualification except as specified.

(Giuseppe Li)

Table of Contents

1 Introduction 1

2 Background 32.1 Bioinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Approximate pattern matching . . . . . . . . . . . . . . . . . . . . . 32.3 Character Classes and Bounded Gaps Pattern Matching . . . . . . . . 52.4 Field Programmable Gate Array . . . . . . . . . . . . . . . . . . . . 52.5 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Platform design 93.1 FPGA modules design . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Comparison Logic Unit module . . . . . . . . . . . . . . . . 103.1.2 CLU complex module . . . . . . . . . . . . . . . . . . . . . 113.1.3 HDL generation . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Support applications . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2.1 DNA string encoding . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Results decoding . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Hardware platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3.1 Zynq-7000 ARM/FPGA SoC board . . . . . . . . . . . . . . 13

3.4 Standalone operation . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Proof of concept implementation 174.1 Software tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Hardware platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Pmod SSD . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 FPGA modules design . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.3.1 CLU module . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3.2 CLU complex module . . . . . . . . . . . . . . . . . . . . . 184.3.3 String buffers module . . . . . . . . . . . . . . . . . . . . . . 194.3.4 SSD driver module . . . . . . . . . . . . . . . . . . . . . . . 204.3.5 SSD wrapper module . . . . . . . . . . . . . . . . . . . . . . 204.3.6 Top module module . . . . . . . . . . . . . . . . . . . . . . 214.3.7 SystemVerilog . . . . . . . . . . . . . . . . . . . . . . . . . 224.3.8 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 Support applications . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4.1 DNA string encoder string to bin.py . . . . . . . . . . . . 22

vii

viii TABLE OF CONTENTS

4.4.2 Results decoder decoder.py . . . . . . . . . . . . . . . . . . 224.4.3 Random DNA string generator rand dna gen.py . . . . . . . 234.4.4 System deployment make sys.py . . . . . . . . . . . . . . . 234.4.5 Vivado TCL script setup.tcl . . . . . . . . . . . . . . . . . 23

4.5 User guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.5.2 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Experimental methodology 275.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1.1 Regex implementation regex.py . . . . . . . . . . . . . . . 275.1.2 CBG pattern search implementation search.py . . . . . . . 275.1.3 Computer system specifications . . . . . . . . . . . . . . . . 28

5.2 Simulated testbench implementation . . . . . . . . . . . . . . . . . . 285.2.1 Testbench runtime estimation . . . . . . . . . . . . . . . . . 28

5.3 Runtime benchmark scenarios . . . . . . . . . . . . . . . . . . . . . 29

6 Results 316.1 Runtime benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Proof of concept resource utilisation . . . . . . . . . . . . . . . . . . 31

6.2.1 CLB utilisation . . . . . . . . . . . . . . . . . . . . . . . . . 326.2.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.2.3 Power usage . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Discussion 337.1 Zynq AP SoC PS-PL implementation . . . . . . . . . . . . . . . . . 337.2 Proof of concept limitations . . . . . . . . . . . . . . . . . . . . . . . 337.3 Vivado Simulation tool . . . . . . . . . . . . . . . . . . . . . . . . . 337.4 Vivado 2015.2 instabilities . . . . . . . . . . . . . . . . . . . . . . . 347.5 Design adaptability and scalability . . . . . . . . . . . . . . . . . . . 34

8 Conclusion 35

Bibliography 37

Chapter 1

Introduction

This project includes the following contributions:

• The automated generation of a customised task specific combinatorial circuitwhich performs multiple CBG pattern matching operation on multiple sectionsof the searched string concurrently.

• Overview of a potential system design developed for the Zynq AP SoC. Thesystem contains the aforementioned combinatorial circuit, moreover the solutionis designed to provide a streamlined solution to the user.

• An implementation of a system on the Zynq’s Programmable Logic sub-architecturethat features the aforementioned circuit and can be deployed by a ”one-click” so-lution using Python scripts.

• Performance analysis of a theoretical testbench featuring the circuit design iscarried out, taking consideration of some equivalent software solutions and theirbenchmarked performance.

• Analysis of FPGA resources utilisation by a large system deployed using the”one-click” solution.

1

Chapter 2

Background

2.1 Bioinformatics

Since early 1970s [1], bioinformatics aids many aspects of biological research involv-ing genomics and proteomics, the large-scale study of genomes and proteins respec-tively. Researchers use pattern matching to determine the presence of a particularsequence, such as genes or proteins, within a usually long string, such as a genomeor a proteome. Further applications of pattern matching in biology include correlat-ing DNA or RNA sequences with proteins and bioengineering [2]. Researchers tendto avoid exact pattern matching in bioinformatics, as DNA sequences and proteinsdiverge through mutation or evolution [3].

The project focuses on approximate matching between DNA strings. Deoxyri-bonucleic acid (DNA) is a chain of organic molecules called nucleotides. DNA storesbiological information that allow the production of proteins amongst other vital func-tions. A gene is a sequence of DNA which codes for a molecule that has a function.Nucleotides are usually classified by a substructure called nucleobase. In DNA thereare four kinds of nucleobases: adenine (A), cytosine (C), guanine (G), and thymine(T). Hence DNA sequences are described by a string of characters classifying its nu-cleotides in order. Figure 2.1 provides an example of such sequence.

ATGACGTGGGGA

Figure 2.1: Example of a DNA polynucleotide

DNA strings feature single-nucleotide polymorphisms (SNPs) [4]. These are vari-ations of a single nucleotide that occur at a specific position in the genome. The designof an approximate matching algorithm should account for the presence of SNPs.

2.2 Approximate pattern matching

The problem of approximate matching can been defined in multiple ways. The com-parison of two strings can be easily measured by their Hamming distance, which is thenumber of differing characters without insertions or deletions (indels). There are also

3

4 Chapter 2. Background

algorithms that use dynamic programming (DP) techniques that accounts for indels[5]. DP is a method for solving complex problems by splitting the task into smallersub problems and computing the final result from a combination of those solutions.Over the past 40 years many DP AM algorithms have been studied and developed foruse in bioinformatics [6]. The Needleman-Wunsch (NW) algorithm and variations,such as the Smith-Waterman (SW) technique for local alignment, present two stringsas a 2D array indexed by the character positions. For each element in the array a scoreis computed which heuristically determines the goodness of a match between two char-acters, one from each string. The solution aims to return an optimal path across thearray that minimises or maximises the overall score. Figure 2.2 shows the comparisonof two DNA sequences and the solution obtained by the NW algorithm.

GCGATcT-

GC-ATtTA

Figure 2.2: Example of sequence alignment and respective NW solution. Figure takenfrom [6].

DP AM solutions have proved effective in sequence alignment for bioinformat-ics [7]. The importance of parallel computing has often been exploited to achievegreater performance. Researchers at the University of Edinburgh in 1987 [8] imple-mented a string matching algorithm based on SW on the I.C.L. DAP, a large 64× 64processor-array machine. Similarly, experiments of the SW technique conducted byresearchers at Boston University [6] reported speed-ups from 180 to 500 times on aXilinx XC2VP50-5 FPGA over a high-end PC from 2004 across various settings andapplications.

There have been developed alternative approximate matching algorithms that arecapable of optimising the search within longer string [9, 10, 11, 12, 13, 14, 15, 16, 17,18]. These methods are not accelerated by parallel computing but achieve performanceimprovements usually via heuristic filtering during runtime and/or data preprocessing.

2.3. Character Classes and Bounded Gaps Pattern Matching 5

2.3 Character Classes and Bounded Gaps Pattern Match-ing

In this project I approach the task of approximate matching by representing patterns asClasses of characters and Bounded size Gaps (CBGs). Figure 2.3 shows an exampleof a protein CBG pattern. Square brackets [ ] indicate a character corresponding toany of the included letters, and X(2,3) a gap of length between 2 and 3.

[RK]--X(2,3)--[DE]--X(2,3)--Y

Figure 2.3: Example of a CBG pattern of a PROSITE protein site (taken from [19, 20])

The CBG pattern given in the example would correspond to the following regularexpression: (R|K) ·X ·X · (X |ε) · (D|E) ·X ·X · (X |ε) ·Y .

Navarro in [19] introduces CBG searching algorithms as methods to efficientlysearch protein patterns. The paper proposes CBGs as a more appropriate representationfor PROSITE protein patterns compared to regular expression. In fact, CBG is a lesssophisticated representation of regular expressions as CBG only contains a limitedsubset of equivalent operators in RE. CBG matching algorithms are developed to bemuch simpler and faster than all the RE search techniques according to Navarro [19].

2.4 Field Programmable Gate Array

Field Programmable Gate Arrays (FPGAs) are a form of programmable logic chips.An FPGA features an integrated circuit designed to be configured after manufacturing.

The architecture of FPGAs consists of an array of Configurable Logic Blocks(CLBs) and an interconnect infrastructure. CLBs contain a small set of Look Up Ta-bles (LUTs) and D-type flip-flop memories. LUT is an efficient way to encode Booleanlogic functions. Multiple CLBs can hence be interconnected to perform complex com-binational functions.

The behaviour of an FPGA is defined by the user using a Hardware DescriptionLanguage (HDL). This provides a high level description of the structure and behaviourof an electronic circuit. The HDL description is then synthesised to produce a netlist,a low level specification of electronic components and their connections. Similarly toa compiler for software programming languages, synthesis may perform optimisationroutines over the HDL description intended to maintain the behaviour defined by theuser. The netlist can be then implemented onto the specific FPGA via a place-and-routesoftware which, as the name suggests, selects, programmes and connects the CLBsaccording to the requested configuration and the optimal routing for the interconnect.

The level of parallelism achievable via the large number of CLBs and their repro-grammability are the properties that make an FPGA adapt for the acceleration of CBGpattern matching in my project.

6 Chapter 2. Background

2.5 Previous work

This project uses several concepts introduced by A. Lipson and S. Hazelhurst in [20].The problem being addressed by my project is defined by Lipson and Hazelhurst as(Definition 1 of [20]):

Given a sequence of characters s = s0s1s2...sn−1 to be searched for a pat-tern, a pattern p = p0 p1...pm−1 (where m� n) and a gap parameter g, pmatches s at position k if:

• For i = 0, ...,m−1 there exists ji such that pi = s ji;

• k = j0 < j1 < ... < jm−1 ≤ k+(m+g).

The problem is to find the positions where the pattern matches the sequence.The solution proposed in [20] and one of the key contributions is the generation of

a combinational circuit Cm,g with a high degree of parallelism to do the matching. Thecircuit evaluates, in fact, all possible CBG patterns given m and g. For example, figure2.4 includes all possible CBG patterns for a circuit given m = 3 and g = 1.

[p0]−−[p1]−−[p2]−−X(1)[p0]−−[p1]−−X(1)−−[p2][p0]−−X(1)−−[p1]−−[p2]X(1)−−[p0]−−[p1]−−[p2]

Figure 2.4: Possible CBG patterns for C3,1

Given the pattern and the sequence, the CBG patterns can be used to construct aboolean expression. Figure 2.5 represents the DNF expression for the example pre-sented earlier.

b0 = p0∧b1 = p1∧b2 = p2∨b0 = p0∧b1 = p1∧b3 = p2∨b0 = p0∧b2 = p1∧b3 = p2∨b1 = p0∧b2 = p1∧b3 = p2

Figure 2.5: DNF expression for C3,1

The authors create an algorithm that generates Binary Decision Diagrams (BDDs)[21] representing the expression. The algorithm is implemented in the functional lan-guage FL and relies on recursion. FL is suited for generating BDDs as, internally, FLrepresents all boolean expressions as BDDs [21].

The generated BDD can then be converted into a VHDL description, which is thengiven as input to the FPGA design tools.

Their proposed solution works as follows [20]:

2.5. Previous work 7

• Produce a circuit Cp,g that has a buffer of size m+ g characters. ...The strength of our approach is that for each pattern being searched,we build a tailor-made circuit for the matching.

• Feed the first m+g characters into the circuit.

• Repeat the following on each clock cycle: The circuit detects whetherthe pattern (with possible g gaps) exists in the buffer and outputs ananswer. The next character from the sequence is fed into the circuit.

Lipson and Hazelhurst propose an extension to the BDD generating algorithmwhich creates circuits allowing parallel search of r positions for a higher degree ofparallelism. The experiments however showed that the BDD size grows superlinearlyin respect to the repeat factor r.

The paper includes the recursive method in FL used to generate BDDs for a basiccircuit (when repeat factor r is 1). It does not include any other code or pseudo-code.

In this project, the following issues listed by the authors in their conclusion arepursued:

• Improving the techniques for generating VHDL code from the BDDs.The technique we used was very simplistic, and so we believe im-provements can be made by using more intelligent approaches, anda combination of approaches.

• Exploiting higher levels of parallelism.

Chapter 3

Platform design

As mentioned in Chapter 1, the goal of the project is to develop a system capable ofaccelerating the search of CBG patterns in a long DNA string. Specifically, the solutionin this project aims to provide an easy and streamlined work-flow for a user who maynot be necessarily familiar with FPGAs or computer programming. The design shouldalso be modular and scalable where possible.

3.1 FPGA modules design

The proposed solution is a combinational circuit with a greater degree of parallelismcompared to the solution described in [20] by Lipson and Hazelhurst. The key differ-ences and contributions are:

• Comparison Logic Unit (CLU): A CLU is a module representing a circuit anal-ogous to the solution proposed by [20]. While representing equivalent booleanexpressions, the HDL description of a CLU is represented by a Disjunctive Nor-mal Form (DNF) expression rather than BDDs.

• CLU complex: the module contains multiple independent and identical CLUs.This allows multiple positions to be evaluated for matches within the same com-parison cycle. The analogous solution proposed in [20], which involved theexpansion of the BDD circuit to accommodate r positions for concurrent eval-uation, demonstrated to grow superlinearly in terms of BDD size. The solutionproposed in this project features linear size growth in regards to the repeat factorr.

Given a string s, a pattern p of length m and an integer gap parameter g, the pro-posed solution works as follows:

• The system produces r comparison logic units (CLU) Cm,g each featuring aninput of size m+g characters. The repeat factor r may be limited by the availableresources and specification of the FPGA.

• Feed the first m+ g+(r− 1) characters into the string buffer. From the buffer,each CLU compares an overlapping sequence of characters of length m+g.

• On each clock cycle, feed the next r characters into the string buffer.

9

10 Chapter 3. Platform design

3.1.1 Comparison Logic Unit module

The CLU Cm,g is a combinational circuit evaluating concurrently all possible CBGpatterns on a segment of the searched string for the given pattern and gap parameter.

• MODULE PARAMETERS:

– m: Length of the pattern string.

– g: Search gap parameter.

• INPUTS:

– STR: A segment of the searched string of length m+g.

– PAT: The pattern string.

• OUTPUTS:

– OUT: One bit wire representing any match of PAT in STR.

The algorithm for constructing such circuit uses a different approach compared tothe recursive method described by [20]. The problem of generating all possible CBGpatterns can been rephrased as: given a pattern of length m, in how many ways is itpossible to insert g gaps on the pattern?

Furthermore, the same problem can be approached as: given a string of lengthm+g, in how many ways is it possible to choose g indices in the string to be assignedas gaps?

The algorithm uses this last approach to generate the CBG patterns. The solutioncan be summarised as the follow:

• Given a list of indices for a string of length m+g, construct a list of combinationsfor each of the

(m+gg

)possible ways for choosing g indices from the string.

• Iteratively for each set over the list of combinations, a substring is constructed byremoving the characters indexed by the items of the set from the string. A CBGpattern match is evaluated when the substring equates the pattern string. Thisis represented as a conjunctive clause of comparisons between the characters ofthe substring and the respective characters in the pattern.

• The CLU output is represented by the disjunction of all clauses constructed bythe previous step. Therefore, the circuit output is evaluated by a boolean expres-sion in DNF.

A design strength of [20] is the compactness of the BDD representation. In factBDD takes advantage of the fact that many of the CBG patterns have significant shar-ing.

While a DNF representation may not be the most efficient representation of a CLU,many HDL synthesis softwares, including Vivado Design Suite [22], perform hierar-chical optimisation during LUT mapping, resulting in a more efficient use of CLBs inthe FPGA fabric. The hierarchical structure of a CLU netlist can be observed in figure3.1. The circuits synthesised by a boolean expression represented as BDD and DNFshould hence be equivalent.

3.1. FPGA modules design 11

Figure 3.1: Schematic of a netlist of a CLU constructed by parameters m = 3 andg = 1.

3.1.2 CLU complex module

The CLU complex is a combinational circuit consisting of multiple CLUs each per-forming concurrent evaluations of different overlapping segments of the searched string.




– r: Repeat factor (number of instantiated CLUs).

• INPUTS:

– STR: A segment of the searched string of length m+g+ r−1.


• OUTPUTS:

– OUT: A r wide bus representing the outputs of each CLU.

The module instantiates multiple CLUs, assigning each of them a segment of lengthm+g of STR as input and a wire of OUT as output. Figure 3.2 represents a CLU complexconstructed by parameters m = 3, g = 1 and r = 3.

As an advantage to the repeat factor approach by [20], a CLU complex is linearlyscalable, allowing better resource utilisation on an FPGA. Given the size of modernFPGAs, the CLU complex solution can potentially provide much greater parallelismcompared to the circuit designed in [20].

3.1.3 HDL generation

The design of the CLU and CLU complex modules are tailored to the parameters givenfor the search task. For large values of m and g, the writing of the entire DNF ex-pression for Cm,g can be tedious. For example, C10,5 the representing DNF contains(10+5

5

)= 3003 conjunctive clauses, each containing 10×2 literals.


Figure 3.2: Schematic of a netlist of a CLU complex constructed by parameters m = 3,g = 1 and r = 3.

Hence the solution proposed in this project aims to automate the process of gener-ating the FPGA modules.

3.2 Support applications

3.2.1 DNA string encoding

In order to more efficiently utilise the FPGA resources, the sequences of DNA areencoded to an alternative alphabet. Given that the original character set contains 4elements, the DNA sequence can be defined by a string of 2-bit characters. Hence weencode:

• A→ 00

• C→ 01

• G→ 10

• T → 11

3.3. Hardware platform 13

3.2.2 Results decoding

The output of the CLU complex shall be stored at each comparison cycle. The resultsonly indicate that at comparison cycle x, the yth CLU indicated a match. To allow theuser to obtain the results in a readable format, a results decoding tool is implemented.

The programme shall return to the user the position of k of the searched stringswhere the pattern was matched, given the number of cycle x and the respective state ofthe CLU complex output.

3.3 Hardware platform

For the completion of this project I am provided with a Zybo (ZynqTMBoard) manu-factured by Digilent Inc.

Figure 3.3: Digilent ZYBO [23].

3.3.1 Zynq-7000 ARM/FPGA SoC board

The Zybo features a Xilinx Z-7010 based on the Xilinx R©All Programmable System-on-Chip architecture (AP SoC), which incorporates a dual-core ARM Cortex-A9 pro-cessor with a Xilinx 7-series Artix equivalent FPGA [24]. The main features of theZybo relevant to this project are:

• 650Mhz dual-core Cortex-A9 processor

• 512MB of DDR3 memory with 1050Mbps bandwidth

• High-bandwidth SDIO controller


• Reprogrammable logic equivalent to Artix-7 FPGA

– 4,400 logic slices, each with four 6-input LUTs and 8 flip-flops

– 240 KB of fast block RAM

– Two clock management tiles, each with a phase-locked loop (PLL) andmixed-mode clock manager (MMCM)

– Internal clock speeds exceeding 450MHz

The Zynq AP SoC is divided into two distinct subsystems: the Processing System(PS) and Programmable Logic (PL) as shown in figure 3.4.

Figure 3.4: Zynq AP SoC architecture [24].

The PS interfaces with the PL via the Advanced Microcontroller Bus Architec-ture (AMBA) Interconnect. The PL can be implemented to trigger interrupts to theprocessor and perform DMA accesses to the onboard DDR3 memory [24].

Unlike standalone FPGAs, the PL, which contains a Xilinx 7-series Artix FPGA,must be configured by the PS processor or via the JTAG port upon system boot. Oneof the relevant features in the project design is the capability to load the PL bitstreamfrom a Zynq Boot Image on an SD card.

3.4. Standalone operation 15

3.4 Standalone operation

The following design did not get implemented due to reasons I explain in Chapter 7.Hence, this is an overview of a system that was originally envisioned to be imple-mented on a ZYBO. This design takes advantage of the advanced features provided bythe SoC.

The standalone design would allow the ZYBO to perform the search task withoutuser intervention during runtime and also independence from a computer provided a5V power source.

The following is an outline of the operations performed by the PS and PL duringruntime.

• PS tasks:

– Handles the configuration of the PL upon boot.

– Loads the encoded string and pattern files from SD card into the DDRRAM. These memory locations will be directly accessed by the PL.

– Generates a ”PS ready” signal destined for the PL. This is required in orderto keep track of the section of the string currently being matched. Thesignal is issued after each time the string position counter (SPC) increases.

– Takes a bus (or a representation of a bus) from the PL as input in whicheach wire indicates the match state of every CLU in the PL. If any of thewires are positive, PS stores the current SPC value and the correspondingCLU indices into a linked list in system memory (on-chip or DDR).

– At the completion of the string matching, save the results from memoryonto a file on the SD card.

• PL tasks:

– Contains the CLU complex circuitry.

– Contains two buffers, one string buffer (SB) which handles the stream ofthe string and the other static buffer containing the pattern.

– Issues requests to the Direct Memory Access (DMA) controller of the DDRmemory for loading the buffers.

– At each ”PS ready” signal, loads r characters from the onboard RAM intoSB.

3.4.1 Challenges

Here are enlisted some of the initial challenges that require to be addressed in order toimplement the standalone design described above.


3.4.1.1 PS programming

Using the Xilinx Software Development Kit (SDK), as described by the Zynq SoftwareDevelopers Guide [25], programmes for the PS subsystem can be developed as eitherbare-metal applications that do not require an OS, or user applications for the opensource Linux OS.

Executing a full OS on the PS would incur significant performance overhead duringthe booting sequence and runtime. Advantages of using Linux include the support ofall peripherals in the PS (including the PL and the SD card interface) through theLinux kernel and advanced process management features. Bare-metal applications,implemented in C/C++, interface with peripherals via Xilinx proprietary libraries [26].

3.4.1.2 SD card boot mode

The Xilinx Wiki website provides an article on boot image preparation [27] using theXilinx SDK for both standalone (bare-metal) and Linux applications. In SD card bootmode, the PL can be configured according to a bitstream file contained in the SD card.

3.4.1.3 PL Direct Memory Access

Chapter 9 of the Zynq Technical Reference Manual [28] provides a description of theDMA controller accessible via the Advanced eXtensible Interface (AXI) interface ofthe AMBA interconnect. An article on the FPGAdeveloper.com website [29] providesa PL DMA tutorial.

3.4.1.4 PS-PL AXI interface

Chapter 2 of the Zynq Technical Reference Manual [28] provides a description of theinterfaces and signals that can be deployed between the PS and PL.

Chapter 4

Proof of concept implementation

Due to the complexities about the standalone design discussed in Chapter 3, the FPGAimplementation of this project is limited to the usage of the Zynq’s PL architecture.

This implementation should therefore be considered as a ”proof of concept”, as itdoes not provide advantages in terms of runtime performance compared to a baselinesoftware solution.

Instead, the objectives of this implementation are:

• To demonstrate the correctness of the CLU and CLU complex behaviour on aphysical implementation.

• To demonstrate the correctness of the software applications complementing thesystem.

• To provide a streamlined and intuitive user interface for deploying the solution.

The implementation does not include work of others unless otherwise stated.

4.1 Software tools

The software used and required for the implementation are:

• Vivado 2015.2 (Linux) [30]. The directory of the installation is required to beincluded in the $PATH variable of the BASH environment.

• Python 2.7 and standard libraries.

4.2 Hardware platform

The system used for development of this implementation is the Zybo [24]. The imple-mentation utilises the programmable logic section of the Zynq AP SoC.

4.2.1 Pmod SSD

The concept uses two Digilent Pmod Seven Segment Displays (SSDs) [31].The SSDs are used to output the results from the CLU complex.

17

18 Chapter 4. Proof of concept implementation

Figure 4.1: Digilent Pmod SSD [31].

4.3 FPGA modules design

The directory of the project includes a subdirectory HDL gen containing all the scriptsused to generate the HDL code.

The scripts allows the implementation to be easily configurable for the specificpattern matching task.

An example of generated files for a system configuration for patterns of length 3,gap parameter 2 and repeat factor 16 is provided in the Appendix.

4.3.1 CLU module




• INPUTS:

– STR: A segment of the searched string of length m+g.


• OUTPUTS:

– OUT: One bit wire representing any match of PAT in STR.

The design of the CLU module is described in Chapter 3.The script clu gen.py generates the file clu.sv which describes the module.

4.3.2 CLU complex module






• INPUTS:

– STR: A segment of the searched string of length m+g+ r−1.


• OUTPUTS:

– OUT: A r wide bus representing the outputs of each CLU.

The module uses a generate loop for instantiating the CLUs.The design of the CLU complex module is described in Chapter 3.The script clu complex gen.py generates the file clu complex.sv which de-

scribes the module.

4.3.3 String buffers module





– str len: Length of the searched string.

• INPUTS:

– CLK: System clock.

– RESET: System reset.

– BUTTONS: Two buttons used to proceed to the following comparison cycle.

• OUTPUTS:

– STR: A segment of the searched string of length m+g+r−1 for the currentcomparison cycle.


– DONE: Signal indicating the completion of the matching task.

– LEDS: Indicates the user which button should be pressed to advance to thefollowing comparison cycle.

The module is responsible of providing the CLU complex a substring of lengthm+g+ r−1 characters from the input string and the pattern string.

The module contains registers (D-type flip-flops) that store the entire input andpattern strings. The module uses the Verilog system task $readmemb to read and storethe encoded DNA string files string.list and pat.list. $readmemb cannot be usedto store data on registers wired to the module outputs.


A counter tracks the current comparison cycle. To advance to the next cycle, theuser has to press a button. If the counter is even, the user presses the left button (BTN3),otherwise the right button (BTN2). This is designed to avoid a user to mistakenly ad-vance multiple comparison cycles. Left and right LEDs (LD3 and LD0 respectively)remind the user which button they should press.

The STR output of the module is determined by a case statement dependent on thecounter. For each counter advance, STR takes values of the input string starting fromthe r’th successive position. When the entire input string is processed, the moduleenables the DONE signal.

The script buf gen.py generates the file buffer.sv which describes the module.

4.3.4 SSD driver module

The SSD driver module is based on the ssd driver.vmodule written by Nigel Tophamprovided for Practical #2 of INF3 Computer Design course (2017/2018), The Univer-sity of Edinburgh [32]. The module allows an 8-bit integer value input to be displayedon a Pmod SSD as a 2 digits decimal representation.

• INPUTS:

– done: The signal indicates the completion of the search task.

– ssd input: An 8-bit integer.

– ssd c: An approximately 59.6046 Hz clock signal.

• OUTPUTS:

– ssd a: The SSD representation of an hexadecimal digit.

The module has been modified to display 2 hexadecimal digits. The left SSD dis-plays the digit of the most significant 4 bits of the input integer, while the right SSDdisplays the least significant 4 bits.

The on/off settings for each segment of the SSD for displaying the extra hexadec-imal digits A,b,C,d,E,F have been created. This was done by retro-engineering thesettings of the decimal digits. From the settings’ binary format, I corresponded eachbit index with a segment on the SSD.

If the done signal is given, the SSD will display ”- -” to indicate completion of thetask to the user.

4.3.5 SSD wrapper module



• INPUTS:

– CLK: System clock.

– DONE: The signal indicates the completion of the search task.


– OUT: Output from the CLU complex in which each wire represents the stateof each CLU.

• OUTPUTS:

– SSD A: The digits currently being displayed by the two Pmod SSDs.– SSD C: Indicates which SSD of the Pmod SSDs is currently being dis-

played.

The module instantiates two SSD driver modules. Each of the modules drives aPmod SSD. The driver of the display on the left takes the 8 most significant bits ofOUT, while the right display takes the 8 least significant bits. If width of the OUT bus isless than 16 (when r < 16), the bus is extended to 16 bits before being assigned to theSSD drivers.

The module generates a 59.6046 Hz clock signal for the SSD drivers and PmodSSDs to indicate which of the two digits on a Pmod SSD is currently being displayed.

The 4 hexidecimal digits displayed on the Pmod SSDs limits the proof of conceptsolution to a repeat factor of 16 or less.

The script ssd wrapper gen.py generates the file ssd wrapper.sv which de-scribes the module.

4.3.6 Top module module


– m: Length of the pattern string.– g: Search gap parameter.– r: Repeat factor (number of instantiated CLUs).

• INPUTS:

– CLK: System clock.– RESET: System reset.– BUTTONS: Two buttons used to proceed to the following comparison cycle.

• OUTPUTS:

– SSD A 0: The digit currently being displayed by the right Pmod SSDs.– SSD A 1: The digit currently being displayed by the left Pmod SSDs.– SSD C: Indicates which SSD of the Pmod SSDs is currently being dis-

played.– LEDS: Indicates the user which button should be pressed to advance to the

following comparison cycle.

The module instantiates and connects the buffer, CLU complex and SSD wrappermodules.

The I/O of the top module are assigned in the constraints file [33]. The pin arrange-ment of the Pmod ports used by the Pmod SSDs is available at [24].

The script top gen.py generates the file top.sv which describes the module.


4.3.7 SystemVerilog

The HDL files are generated in the SystemVerilog language. SystemVerilog has beenchosen as it features the same syntax as Verilog, but allows module I/O buses to bedefined as unpacked arrays. This allows an abstraction over the I/O buses transportingencoded DNA strings, since an index of the bus array indicates a 2-bit character ratherthan a single wire.

Vivado 2015.2 synthesis supports a synthesiseable subset of SystemVerilog [22].

4.3.8 Verification

Vivado provides a Behavioural Simulation tool. The simulator has been useful verifythe behaviour of each new module. However, in several occasions during the devel-opment process, the HDL code for a module may behave correctly in simulation, butmay not be synthesiseable or may have different behaviour after synthesis. In the lattercase, Vivado Synthesis may or may not display warning messages.

4.4 Support applications

These are applications supporting the execution of a pattern matching task on the proofof concept system.

4.4.1 DNA string encoder string to bin.py

The design of the DNA string encoder is described in Chapter 3.The encoder inputs a text file containing a DNA string encoded in the alphabet

A,C,G,T . The output is a text file in which each line contains two binary digits repre-senting the corresponding string character.

For input strings (strings on which the pattern will be matched), the output filefeatures empty lines. These represent an increase of the comparison cycle counter,hence the different substrings currently being matched by the CLU complex. Theencoder may also add a few dummy characters to the end of the output file in case, forthe last comparison cycle, the remaining characters do not suffice in complementingthe CLU substring input.

4.4.2 Results decoder decoder.py

The design of the results decoder is described in Chapter 3.The result decoder takes as inputs the original input string file and a result file. The

result file should contain on each line the 4 digits hexadecimal number displayed bythe Pmod SSDs on each comparison cycle. Provided the search parameters m, g and r,the application prints every matching position k, followed by the m+g long substringcontaining the match.

At each comparison cycle, r characters are input to the string buffer for comparisonby the CLU complex. Given the parameter r and the current cycle count i (given bythe line number of the result file, starting from 0), it is possible to calculate the starting

4.4. Support applications 23

position of the current comparison cycle as i× r. For each result, the hexadecimalnumber is converted into binary form. The index j of each ”1” present in the binaryform represent a match of the CLU comparing from substring index j. Hence thematching position k is given by the formula:

k = i× r+ j

The decoder discards any match in which the matching substring contains charac-ters added by the DNA string encoder.

4.4.3 Random DNA string generator rand dna gen.py

The application generates a text file containing a string of random characters from thealphabet A,C,G,T . The length of the DNA string is decided by the user.

The strings generated by this application have been used in this project for devel-opment, testing and experimental purposes.

4.4.4 System deployment make sys.py

The script provides a ”one-click” solution to the deployment the proof of concept im-plementation onto the ZYBO.

The inputs for launching the script are:

• Input DNA string file,

• DNA Pattern string file,

• CBG gap parameter g, and

• Repeat factor r (limited to 16),

• An option for programming the FPGA.

Therefore, the script executes the following tasks:

• Encoding of the input and pattern strings.

• Generation of all HDL modules (buffer, CLU, CLU complex, SSD wrapper andtop). The generated code are saved in a subdirectory pointed by a Vivado project.

• If requested, the script executes a BASH environment in which Vivado is launchedusing a batch Tcl script [34]. The Tcl script opens a Vivado project and containsa series commands including:

– the synthesis of the HDL code,

– its implementation, and

– the programming of the generated bitstream onto a connected ZYBO.

4.4.5 Vivado TCL script setup.tcl

Xilinx provides a reference guide for Vivado’s Tcl interface [35].


4.5 User guide

The user guide is meant to provide instructions to a user who may not be familiar withcomputing. It is assumed that the user computer features the required software toolsinstalled. The solution should work on other editions of Vivado and Python, howeverthis has not been tested. The solution has been tested on multiple computers.

4.5.1 Setup

4.5.1.1 ZYBO jumper settings

Select the QSPI boot mode using the jumper on the top right of the ZYBO by placingthe jumper between the central two pins.

Select the USB power source by placing the jumper between the pins labelled”USB” and ”VU5V0”.

4.5.1.2 Installation of the Pmod SSDs

The two Pmod SSDs plug into the Pmod sockets present on the lower section of ZYBO(Pmod connectors ”JE”, ”JD”, ”JC” and ”JB”). The SSDs must be connected to thelower row of pins in the Pmod connectors.

4.5.1.3 Project download

Clone the git repository present on [36]. The directory provides a Vivado projecttemplate and the scripts for the deployment of this proof of concept implementation.

4.5.2 Execution

1. Connect a USB cable between the ”PROG UART” port on the ZYBO and a USBport on the computer.

2. From the root directory of the git repository, execute the command in a termi-nal: python make sys.py <input string file path> <pattern stringfile path> full

3. The script will ask the user to insert the CBG pattern search gap parameter andthe repeat factor for the CLU complex (figure 4.2).

4. Once the script is executed and the ZYBO is ready (figure 4.3), write into a”results” text file the hexadecimal number shown on the SSDs.

5. Advance to the next comparison cycle by pressing the left/right buttons (”BTN3”and ”BTN2” respectively).

6. On a new line, write the hexadecimal number into the text file. Return to step 5.

7. The full search of the input string is indicated when the SSDs display ”- - - -”. Itis now possible to save and close the text file.

4.5. User guide 25

8. From the root directory of the git repository, execute the command in a terminal:python decoder.py <results file path> <input string file path> <patternstring file path>

9. The script will ask again to insert the CBG pattern search gap parameter and therepeat factor for the CLU complex.

10. All match results should be printed on the terminal (figure 4.4).

It is possible to restart to the initial comparison cycle by pressing ”BTN0” at anypoint after the FPGA is programmed.

Figure 4.2: Execution of the make sys.py script.


Figure 4.3: Once the FPGA is programmed, the Pmod SSDs should display a four digithexadecimal number.

Figure 4.4: Execution of the decoder.py script.

Chapter 5

Experimental methodology

In this project I aim to perform benchmarks between two Python baseline implemen-tations of single-threaded iterative algorithms, and the simulated performance of anFPGA system featuring the CLU complex design. The baseline implementations,moreover, provide a solution for verifying the correctness of the results provided bythe proof of concept system in Chapter 4.

5.1 Baseline

The algorithms use standard Python libraries and does not include work of others.

5.1.1 Regex implementation regex.py

The implementation uses the Python standard library re.The application takes a DNA input string and a pattern string as inputs. Given

a gap parameter g, the application generates a list of all possible CBGs in a regularexpression format.

Iteratively, the algorithm performs a regular expression search, using patterns fromthe list of CBGs, over a substring of size m+ g of the input string until a match isreturned. The substring window over the input string would then be advanced by 1character.

5.1.2 CBG pattern search implementation search.py

The application takes a DNA input string and a pattern string as inputs. Given a gapparameter g, the application generates a list of all possible CBGs. Each CBG is a listof tuples. The first item i of the tuple indicates a character index from the substring,the second would be the character from the pattern to be matched against the characterfrom the substring indexed by i.

Iteratively, the algorithm performs a matching algorithm, using patterns from thelist of CBGs, over a substring of size m+g of the input string until a match is returned.The substring window over the input string would then be advanced by 1 character.

The matching algorithm between a substring and a CBG iteratively compares eachsubstring character indexed by the CBG and the corresponding pattern character. The

27

28 Chapter 5. Experimental methodology

loop is terminated if, at a comparison, the two characters are not equal. The earlytermination of the CBG matching algorithm should allow this implementation a per-formance advantage over regex.py.

5.1.3 Computer system specifications

The system on which the baseline benchmarks are run features an Intel R© CoreTM i5-6500 processor, featuring 4 cores and threads, 3.20GHz base clock speed and 3.60GHzboost speed.

Baseline benchmark results are reported as average of multiple runs.

5.2 Simulated testbench implementation

A simulated implementation of should be able to take full advantage of the combina-torial design of a CLU complex. In fact, the testbench should perform a comparisoncycle at every clock cycle. This means the search over an input string advances by r(repeat factor) characters every clock period.

The testbench instantiates the following modules described in Chapter 4:

• the CLU Complex module,

• a modified buffer module in which the comparison cycle counter is advanced atevery clock cycle.

The clock period of the testbench is set to be 8ns, which corresponds to the 125MHzclock used in the concept design in Chapter 4.

The testbench uses the verilog system function $fwrite to store the results at eachcomparison cycle [37].

5.2.1 Testbench runtime estimation

It is possible to mathematically estimate the runtime period for the testbench system tocomplete the search task.

Given that initially m+g+ r−1 (width of the substring window) positions of theinput string are evaluated, and the substring window loads r new characters at eachcomparison cycle, the number of comparison cycles required to complete the searchtask is given by the formula:

n cycles =⌈

str len− (m+g+ r−1)r

⌉+1

where str len represents the number of characters of the input string.In the testbench system, a comparison cycle is evaluated at each clock cycle. Given

Tclock the period of a clock cycle, the runtime estimation is provided by the formula:

runtime = n cycles×Tclock

5.3. Runtime benchmark scenarios 29

5.3 Runtime benchmark scenarios

3 test cases have been created for the benchmarks. The parameters are:

• str len: length of the input string.

• m: length of the pattern string.

• g: CBG gap parameter.

• r: repeat factor.

Scenario str len m g r1 500 10 5 162 100000 5 3 323 100000 3 3 64

The input and pattern strings are randomly generated by rand dna gen.py de-scribed in Chapter 4.

As CLUs reduce in terms of size and complexity with smaller m and g parameters(see Chapter 3), given a limited amount of CLB resources in the FPGA fabric, it ispossible to increase the repeat factor, hence including more CLUs in the CLU complexand greater resource utilisation.

Chapter 6

Results

6.1 Runtime benchmarks

The reported results of the software baseline solutions represent the average of threeindividual runs. The results the testbench solution described in Chapter 5 are mathe-matical estimations of the runtimes. The clock frequency is defined to 125MHz (as theclock of the proof of concept implementation in Chapter 4), hence Tclock is 8ns.

It was not possible to use the Simulation tools of Vivado as, with large simulationsas defined by the testing scenarios (Chapter 5), the computer system would freeze andcrash.

Scenario regex.py search.py Testbench1 51369ms 223ms 248ns2 3888ms 829ms 25000ns3 1340ms 294ms 12488ns

Table 6.1: Runtime periods for each testing scenario

It should be noted that results for the testbench solution are reported as nanosec-onds, while the runtimes of the baseline solutions are milliseconds.

6.2 Proof of concept resource utilisation

Vivado provides information about the utilisation of the FPGA resources for a givenHDL description.

Hence a proof of concept system (Chapter 4) is built using make sys.py with thefollowing parameters:

• Input string length: 5000

• Pattern string length: 10

• CBG gap parameter: 5

• Repeat factor: 16

31

32 Chapter 6. Results

6.2.1 CLB utilisation

The Vivado Implementation tool provides a report on the utilisation of implementednetlists. For the system generated above, the summary of the utilisation report is:

Resource Utilisation Available Utilisation %Slice LUTs 11624 17600 66.05

Slice registers 338 35200 0.96Memory 1 60 1.67

IO 22 100 22.00Clocking 1 32 3.12

Table 6.2: Utilisation summary of the proof of concept system

This shows the capabilities of modern FPGA devices. The DNF expression rep-resenting the implemented CLU C10,5 contains 3003 conjunctive clauses, as demon-strated in Chapter 3. This complicated circuitry is replicated 16 times in the CLUcomplex module.

6.2.2 Timing

The Worst Negative Slack (WNS) value corresponds to the worst slack of all the timingpaths. The WNS value reported by the Timing Summary of the Vivado Implementationtool is 2.736ns.

This indicates that there is a possibility of increasing the system clock frequencyfrom 125MHz, which would improve the performance.

6.2.3 Power usage

The Vivado Implementation tool performs power analysis from the implemented netlist.The total on-chip power consumed by the FPGA, in the proof of concept design spec-ified above, is estimated to be 0.296W.

This power consumption is in orders of magnitude lower than those in moderncomputer processors. For example, the TDP rating of the i5-6500 processor featuredin the system used for baseline benchmarking is 65W [38]. It should be noted thatTDP ratings usually refer to a full CPU utilisation scenario. The baseline applicationsdo not fully utilise all the available cores of the processor.

Chapter 7

Discussion

7.1 Zynq AP SoC PS-PL implementation

The original plan of this project involved the utilisation of the Processing System sub-architecture of the Zynq chip. Such a design would allow a standalone operation of theproposed system.

Due to challenges partly enlisted in Chapter 3, the standalone design has beendiscarded. Even if such a design were implemented, the potential performance of thePL design would be severely bottlenecked by the interactions with the PS and DDRmemory. As the project focuses on FPGA implementations, the aim is to demonstratethe performance achievable by using configurable logic.

7.2 Proof of concept limitations

The main limitation of the ”proof of concept” implementation described in Chapter 4is its result outputting system. It requires the user to manually annotate results shownon the Pmod SSDs at each comparison cycle. This is a less than ideal solution to anyuser who would like to perform DNA string pattern matching. The displays, which arecapable of displaying up to a 4-digit hexadecimal integer, limits the repeat factor of theCLU complex to 16.

One of the properties of the CLU complex design is its scalability. Increasingthe number of CLU instantiated in the CLU complex would achieve greater level ofparallelism. As shown in Chapter 6, the FPGA featured in the Zynq provides greateramounts of resources in terms of CLBs.

7.3 Vivado Simulation tool

The Vivado Simulation tool is a HDL event-driven simulator [39]. The tool was unableto execute the testbench described in Chapter 5 probably due to the fact that a singleevent (e.g. the clock state change) would trigger numerous events. Hence simulating asingle clock period would require large amounts of computational resources.

The tool has been used to develop each HDL module, as the simulator is one of thefew available debug tools for the development. However, in multiple instances during

33

34 Chapter 7. Discussion

the development of this project, the simulated behaviour may not correspond to thesynthesised behaviour.

7.4 Vivado 2015.2 instabilities

On the computer system described in Chapter 5, Vivado has repetitively shown abnor-mal execution.

When a large HDL file is included amongst the project sources (e.g. clu.sv withm = 10 and g = 10), the programme would immediately exit and return a segmentationerror message.

When a large encoded DNA string file is being used (e.g. a DNA string of 10000characters), the computer OS would freeze. The same behaviour has been often en-countered when trying to simulate the testbench described in Chapter 5.

7.5 Design adaptability and scalability

The main contribution in this project is the CLU complex design and its automatedHDL code generation. The design allows the combinational circuit to be scalableacross various FPGAs featuring different amounts of resources.

Chapter 8

Conclusion

The project extends some of the ideas and addresses some of the issues presented byLipson and Hazelhurst [20]. The implementation does not provide a complete solutionfor a user, however it demonstrate the capability of a programmable logic solutioncompared to a software CBG pattern search algorithm. Benefits include:

• Several orders of magnitude smaller runtime.

• Orders of magnitude smaller power consumption.

The efficiency of the design can be attributed to the configuration of the circuitbeing ”tailor made” to the search task.

Research literature on DP algorithms for DNA and protein sequences pattern match-ing is widely available, however the proposed approach does not rely on DP.

I would like to pursue this project by:

• Extend the design of the proposed solution by allowing the instantiation of multi-ple CLU complexes. This would allow multiple different patterns to be searchedconcurrently.

• Support of the full IUPAC code [40]. The alphabet of possible characters isdefined in this project as the set {A,C,G,T}. Often in bioinformatics applica-tions, patterns may be given containing characters that indicate a set of possiblenucleotides (e.g. IUPAC character R indicates ”A or G”).

35

Bibliography

[1] Paulien Hogeweg. The roots of bioinformatics in theoretical biology. PLOSComputational Biology, 7(3):1–5, 03 2011. doi: 10.1371/journal.pcbi.1002021.URL https://doi.org/10.1371/journal.pcbi.1002021.

[2] TK Attwood, A Gisel, Nils-Einar Eriksson, and Erik Bongcam-Rudloff. Con-cepts, historical milestones and the central place of bioinformatics in modernbiology: a european perspective. In Bioinformatics-trends and methodologies.InTech, 2011.

[3] Gonzalo Navarro. A guided tour to approximate string matching. ACM comput-ing surveys (CSUR), 33(1):31–88, 2001.

[4] Andrew D Johnson. Single-nucleotide polymorphism bioinformatics: a compre-hensive review of resources. Circulation: Cardiovascular Genetics, 2(5):530–536, 2009.

[5] Maya Gokhale, William Holmes, Andrew Kopser, Sara Lucas, Ronald Minnich,Douglas Sweely, and Daniel Lopresti. Building and using a highly parallel pro-grammable logic array. Computer, 24(1):81–89, 1991.

[6] Tom Van Court and Martin C Herbordt. Families of fpga-based accelerators forapproximate string matching1.

[7] Stefan Dydel and Piotr Bała. Large scale protein sequence alignment usingfpga reprogrammable logic devices. In International Conference on Field Pro-grammable Logic and Applications, pages 23–32. Springer, 2004.

[8] AFW Coulson, JF Collins, and A Lyall. Protein and nucleic acid sequencedatabase searching: a suitable case for parallel processing. The Computer Jour-nal, 30(5):420–424, 1987.

[9] Costas S Iliopoulos, Laurent Mouchard, and M Sohel Rahman. A new ap-proach to pattern matching in degenerate dna/rna sequences and distributed pat-tern matching. Mathematics in Computer Science, 1(4):557–569, 2008.

[10] Gonzalo Navarro and Mathieu Raffinot. Flexible pattern matching in strings:practical on-line search algorithms for texts and biological sequences. Cam-bridge University Press, 2002.

37

38 Bibliography

[11] Lei Chen, Shiyong Lu, and Jeffrey Ram. Compressed pattern matching in dnasequences. In Computational Systems Bioinformatics Conference, 2004. CSB2004. Proceedings. 2004 IEEE, pages 62–68. IEEE, 2004.

[12] Sun Wu and Udi Manber. Agrep–a fast approximate pattern-matching tool. InUsenix Winter 1992 Technical Conference, pages 153–162, 1992.

[13] Sun Kim and Yanggon Kim. A fast multiple string-pattern matching algorithm. InProceedings of 17th AoM/IAoM Conference on Computer Science, pages 44–49,1999.

[14] Jun-ichi Aoe. Computer algorithms: string pattern matching strategies, vol-ume 55. John Wiley & Sons, 1994.

[15] William I Chang and Eugene L Lawler. Approximate string matching in sublinearexpected time. In Foundations of Computer Science, 1990. Proceedings., 31stAnnual Symposium on, pages 116–124. IEEE, 1990.

[16] Doron Betel and Christopher WV Hogue. Kangaroo–a pattern-matching programfor biological sequences. BMC bioinformatics, 3(1):20, 2002.

[17] Alain Laferriere, Daniel Gautheret, and Robert Cedergren. An rna pattern match-ing program with enhanced performance and portability. Bioinformatics, 10(2):211–212, 1994.

[18] Gonzalo Navarro and Mathieu Raffinot. A general practical approach to patternmatching over ziv-lempel compressed text. In Annual Symposium on Combina-torial Pattern Matching, pages 14–36. Springer, Berlin, Heidelberg, 1999.

[19] Gonzalo Navarro and Mathieu Raffinot. Fast and simple character classes andbounded gaps pattern matching, with applications to protein searching. Journalof Computational Biology, 10(6):903–923, 2003.

[20] Asher Lipson and Scott Hazelhurst. Dna pattern matching using fpgas. In Pro-ceedings of the twelth annual symposium of the Pattern Recognition Associationof South Africa, pages 180–185, 2001.

[21] Randal E Bryant. Symbolic boolean manipulation with ordered binary-decisiondiagrams. ACM Computing Surveys (CSUR), 24(3):293–318, 1992.

[22] Xilinx R©. Vivado Design Suite User Guide Synthesis UG901 (v2015.3), 2015.URL https://www.xilinx.com/support/documentation/sw_manuals/xilinx2015_3/ug901-vivado-synthesis.pdf.

[23] Digilent Inc. Zybo [reference.digilentinc], . URL https://reference.digilentinc.com/_media/zybo:zybo_rm.pdf. Last visited 11-03-2018.

[24] Digilent Inc. ZYBOTMFPGA Board Reference Manual, 2016. URL https://reference.digilentinc.com/_media/zybo:zybo_rm.pdf.

Bibliography 39

[25] Xilinx R©. Zynq-7000 All Programmable SoC Software Developers Guide UG821(v12.0), 2015. URL https://www.xilinx.com/support/documentation/user_guides/ug821-zynq-7000-swdev.pdf.

[26] Xilinx R©. OS and Libraries Document Collection UG643 (v2014.4), 2014.URL https://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_4/oslib_rm.pdf.

[27] Xilinx R©. Xilinx wiki - prepare boot image, . URL http://www.wiki.xilinx.com/Prepare+boot+image. Last visited 14-03-2018.

[28] Xilinx R©. Zynq-7000 All Programmable SoC Technical Reference Man-ual UG585 (v1.12.1), 2017. URL https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf.

[29] Jeff Johnson. Using the axi dma in vivado. URL http://www.fpgadeveloper.com/2014/08/using-the-axi-dma-in-vivado.html. Last visited 14-03-2018.

[30] Xilinx R©. Vivado 2015.2: Full installer for linux, . URL https://www.xilinx.com/member/forms/download/xef.html?filename=Xilinx_Vivado_SDK_Lin_2015.2_0626_1.tar.gz&akdm=1. Last visited11-04-2018.

[31] Digilent Inc. Pmod SSD reference manual, . URL https://reference.digilentinc.com/reference/pmod/pmodssd/reference-manual. Last vis-ited 11-04-2018.

[32] Nigel Topham. INF3 Computer Design (2017/2018) practical #2 zipped project,. URL http://www.inf.ed.ac.uk/teaching/courses/cd/Resources_files/prac2.zip. Last visited 11-04-2018.

[33] Nigel Topham. INF3 Computer Design (2017/2018) practical #1 zybo de-fault .xdc file, . URL http://www.inf.ed.ac.uk/teaching/courses/cd/Resources_files/zybo_default.xdc. Last visited 11-04-2018.

[34] Xilinx R©. Vivado Design Suite User Guide Design Flows Overview UG892(v2016.2), 2016. URL https://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_2/ug892-vivado-design-flows-overview.pdf.

[35] Xilinx R©. Vivado Design Suite Tcl Command Reference Guide UG835(v2017.1), 2017. URL https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug835-vivado-tcl-commands.pdf.

[36] Giuseppe Li. 4th year honours project. URL https://github.com/beppe712/honours-project.

[37] Baltimore County University of Maryland. Verilog system tasks and func-tions. URL https://www.csee.umbc.edu/portal/help/VHDL/verilog/system.html. Last visited 12-04-2018.

40 Bibliography

[38] Intel R©. Intel R©coreTMi5-6500 processor. URL https://ark.intel.com/products/88184/Intel-Core-i5-6500-Processor-6M-Cache-up-to-3_60-GHz. Last visited 12-04-2018.

[39] Xilinx R©. Verilog simulator, . URL https://www.xilinx.com/products/design-tools/vivado/simulator.html. Last visited 12-04-2018.

[40] Iupac codes. URL https://www.bioinformatics.org/sms/iupac.html.Last visited 12-04-2018.

Appendix

Generated HDL modules

buffer.sv

‘timescale 1ns / 1ps‘default nettype wire//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Giuseppe Li (s1402587)//// Create Date: 2018−04−12 02:47:27// Design name: FPGA accelerator for DNA pattern matching// Module Name: buffer// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Description: String and Pattern buffer//// Dependences: None//// Additional Comments: This file has been automatically generated////////////////////////////////////////////////////////////////////////////////////

module buffer(input CLK,input RESET,input [1:0] BUTTONS,output reg [1:0] STR[0:19],output reg [1:0] PAT[0:2],output reg DONE,output [1:0] LEDS);

reg [1:0] str mem[0:99];reg [1:0] pat mem[0:2];reg [2:0] count;

i

ii Bibliography

initial begin$readmemb(”pat.list”, pat mem);$readmemb(”string.list”, str mem);count = 0;DONE = 0;

end

always@(posedge CLK or posedge RESET) beginif (RESET) begin

count <= 0;DONE <= 0;

endelse if (count >= 6)

DONE <= 1;else if (((count[0] == 1’b0) && BUTTONS[0]) || ((count[0] == 1’b1) &&

BUTTONS[1]))count <= count + 1;

end

always@(posedge CLK) begincase (count)

3’d0: STR <= str mem[0:19];3’d1: STR <= str mem[16:35];3’d2: STR <= str mem[32:51];3’d3: STR <= str mem[48:67];3’d4: STR <= str mem[64:83];3’d5: STR <= str mem[80:99];

endcasePAT <= pat mem[0:2];

end

assign LEDS = (count[0] == 1’b0) ? 2’b01 : 2’b10;

endmodule

Bibliography iii

clu.sv

‘timescale 1ns / 1ps‘default nettype wire//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Giuseppe Li (s1402587)//// Create Date: 2018−04−12 02:47:27// Design name: FPGA accelerator for DNA pattern matching// Module Name: clu// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Description: Comparison Logic Unit//// Dependences: None//// Additional Comments: This file has been automatically generated////////////////////////////////////////////////////////////////////////////////////

module clu(input [1:0] STR[0:4],input [1:0] PAT[0:2],output OUT);

assign OUT = (((STR[2] == PAT[0]) && (STR[3] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[1] == PAT[0]) && (STR[3] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[1] == PAT[0]) && (STR[2] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[1] == PAT[0]) && (STR[2] == PAT[1]) && (STR[3] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[3] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[2] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[2] == PAT[1]) && (STR[3] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[1] == PAT[1]) && (STR[4] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[1] == PAT[1]) && (STR[3] == PAT[2]))|| ((STR[0] == PAT[0]) && (STR[1] == PAT[1]) && (STR[2] == PAT[2])))

;

endmodule

iv Bibliography

clu complex.sv

‘timescale 1ns / 1ps‘default nettype wire//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Giuseppe Li (s1402587)//// Create Date: 2018−04−12 02:47:27// Design name: FPGA accelerator for DNA pattern matching// Module Name: clu complex// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Description: Comparison Logic Unit//// Dependences: clu//// Additional Comments: This file has been automatically generated////////////////////////////////////////////////////////////////////////////////////

module clu complex(input [1:0] STR[0:19],input [1:0] PAT[0:2],output [15:0] OUT);

genvar cluNo;

generatefor (cluNo = 0; cluNo < 16; cluNo = cluNo + 1)begin: CLUInstantiation

clu u clu (.STR(STR[cluNo:cluNo + 4]), .PAT(PAT), .OUT(OUT[15 − cluNo]));

endendgenerate

endmodule

Bibliography v

ssd driver.v

‘timescale 1ns / 1ps//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Nigel Topham, Giuseppe Li//// Create Date: 18.03.2018 12:36:42// Design Name: FPGA accelerator for DNA pattern matching// Module Name: ssd driver// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Target Devices: Zync−7010// Tool Versions: 2015.2// Description: Module to drive a seven−segment display from 8−bit integer//// Dependencies: None//// Revision:// Revision 1.0 − File Created// Additional Comments:////////////////////////////////////////////////////////////////////////////////////

module ssd driver(input done,input [7:0] ssd input,

input ssd c,output [6:0] ssd a

);

// Define the on/off settings for each segment of a seven−segment digit

localparam BLANK = 7’h00; //localparam ZERO = 7’h3f; // 0localparam ONE = 7’h06; // 1localparam TWO = 7’h5b; // 2localparam THREE = 7’h4f; // 3localparam FOUR = 7’h66; // 4localparam FIVE = 7’h6d; // 5localparam SIX = 7’h7d; // 6localparam SEVEN = 7’h07; // 7localparam EIGHT = 7’h7f; // 8localparam NINE = 7’h6f; // 9localparam ALFA = 7’h77; // Alocalparam BRAVO = 7’h7c; // blocalparam CHARLIE = 7’h39; // Clocalparam DELTA = 7’h5e; // d

vi Bibliography

localparam ECHO = 7’h79; // Elocalparam FOXTROT = 7’h71; // Flocalparam DASH = 7’h40; // −

reg [13:0] ssd segments;

always @∗begin: ssd mapping PROC

if (done == 1)ssd segments = { DASH, DASH };

else begincase (ssd input[7:4])

4’h0: ssd segments[13:7] = ZERO;4’h1: ssd segments[13:7] = ONE;4’h2: ssd segments[13:7] = TWO;4’h3: ssd segments[13:7] = THREE;4’h4: ssd segments[13:7] = FOUR;4’h5: ssd segments[13:7] = FIVE;4’h6: ssd segments[13:7] = SIX;4’h7: ssd segments[13:7] = SEVEN;4’h8: ssd segments[13:7] = EIGHT;4’h9: ssd segments[13:7] = NINE;4’ha: ssd segments[13:7] = ALFA;4’hb: ssd segments[13:7] = BRAVO;4’hc: ssd segments[13:7] = CHARLIE;4’hd: ssd segments[13:7] = DELTA;4’he: ssd segments[13:7] = ECHO;4’hf: ssd segments[13:7] = FOXTROT;

endcasecase (ssd input[3:0])

4’h0: ssd segments[6:0] = ZERO;4’h1: ssd segments[6:0] = ONE;4’h2: ssd segments[6:0] = TWO;4’h3: ssd segments[6:0] = THREE;4’h4: ssd segments[6:0] = FOUR;4’h5: ssd segments[6:0] = FIVE;4’h6: ssd segments[6:0] = SIX;4’h7: ssd segments[6:0] = SEVEN;4’h8: ssd segments[6:0] = EIGHT;4’h9: ssd segments[6:0] = NINE;4’ha: ssd segments[6:0] = ALFA;4’hb: ssd segments[6:0] = BRAVO;4’hc: ssd segments[6:0] = CHARLIE;4’hd: ssd segments[6:0] = DELTA;4’he: ssd segments[6:0] = ECHO;4’hf: ssd segments[6:0] = FOXTROT;

endcaseend

end // ssd mapping PROC

Bibliography vii

// Time−division multiplex the two digit outputs for the SSD

assign ssd a = (ssd c == 1’b1) ? ssd segments[13:7] : ssd segments[6:0];

endmodule

viii Bibliography

ssd wrapper.sv

‘timescale 1ns / 1ps‘default nettype wire//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Giuseppe Li (s1402587)//// Create Date: 2018−04−12 02:47:27// Design name: FPGA accelerator for DNA pattern matching// Module Name: ssd wrapper// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Description: Two Seven Segment Display Wrapper//// Dependences: ssd driver//// Additional Comments: This file has been automatically generated////////////////////////////////////////////////////////////////////////////////////

module ssd wrapper(input CLK,input DONE,input [15:0] OUT,output [6:0] SSD A[1:0],output SSD C);

wire [15:0] digits;

assign digits = OUT;

reg [20:0] counter r;

initialcounter r = 0;

always @(posedge CLK)counter r <= counter r + 1;

assign SSD C = counter r[20];

genvar ssdNo;

Bibliography ix

generatefor (ssdNo = 0; ssdNo < 2; ssdNo = ssdNo + 1)begin: SSDInstantiation

ssd driver u ssd driver (.done (DONE),.ssd input (digits[(ssdNo ∗ 8)+7 : (ssdNo ∗ 8)]),.ssd c (SSD C),.ssd a (SSD A[ssdNo])

);end

endgenerate

endmodule

x Bibliography

top.sv

‘timescale 1ns / 1ps‘default nettype wire//////////////////////////////////////////////////////////////////////////////////// Company: The University of Edinburgh// Engineer: Giuseppe Li (s1402587)//// Create Date: 2018−04−12 02:47:27// Design name: FPGA accelerator for DNA pattern matching// Module Name: top// Project Name: 4th Year Honours Project (BSc (Hons) Artificial Intelligence and Computer

Science)// Description: Top level module//// Dependences: clu, clu complex, buffer, ssd driver, ssd wrapper//// Additional Comments: This file has been automatically generated////////////////////////////////////////////////////////////////////////////////////

module top(input CLK,input RESET,input [1:0] BUTTONS,output [6:0] SSD A 0,output [6:0] SSD A 1,output [1:0] SSD C,output [1:0] LEDS);

wire [1:0] STR[0:19];wire [1:0] PAT[0:2];wire DONE;wire [15:0] OUT;wire SSD C SINGLE;

buffer u buffer(.CLK (CLK),.RESET (RESET),.BUTTONS(BUTTONS),.STR (STR),.PAT (PAT),.DONE (DONE),

.LEDS (LEDS));

Bibliography xi

clu complex u clu complex(.STR (STR),.PAT (PAT),.OUT (OUT)

);

ssd wrapper u ssd wrapper(.CLK (CLK),.DONE (DONE),.OUT (OUT),.SSD A ({SSD A 1, SSD A 0}),.SSD C (SSD C SINGLE)

);

assign SSD C = {2{SSD C SINGLE}};

endmodule

fpga accelerated search of cbg patterns in dna strings · the project focuses on approximate...

Documents