volume 3, spl. issue 1 (2 016) e-issn: 1694-2310 | p-issn: … · 2016-07-05 · volume 3, spl....

3
Volume 3, Spl. Issue 1 (2016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426 367 BUEST, Baddi RIEECE -2016 Design and Implementation of Execute Unit for Procssor having Very Long Instruction word based Architecture Sanjeev Kumar 1 , Tejinder Singh 2 , Narender kumar 3 , 1,2,3 Department of Electronics and Communication Engineering, Baddi, Solan (H.P.) 1 [email protected] , 2 [email protected] , 3 [email protected] AbstractThis paper proposes a design and implementation of Execute unit for VLIW processor. Microprocessor architecture has grown from complex instruction set computing based to reduced instruction set computing based on a combination of RISC-CISC based and currently very long instruction word based. This processor has grown from 8 bits to 16 bits, 32 bits, and currently to 64 bits. In this paper we have performed the hardware design and implementation of execute unit for a 32-bit microprocessor capable of operating four operations per instruction word on ASIC and FPGA technology. The VLIW microprocessor begins with the technical specifications which involve the area utilization, voltage requirements, performance requirements, instruction set and details of operation for each instruction. From these technical details, the architecture and microarchitecture consisting of four pipes running in parallel allowing for four operations executed in parallel. Keywords— Very Long Instruction Word (VLIW), Complex Instruction Set Computing (CISC), Reduced Instruction Set Computing (RISC). I. INTRODUCTION Microprocessors and microcontrollers are used in everyday electronic systems, be it systems used in industry or systems used by consumers. Complex electronic systems such as computers, ATM machines, financial systems, transaction systems, control systems, and database systems all use some form of microcontroller or microprocessor as the core of their system. Consumer electronic systems such as home security systems, chip-based credit cards, microwave ovens, cars, cell phones, PDAs, refrigerators, and other daily appliances have within the core of their systems either a microcontroller or microprocessor. Microprocessors and microcontrollers are very similar in nature. In fact, from a top level perspective, a microprocessor is the core of a microcontroller. A microcontroller basically consists of a microprocessor as its central processing unit (CPU) with peripheral logic surrounding the microprocessor core. As such it can be viewed that a microprocessor is the building block for a microcontroller. A microcontroller has many uses. It is commonly used to provide a system level solution for things such as controlling a car’s electronic system, home security systems, ATM system, communication systems, daily consumer appliances (such as microwave oven, washing machine), and many others. So due to these numbers of applications microcontrollers are in great demand. II. TYPES OF MICROPROESSOR ARCHITECTURES Present-day microprocessors typically run in hundreds of megahertz ranging to gigahertz in their clock speeds. They have also grown from 8 bits to 16, 32, and 64 bits. The architecture of a microprocessor has also grown from CISC to RISC and VLIW. CISC (Complex instruction set computing is based on the concept of using as little instruction as possible in programming a microprocessor. CISC instruction sets are large with instructions ranging from basic to complex instructions. CISC microprocessors were widely used in the early days of microprocessor history [9]. RISC (Reduced instruction set computing) microprocessors are very different from CISC microprocessors. RISC uses the concept of keeping the instruction set as simple as possible to allow the microprocessor’s program to be written using only simple instructions. This idea was presented by John Cocke from IBM Research when he noticed that most complex instructions in the CISC instruction set were seldom used while the basic instructions were heavily utilized. Like CISC and RISC microprocessors, there is a different generation of microprocessor based on a concept called very long instruction word (VLIW). VLIW microprocessors make use of a concept of instruction level parallelism (ILP)—executing multiple instructions in parallel. Many applications in the multimedia domain happen to contain a lot of Instruction Level Parallelism (ILP), because they typically consist of many independent repetitive calculations. Very Long Instruction Word (VLIW) processors exploit ILP by means of a compiler that is completely aware of the target processor architecture [10]. Very Long Instruction Word microprocessors are not the only type of microprocessors that take advantage of executing multiple instructions in parallel. Superscalar super pipeline CISC/RISC microprocessors are also able to achieve parallel execution of instructions. To achieve high performance for microprocessors, the concept of pipeline is introduced into microprocessor architecture. In pipelining, a microprocessor is divided into multiple pipe stages. Each pipe stage can execute an instruction simultaneously. When a stage in the pipe has completed executing its instruction, it will pass the results to the next stage for further processing while it takes another instruction from its preceding stage. The instruction execution for a pipeline microprocessor has the four basic stages of pipe: 1. fetch This stage of the pipeline fetches instruction/data from instruction cache/memory. 2. decode — This stage of the pipeline decodes the instruction fetched by the fetch stage. The decode stage also fetches register data from the register file. 3. execute —This stage of the pipeline executes the instruction. This is the stage where the ALU (arithmetic logic unit) is located. 4. writeback —This stage of the pipeline writes data into the register file. To achieve multiple instruction execution, multiple pipes can be put together to form a superscalar microprocessor. A superscalar microprocessor increases in complexity but allows

Upload: ngokhanh

Post on 08-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: … · 2016-07-05 · Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426 367 BUEST, Baddi RIEECE -2016

Volume 3, Spl. Issue 1 (2016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426

367 BUEST, Baddi RIEECE -2016

Design and Implementation of Execute Unitfor Procssor having Very Long Instruction

word based ArchitectureSanjeev Kumar1, Tejinder Singh2, Narender kumar3,

1,2,3Department of Electronics and Communication Engineering, Baddi, Solan (H.P.)[email protected] , [email protected] ,

[email protected]

Abstract— This paper proposes a design and implementationof Execute unit for VLIW processor. Microprocessorarchitecture has grown from complex instruction setcomputing based to reduced instruction set computing basedon a combination of RISC-CISC based and currently very longinstruction word based. This processor has grown from 8 bitsto 16 bits, 32 bits, and currently to 64 bits. In this paper wehave performed the hardware design and implementation ofexecute unit for a 32-bit microprocessor capable of operatingfour operations per instruction word on ASIC and FPGAtechnology. The VLIW microprocessor begins with thetechnical specifications which involve the area utilization,voltage requirements, performance requirements, instructionset and details of operation for each instruction. From thesetechnical details, the architecture and microarchitectureconsisting of four pipes running in parallel allowing for fouroperations executed in parallel.

Keywords— Very Long Instruction Word (VLIW),Complex Instruction Set Computing (CISC), ReducedInstruction Set Computing (RISC).

I. INTRODUCTIONMicroprocessors and microcontrollers are used in

everyday electronic systems, be it systems used in industry orsystems used by consumers. Complex electronic systems suchas computers, ATM machines, financial systems, transactionsystems, control systems, and database systems all use someform of microcontroller or microprocessor as the core of theirsystem. Consumer electronic systems such as home securitysystems, chip-based credit cards, microwave ovens, cars, cellphones, PDAs, refrigerators, and other daily appliances havewithin the core of their systems either a microcontroller ormicroprocessor. Microprocessors and microcontrollers arevery similar in nature. In fact, from a top level perspective, amicroprocessor is the core of a microcontroller. Amicrocontroller basically consists of a microprocessor as itscentral processing unit (CPU) with peripheral logicsurrounding the microprocessor core. As such it can be viewedthat a microprocessor is the building block for amicrocontroller. A microcontroller has many uses. It iscommonly used to provide a system level solution for thingssuch as controlling a car’s electronic system, home securitysystems, ATM system, communication systems, dailyconsumer appliances (such as microwave oven, washingmachine), and many others. So due to these numbers ofapplications microcontrollers are in great demand.

II. TYPES OF MICROPROESSOR ARCHITECTURESPresent-day microprocessors typically run in hundreds of

megahertz ranging to gigahertz in their clock speeds. Theyhave also grown from 8 bits to 16, 32, and 64 bits. Thearchitecture of a microprocessor has also grown from CISC toRISC and VLIW.

CISC (Complex instruction set computing is based on theconcept of using as little instruction as possible inprogramming a microprocessor. CISC instruction sets are largewith instructions ranging from basic to complex instructions.CISC microprocessors were widely used in the early days ofmicroprocessor history [9]. RISC (Reduced instruction setcomputing) microprocessors are very different from CISCmicroprocessors. RISC uses the concept of keeping theinstruction set as simple as possible to allow themicroprocessor’s program to be written using only simpleinstructions. This idea was presented by John Cocke from IBMResearch when he noticed that most complex instructions inthe CISC instruction set were seldom used while the basicinstructions were heavily utilized. Like CISC and RISCmicroprocessors, there is a different generation ofmicroprocessor based on a concept called very long instructionword (VLIW). VLIW microprocessors make use of a conceptof instruction level parallelism (ILP)—executing multipleinstructions in parallel.

Many applications in the multimedia domain happen tocontain a lot of Instruction Level Parallelism (ILP), becausethey typically consist of many independent repetitivecalculations. Very Long Instruction Word (VLIW) processorsexploit ILP by means of a compiler that is completely aware ofthe target processor architecture [10]. Very Long InstructionWord microprocessors are not the only type ofmicroprocessors that take advantage of executing multipleinstructions in parallel. Superscalar super pipeline CISC/RISCmicroprocessors are also able to achieve parallel execution ofinstructions. To achieve high performance formicroprocessors, the concept of pipeline is introduced intomicroprocessor architecture. In pipelining, a microprocessor isdivided into multiple pipe stages. Each pipe stage can executean instruction simultaneously. When a stage in the pipe hascompleted executing its instruction, it will pass the results tothe next stage for further processing while it takes anotherinstruction from its preceding stage. The instruction executionfor a pipeline microprocessor has the four basic stages of pipe:

1. fetch — This stage of the pipeline fetchesinstruction/data from instruction cache/memory.

2. decode — This stage of the pipeline decodes theinstruction fetched by the fetch stage. The

decode stage also fetches register data from the registerfile.

3. execute —This stage of the pipeline executes theinstruction. This is the stage where the ALU (arithmetic logicunit) is located.

4. writeback —This stage of the pipeline writes data intothe register file.

To achieve multiple instruction execution, multiple pipescan be put together to form a superscalar microprocessor. Asuperscalar microprocessor increases in complexity but allows

Page 2: Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: … · 2016-07-05 · Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426 367 BUEST, Baddi RIEECE -2016

BUEST, Baddi RIEECE-2016 368

multiple instructions to be executed in parallel. VLIWmicroprocessors use a long instruction word that is acombination of several operations combined into one singlelong instruction word. This allows a VLIW microprocessor toexecute multiple operations in parallel.

Although both superscalar pipeline and VLIWmicroprocessors can execute multiple instructions in parallel,each microprocessor is very different and has its own set ofadvantages and disadvantages.

A. Instruction layout for VLIW Instruction SetThe operation code consists of 8 bits, with the most

significant bit being a reserved bit for future expansion. Bits 7to 0 are used to represent the 36 different possible operations.Similarly, each internal register is assigned eight address bitswith the most two significant bits being a reserved bits forfuture expansion and we take it as zero bit for simplicity.

TABLE I. INSTRUCTION SET LAYOUTBit [31:24] Bit [23:16] Bit [15:8] Bit [7:0]

OperationCode

Source1address

Source2address

Destinationaddress

The columns for source1, source2 and destination addressare internal register addresses. The VLIW microprocessor has40 internal registers and each is defined with its own registeraddress.

B. Architectural Specifications of Processor

The microprocessor fetches instructions from an externalinstruction cache into its internal instruction buffers anddecoders. The instruction is then passed on to multipleexecution units which allows for multiple operations to beexecuted in parallel. The VLIW microprocessor can besimplified and architectured using a pipeline technology offour stages:

1. The VLIW microprocessor is architectured to takeadvantage of the pipeline technology.

2. Each 32-bit VLIW instruction word consists of fouroperations. To maximize the performance capability, thearchitecture is built to execute the four operations in parallel.Each operation is numbered and categorized as pipe1, pipe2,pipe3 and pipe4 with pipe1 operating operation 1, pipe2operating operation 2, pipe3 operating operation 3 and pipe4operating operation 4.

3. Each operation is split into four stages: fetch stage,decode stage, execute stage, and writeback stage. Four stagesare chosen to keep the architecture simple yet efficient. Thefetch stage fetches the VLIW instruction and data fromexternal devices such as memory. The decode stage decodesthe VLIW instruction to determine what operations each pipeneeds to execute. The execute stage executes the operationdecoded by the decode stage. The writeback stage (the laststage of the pipe) writes the results from the execution of theinstruction into internal registers.

4. All four operations share a set of forty 32-bit internalregisters, which forms a register file. During the decode stage,data are read from the register file and during writeback stage,data are written into the register file.

Fig. 1. VLIW Top level architecture [10]

Upon completion of execution of an operation, the finalstage (writeback stage) will write the results of the operationinto the register file, or read data to the output of the VLIWmicroprocessor for read operation. Figure 1 shows theinterface signal diagram of the VLIW microprocessor.

III. SIMULATION AND SYNTHESIS RESULTS

The VLIW microprocessor consists of four stages (fetch,decode, execute and writeback). For ease of understanding,each operation is numbered and categorized as pipe1, pipe2,pipe3 and pipe4 with pipe1 operating operation 1, pipe2operating operation 2, pipe3 operating operation 3 and pipe4operating operation 4. All three operations within the VLIWinstruction word have access to a forty 32-bit register file.

A. Excecute Unit

The execute module is the most complicated module in theVLIW microprocessor. Its functionality is to execute theoperations of the VLIW instruction.

Fig. 2. Data flow representation of Execute module using Modelsim

Fig. 3. Simulation result of Execution Unit

Page 3: Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: … · 2016-07-05 · Volume 3, Spl. Issue 1 (2 016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426 367 BUEST, Baddi RIEECE -2016

Volume 3, Spl. Issue 1 (2016) e-ISSN: 1694-2310 | p-ISSN: 1694-2426

369 BUEST, Baddi RIEECE -2016

Fig. 4. Chip layout of Execution unit

IV. CONCLUSION

In this paper we proposes a design and implementation ofExecute unit for VLIW processor. In this paper thearchitecture is based on ILP (Instruction level Parallelism) infour instructions are used in parallel. For verification of thisdesign Xilinx ISE 9.2i, Modelsim SE 5.7g and HDL designerof Mentor Graphics are used.

TABLE II. SYNTHESIS RESULTS OF EXECUTE UNIT VLIWMICROPROCESSOR.

TABLE III.Performance Execute

Bonded IO 683

Throughput 75 MIPS

Speed 74.995 Mhz

Estimated junctiontemperature

25 C

Gate Delay 10.703nsNet Delay 4.909ns

Power Consumption 334mW

Set Up Time 10.703ns

Gate Count 345793Hold Time 3.670ns

V. FUTURE WORK

In the present design execute unit for 32 bit VLIWProcessor is designed. In future we can extend this design for64 bit. In the present design forty 32bit general purposeregisters are used so in future design sixty four 32 bit registerscan be used. And the no. of instructions can also be increased.The processor supports only integer operations. So in futureone can design floating point ALU and 32 bit Floating Pointmultiplier because that will have wider range compared tointegers and provides more flexibility in scientific calculations.To increase the processor speed architecture should bepipelined. So registers are connected between every twodesign units. VLIW Processor can be designed based on lowpower. A VLIW microprocessor can also be designed for DSPapplications, MPEG (Motion Picture Expert Group)Audio/Video Applications.

REFERENCES[1] Samir Palnitkar, “Verilog HDL, A Guide to Digital Design and

Synthesis”, USA : Sun Microsystems Inc.- California, 2003.[2] M. Morries Mano, “Computer System Architecture”, Prentice-

Hall of India Private Limited, 1986.[3] Harry F. Jordan, “Computer System Design & Architecture”,

Prentice Hall; 2 edition, December 6, 2003.[4] Israel Koren, “Computer Arithmetic Algorithems”, A K

Peters/CRC Press; 2 edition, November 30, 2001.[5] Katherine Compton, “An Introduction to Reconfigurable

Computing”, Department of Electrical & ComputerEngineering, Northwestern University, April 2008.

[6] John L. Hennesy, David A. Patterson, “Computer Architecture,“A Quantative Approach”, Morgan Kaufmann; 5 edition,September 30, 2011.

[7] Fisher, Joseph A., Paolo Faraboschi, and Cliff Young.“Embedded Computing: A VLIW Approach to Architecture,Compilers and Tools”, New York: Morgan Kaufmann, 2004.

[8] Joseph A. Fisher , “Embedded Computing: A VLIW Approachto Architecture”, Morgan Kaufmann, 1st edition, December 31,2004.

[9] Kai Hwang, Faye, A. Briggs, “Computer Architecture andparallel processing”, Sung Kung Computer Book Company,1986.

[10] Stephen Wong, Thijs van As, “ρ-VEX: A Reconfigurable andExtensible VLIW Processor”, IEEE International Conference onField-Programmable Technology (ICFPT’08), 2008.

[11] R.Seshasayanan, Dr S.K.Srivatsa, “Implementation of NovelPipeline VLIW Architecture On FPGA”, International Journalof Computer Science and Security, vol 7 no.7, pp. 264-268, July2007.