sisteme integrate ver5

155
2009 – 2011 Cosmin Ionete Dragos Surlea Nicolae Neagu FACULTY OF AUTOMATION EMBEDDED SYSTEMS

Upload: cosmina-papa

Post on 13-Nov-2014

49 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Sisteme Integrate Ver5

2009 – 2011

Cosmin Ionete

Dragos Surlea

Nicolae Neagu

FACULTY OF

AUTOMATION EMBEDDED SYSTEMS

Page 2: Sisteme Integrate Ver5

Contents

1. Embedded Systems Architecture........................................................................................................... 6

1.1 What is an embedded system ? ---------------------------------------------------------------------------------6

1.2 Microprocessor and Microcontroller Architectures --------------------------------------------------------7

1.3 Microprocessor/Microcontroller Basics ---------------------------------------------------------------------11

1.3.1 What is a microprocessor? ..........................................................................................................11

1.3.1.2 Microprocessor Fundamentals..............................................................................................16

1.3.2 What is a microcontroller? ..........................................................................................................20

1.3.3 Some differences between microprocessors and microcontrollers ..............................................21

1.4 Compiling, Linking, and Locating ------------------------------------------------------------------------------25

1.4.1 The Build Process ....................................................................................................................25

1.4.2 Compiling................................................................................................................................27

1.4.3 Linking ....................................................................................................................................28

1.4.4 Locating ..................................................................................................................................29

1.4.5 Dowloading and Debugging ....................................................................................................29

1.4.6 Emulators ...............................................................................................................................31

2. Fixed points vs. Floating point numbers. Fundamentals............................................................................34

2.1 About Fixed-Point Numbers -----------------------------------------------------------------------------------------34

2.2 Scaling---------------------------------------------------------------------------------------------------------------------35

2.2.1 Quantization, Range and Precision..........................................................................................41

2.2.2 Recommendations for Arithmetic and Scaling.........................................................................45

3. Microcontroller CPU, Interupts, Memory, and I/O....................................................................................48

3.1 CPU – Central Processing Unit --------------------------------------------------------------------------------------48

3.2 Interrupts ----------------------------------------------------------------------------------------------------------------49

3.1.1.1 Vectored Interrupts & Non-Vectored Interrupts .............................................................52

3.1.1.2 Interrupt Priority ............................................................................................................53

3.1.1.3 Serial communication with polling and interrupts ...........................................................53

3.3 On-Chip Memory-------------------------------------------------------------------------------------------------------58

Page 3: Sisteme Integrate Ver5

3.3.1 Read-Only Memory (ROM) ..........................................................................................................60

3.3.2 Random-Access Memory (RAM) ..................................................................................................61

3.3.3 Hybrid Types ...............................................................................................................................63

3.4 I/O--------------------------------------------------------------------------------------------------------------------------65

3.4.1 Study of External Peripherals.......................................................................................................66

3.4.1.1 Initialize the Hardware ...................................................................................................66

3.4.2 Peripheral devices .......................................................................................................................67

3.4.2.1 Control and Status Registers .................................................................................................67

3.4.2.2 The Device Driver Philosophy................................................................................................68

5. Decodificarea adreselor. ..........................................................................................................................70

6. Flip-Flops, Registers, Counters .................................................................................................................74

6.1 Flip-Flops -----------------------------------------------------------------------------------------------------------------74

6.1.1 RS Flip-Flops ................................................................................................................................74

6.1.1 Gated D latch...............................................................................................................................75

6.1.2 Master-Slave and Edge-Triggered D Flip-Flops .............................................................................75

6.1.3 D Flip-Flops with Clear and Preset................................................................................................77

6.1.4 T Flip-Flop....................................................................................................................................79

6.1.5 T JK Flip-Flop ...............................................................................................................................80

6.2 Registers------------------------------------------------------------------------------------------------------------------81

6.2.1 Shift Register----------------------------------------------------------------------------------------------------------81

6.2.2 Parallel-Access Shift Register -------------------------------------------------------------------------------------82

6.3 Counters------------------------------------------------------------------------------------------------------------------83

6.3.1 Asynchronous Counters...............................................................................................................83

6.3.1.1 Up-Counter with T Flip-Flops.................................................................................................84

6.3.1.2 Down-Counter with T Flip-Flops............................................................................................85

6.3.2 Synchronous Counters.................................................................................................................85

6.3.2.1 Synchronous Counter with T Flip-Flops .................................................................................85

6.3.2.2 Synchronous Counter with D Flip-Flops.................................................................................87

Page 4: Sisteme Integrate Ver5

6.3.3 Counters with Parallel Load .........................................................................................................89

7. Timers/Counters ......................................................................................................................................91

3.3.1.1 Reloading a timer ...........................................................................................................92

3.3.1.2 Input Capture Timer .......................................................................................................92

3.3.1.3 Watchdog Timer .............................................................................................................92

3.3.1.4 Using Timers...................................................................................................................93

8. PWM Control ...........................................................................................................................................94

8.1 Examples and description--------------------------------------------------------------------------------------------94

8.2 Concepts of Pulse Width Modulation (PWM) ----------------------------------------------------------------- 101

8.3 PWM Study ------------------------------------------------------------------------------------------------------------ 104

9. DAC and ADC .........................................................................................................................................111

9.1 Digital-to-Analog Converters (DAC)------------------------------------------------------------------------------ 111

9.2 Analog-to-Digital Converters (ADC)------------------------------------------------------------------------------ 112

9.2.1 Reference Voltage .....................................................................................................................112

9.2.1 Resolution .................................................................................................................................112

10. Communication....................................................................................................................................113

10.1 UART------------------------------------------------------------------------------------------------------------------- 113

10.1.1 Synchronous Serial Transmission .............................................................................................113

10.1.2 Asynchronous Serial Transmission ...........................................................................................114

10.2 RS232 ------------------------------------------------------------------------------------------------------------------ 114

10.3 Serial Peripheral Interface ..........................................................................................................118

10.4 Local Interconnect Network (LIN) ...............................................................................................121

10.4 Controller Area Network ............................................................................................................126

11. IDE – Integrated Development Environment ..................................................................................134

11.1 Source Code Editor --------------------------------------------------------------------------------------------- 134

11.2 Compiler----------------------------------------------------------------------------------------------------------- 135

11.2.1 Front end..........................................................................................................................135

11.2.2 Back end ...........................................................................................................................136

Page 5: Sisteme Integrate Ver5

11.3 Linker--------------------------------------------------------------------------------------------------------------- 136

11.4 Debugger ---------------------------------------------------------------------------------------------------------- 137

12. Real-Time Operating Systems .........................................................................................................137

12.1 Introduction ------------------------------------------------------------------------------------------------------ 137

12.2 Defining an RTOS ------------------------------------------------------------------------------------------------ 137

12.3 The Scheduler ---------------------------------------------------------------------------------------------------- 139

12.3.1 Schedulable Entities..........................................................................................................139

12.3.2 Multitasking......................................................................................................................140

12.3.3 The Context Switch...........................................................................................................140

12.3.4 The Dispatcher..................................................................................................................141

12.3.5 Scheduling Algorithms ......................................................................................................141

12.4 Objects------------------------------------------------------------------------------------------------------------- 143

12.4.1 Tasks ................................................................................................................................143

12.4.1.1 Introduction .................................................................................................................143

12.4.1.2 Defining a Task .............................................................................................................143

12.4.1.3 Task States and Scheduling ...........................................................................................145

12.4.1.4 Typical Task Operations ................................................................................................148

12.4.2 Semaphores......................................................................................................................150

12.4.2.1 Introduction .................................................................................................................150

12.4.2.2 Defining Semaphores....................................................................................................150

12.4.2.3 Typical Semaphore Operations .....................................................................................154

12.5 Services ------------------------------------------------------------------------------------------------------------ 155

Page 6: Sisteme Integrate Ver5

1. Embedded Systems Architecture

1.1 What is an embedded system ?

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions. It is usually embedded as part of a complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such as a personal computer, can do many different tasks depending on programming. Since the embedded system is dedicated to specific tasks, design engineers can optimize it, reducing the size and cost of the product, or increasing the reliability and performance. Complexity varies from low, with a single microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large chassis or enclosure. In general, "embedded system" is not an exactly defined term, as many systems have some element of programmability. For example, Handheld computers share some elements with embedded systems - such as the operating systems and microprocessors which power them - but are not truly embedded systems, because they allow different applications to be loaded and peripherals to be connected. Some of the actual commercial applications of embedded systems include: Market Embedded Device

Ignition System Engine Control Brake System (Antilock Braking System) Automotive

Interior/Exterior Lights

Set-Top Boxes (DVDs, VCRs, Cable Boxes, etc.)

Kitchen Appliances (Refrigerators, Toasters, Microwave Ovens)

Cameras

Handheld tools

Remote control devices

Security systems

Global Positioning Systems (GPS)

Consumer

Electronics

Cordless and cellular phones

Industrial Control Robotics and Control Systems (Manufacturing) Electronic measurement instruments (e.g., digital multimeters, frequency synthesisers, and oscilloscopes) Infusion Pumps Dialysis Machines Prosthetic Devices Hearing aids

Medical

Cardiac Monitors Routers Hubs Networking Gateways Fax Machine Monitors Scanners Photocopier

Office Automation

Printers

Page 7: Sisteme Integrate Ver5

Selecting a particular processor for a given application is usually a function of the designer’s familiarity with a particular architecture. While there are many variations in the details and specific features, there are two general categories of devices: microprocessors and microcontrollers. The key difference between a microprocessor and a microcontroller is that a microprocessor contains only a central processing unit (CPU) while a microcontroller has memory and I/O on the chip in addition to a CPU. Microcontrollers are generally used for dedicated tasks. Microcomputer is a general term that applies to complete computer systems implemented with either a microprocessor or microcontroller.

1.2 Microprocessor and Microcontroller Architectures

Microprocessors are generally utilized for relatively high performance applications where cost and size are not critical selection criteria. Because microprocessor chips have their entire function dedicated to the CPU and thus have room for more circuitry to increase execution speed, they can achieve very high-levels of processing power. However, microprocessors require external memory and I/O hardware. Microprocessor chips are used in desktop PCs and workstations where software compatibility, performance, generality, and flexibility are important. By contrast, microcontroller chips are usually designed to minimize the total chip count and cost by incorporating memory and I/O on the chip. They are often “application specialized” at the expense of flexibility. In some cases, the microcontroller has enough resources on-chip that it is the only IC required for a product. Examples of a single-chip application include the key fob used to arm a security system, a toaster, or hand-held games. The hardware interfaces of both devices have much in common, and those of the microcontrollers are generally a simplified subset of the microprocessor. The primary design goals for each type of chip can be summarized this way: • microprocessors are most flexible • microcontrollers are most compact Microcontroller Architectures

Page 8: Sisteme Integrate Ver5

A. Princeton (Von Neumann) vs. Harvard

There are also differences in the basic CPU architectures used, and these tend to reflect the application. Microprocessor based machines usually have a von Neumann architecture with a single memory for both programs and data to allow maximum flexibility in allocation of memory. Microcontroller chips, on the other hand, frequently embody the Harvard

architecture, which has separate memories for programs and data. Figure 1.1 illustrates this difference.

CPU

Program and Data Memory

Data

Memory

Program

Memory

CPU

Figure 1.1 - At left is the von Neumann architecture; at right is the Harvard architecture

Page 9: Sisteme Integrate Ver5

Princeton architecture:

All memory space on same bus Every location has unique address so instructions and data treated the same way Possible bottleneck between instruction and data fetches Overcome with instruction prefetching (overlapping, pipelining) and/or Instruction/Data caches Simplifies processor design -- one memory interface More reliable -- fewer things can fail Also RAM can be used for both data and instruction storage Greater flexibility in design of software (esp. real-time OS)

One advantage the Harvard architecture has for embedded applications is due to the two types of memory used in embedded systems. A fixed program and constants can be stored in non-volatile ROM memory while working variable data storage can reside in volatile RAM. Volatile memory loses its contents when power is removed, but non-volatile ROM memory always maintains its contents even after power is removed. The Harvard architecture also has the potential advantage of a separate interface allowing twice the memory transfer rate by allowing instruction fetches to occur in parallel with data transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the CPU using a bus that limits the parallelism to a single bus. A typical embedded computer consists of the CPU, memory, and I/O. They are most often connected by means of a shared bus for communication. The peripherals on a microcontroller chip are typically timers, counters, serial or parallel data ports, and analog-to-digital and digital-to-analog converters that are integrated directly on the chip. The performance of these peripherals is generally less than that of dedicated peripheral chips, which are frequently used with microprocessor chips. However, having the bus connections, CPU, memory, and I/O functions on one chip has several advantages: - Fewer chips are required since most functions are already present on the processor chip. - Lower cost and smaller size result from a simpler design. - Lower power requirements because on-chip power requirements are much smaller than external loads. - Fewer external connections are required because most are made on-chip, and most of the chip connections can be used for I/O. - More pins on the chip are available for user I/O since they aren’t needed for the bus. - Overall reliability is higher since there are fewer components and interconnections. Of course there are disadvantages too, including: - Reduced flexibility since you can’t easily change the functions designed into the chip. - Expansion of memory or I/O is limited or impossible. - Limited data transfer rates due to practical size and speed limits for a single-chip. - Lower performance I/O because of design compromises to fit everything on one chip. The von Neumann machine, with only one memory, requires all instruction and data transfers to occur on the same interface. This is sometimes referred to as the “von Neumann bottleneck.” In common computer architectures, this is the primary upper limit to processor throughput. The Harvard architecture has the potential advantage of a separate interface allowing twice the memory transfer rate by allowing instruction fetches to occur in parallel with data transfers. Unfortunately, in most Harvard architecture machines, the memory is connected to the CPU using a bus that limits the parallelism to a single bus. The memory separation is still used to advantage in microcontrollers, as the program is usually stored in non-volatile memory (program is not lost when power is removed), and the temporary data storage is in volatile memory.

Page 10: Sisteme Integrate Ver5

Non-volatile memories, such as read-only memory (ROM) are used in both types of systems to store permanent programs. In a desktop PC, ROMs are used to store just the start-up or bootstrap programs and hardware specific programs. Volatile random access memory (RAM) can be read and written easily, but it loses its contents when power is removed. RAM is used to store both application programs and data in PCs that need to be able to run many different programs. In a dedicated embedded computer, however, the programs are stored permanently in ROM where they will always be available. Microcontroller chips that are used in dedicated applications generally use ROM for program storage and RAM for data storage.

B. CISC vs. RISC

• CISC (Complex Instruction Set Computer) Tend to have many instruction in instruction set Can carry out complex operations (many used very infrequently) Many are very long (many bits) And require many clock cycles • RISC (Reduced Instruction Set Computer) Few instructions Simple instructions Short (few bits) and fast Often orthogonal instruction sets Can read/write/use all registers in same way Allows for great power and flexibility Example PICmicro Many other microcontrollers use RISC

Some microprocessors offer both CISC and RISC features

C. Microcoded versus Hardwired processors

The under cover design of a processor

Microcoded Processor within a processor Signals required to execute instructions "fetched" from internal "Control ROM" memory Allows for great flexibility in instruction set Easier to design Slower than hardwired

Page 11: Sisteme Integrate Ver5

Hardwired Signals required to execute instruction generated by logic gates (combinational circuitry) The "control matrix" is: Faster Less flexible

1.3 Microprocessor/Microcontroller Basics

Microprocessor vs Micro-controllers • Microprocessors –high end of market where performance matters –high power dissipation –high cost–need peripheral devices to work –mostly used in microcomputers • Microcontollers –targeted for low end of market where performance does not matter –low power dissipation –low cost –memory plus I/O devices, all integrated into one chip –Mostly used in embedded systems

1.3.1 What is a microprocessor?

A device that integrates a number of useful functions into a single IC package Some functions are: - Ability to execute a stored set of instructions to carry out user defined tasks. - Ability to access external memory chips to read/write data from/to memory. - Ability to interface with I/O devices

There are three groups of signals, or buses, that connect the CPU to the other major components. The buses are: - Data bus - Address bus - Control bus

Page 12: Sisteme Integrate Ver5

• concepts of address and data is fundamental to the operation of the microprocessor

• memory -consists of locations uniquely identified by CPU through their address

• CPU communicates with those addresses to read and write the data

• the communications go via buses

• the CPU -responsible for control of address, data and control buses

• All devices attached to data bus -potential clash

• Devices connected to data buses can be driven to high-impedance states

• The ability of devices to set their output at either logic 1, logic 0 or in a high impedance state is an

essential feature of common bus systems and is termed a tristate device.

A. Data bus - to transfer the data associated with the processing function of the microprocessor. (8 lines, typically) The data bus width is defined as the number of bits that can be transferred on the bus at one time. This defines the processor’s “word size.” Many chip vendors define the word size based on the width of an internal data bus. A processor with eight data bus pins is an 8-bit CPU. Both instructions and data are transferred on the data bus one “word” at a time. This allows the re-use of the same connections for many different types of information. Due to packaging limitations, the number of connections or pins on a chip is limited. By sharing the pins in this way, the number of pins required is reduced at the expense of increased complexity in the external circuits. Many processors also take this a step further and share some or all of the data bus pins to carry address information as well. This is referred to as a multiplexed address/data bus. Processors that have multiplexed address/data buses require an external address latch to separate and hold the address information stable for the duration of a data transfer. The processor controls the direction of data transfer on the data bus(read/write). B. Address bus - contains the address of a specific memory location for accessing (reading/writing) stored data. 16, typically The address bus is a set of wires that are used to point to the memory or I/O location that is to be read from or written to. The address signals must generally be held at a constant value for some period of time before, during, and after the data is transferred. In most cases, the processor actively drives the address bus with either instruction or data addresses.

Page 13: Sisteme Integrate Ver5
Page 14: Sisteme Integrate Ver5

Memory Read and Write Cycles

• Hardware Control lines used by the CPU to Control reads and Writes to Memory

• Active low signal RD asserted for a Read Cycle

• Active Low signal WR indicates a write•RD and WR signals supply timing information to memory

device

Read cycle

• It lasts 2 cycles of the clock signal: 1. address of required memory location puton address bus (by CPU), at rising edge 2. while device held at ‘tristate’ level -control bus issues ‘read signal’ (active low) to the device (2nd cycle begins) 3. after delay -valid data placed on data bus

Page 15: Sisteme Integrate Ver5

4. levels on the data bus sampled by CPUat falling edgeof the 2nd cycle Write cycle

1. CPU places address at rising edge 2. decoding logic selects correct device

3. 2nd cycle -rising edge: CPU outputs data onto data bus & sets WRITE control bus signal active

(LOW) •Note:–memory devices & other I/O components have static logic -do not depend on clock signal-read data from data bus when write signal high (inactive) - data must be valid for transition

C. Control bus - carries the control signals to the memory and the I/O devices. Arbitrary number, often 15. The control bus is an assortment of signals that determine what kind of information is on the data bus and determines where the data will go, in conjunction with the address bus. Most of the design process is concerned with the logic and timing of the control signals. The timing analysis is primarily involved with the relative timing between these control signals and the appearance and disappearance of data and addresses on their respective buses.

Page 16: Sisteme Integrate Ver5

1.3.1.2 Microprocessor Fundamentals

–MPU Register set and Internal Architecture –MPU buses

Page 17: Sisteme Integrate Ver5

–Memory Considerations –MPU interfacing The CPU

• processes the data by executing a program stored in the memory • performs sequence of fetch-and-execute operations • consists of: – Control Unit + ALU + Registers • responsible for the control of address, data and control buses (a ‘master’) • all actions within µP synchronised to the CPU via a clock signal • clock signal = a logic square-wave to drive all the circuitry in the µP, typically 1 to 30 MHz or higher

The Control Unit

• determines timing and sequence of operations • generates timing signals which are used to fetch program instructions from memory and to execute it • also responsible for decoding instructions • supplies control signals to read and write data into registers, controls ALU and external control signals The ALU

• The arithmetic and logic unit (ALU) -responsible for data manipulation • arithmetic operations, logic operations (AND, OR, XOR etc.) • bit shifting, rotating, incrementing, decrementing, negate, complementing, addition etc.

Page 18: Sisteme Integrate Ver5

Registers

• Registers –data/adressesthat CPU currently uses -stored in special memory (Small and fast) locations on the CPU • accumulator register-input to ALU is stored temp and sometime I/O operations. It may be 8, 16, 32 bits wide • flags registeror status register–Individual bits in the register are called flags. Conditions of the latest ALU operations are reflected. Used by subsequent jump, branch instructions • general purpose register-temporary storage for data or addresses. Not assigned any specific task. • program counter-tracks CPUs position in program. Width of the program counter is same as address bus • instruction register-stores instruction where it can be decoded; not accessible by the programmer • index registers-hold the address of an operand when the indexed address mode is used • stack pointer register-holds the address of the next memory location in the stack in RAM. Stack -special area of RAM: last-in first-out (LIFO or FILO) file organisation. It is used during subroutine calls andinteruppts Types of registers:

Stack • Part of memory where program data can be stored by a simple PUSH operation • Restore data by a POP • Stack is in main memory and is defined by the program • Stack Pointer (SP) keeps track of the next location available on the Stack • Organised as a FILO Buffer General Registers

• Small set of internal registers -temporary data storage • CU ensures that data from the correct register is presented to the CPU • CU ensures that data is written back to correct register • Accumulator usually holds ALU result

Page 19: Sisteme Integrate Ver5

Status or Flags Register

• CF -Carry Flag •1 -there is a carry out from the most significant bit •0 -no carry out frommsb • PF -Parity Flag •1-low bye has an even number of 1 bits •0 -low byte has odd parity • AF -Auxiliary carry Flag •1 -carry out from bit 3 on addition •0 -borrow into bit 3 on addition • ZF -Zero Flag •1 -zero result •0 -non-zero result • SF -Sign Flag •1 -msbis 1 (negative) •0 -msbis 0 (positive) • TF -Trap Flag •Used by debuggers for single step operation •1 -Trap on •0 -Trap off • IF -Interrupt Flag •1 -Enabled •0 -Disabled • OF -Overflow Flag •1 -signed overflow occurred •0 -no overflow Flag bits are set by instructions

Flag bits are basis of conditional jump instructions

The program status word (PSW) is an area of memory or a hardware register which contains information about program state used by the operating system and the underlying hardware. It will normally include a pointer (address) to the next instruction to be executed. The program status word

Page 20: Sisteme Integrate Ver5

typically contains an error status field and condition codes such as the interrupt enable/disable bit and a supervisor / user mode bit.

PSW PSW contains information such as: condition code bits (set by various comparison instructions) CPU priority mode: user-mode: only a subset of instructions and features are accessible. kernel-mode: all instructions and features are accessible.

The program status word (PSW) is 32 bits in length and contains the information required for proper program execution. The PSW includes the instruction address, condition code, and other fields. In general, the PSW is used to control instruction sequencing and to hold and indicate the status of the system in relation to the program currently being executed. The active or controlling PSW is called the current PSW. By storing the current PSW during an interruption, the status of the CPU can be preserved for subsequent inspection. By loading a new PSW or part of a PSW, the state of the CPU can be initialized or changed

1.3.2 What is a microcontroller?

- Common component in modern electronic systems - Microprocessor-based device - Basically, a device which integrates a number of the components of a microprocessor system onto a single chip: Single-chip computer - Completely self-contained with memory and I/O - Only need to be supplied power and clocking

Primary role: provide inexpensive, programmable logic control and interfacing to external devices e.g., turns devices on/off, monitor external conditions

Microcontroller combines on the same chip:

• The CPU core • I/O • Memory: PROM/EPROM/EEPROM/Flash memory Variable RAM memory

Page 21: Sisteme Integrate Ver5

Most microcontrollers will also combine other devices such as:

• A Timer module to allow the microcontroller to perform tasks for certain time periods. • Serial I/O (UART) for data flow between microcontroller and devices such as a PC or other microcontroller. • Analog input and output (e.g., to receive data from sensors or control motors) • Interrupt capability (from a variety of sources) • Bus/external memory interfaces (for RAM or ROM) • Built-in monitor/debugger program • Support for external peripherals (e.g., I/O and bus extenders)

A typical microcontroller; the different sub units integrated onto the microcontroller chip.

The heart of the microcontroller is the CPU core

1.3.3 Some differences between microprocessors and microcontrollers

MP: suited to processing information in computer systems MC: suited to control of I/O devices requiring a minimum component count

Instruction sets: MP: processing intensive powerful addressing modes instructions to perform complex operations & manipulate large volumes of data processing capability of MCs never approaches those of MPs large instructions -- e.g., 80X86 7-byte long instructions MC: cater to control of inputs and outputs instructions to set/clear bits boolean operations (AND, OR, XOR, NOT, jump if a bit is set/cleared), etc. Extremely compact instructions, many implemented in one byte (Control program must often fit in the small, on-chip ROM)

Page 22: Sisteme Integrate Ver5

Instruction sets:

• The set of instructions given to the µP to execute a task is called an instruction set • Generally, instructions can be classified into the following categories: – Data transfer – Arithmetic – Logical – Program control • Differ depending on the manufacturer, but some are reasonably common to most µP's. A. Data transfer

1. Load

• reads the content of a specified memory location and copies it to the specified register location in the CPU 2. Store • copies the current contents of a specified register into a specified memory location. B. Arithmetic

3. Add

• Adds the contents of a specified memory location to the data in some register 4. Decrement

• subtracts 1 from the content of a specified location. 5. Compare

• indicates whether the contents of a register are greater than, less than or same as the contents of a specified memory location. The result appears as a flag in the status register. C. Logical 6. AND

• carries out the logical AND operation with the contents of a specified memory location and the data in some register 7. OR

• carries out the logical OR operation with the contents of a specified memory location and the data in some register 8. EXCLUSIVE OR-(similar to 6, but for exclusive OR) 9. Logical shift

• moving the pattern of bits in the register one place to the left or right by moving zero (0) to the end of the number 10. Arithmetic shift

• moving the pattern of bits one place left/right but with copying of the end bit into the vacancy created by shift D. Program control

•11. Jump

• changes the sequence in which the program is executed. So the program counter jumps to some specified location (other than sequential) 12. Branch

• a conditional instruction which might be 'branch if zero'or 'branch if plus'. It is followed if the right conditions are met. 13. Halt

• stops all further microprocessor activities

Page 23: Sisteme Integrate Ver5

Hardware & Instructionset support: MC: built-in I/O operations, event timing, enabling & setting up priority levels for interrupts caused by external stimuli MP: usually require external circuitry to do similar things (e.g, 8255 PPI, 8254 PIT, 8259 PIC)

Bus widths: MP: very wide large memory address spaces (>4 Gbytes) lots of data (Data bus: 32, 64, 128 bits wide) MC: narrow relatively small memory address spaces (typically kBytes) less data (Data bus typically 4, 8, 16 bits wide)

Clock rates: MP very fast (> 1 GHz) MC: Relatively slow (typically 10-20 MHz) since most I/O devices being controlled are relatively slow

Cost: MP's expensive (often > $100) MCs cheap (often $1 - $10) 4-bit: < $1.00

Page 24: Sisteme Integrate Ver5

8-bit: $1.00 - $8.00 16-32-bit: $6.00 - $20.00

Page 25: Sisteme Integrate Ver5

1.4 Compiling, Linking, and Locating

1.4.1 The Build Process

There are a lot of things that software development tools can do automatically when the target platform is well defined. This automation is possible because the tools can exploit features of the hardware and operating system on which your program will execute. For example, if all of your programs will be executed on IBM-compatible PCs running DOS, your compiler can automate-and, therefore, hide from your view-certain aspects of the software build process.

Page 26: Sisteme Integrate Ver5

Embedded software development tools, on the other hand, can rarely make assumptions about the target platform. Instead, the user must provide some of his own knowledge of the system to the tools by giving them more explicit instructions. The term "target platform" is best understood to include not only the hardware but also the operating system that forms the basic runtime environment for your software. If no operating system is present-as is sometimes the case in an embedded system-the target platform is simply the processor on which your program will be run.

The process of converting the source code representation of your embedded software into an executable binary image involves three distinct steps. First, each of the source files must be compiled or assembled into an object file. Second, all of the object files that result from the first step must be linked together to produce a single object file, called the relocatable program. Finally, physical memory addresses must be assigned to the relative offsets within the relocatable program in a process called relocation. The result of this third step is a file that contains an executable binary image that is ready to be run on the embedded system. The embedded software development process just described is illustrated in Figure below. In this figure, the three steps are shown from top to bottom, with the tools that perform them shown in boxes that have rounded corners. Each of these development tools takes one or more files as input and produces a single output file. More specific information about these tools and the files they produce is provided in the sections that follow.

Each of the steps of the embedded software build process is a transformation performed by software running on a general-purpose computer. To distinguish this development computer (usually a PC or Unix workstation) from the target embedded system, it is referred to as the host computer. In other words, the compiler, assembler, linker, and locator are all pieces of software that run on a host computer, rather than on the embedded system itself. Yet, despite the fact that they run on some other computer platform, these tools combine their efforts to produce an executable binary image that will execute properly only on the target embedded system. This split of responsibilities is shown in Figure below.

Page 27: Sisteme Integrate Ver5

1.4.2 Compiling

The job of a compiler is mainly to translate programs written in some human-readable language into an equivalent set of opcodes for a particular processor. In that sense, an assembler is also a compiler (you might call it an "assembly language compiler") but one that performs a much simpler one-to-one translation from one line of human-readable mnemonics to the equivalent opcode. Everything in this section applies equally to compilers and assemblers. Together these tools make up the first step of the embedded software build process. Of course, each processor has its own unique machine language, so you need to choose a compiler that is capable of producing programs for your specific target processor. In the embedded systems case, this compiler almost always runs on the host computer. It simply doesn't make sense to execute the compiler on the embedded system itself. A compiler such as this-that runs on one computer platform and produces code for another-is called a cross-compiler. The use of a cross-compiler is one of the defining features of embedded software development. Regardless of the input language (C/C++, assembly, or any other), the output of the cross-compiler will be an object file. This is a specially formatted binary file that contains the set of instructions and data resulting from the language translation process. Although parts of this file contain executable code, the object file is not intended to be executed directly. In fact, the internal structure of an object file emphasizes the incompleteness of the larger program. The contents of an object file can be thought of as a very large, flexible data structure. The structure of the file is usually defined by a standard format like the Common Object File Format (COFF) or Extended Linker Format (ELF). If you'll be using more than one compiler (i.e., you'll be writing parts of your program in different source languages), you need to make sure that each is capable of producing object files in the same format. Although many compilers (particularly those that run on Unix platforms) support standard object file formats like COFF and ELF ( gcc supports both), there are also some others that produce object files only in proprietary formats. If you're using one of the compilers in the latter group, you might find that you need to buy all of your other development tools from the same vendor. Most object files begin with a header that describes the sections that follow. Each of these sections contains one or more blocks of code or data that originated within the original source file. However, these blocks have been regrouped by the compiler into related sections. For example, all of the code blocks are collected into a section called text, initialized global variables (and their initial values) into a section called data, and uninitialized global variables into a section called bss. There is also usually a symbol table somewhere in the object file that contains the names and locations of all the variables and functions referenced within the source file. Parts of this table may be incomplete, however, because not all of the variables and functions are always defined in the

Page 28: Sisteme Integrate Ver5

same file. These are the symbols that refer to variables and functions defined in other source files. And it is up to the linker to resolve such unresolved references.

1.4.3 Linking

All of the object files resulting from step one (compiling) must be combined in a special way before the program can be executed. The object files themselves are individually incomplete, most notably in that some of the internal variable and function references have not yet been resolved. The job of the linker is to combine these object files and, in the process, to resolve all of the unresolved symbols. The output of the linker is a new object file that contains all of the code and data from the input object files and is in the same object file format. It does this by merging the text, data, and bss sections of the input files. So, when the linker is finished executing, all of the machine language code from all of the input object files will be in the text section of the new file, and all of the initialized and uninitialized variables will reside in the new data and bss sections, respectively. While the linker is in the process of merging the section contents, it is also on the lookout for unresolved symbols. For example, if one object file contains an unresolved reference to a variable named foo and a variable with that same name is declared in one of the other object files, the linker will match them up. The unresolved reference will be replaced with a reference to the actual variable. In other words, if foo is located at offset 14 of the output data section, its entry in the symbol table will now contain that address. The GNU linker (ld ) runs on all of the same host platforms as the GNU compiler. It is essentially a command-line tool that takes the names of all the object files to be linked together as arguments. For embedded development, a special object file that contains the compiled startup code must also be included within this list. Startup Code One of the things that traditional software development tools do automatically is to insert startup code. Startup code is a small block of assembly language code that prepares the way for the execution of software written in a high-level language. Each high-level language has its own set of expectations about the runtime environment. For example, C and C++ both utilize an implicit stack. Space for the stack has to be allocated and initialized before software written in either language can be properly executed. That is just one of the responsibilities assigned to startup code for C/C++ programs. Most cross-compilers for embedded systems include an assembly language file called startup.asm,

crt0.s (short for C runtime), or something similar. The location and contents of this file are usually described in the documentation supplied with the compiler. Startup code for C/C++ programs usually consists of the following actions, performed in the order described: 1. Disable all interrupts. 2. Copy any initialized data from ROM to RAM. 3. Zero the uninitialized data area. 4. Allocate space for and initialize the stack. 5. Initialize the processor's stack pointer. 6. Create and initialize the heap. 7. Execute the constructors and initializers for all global variables (C++ only). 8. Enable interrupts. 9. Call main. Typically, the startup code will also include a few instructions after the call to main. These instructions will be executed only in the event that the high-level language program exits (i.e., the call to main returns). Depending on the nature of the embedded system, you might want to use

Page 29: Sisteme Integrate Ver5

these instructions to halt the processor, reset the entire system, or transfer control to a debugging tool. Because the startup code is not inserted automatically, the programmer must usually assemble it himself and include the resulting object file among the list of input files to the linker. He might even need to give the linker a special command-line option to prevent it from inserting the usual startup code. If the same symbol is declared in more than one object file, the linker is unable to proceed. It will likely appeal to the programmer-by displaying an error message-and exit. However, if symbol reference instead remains unresolved after all of the object files have been merged, the linker will try to resolve the reference on its own. The reference might be to a function that is part of the standard library, so the linker will open each of the libraries described to it on the command line (in the order provided) and examine their symbol tables. If it finds a function with that name, the reference will be resolved by including the associated code and data sections within the output object file. After merging all of the code and data sections and resolving all of the symbol references, the linker produces a special "relocatable" copy of the program. In other words, the program is complete except for one thing: no memory addresses have yet been assigned to the code and data sections within. If you weren't working on an embedded system, you'd be finished building your software now. But embedded programmers aren't generally finished with the build process at this point. Even if your embedded system includes an operating system, you'll probably still need an absolutely located binary image. In fact, if there is an operating system, the code and data of which it consists are most likely within the relocatable program too. The entire embedded application-including the operating system-is almost always statically linked together and executed as a single binary image.

1.4.4 Locating

The tool that performs the conversion from relocatable program to executable binary image is called a locator. It takes responsibility for the easiest step of the three. In fact, you will have to do most of the work in this step yourself, by providing information about the memory on the target board as input to the locator. The locator will use this information to assign physical memory addresses to each of the code and data sections within the relocatable program. It will then produce an output file that contains a binary memory image that can be loaded into the target ROM. In many cases, the locator is a separate development tool. However, in the case of the GNU tools, this functionality is built right into the linker. Try not to be confused by this one particular implementation. Whether you are writing software for a general-purpose computer or an embedded system, at some point the sections of your relocatable program must have actual addresses assigned to them. In the first case, the operating system does it for you at load time. In the second, you must perform the step with a special tool. This is true even if the locator is a part of the linker. The memory information required by the GNU linker can be passed to it in the form of a linker script. Such scripts are sometimes used to control the exact order of the code and data sections within the relocatable program.

1.4.5 Dowloading and Debugging

Once you have an executable binary image stored as a file on the host computer, you will need a way to download that image to the embedded system and execute it. The executable binary image is usually loaded into a memory device on the target board and executed from there. And if you have the right tools at your disposal, it will be possible to set breakpoints in the program or to observe its execution in less intrusive ways. This chapter describes various techniques for downloading, executing, and debugging embedded software.

Page 30: Sisteme Integrate Ver5

One of the most obvious ways to download your embedded software is to load the binary image into a read-only memory device and insert that chip into a socket on the target board. Obviously, the contents of a truly read-only memory device could not be overwritten. However, embedded systems commonly employ special read-only memory devices that can be programmed (or reprogrammed) with the help of a special piece of equipment called a device programmer. A device programmer is a computer system that has several memory sockets on the top-of varying shapes and sizes-and is capable of programming memory devices of all sorts. In an ideal development scenario, the device programmer would be connected to the same network as the host computer. That way, files that contain executable binary images could be easily transferred to it for ROM programming. After the binary image has been transferred to the device programmer, the memory chip is placed into the appropriately sized and shaped socket and the device type is selected from an on-screen menu. The actual device programming process can take anywhere from a few seconds to several minutes, depending on the size of the binary image and the type of memory device you are using. After you program the ROM, it is ready to be inserted into its socket on the board. Of course, his shouldn't be done while the embedded system is still powered on. The power should be turned off and then reapplied only after the chip has been carefully inserted. As soon as power is applied to it, the processor will begin to fetch and execute the code that is stored inside the ROM. However, beware that each type of processor has its own rules about the location of its first instruction. If your program doesn't appear to be working, it could be there is something wrong with your reset code. You must always ensure that the binary image you've loaded into the ROM satisfies the target processor's reset rules. A development board includes a special in-circuit programmable memory, called Flash memory, that does not have to be removed from the board to be reprogrammed. In fact, software that can perform the device programming function, the monitor, is already installed in another memory device on the board. The board actually has two read-only memory devices: one (a true ROM) contains a simple program that allows the user to in-circuit program the other (a Flash memory device). All the host computers need to talk to the monitor program on a serial port and with a terminal program. The biggest disadvantage of this download technique is that there is no easy way to debug software that is executing out of ROM. The processor fetches and executes the instructions at a high rate of speed and provides no way for you to view the internal state of the program. This might be fine once you know that your software works and you're ready to deploy the system, but it's not very helpful during software development. Remote Debuggers

If available, a remote debugger can be used to download, execute, and debug embedded software over a serial port or network connection between the host and target. The frontend of a remote debugger looks just like any other debugger that you might have used. It usually has a text or GUI-based main window and several smaller windows for the source code, register contents, and other relevant information about the executing program. However, in the case of embedded systems, the debugger and the software being debugged are executing on two different computer systems. A remote debugger actually consists of two pieces of software. The frontend runs on the host computer and provides the human interface just described. But there is also a hidden backend that runs on the target processor and communicates with the frontend over a communications link of some sort. The backend provides for low-level control of the target processor and is usually called the debug monitor. Figure below shows how these two components work together.

Page 31: Sisteme Integrate Ver5

The debug monitor resides in ROM-having been placed there in the manner described earlier (either by you or at the factory)-and is automatically started whenever the target processor is reset. It monitors the communications link to the host computer and responds to requests from the remote debugger running there. Of course, these requests and the monitor's responses must conform to some predefined communications protocol and are typically of a very low-level nature. Examples of requests the remote debugger can make are "read register x," "modify register y," "read n bytes of memory starting at address," and "modify the data at address." The remote debugger combines sequences of these low-level commands to accomplish high-level debugging tasks like downloading a program, single-stepping through it, and setting breakpoints. Communication between the frontend and the debug monitor is byte-oriented and designed for transmission over a serial connection, RS232 or USB. Remote debuggers are one of the most commonly used downloading and testing tools during development of embedded software. This is mainly because of their low cost. Embedded software developers already have the requisite host computer. In addition, the price of a remote debugger frontend does not add significantly to the cost of a suite of cross-development tools (compiler, linker, locator, etc.). Finally, the suppliers of remote debuggers often desire to give away the source code for their debug monitors, in order to increase the size of their installed user base. As shipped, the Keil board includes a free debug monitor in Flash memory. Together with host software provided by Arcom, this debug monitor can be used to download programs directly into target RAM and execute them.

1.4.6 Emulators

Remote debuggers are helpful for monitoring and controlling the state of embedded software, but only an in-circuit emulator (ICE) allows you to examine the state of the processor on which that program is running. In fact, an ICE actually takes the place of - or emulates - the processor on your target board. It is itself an embedded system, with its own copy of the target processor, RAM, ROM, and its own embedded software. As a result, in-circuit emulators are usually pretty expensive-often more expensive than the target hardware. But they are a powerful tool, and in a tight debugging spot nothing else will help you get the job done better. Like a debug monitor, an emulator uses a remote debugger for its human interface. In some cases, it is even possible to use the same debugger frontend for both. But because the emulator has its own copy of the target processor it is possible to monitor and control the state of the processor in real time. This allows the emulator to support such powerful debugging features as hardware breakpoints and real-time tracing, in addition to the features provided by any debug monitor. With a debug monitor, you can set breakpoints in your program. However, these software breakpoints are restricted to instruction fetches-the equivalent of the command "stop execution if this instruction is about to be fetched." Emulators, by contrast, also support hardware breakpoints. Hardware breakpoints allow you to stop execution in response to a wide variety of events. These

Page 32: Sisteme Integrate Ver5

events include not only instruction fetches, but also memory and I/O reads and writes, and interrupts. For example, you might set a hardware breakpoint on the event "variable foo contains 15 and register AX becomes 0." Another useful feature of an in-circuit emulator is real-time tracing. Typically, an emulator incorporates a large block of special-purpose RAM that is dedicated to storing information about each of the processor cycles that are executed. This feature allows you to see in exactly what order things happened, so it can help you answer questions, such as, did the timer interrupt occur before or after the variable bar became 94? In addition, it is usually possible to either restrict the information that is stored or post-process the data prior to viewing it in order to cut down on the amount of trace data to be examined. ROM Emulators

One other type of emulator is worth mentioning at this point. A ROM emulator is a device that emulates a read-only memory device. Like an ICE, it is an embedded system that connects to the target and communicates with the host. However, this time the target connection is via a ROM socket. To the embedded processor, it looks like any other read-only memory device. But to the remote debugger, it looks like a debug monitor. ROM emulators have several advantages over debug monitors. First, no one has to port the debug monitor code to your particular target hardware. Second, the ROM emulator supplies its own serial or network connection to the host, so it is not necessary to use the target's own, usually limited, resources. And finally, the ROM emulator is a true replacement for the original ROM, so none of the target's memory is used up by the debug monitor code. Simulators and Other Tools

Of course, many other debugging tools are available to you, including simulators, logic analyzers, and oscilloscopes. A simulator is a completely host-based program that simulates the functionality and instruction set of the target processor. The human interface is usually the same as or similar to that of the remote debugger. In fact, it might be possible to use one debugger frontend for the simulator backend as well, as shown in Figure below. Although simulators have many disadvantages, they are quite valuable in the earlier stages of a project when there is not yet any actual hardware for the programmers to experiment with.

By far, the biggest disadvantage of a simulator is that it only simulates the processor. And embedded systems frequently contain one or more other important peripherals. Interaction with these devices can sometimes be imitated with simulator scripts or other workarounds, but such workarounds are often more trouble to create than the simulation is valuable. So you probably won't do too much with the simulator once you have the actual embedded hardware available to you.

Page 33: Sisteme Integrate Ver5

Once you have access to your target hardware-and especially during the hardware debugging-logic analyzers and oscilloscopes can be indispensable debugging tools. They are most useful for debugging the interactions between the processor and other chips on the board. Because they can only view signals that lie outside the processor, however, they cannot control the flow of execution of your software like a debugger or an emulator can. This makes these tools significantly less useful by themselves. But coupled with a software debugging tool like a remote debugger or an emulator, they can be extremely valuable. An oscilloscope is another piece of laboratory equipment for hardware debugging. But this one is used to examine any electrical signal, analog or digital, on any piece of hardware. Oscilloscopes are sometimes useful for quickly observing the voltage on a particular pin or, in the absence of a logic analyzer, for something slightly more complex. However, the number of inputs is much smaller (there are usually about four) and advanced triggering logic is not often available. As a result, it'll be useful to you only rarely as a software debugging tool. Most of the debugging tools described in this chapter will be used at some point or another in every embedded project. Oscilloscopes and logic analyzers are most often used to debug hardware problems - simulators during early stages of the software development, and debug monitors and emulators during the actual software debugging. To be most effective, you should understand what each tool is for and when and where to apply it for the greatest impact.

Programming Generally done in either the core's native assembly language or C Sometimes HLL support (often BASIC) is available Assemblers/Linkers often supplied free by the micro's manufacturer C compilers vary from free and very buggy to very expensive and only moderately buggy Environments generally not friendly or reliable

Downloading Program development usually done on a PC Software tools must produce a file to download to the MC's EPROM Several standard formats (e.g., binary, hex) EPROM burner often necessary Can download program to an EPROM emulator But to reprogram, must us an UV erasor first Flash memory programmers make this easier Very easy to reprogram with inexpensive "in-circuit debugger" Interacts with MC via 3 pins + power + ground Or can be programmed/debugged with a resident monitor program on-chip UART for communications with PC No burner or UV erasor needed No expensive quartz window required Expedites program-test-erase-reprogram code development cycle

Monitor A program module that communicates with PC software Typically uses a serial port to talk to a PC's terminal program Capabilities vary widely Usually can send/receive text and ASCII-converted numbers Often has commands to examine/change registers, memory locations, I/O ports

Page 34: Sisteme Integrate Ver5

2. Fixed points vs. Floating point numbers. Fundamentals

2.1 About Fixed-Point Numbers

Fixed-point numbers are stored in data types that are characterized by their word size in bits, binary point, and whether they are signed or unsigned. The Simulink® Fixed Point™ software supports integers, fractionals, and generalized fixed-point numbers. The main difference among these data types is their default binary point.

Note Fixed-point word sizes up to 128 bits are supported.

A common representation of a binary fixed-point number (either signed or unsigned) is shown in the following figure.

where

* bi are the binary digits (bits).

* The size of the word in bits is given by ws.

* The most significant bit (MSB) is the leftmost bit, and is represented by location bws – 1.

* The least significant bit (LSB) is the rightmost bit, and is represented by location b0.

* The binary point is shown four places to the left of the LSB.

Signed Fixed-Point Numbers

Computer hardware typically represents the negation of a binary fixed-point number in three different ways: sign/magnitude, one's complement, and two's complement. Two's complement is the preferred representation of signed fixed-point numbers and is supported by the Simulink Fixed Point software.

Negation using two's complement consists of a bit inversion (translation into one's complement) followed by the addition of a one. For example, the two's complement of 000101 is 111011.

Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the computer architecture.

Binary Point Interpretation

The binary point is the means by which fixed-point numbers are scaled. It is usually the software that determines the binary point. When performing basic math functions such as addition or subtraction, the hardware uses the same logic circuits regardless of the value of the scale factor. In

Page 35: Sisteme Integrate Ver5

essence, the logic circuits have no knowledge of a scale factor. They are performing signed or unsigned fixed-point binary algebra as if the binary point is to the right of b0.

Within the Simulink Fixed Point software, the main difference between fixed-point data types is the default binary point. For integers and fractionals, the binary point is fixed at the default value. For generalized fixed-point data types, you must either explicitly specify the scaling by configuring dialog box parameters, or inherit the scaling from another block. The sections that follow describe the supported fixed-point data types.

Integers

The default binary point for signed and unsigned integer data types is assumed to be just to the right of the LSB. You specify unsigned and signed integers with the uint and sint functions, respectively.

Fractionals

The default binary point for unsigned fractional data types is just to the left of the MSB, while for signed fractionals the binary point is just to the right of the MSB. If you specify guard bits, then they lie to the left of the binary point. You specify unsigned and signed fractional numbers with the ufrac and sfrac functions, respectively.

Generalized Fixed-Point Numbers

For signed and unsigned generalized fixed-point numbers, there is no default binary point. You specify unsigned and signed generalized fixed-point numbers with the ufix and sfix functions, respectively.

Note: You can also use the fixdt function to create integer, fractional, and generalized fixed-point objects.

2.2 Scaling

The dynamic range of fixed-point numbers is much less than that of floating-point numbers with equivalent word sizes. To avoid overflow conditions and minimize quantization errors, fixed-point numbers must be scaled.

With the Simulink Fixed Point software, you can select a fixed-point data type whose scaling is defined by its default binary point, or you can select a generalized fixed-point data type and choose an arbitrary linear scaling that suits your needs. This section presents the scaling choices available for generalized fixed-point data types.

A fixed-point number can be represented by a general [Slope Bias] encoding scheme

BQSV~~V +⋅==

where

* V is an arbitrarily precise real-world value.

* V~

is the approximate real-world value.

* Q is an integer that encodes V.

Page 36: Sisteme Integrate Ver5

* S = Fx2E is the slope.

* B is the bias.

The slope is partitioned into two components:

* 2E specifies the binary point. E is the fixed power-of-two exponent.

* F is the fractional slope. It is normalized such that 2F1 <≤ With the fractional slope, we can use

as value for the LSB of the number representation a value 1EE 2S2 +<≤ . The slope is the value

assigned to the LSB of the representation.

Note: S and B are constants and do not show up in the computer hardware directly—only the

quantization value Q is stored in computer memory (as a variable).

Binary-Point-Only Scaling

As the name implies, binary-point-only (or power-of-two) scaling involves moving only the binary point within the generalized fixed-point word. The advantage of this scaling mode is that the number of processor arithmetic operations is minimized.

With binary-point-only scaling, the components of the general [Slope Bias] formula have these values:

* F = 1

* S = 2E

* B = 0

That is, the scaling of the quantized real-world number is defined only by the slope S, which is restricted to a power of two.

In the Simulink Fixed Point software, you specify binary-point-only scaling with the syntax 2^-E where E is unrestricted. This creates a MATLAB® structure with a bias B = 0 and a fractional slope F = 1.0. For example, the syntax 2^-10 defines a scaling such that the binary point is at a location 10 places to the left of the least significant bit.

[Slope Bias] Scaling

When you scale by slope and bias, the slope S and bias B of the quantized real-world number can take on any value. You specify scaling by slope and bias with the syntax [slope bias], which creates a MATLAB structure with the given slope and bias. For example, a [Slope Bias] scaling specified by [5/9 10] defines a slope of 5/9 and a bias of 10. The slope must be a positive number.

E1 2F29

10

2

1

9

52

9

5⋅=⋅=

⋅=

Examples:

1. Let’s represent x = 3.3333e-002

Page 37: Sisteme Integrate Ver5

The number x is converted to a signed, 10-bit generalized fixed-point data type with binary-point-only scaling of 2-7 (that is, the binary point is located seven places to the left of the rightmost bit).

0.033333 0.

0.066666 0.0

0.133332 0.00

0.266664 0.000

0.533328 0.0000

1.066656 ->0.066656 0.00001

0.133312 0.000010

0.266624 0.0000100

0.533248 0.00001000

1.066496 -> 0.066496 0.000010001

0.132992 0.0000100010

etc

We use 10-bit generalized fixed-point data type with binary-point-only scaling of 2-7, so we use 7 bits for fractional representation and 3 bits for signed integer:

x = 000. 0000100010.... ~= 000. 0000100 /(010...) = 2-5 = 0.03125

2. Let’s represent x = 3.3333e-001

0.33333 0.

0.66666 0.0

1.33332 ->0.33332 0.01

0.66664 0.010

1.33328 -> 0.33328 0.0101

0.66656 0.01010

1.33312 -> 0. 33312 0.010101

0.66624 0.0101010

1.33248 -> 0. 33248 0. 01010101

0.66496 0. 010101010

1.32992 0. 0101010101

etc

x = 000. 0101010 (101.....)~= 000. 0101010 = 0.328125

Page 38: Sisteme Integrate Ver5

3. Let’s represent x = 3.3333e-003

0.0033333 0.

0.0066666 0.0

0.0133332 0.00

0.0266664 0.000

0.0533328 0.0000

0.1066656 0.00000

0.2133312 0.000000

0.4266624 0.0000000

0.8533248 0.00000000

1.766496 -> 0. 766496 0.000000001

1.532992 -> 0.532992 0.0000000011

etc

x = 0.0000000 /(011...) ~= 0

fi(v,s,w,f) returns a fixed-point object with value v, signedness s, word length w, and fraction length f.

fi(0.33333,1,10,7, 'RoundMode','floor')

ans =

0.328125000000000

DataTypeMode: Fixed-point: binary point scaling

Signed: true

WordLength: 10

FractionLength: 7

RoundMode: floor

OverflowMode: saturate

ProductMode: FullPrecision

MaxProductWordLength: 128

SumMode: FullPrecision

MaxSumWordLength: 128

CastBeforeSum: true

Page 39: Sisteme Integrate Ver5

>> 1/4+1/16+1/64

ans =

0.328125000000000

By default, the RoundMode is Nearest.

fi(0.33333,1,10,7)

ans =

0.335937500000000

>> (1/4+1/16+1/64)+1/128

ans =

0.335937500000000

Another example:

m= [3.3333e-005 3.3333e-006 3.3333e-007 3.3333e-008

3.3333e-004 3.3333e-005 3.3333e-006 3.3333e-007

3.3333e-003 3.3333e-004 3.3333e-005 3.3333e-006

3.3333e-002 3.3333e-003 3.3333e-004 3.3333e-005

3.3333e-001 3.3333e-002 3.3333e-003 3.3333e-004]

We use 10 bit word length wit 7 bits fraction

>>x=2^-7

x =

0.007812500000000

>>round(m/x)*x

ans =

0 0 0 0

0 0 0 0

0 0 0 0

0.031250000000000 0 0 0

0.335937500000000 0.031250000000000 0 0

The same result can be obtained with:

Page 40: Sisteme Integrate Ver5

>>fi(m,0,10,7)

>> M=m/m(5,1) %relative to the maximum value of the matrix

M =

0.0001 0.0000 0.0000 0.0000

0.0010 0.0001 0.0000 0.0000

0.0100 0.0010 0.0001 0.0000

0.1000 0.0100 0.0010 0.0001

1.0000 0.1000 0.0100 0.0010

>> fi(M,0,10,7)

ans =

0 0 0 0

0 0 0 0

0.0078 0 0 0

0.1016 0.0078 0 0

1.0000 0.1016 0.0078 0

>>fi(M,0,10,7)*round(m(5,1)/x)*x

ans =

0 0 0 0

0 0 0 0

0.0026 0 0 0

0.0341 0.0026 0 0

0.3359 0.0341 0.0026 0

Page 41: Sisteme Integrate Ver5

2.2.1 Quantization, Range and Precision

Introduction

The sections that follow describe the relationship between arithmetic operations and fixed-point scaling, and offer some basic recommendations that may be appropriate for your fixed-point design. For each arithmetic operation,

* The general [Slope Bias] encoding scheme described in Scaling is used.

* The scaling of the result is automatically selected based on the scaling of the two inputs. In other words, the scaling is inherited.

* Scaling choices are based on

o Minimizing the number of arithmetic operations of the result

o Maximizing the precision of the result

Additionally, binary-point-only scaling is presented as a special case of the general encoding scheme.

In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed. However for most other variables, the scaling is something you can choose to give the best design. When scaling fixed-point variables, it is important to remember that

* Your scaling choices depend on the particular design you are simulating.

* There is no best scaling approach. All choices have associated advantages and disadvantages. It is the goal of this section to expose these advantages and disadvantages to you.

From the previous analysis of fixed-point variables scaled within the general [Slope Bias] encoding scheme, you can conclude

* Addition, subtraction, multiplication, and division can be very involved unless certain choices are made for the biases and slopes.

* Binary-point-only scaling guarantees simpler math, but generally sacrifices some precision.

Note that the previous formulas don't show the following:

* Constants and variables are represented with a finite number of bits.

* Variables are either signed or unsigned.

* Rounding and overflow handling schemes. You must make these decisions before an actual fixed-point realization is achieved.

A. Quantization

The quantization Q of a real-world value V is represented by a weighted sum of bits.

Page 42: Sisteme Integrate Ver5

Within the context of the general [Slope Bias] encoding scheme, the value of an unsigned fixed-point quantity is given by

while the value of a signed fixed-point quantity is given by

where

* bi are binary digits, with bi = 1, 0.

* The word size in bits is given by ws, with ws = 1,2,3,...,128.

* S is given by Fx2E, where the scaling is unrestricted because the binary point does not have to be contiguous with the word.

bi are called bit multipliers and 2i are called the weights.

Example: Fixed-Point Format

The formats for 8-bit signed and unsigned fixed-point values are shown in the following figure.

Note that you cannot discern whether these numbers are signed or unsigned data types merely by inspection since this information is not explicitly encoded within the word.

The binary number 0011.0101 yields the same value for the unsigned and two's complement representation because the MSB = 0. Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the value is

Conversely, the binary number 1011.0101 yields different values for the unsigned and two's complement representation since the MSB = 1.

Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the unsigned value is

Page 43: Sisteme Integrate Ver5

while the two's complement value is

B. Range and Precision

The range of a number gives the limits of the representation, while the precision gives the distance between successive numbers in the representation. The range and precision of a fixed-point number depend on the length of the word and the scaling.

Range

The range of representable numbers for an unsigned and two's complement fixed-point number of size ws, scaling S, and bias B is illustrated in the following figure.

For both the signed and unsigned fixed-point numbers of any data type, the number of different bit patterns is 2ws.

For example, if the fixed-point data type is an integer with scaling defined as S = 1 and B = 0, then the maximum unsigned value is 2ws - 1, because zero must be represented. In two's complement, negative numbers must be represented as well as zero, so the maximum value is 2ws - 1- 1. Additionally, since there is only one representation for zero, there must be an unequal number of positive and negative numbers. This means there is a representation for

-2ws – 1 but not for 2ws - 1.

Precision

The precision (scaling) of integer and fractional data types is specified by the default binary point. For generalized fixed-point data types, the scaling must be explicitly defined as either [Slope Bias] or binary-point-only. In either case, the precision is given by the slope.

Page 44: Sisteme Integrate Ver5

Fixed-Point Data Type Parameters

The low limit, high limit, and default binary-point-only scaling for the supported fixed-point data types discussed in Binary Point Interpretation are given in the following table. See Limitations on Precision and Limitations on Range for more information.

Fixed-Point Data Type Range and Default Scaling

Range of an 8-Bit Fixed-Point Data Type — Binary-Point-Only Scaling

The precision, range of signed values, and range of unsigned values for an 8-bit generalized fixed-point data type with binary-point-only scaling follow. Note that the first scaling value (21) represents a binary point that is not contiguous with the word.

Range of an 8-Bit Fixed-Point Data Type — [Slope Bias] Scaling

The precision and range of signed and unsigned values for an 8-bit fixed-point data type using [Slope Bias] scaling follow. The slope starts at a value of 1.25 and the bias is 1.0 for all slopes. Note that the slope is the same as the precision.

Page 45: Sisteme Integrate Ver5

Fixed-Point Data Type and Scaling Notation

The following table provides a key for various symbols that may appear in Simulink products to indicate the data type and scaling of a fixed-point value.

2.2.2 Recommendations for Arithmetic and Scaling

Introduction

The sections that follow describe the relationship between arithmetic operations and fixed-point scaling, and offer some basic recommendations that may be appropriate for your fixed-point design. For each arithmetic operation,

* The general [Slope Bias] encoding scheme described in Scaling is used.

* The scaling of the result is automatically selected based on the scaling of the two inputs. In other words, the scaling is inherited.

* Scaling choices are based on

o Minimizing the number of arithmetic operations of the result

o Maximizing the precision of the result

Additionally, binary-point-only scaling is presented as a special case of the general encoding scheme.

In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed. However for most other variables, the scaling is something you can choose to give the best design. When scaling fixed-point variables, it is important to remember that

Page 46: Sisteme Integrate Ver5

* Your scaling choices depend on the particular design you are simulating.

* There is no best scaling approach. All choices have associated advantages and disadvantages. It is the goal of this section to expose these advantages and disadvantages to you.

Addition

Consider the addition of two real-world values:

These values are represented by the general [Slope Bias] encoding scheme described in Scaling:

In a fixed-point system, the addition of values results in finding the variable Qa:

This formula shows

* In general, Qa is not computed through a simple addition of Qb and Qc.

* In general, there are two multiplications of a constant and a variable, two additions, and some additional bit shifting.

Inherited Scaling for Speed

In the process of finding the scaling of the sum, one reasonable goal is to simplify the calculations. Simplifying the calculations should reduce the number of operations, thereby increasing execution speed. The following choices can help to minimize the number of arithmetic operations:

* Set Ba = Bb + Bc. This eliminates one addition.

* Set Fa = Fb or Fa = Fc. Either choice eliminates one of the two constant times variable multiplications.

The resulting formula is

These equations appear to be equivalent. However, your choice of rounding and precision may make one choice stand out over the other. To further simplify matters, you could choose Ea = Ec or Ea = Eb. This will eliminate some bit shifting.

Page 47: Sisteme Integrate Ver5

Inherited Scaling for Maximum Precision

In the process of finding the scaling of the sum, one reasonable goal is maximum precision. You can determine the maximum-precision scaling if the range of the variable is known. Example: Maximizing Precision shows that you can determine the range of a fixed-point operation from and . For a summation, you can determine the range from

You can now derive the maximum-precision slope:

In most cases the input and output word sizes are much greater than one, and the slope becomes

which depends only on the size of the input and output words. The corresponding bias is

The value of the bias depends on whether the inputs and output are signed or unsigned numbers.

If the inputs and output are all unsigned, then the minimum values for these variables are all zero and the bias reduces to a particularly simple form:

If the inputs and the output are all signed, then the bias becomes

Binary-Point-Only Scaling

For binary-point-only scaling, finding Qa results in this simple expression:

This scaling choice results in only one addition and some bit shifting. The avoidance of any multiplications is a big advantage of binary-point-only scaling.

Page 48: Sisteme Integrate Ver5

3. Microcontroller CPU, Interupts, Memory, and I/O

The interconnection between the CPU, memory, and I/O of the address and data buses is generally a one-to-one connection. The hard part is designing the appropriate circuitry to adapt the control signals present on each device to be compatible with that of the other devices. The most basic control signals are generated by the CPU to control the data transfers between the CPU and memory, and between the CPU and I/O devices. The four most common types of CPU controlled data transfers are: - CPU reads data/instructions from memory (memory read)

- CPU writes data to memory (memory write)

- CPU reads data from an input device (I/O read)

- CPU writes data to an output device (I/O write)

3.1 CPU – Central Processing Unit

The four major CPU components are: - the arithmetic logic unit (ALU) – The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs - registers – a type of fast memory - the control unit (CU) – The control unit is the circuitry that controls the flow of data through the processor, and coordinates the activities of the other units within it. In a way, it is the "brain within the brain". - the internal CPU buses – interconnect the ALU, registers, and the CU The Figure 1.2 presents the internal block diagram of the V850 CPU.

Figure 1.2 – Internal block diagram of V850ES CPU

- The general-purpose registers can be used to store a data variable or an address variable. - The program counter holds the instruction address during program execution. - The system registers control the status of the CPU and hold interrupt information

Page 49: Sisteme Integrate Ver5

- The program status word (PSW) is an area of memory or a hardware register which contains information about program state used by the operating system and the underlying hardware. It will normally include a pointer (address) to the next instruction to be executed. The program status word typically contains an error status field and condition codes such as the interrupt enable/disable bit and a supervisor / user mode bit.

Registers

Registers are simply a combination of various flip-flops that can be used to temporarily store data or to delay signals. A storage register is a form of fast programmable internal processor memory usually used to temporarily store, copy, and modify operands that are immediately or frequently used by the system. Shift registers delay signals by passing the signals between the various internal flip-flops with every clock pulse. Registers are made up of a set of flip-flops that can be activated either individually or as a set. In fact, it is the number of flip-flops in each register that is actually used to describe a processor (for example, a 32-bit processor has working registers that are 32 bits wide containing 32 flip-flops, a 16-bit processor has working registers that are 16 bits wide containing 16 flipflops, and so on). The number of flip-flops within these registers also determines the width of the data buses used in the system While ISA designs do not all use registers in the same way to process the data, storage typically falls under one of two categories, either general purpose or special purpose. General purpose registers can be used to store and manipulate any type of data determined by the programmer, whereas special purpose registers can only be used in a manner specified by the ISA, including holding results for specific types of computations, having predetermined flags (single bits within a register that can act and be controlled independently), acting as counters (registers that can be programmed to change states—that is, increment— asynchronously or synchronously after a specified length of time), and controlling I/O ports (registers managing the external I/O pins connected to the body of the processor and to board I/O). Shift registers are inherently special purpose, because of their limited functionality. The number of registers, the types of registers, and the size of the data that these registers can store (8-bit, 16-bit, 32-bit, and so forth) varies depending on the CPU, according to the ISA definitions. In the cycle of fetching and executing instructions, the CPU’s registers have to be fast, so as to quickly feed data to the ALU, for example, and to receive data from the CPUs internal data bus. Registers are also multi-ported so as to be able to both receive and transmit data to these CPU components.

3.2 Interrupts

Now that you know the names and addresses of the memory and peripherals attached to the processor, it is time to learn how to communicate with the latter. There are two basic communication techniques: polling and interrupts. In either case, the processor usually issues some sort of commands to the device-by way of the memory or I/O space-and waits for the device to complete the assigned task. For example, the processor might ask a timer to count down from 1000 to 0. Once the countdown begins, the processor is interested in just one thing: is the timer finished counting yet? If polling is used, then the processor repeatedly checks to see if the task has been completed. This is analogous to the small child who repeatedly asks "are we there yet?" throughout a long trip. Like the child, the processor spends a large amount of otherwise useful time asking the question and getting a negative response. To implement polling in software, you need only create a loop that reads the status register of the device in question. The second communication technique uses interrupts.

Page 50: Sisteme Integrate Ver5

An interrupt is an asynchronous electrical signal from a peripheral to the processor. When interrupts are used, the processor issues commands to the peripheral exactly as before, but then waits for an interrupt to signal completion of the assigned work. While the processor is waiting for the interrupt to arrive, it is free to continue working on other things. When the interrupt signal is finally asserted, the processor temporarily sets aside its current work and executes a small piece of software called the interrupt service routine (ISR). When the ISR completes, the processor returns to the work that was interrupted. Of course, this isn't all automatic. The programmer must write the ISR himself and "install" and enable it so that it will be executed when the relevant interrupt occurs. The first few times you do this, it will be a significant challenge. But, even so, the use of interrupts generally decreases the complexity of one's overall code by giving it a better structure. Rather than device polling being embedded within an unrelated part of the program, the two pieces of code remain appropriately separate. On the whole, interrupts are a much more efficient use of the processor than polling. The processor is able to use a larger percentage of its waiting time to perform useful work. However, there is some overhead associated with each interrupt. It takes a good bit of time-relative to the length of time it takes to execute an opcode-to put aside the processor's current work and transfer control to the interrupt service routine. Many of the processor's registers must be saved in memory, and lower-priority interrupts must be disabled. So in practice both methods are used frequently. Interrupts are used when efficiency is paramount or multiple devices must be monitored simultaneously. Polling is used when the processor must respond to some event more quickly than is possible using interrupts. DEFINITIONS

• Interrupt - Hardware-supported asynchronous transfer of control to an interrupt vector • Interrupt Vector - Dedicated location in memory that specifies address execution jumps to • Interrupt Handler - Code that is reachable from an interrupt vector • Interrupt Controller - Peripheral device that manages interrupts for the processor • Pending - Firing condition met and noticed but interrupt handler has not began to execute • Interrupt Latency - Time from interrupt’s firing condition being met and start of execution of interrupt handler • Nested Interrupt - Occurs when one interrupt handler preempts another • Reentrant Interrupt - Multiple invocations of a single interrupt handler are concurrently active

An interrupt is an asynchronous signal from hardware indicating the need for attention or a synchronous event in software indicating the need for a change in execution. Hardware interrupts are triggered by a physical event, such as the closure of a switch, that causes a specific subroutine to be called. They can be thought of as a sort of hardware initiated subroutine call. They can and do occur at any time in the program, depending on when the event occurs. These are referred to as asynchronous events because they may occur during the execution of any part of the program. Interrupts allow the programs to respond to an event when it occurs. A software interrupt is a special subroutine call. It is synchronous meaning that it always occurs at the same time and place in the program that is interrupted. It is frequently used as a quick and simple way to do a subroutine call for accessing programs such as the operating system and I/O programs. Software interrupts are usually implemented as instructions in the instruction set, which cause a context switch to an interrupt handler similar to a hardware interrupt. Interrupts can be categorized into: maskable interrupt (IRQ), non-maskable interrupt (NMI), interprocessor interrupt (IPI), software interrupt, and spurious interrupt. - A maskable interrupt (IRQ) is a hardware interrupt that may be ignored by setting a bit in an interrupt mask register's (IMR) bit-mask.

Page 51: Sisteme Integrate Ver5

- Likewise, a non-maskable interrupt (NMI) is a hardware interrupt that does not have a bit-mask associated with it - meaning that it can never be ignored. NMIs are often used for timers, especially watchdog timers. - An interprocessor interrupt is a special case of interrupt that is generated by one processor to interrupt another processor in a multiprocessor system. - A software interrupt is an interrupt generated within a processor by executing an instruction. Software interrupts are often used to implement System calls because they implement a subroutine call with a CPU ring level change. - A spurious interrupt is a hardware interrupt that is unwanted. They are typically generated by system conditions such as electrical interference on an interrupt line or through incorrectly designed hardware. An interrupt can notify the processor when an analog-to-digital converter (ADC) has new data, when a timer rolls over, when a direct memory access (DMA) transfer is complete, when another processor wants to communicate, or when almost any asynchronous event happens. The interrupt hardware is initialized and programmed by the system software. When an interrupt is acknowledged, that process is performed by hardware internal to the processor and the interrupt controller integrated circuit (IC) (if any).

When an interrupt occurs, the on-chip hardware performs the following functions: • It saves the program counter (the address the processor was executing when the interrupt occurred) on the stack. Some processors save other information as well, such as register contents. • It executes an interrupt acknowledge cycle to get a vector from the interrupting peripheral, depending on the processor and the specific type of interrupt. • It branches to a predetermined address specific to that particular interrupt. The destination address is the interrupt service routine (ISR, or sometimes ISP for interrupt service process). The ISR performs whatever functions are required and then returns. When the return code is executed, the processor performs the following tasks: • It retrieves the return address and any other saved information from the stack. • It resumes execution at the return address.

Page 52: Sisteme Integrate Ver5

The return address, in nearly all cases, is the address that would have been executed next if the interrupt had not occurred. If the implementation is correct the code that was interrupted will not even know that an interrupt occurred. The hardware part of this process occurs at hardware speed-microseconds or even tens of nanoseconds for a fast CPU with a high clock rate. Re-entrant code or a re-entrant routine is code that can be interrupted at any point when partially complete, then called by another process, and later return to the point where it was interrupted to complete the original function without any errors. Non-re-entrant code, however, cannot be interrupted and then called again without problems. An example of a program that is not re-entrant is one that uses a fixed memory address to store a temporary result. If the program is interrupted while the temporary variable is in use and then the routine is called again, the value in the temporary variable would be changed. When execution returns to the point where it was interrupted, the temporary variable will have the wrong value. In order to be re-entrant, a program must keep a separate copy of all internal variables for each invocation. Re-entrant code is required for any subroutines that must be available to more than one interrupt driven task. Interrupts can be processed between execution of instructions by the CPU any time they are enabled. Most CPUs check for the presence of an interrupt request at the end of every instruction. If interrupts are enabled, the processor saves the contents of the program counter (PC) on the stack, and loads the PC with the address of the ISR. Some CPUs allow certain instructions to be interrupted when they take a long time to process, such as a block move instruction.

3.1.1.1 Vectored Interrupts

& Non-Vectored Interrupts

Interrupt Map Most embedded systems have only a handful of interrupts. Associated with each of these are an interrupt pin (on the outside of the processor chip) and an ISR. In order for the processor to execute the correct ISR, a mapping must exist between interrupt pins and ISRs. This mapping usually takes the form of an interrupt vector table. The vector table is usually just an array of pointers to functions, located at some known memory address. The processor uses the interrupt type (a unique number associated with each interrupt pin) as its index into this array. The value stored at that location in the vector table is usually just the address of the ISR to be executed. It is important to initialize the interrupt vector table correctly. (If it is done incorrectly, the ISR might be executed in response to the wrong interrupt or never executed at all.) The first part of this process is to create an interrupt map that organizes the relevant information. An interrupt map is a table that contains a list of interrupt types and the devices to which they refer. This information should be included in the documentation provided with the board. In a vectored interrupt system, the interrupt request is accompanied by an identifier, referred to as a vector or interrupt vector number that defines the source of the interrupt. The vector is a pointer that is used as an index into a table known as the interrupt vector table. This table contains the addresses of the ISRs that are to be executed when the corresponding interrupts are processed. When a vectored interrupt is processed, the CPU goes through the following sequence of events to begin execution of the ISR: - After acknowledging the interrupt, the CPU receives the vector number. - The CPU converts the vector into a memory address in the vector table. - The ISR address is fetched from the vector table and placed in the program counter. For example, when an external event occurs, the interrupting device activates the IRQ input to the interrupt controller that then requests an interrupt cycle from the CPU. When the CPU acknowledges the interrupt, the interrupt controller passes the vector number to the CPU. The CPU converts the vector number to a memory address. This address points to the place in memory, which in turn contains the address of ISR. For systems with non-vectored interrupts, there is only one interrupt service routine entry point, and the ISR code must determine what caused the interrupt if there are multiple interrupt

Page 53: Sisteme Integrate Ver5

sources in the system. When an interrupt occurs a call to a fixed location is executed, and that begins execution of the ISR. It is possible to have multiple interrupts pointing to the same ISR. The first act of such an ISR is to determine which interrupt occurred and branch to the appropriate handler. Serial I/O ports frequently have one vector for transmit and receive interrupts.

3.1.1.2 Interrupt Priority

There are a number of variations in the way interrupts can be handled by the processor. These variations include how multiple interrupts are handled, if they can be turned off, and how they are triggered. Some processors allow multiple (nested) interrupts, meaning the CPU can handle multiple interrupts simultaneously. In other words, interrupts can interrupt interrupts. When multiple interrupts are sent to the CPU, some method must be used to determine which is handled first. Here are the most common prioritization schemes currently in use. - Fixed (static) multi-level priority. This uses a priority encoder to assign priorities, with the highest priority interrupt processed first. Nested interrupts allow an ISR itself to be interrupted by a higher-priority device. Interrupts from lower-priority devices are ignored until the higher-priority ISR is completed. This is the most common method of assigning priorities to interrupts. - Variable (dynamic) multi-level priority. One problem with fixed priority is that one type of event can “dominate” the CPU to the exclusion of other events. The solution is to rotate priority each time an event occurs. This ensures that no interrupt gets “locked out” and all interrupts will eventually be processed. This scheme is good for multi-user systems because eventually everyone gets priority. - Equal single-level priority. If an interrupt occurs with an interrupt, the new interrupt gains control of the processor.

3.1.1.3 Serial

communication with polling and interrupts

Depending on the interrupt strategy, the parts from the endless loop are structured differently.

Page 54: Sisteme Integrate Ver5

Initialization part

Serial communication initialization

Other initializations

Other initializations

Part 1

Serial receive part

Part 2

Serial transmit part

Part 3

Serial send/ receive without interrupts

If we want to detail further the serial receive/ transmit parts, we can implement:

Initialization part

Serial communication initialization

Other initializations

Other initializations

Part 1

Serial receive part

Part 2

Serial transmit part

Part 3

Byte received? No

Wait loop

Take byte from receive buffer

Byte transmited? No

Wait loop

Put byte in transmit buffer

Serial receive/ transmit without interrupts (polling)

Page 55: Sisteme Integrate Ver5

Initialization part

Serial communication initialization

Other initializations

Other initializations

Part 1

Serial receive part

Part 2

Serial transmit part

Part 3

Byte received?

Yes

Take byte from receive buffer

Byte transmited? No

Wait loop

Put byte in transmit buffer

No

Polling: CPU periodically checks each device to see if it needs service

• takes CPU time even when no requests pending • overhead may be reduced at expense of response time • can be efficient if events arrive rapidly “Polling is like picking up your phone every few seconds to see if you have a call. …”

The main drawback of this implementation is the fact that, during wait loops, some other program parts need to be executed. For example, if we don’t receive a byte, the program will stay forever in the receive loop. We can eliminate the wait loops from the receiving part, using the implementation:

Implementation of interrupts is the answer to the above problems.

Definition: An interrupt is an event external to the currently executing process that causes a change in the normal flow of instruction execution;

An interrupt is usually generated by hardware devices external to the CPU (UART for example)

• Key point is that interrupts are asynchronous w.r.t. current process

• Typically indicate that some device needs service

Page 56: Sisteme Integrate Ver5

Is very important to underline the fact that an interrupt must be first activated in order to be used. For using polling for a certain module, the interrupt corresponding to this module (UART for example) must be disabled. If not, the interrupt is served by default.

For example, from table below we can use serial communication in interrupt mode by setting EA = 1 and ES = 1. Then we have to write an interrupt service routine which is called by the interrupt.

For using the serial communication in polling mode, we have to set at least ES = 0 for disabling the specific serial interrupt, or to disable all interrupts by setting EA = 0.

Page 57: Sisteme Integrate Ver5

Part 1

Initialization part

Serial communication initialization (ES = 0)

Other initializations

Other initializations

Serial receive part

Part 2

Serial transmit part

Part 3

(RI = = 1) ?

Yes

char X = SBUF

No

RI = 0

RI = 1

HW for Serial communication

Byte received ?

Yes

TI = 1

Byte transmited ?

Yes

(TI = = 1) ?

Yes

SBUF = char X

No

TI = 0

SW for Serial communication

Serial communication in polling mode

Page 58: Sisteme Integrate Ver5

ISR

Transmit

SBUF = char X

TI = 0

Part 1

Initialization part

Serial communication initialization (ES = 1, EA = 1)

Other initializations

Other initializations

Part 2

Part 3

RI = 1

HW for Serial communication

Byte received ?

Yes

TI = 1

Byte transmited ?

Yes

ISR (Interrupt service routines)

ISR

Receive

char X = SBUF

RI = 0

Generate receive interrupt

Generate transmit interrupt

SW endless loop

3.3 On-Chip Memory

The CPU goes to memory to get what it needs to process, because it is in memory that all of the data and instructions to be executed by the system are stored. Embedded platforms have a memory hierarchy, a collection of different types of memory, each with unique speeds, sizes, and usages (see Figure 1.3). Some of this memory can be physically integrated on the processor, such as registers, read-only memory (ROM), certain types of random access memory (RAM) and level-1 cache.

Page 59: Sisteme Integrate Ver5

Figure 1.3 – Memory hierarchy

Types of Memory

Many types of memory devices are available for use in modern computer systems. As an embedded software engineer, you must be aware of the differences between them and understand how to use each type effectively. In our discussion, we will approach these devices from a software viewpoint. As you are reading, try to keep in mind that the development of these devices took several decades and that there are significant physical differences in the underlying hardware. The names of the memory types frequently reflect the historical nature of the development process and are often more confusing than insightful. Most software developers think of memory as being either random-access (RAM) or read-only (ROM). But, in fact, there are subtypes of each and even a third class of hybrid memories. In a RAM device, the data stored at each memory location can be read or written, as desired. In a ROM device, the data stored at each memory location can be read at will, but never written. In some cases, it is possible to overwrite the data in a ROM-like device. Such devices are called hybrid memories because they exhibit some of the characteristics of both RAM and ROM. Figures below provides a classification system for the memory devices that are commonly found in embedded systems.

Page 60: Sisteme Integrate Ver5

3.3.1 Read-Only Memory (ROM)

Types of ROM Memories in the ROM family are distinguished by the methods used to write new data to them (usually called programming) and the number of times they can be rewritten. This classification reflects the evolution of ROM devices from hardwired to one-time programmable to erasable-and-programmable. A common feature across all these devices is their ability to retain data and programs forever, even during a power failure. The very first ROMs were hardwired devices that contained a preprogrammed set of data or instructions. The contents of the ROM had to be specified before chip production, so the actual data could be used to arrange the transistors inside the chip! Hardwired memories are still used, though they are now called "masked ROMs" to distinguish them from other types of ROM. The main advantage of a masked ROM is a low production cost. Unfortunately, the cost is low only when hundreds of thousands of copies of the same ROM are required. One step up from the masked ROM is the PROM (programmable ROM), which is purchased in an unprogrammed state. If you were to look at the contents of an unprogrammed PROM, you would see that the data is made up entirely of 1's. The process of writing your data to the PROM involves a special piece of equipment called a device programmer. The device programmer writes data to the device one word at a time, by applying an electrical charge to the input pins of the chip. Once a PROM has been programmed in this way, its contents can never be changed. If the code or data stored in the PROM must be changed, the current device must be discarded. As a result, PROMs are also known as one-time programmable (OTP) devices. An EPROM (erasable-and-programmable ROM) is programmed in exactly the same manner as a PROM. However, EPROMs can be erased and reprogrammed repeatedly. To erase an EPROM, you simply expose the device to a strong source of ultraviolet light. (There is a "window" in the top of the device to let the ultraviolet light reach the silicon.) By doing this, you essentially reset the entire chip to its initial-unprogrammed-state. Though more expensive than PROMs, their ability to be reprogrammed makes EPROMs an essential part of the software development and testing process. On-chip ROM is memory integrated into a processor that contains data or instructions that remain even when there is no power in the system, due to a small, longer-life battery, and therefore is considered to be nonvolatile memory (NVM). The content of on-chip ROM usually can only be read by the system it is used in. The most common types of on-chip ROM include: - MROM (mask ROM), which is ROM (with data content) that is permanently etched into the microchip during the manufacturing of the processor, and cannot be modified later. - PROMs (programmable ROM), or OTPs (one-time programmables), which is a type of ROM that can be integrated on-chip, that is one-time programmable by a PROM programmer (in other words, it can be programmed outside the manufacturing factory). - EPROM (erasable programmable ROM), which is ROM that can be integrated on a processor, in which content can be erased and reprogrammed more than once (the number of times

Page 61: Sisteme Integrate Ver5

erasure and re-use can occur depends on the processor). The content of EPROM is written to the device using special separate devices and erased, either selectively or in its entirety using other devices that output intense ultraviolet light into the processor’s built-in window. - EEPROM (electrically erasable programmable ROM), which, like EPROM, can be erased and reprogrammed more than once. The number of times erasure and re-use can occur depends on the processor. Unlike EPROMs, the content of EEPROM can be written and erased without using any special devices while the embedded system is functioning.With EEPROMs, erasing must be done in its entirety, unlike EPROMs, which can be erased selectively. A cheaper and faster variation of the EEPROM is Flash memory. Where EEPROMs are written and erased at the byte level, Flash can be written and erased in blocks or sectors (a group of bytes). Like EEPROM, Flash can be erased while still in the embedded device.

3.3.2 Random-Access Memory (RAM)

RAM (random access memory), commonly referred to as main memory, is memory in which any location within it can be accessed directly (randomly, rather than sequentially from some starting point), and whose content can be changed more than once (the number depending on the hardware). Unlike ROM, contents of RAM are erased if RAM loses power, meaning RAM is volatile. The two main types of RAM are static RAM (SRAM) and dynamic RAM (DRAM). There are two important memory devices in the RAM family: SRAM and DRAM. The main difference between them is the lifetime of the data stored. SRAM (static RAM) retains its contents as long as electrical power is applied to the chip. However, if the power is turned off or lost temporarily then its contents will be lost forever. DRAM (dynamic RAM), on the other hand, has an extremely short data lifetime-usually less than a quarter of a second. This is true even when power is applied constantly. In short, SRAM has all the properties of the memory you think of when you hear the word RAM. Compared to that, DRAM sounds kind of useless. What good is a memory device that retains its contents for only a fraction of a second? By itself, such a volatile memory is indeed worthless. However, a simple piece of hardware called a DRAM controller can be used to make DRAM behave more like SRAM. (See DRAM Controllers later in this chapter.) The job of the DRAM controller is to periodically refresh the data stored in the DRAM. By refreshing the data several times a second, the DRAM controller keeps the contents of memory alive for as long as they are needed. So, DRAM is as useful as SRAM after all. DRAM Controllers

If your embedded system includes DRAM, there is probably a DRAM controller on board (or on-chip) as well. The DRAM controller is an extra piece of hardware placed between the processor and the memory chips. Its main purpose is to perform the refresh operations required to keep your data alive in the DRAM. However, it cannot do this properly without some help from you. One of the first things your software must do is initialize the DRAM controller. If you do not have any other RAM in the system, you must do this before creating the stack or heap. As a result, this initialization code is usually written in assembly language and placed within the hardware initialization module. Almost all DRAM controllers require a short initialization sequence that consists of one or more setup commands. The setup commands tell the controller about the hardware interface to the DRAM and how frequently the data there must be refreshed. To determine the initialization sequence for your particular system, consult the designer of the board or read the databooks that describe the DRAM and DRAM controller. If the DRAM in your system does not appear to be working properly, it could be that the DRAM controller either is not initialized or has been Jinitialized incorrectly. As shown in Figure 1.4, SRAM memory cells are made up of transistor-based flip-flop circuitry that typically holds its data due to a moving current being switched bi-directionally on a pair of inverting gates in the circuit, until power is cut off or the data is overwritten.

Page 62: Sisteme Integrate Ver5

Figure 1.4 – 6 Transistor SRAM cell

As shown in Figure 1.5, DRAM memory cells are circuits with capacitors that hold a charge in place (the charges or lack thereof reflecting data). DRAM capacitors need to be refreshed frequently with power in order to maintain their respective charges, and to recharge capacitors after DRAM is read (reading DRAM discharges the capacitor). The cycle of discharging and recharging of memory cells is why this type of RAM is called dynamic.

Figure 1.5 – DRAM (capacitor based) memory cell

One of the major differences between SRAM and DRAM lies in the makeup of the DRAM memory array itself. The capacitors in the memory array of DRAM are not able to hold a charge (data). The charge gradually dissipates over time, thus requiring some additional mechanism to refresh DRAM, in order to maintain the integrity of the data. This mechanism reads the data in DRAM before it is lost, via a sense amplification circuit that senses a charge stored within the memory cell, and writes it back onto the DRAM circuitry. Ironically, the process of reading the cell also discharges the capacitor, even though reading the cell in the first place is part of the process of correcting the problem of the capacitor gradually discharging in the first place. A memory controller in the embedded system typically manages a DRAM’s recharging and discharging cycle by initiating refreshes and keeping track of the refresh sequence of events. It is this refresh cycling mechanism that discharges and recharges memory cells that gives this type of RAM its name—“dynamic” RAM (DRAM)—and the fact that the charge in SRAM stays put is the basis for its name, “static” RAM (SRAM). It is this same additional recharge circuitry that makes DRAM slower in comparison to SRAM. (Note that SRAM is usually slower than registers, because the transistors within the flip-flop are usually smaller, and thus do not carry as much current as those typically used within registers.) SRAMs also usually consume less power than DRAMs, since no extra energy is needed for a refresh. On the flip side, DRAM is typically cheaper than SRAM, because of its capacitance based design, in comparison to its SRAM flip-flop counterpart (more than one transistor). DRAM also can hold more data than SRAM, since DRAM circuitry is much smaller than SRAM circuitry and more DRAM circuitry can be integrated into an IC. DRAM is usually the “main” memory in larger quantities, and is also used for video RAM and cache. DRAMs used for display memory are also commonly referred to as frame buffers. SRAM, because it is more expensive, is typically used in smaller quantities, but because it is also the

Page 63: Sisteme Integrate Ver5

fastest type of RAM, it is used in external cache and video memory (when processing certain types of graphics, and given a more generous budget, a system can implement a better-performing RAM). When deciding which type of RAM to use, a system designer must consider access time and cost. SRAM devices offer extremely fast access times (approximately four times faster than DRAM) but are much more expensive to produce. Generally, SRAM is used only where access speed is extremely important. A lower cost per byte makes DRAM attractive whenever large amounts of RAM are required. Many embedded systems include both types: a small block of SRAM (a few hundred kilobytes) along a critical data path and a much larger block of DRAM (in the megabytes) for everything else. Reading speed

Although the relative speed of RAM vs. ROM has varied over time, as of 2007 large RAM chips can be read faster than most ROMs. For this reason (and to make for uniform access), ROM content is sometimes copied to RAM or shadowed before its first use, and subsequently read from RAM. Writing speed

For those types of ROM that can be electrically modified, writing speed is always much slower than reading speed, and it may require unusually high voltage, the movement of jumper plugs to apply write-enable signals, and special lock/unlock command codes. Modern NAND Flash achieves the highest write speeds of any rewritable ROM technology, with speeds as high as 15 MiB/s (or 70 ns/bit), by allowing (indeed requiring) large blocks of memory cells to be written simultaneously.

3.3.3 Hybrid Types

As memory technology has matured in recent years, the line between RAM and ROM devices has blurred. There are now several types of memory that combine the best features of both. These devices do not belong to either group and can be collectively referred to as hybrid memory devices. Hybrid memories can be read and written as desired, like RAM, but maintain their contents without electrical power, just like ROM. Two of the hybrid devices, EEPROM and Flash, are descendants of ROM devices; the third, NVRAM, is a modified version of SRAM. EEPROMs are electrically-erasable-and-programmable. Internally, they are similar to EPROMs, but the erase operation is accomplished electrically, rather than by exposure to ultraviolet light. Any byte within an EEPROM can be erased and rewritten. Once written, the new data will remain in the device forever-or at least until it is electrically erased. The tradeoff for this improved functionality is mainly higher cost. Write cycles are also significantly longer than writes to a RAM, so you wouldn't want to use an EEPROM for your main system memory. Flash memory is the most recent advancement in memory technology. It combines all the best features of the memory devices described thus far. Flash memory devices are high density, low cost, nonvolatile, fast (to read, but not to write), and electrically reprogrammable. These advantages are overwhelming and the use of Flash memory has increased dramatically in embedded systems as a direct result. From a software viewpoint, Flash and EEPROM technologies are very similar. The major difference is that Flash devices can be erased only one sector at a time, not byte by byte. Typical sector sizes are in the range of 256 bytes to 16 kilobytes. Despite this disadvantage, Flash is much more popular than EEPROM and is rapidly displacing many of the ROM devices as well. The third member of the hybrid memory class is NVRAM (nonvolatile RAM). Nonvolatility is also a characteristic of the ROM and hybrid memories discussed earlier. However, an NVRAM is physically very different from those devices. An NVRAM is usually just an SRAM with a battery backup. When the power is turned on, the NVRAM operates just like any other SRAM. But when the power is turned off, the NVRAM draws just enough electrical power from the battery to retain its current contents. NVRAM is fairly common in embedded systems. However, it is very

Page 64: Sisteme Integrate Ver5

expensive-even more expensive than SRAM-so its applications are typically limited to the storage of only a few hundred bytes of system-critical information that cannot be stored in any better way.

Direct Memory Access

Direct memory access (DMA) is a technique for transferring blocks of data directly between two hardware devices. In the absence of DMA, the processor must read the data from one device and write it to the other, one byte or word at a time. If the amount of data to be transferred is large, or the frequency of transfers is high, the rest of the software might never get a chance to run. However, if a DMA controller is present it is possible to have it perform the entire transfer, with little assistance from the processor. Here's how DMA works. When a block of data needs to be transferred, the processor provides the DMA controller with the source and destination addresses and the total number of bytes. The DMA controller then transfers the data from the source to the destination automatically. After each byte is copied, each address is incremented and the number of bytes remaining is reduced by one. When the number of bytes remaining reaches zero, the block transfer ends and the DMA controller sends an interrupt to the processor. In a typical DMA scenario, the block of data is transferred directly to or from memory. For example, a network controller might want to place an incoming network packet into memory as it arrives, but only notify the processor once the entire packet has been received. By using DMA, the processor can spend more time processing the data once it arrives and less time transferring it between devices. The processor and DMA controller must share the address and data buses during this time, but this is handled automatically by the hardware and the processor is otherwise uninvolved with the actual transfer.

Memory Management

Goals:

Protect the programs from each other, and the kernel from the programs. Perform relocation

Relocation: User program thinks it has the whole address space from address 0x0 to 0xffffffff.

Page 65: Sisteme Integrate Ver5

Really it only has a part of the physical memory. Need to map virtual address into a physical address. This is performed by the MMU (Memory Management Unit), but the OS must configure it.

Context Switching

Whenever execution switches between a user program and the OS, a context switch occurs. The

operating system must now:

Save the PC, Stack Pointer, PSW.

Save the contents of the registers.

Reprogram the MMU registers.

Wait while the instructions in the CPU pipeline are trashed.

Wait for cache lines to load from new program’s memory.

Context switching is pretty expensive.

3.4 I/O

The entire point of an embedded microprocessor is to monitor or control some real-world event. To do this, the microprocessor must have I/O capability. Like a desktop computer without a monitor, printer, or keyboard, an embedded microprocessor without I/O is just a paperweight. The I/O from an embedded control system falls into two broad categories: digital and analog. However, at the microprocessor level, all I/O is digital. (Some microprocessor ICs have built-in ADCs, but the processor itself still works with digital values.) The simplest form of I/O is a register that the microprocessor can write to or a buffer that it can read. Most of the peripherals require the use of a certain set of pins on the processor. In many cases, the majority of those pins can be used for their specific function (serial port receiver, timer output, DMA control signal, etc.), or they can be programmed to just act as a simple input or output pin (PIO). This flexibility allows the silicon to be configured based on the needs of the design. For example, if you don’t need two serial ports (and the processor comes with two), then the pins that

Page 66: Sisteme Integrate Ver5

are allocated to the second port (RX2, TX2, and maybe DTR2, CTS2, etc…) can be programmed to function as simple PIO pins and used to drive an LED or read a switch. Programmable pins are sometimes referred to as dual function. Note that this dual functionality should not be assumed. How each pin is configured and the ability to configure it to run in different modes is dependent on the processor implementation.Often a pin name is chosen to reflect the pin’s dual personality. For example if RX2 can be configured as a serial port 2 receiver or as a PIO pin, then it will probably be labeled as RX2/PION (or something similar), where N is some number between one and M, and M is the number of PIO pins on the processor.Some microprocessors may be advertised as having a set of features but actually provide these features on dual-function pins. Hence, the full set of advertised features (two serial ports and 32 PIO lines) may not be simultaneously available (because the pins used for the second serial port are dual-functioned as PIO lines.

3.4.1 Study of External Peripherals

At this point, you've studied every aspect of the new hardware except the external peripherals. These are the hardware devices that reside outside the processor chip and communicate with it by way of interrupts and I/O or memory-mapped registers. Begin by making a list of the external peripherals. Depending on your application, this list might include LCD or keyboard controllers, A/D converters, network interface chips, or custom ASICs (Application-Specific Generated Circuits). In the case of the Arcom board, the list contains just three items: the Zilog 85230 Serial Controller, parallel port, and debugger port. You should obtain a copy of the user's manual or databook for each device on your list. At this early stage of the project, your goal in reading these documents is to understand the basic functions of the device. What does the device do? What registers are used to issue commands and receive the results? What do the various bits and larger fields within these registers mean? When, if ever, does the device generate interrupts? How are interrupts acknowledged or cleared at the device? When you are designing the embedded software, you should try to break the program down along device lines. It is usually a good idea to associate a software module called a device driver with each of the external peripherals. This is nothing more than a collection of software routines that control the operation of the peripheral and isolate the application software from the details of that particular hardware device.

3.4.1.1 Initialize the Hardware

The final step in getting to know your new hardware is to write some initialization software. This is your best opportunity to develop a close working relationship with the hardware-especially if you will be developing the remainder of the software in a high-level language. During hardware initialization it will be impossible to avoid using assembly language. However, after completing this step, you will be ready to begin writing small programs in C or C++. The hardware initialization should be executed before the startup code. The code described there assumes that the hardware has already been initialized and concerns itself only with creating a proper runtime environment for high-level language programs. Figure below provides an overview of the entire initialization process, from processor reset through hardware initialization and C/C++ startup code to main.

Page 67: Sisteme Integrate Ver5

The first stage of the initialization process is the reset code. This is a small piece of assembly (usually only two or three instructions) that the processor executes immediately after it is powered on or reset. The sole purpose of this code is to transfer control to the hardware initialization routine. The first instruction of the reset code must be placed at a specific location in memory, usually called the reset address, that is specified in the processor databook. Most of the actual hardware initialization takes place in the second stage. At this point, we need to inform the processor about its environment. This is also a good place to initialize the interrupt controller and other critical peripherals. Less critical hardware devices can be initialized when the associated device driver is started, usually from within main. Intel's 8051/80251 has several internal registers that must be programmed before any useful work can be done with the processor. These registers are responsible for setting up the memory and I/O maps and are part of the processor's internal chip-select unit. By programming the chip-select registers, you are essentially waking up each of the memory and I/O devices that are connected to the processor. Each chip-select register is associated with a single "chip enable" wire that runs from the processor to some other chip. The association between particular chip-selects and hardware devices must be established by the hardware designer. All you need to do is get a list of chip-select settings from him and load those settings into the chip-select registers. The third initialization stage contains the startup code, its job is to the prepare the way for code written in a high-level language. Of importance here is only that the startup code calls main. From that point forward, all of your other software can be written in C or C++.

3.4.2 Peripheral devices

In addition to the processor and memory, most embedded systems contain a handful of other hardware devices. Some of these devices are specific to the application domain, whileothers-like timers and serial ports-are useful in a wide variety of systems. The most generically useful of these are often included within the same chip as the processor and are called internal, or on-chip, peripherals. Hardware devices that reside outside the processor chip are, therefore, said to be external peripherals. In this chapter we'll discuss the most common software issues that arise when interfacing to a peripheral of either type.

3.4.2.1 Control and Status Registers

The basic interface between an embedded processor and a peripheral device is a set of control and status registers. These registers are part of the peripheral hardware, and their locations, size, and

Page 68: Sisteme Integrate Ver5

individual meanings are features of the peripheral. For example, the registers within a serial controller are very different from those in a timer/counter. In this section, I'll describe how to manipulate the contents of these control and status registers directly from your C/C++ programs. Depending upon the design of the processor and board, peripheral devices are located either in the processor's memory space or within the I/O space. In fact, it is common for embedded systems to include some peripherals of each type. These are called memory-mapped and I/O-mapped peripherals, respectively. Of the two types, memory-mapped peripherals are generally easier to work with and are increasingly popular. Memory-mapped control and status registers can be made to look just like ordinary variables. To do this, you need simply declare a pointer to the register, or block of registers, and set the value of the pointer explicitly. Note, however, that there is one very important difference between device registers and ordinary variables. The contents of a device register can change without the knowledge or intervention of your program. That's because the register contents can also be modified by the peripheral hardware. By contrast, the contents of a variable will not change unless your program modifies them explicitly. For that reason, we say that the contents of a device register are volatile, or subject to change without notice. The C/C++ keyword volatile should be used when declaring pointers to device registers. This warns the compiler not to make any assumptions about the data stored at that address. For example, if the compiler sees a write to the volatile location followed by another write to that same location, it will not assume that the first write is an unnecessary use of processor time. In other words, the keyword volatile instructs the optimization phase of the compiler to treat that variable as though its behavior cannot be predicted at compile time. The primary disadvantage of the other type of device registers, I/O-mapped registers, is that there is no standard way to access them from C or C++. Such registers are accessible only with the help of special machine-language instructions. And these processor-specific instructions are not supported by the C or C++ language standards. So it is necessary to use special library routines or inline assembly (as we did in Chapter 2) to read and write the registers of an I/O-mapped device.

3.4.2.2 The Device Driver Philosophy

When it comes to designing device drivers, you should always focus on one easily stated goal: hide the hardware completely. When you're finished, you want the device driver module to be the only piece of software in the entire system that reads or writes that particular device's control and status registers directly. In addition, if the device generates any interrupts, the interrupt service routine that responds to them should be an integral part of the device driver. In this section, I'll explain why is recommend this philosophy and how it can be achieved. Of course, attempts to hide the hardware completely are difficult. Any programming interface you select will reflect the broad features of the device. That's to be expected. The goal should be to create a programming interface that would not need to be changed if the underlying peripheral

were replaced with another in its general class. For example, all Flash memory devices share the concepts of sectors (though the sector size can differ between chips). An erase operation can be performed only on an entire sector, and once erased, individual bytes or words can be rewritten. So the programming interface provided by the Flash driver example in the last chapter should work with any Flash memory device. The specific features of the AMD 29F010 are hidden from that level, as desired. Device drivers for embedded systems are quite different from their workstation counterparts. In a modern computer workstation, device drivers are most often concerned with satisfying the requirements of the operating system. For example, workstation operating systems generally impose strict requirements on the software interface between themselves and a network card. The device driver for a particular network card must conform to this software interface, regardless of the

Page 69: Sisteme Integrate Ver5

features and capabilities of the underlying hardware. Application programs that want to use the network card are forced to use the networking API provided by the operating system and don't have direct access to the card itself. In this case, the goal of hiding the hardware completely is easily met. By contrast, the application software in an embedded system can easily access your hardware. In fact, because all of the software is linked together into a single binary image, there is rarely even a distinction made between application software, operating system, and device drivers. The drawing of these lines and the enforcement of hardware access restrictions are purely the responsibilities of the software developers. Both are design decisions that the developers must consciously make. In other words, the implementers of embedded software can more easily cheat on the software design than their non-embedded peers. The benefits of good device driver design are threefold. • First, because of the modularization, the structure of the overall software is easier to understand. • Second, because there is only one module that ever interacts directly with the peripheral's registers, the state of the hardware can be more accurately tracked. • And, last but not least, software changes that result from hardware changes are localized to the device driver. Each of these benefits can and will help to reduce the total number of bugs in your embedded software. But you have to be willing to put in a bit of extra effort at design time in order to realize such savings. If you agree with the philosophy of hiding all hardware specifics and interactions within the device driver, it will usually consist of the five components in the following list. To make driver implementation as simple and incremental as possible, these elements should be developed in the order in which they are presented. 1. A data structure that overlays the memory-mapped control and status registers of the device

The first step in the driver development process is to create a C-style struct that looks just like the memory-mapped registers of your device. This usually involves studying the data book for the peripheral and creating a table of the control and status registers and their offsets. Then, beginning with the register at the lowest offset, start filling out the struct. (If one or more locations are unused or reserved, be sure to place dummy variables there to fill in the additional space.) 2. A set of variables to track the current state of the hardware and device driver

The second step in the driver development process is to figure out what variables you will need to track the state of the hardware and device driver. For example, in the case of the timer/counter unit described earlier we'll probably need to know if the hardware has been initialized. And if it has been, we might also want to know the length of the running countdown. Some device drivers create more than one software device. This is a purely logical device that is implemented over the top of the basic peripheral hardware. For example, it is easy to imagine that more than one software timer could be created from a single timer/counter unit. The timer/counter unit would be configured to generate a periodic clock tick, and the device driver would then manage a set of software timers of various lengths by maintaining state information for each. 3. A routine to initialize the hardware to a known state

Once you know how you'll track the state of the physical and logical devices, it's time to start writing the functions that actually interact with and control the device. It is probably best to begin with the hardware initialization routine. You'll need that one first anyway, and it's a good way to get familiar with the device interaction. 4. A set of routines that, taken together, provide an API for users of the device driver

Page 70: Sisteme Integrate Ver5

After you've successfully initialized the device, you can start adding other functionality to the driver. Hopefully, you've already settled on the names and purposes of the various routines, as well as their respective parameters and return values. All that's left to do now is implement and test each one. We'll see examples of such routines in the next section. 5. One or more interrupt service routines

It's best to design, implement, and test most of the device driver routines before enabling interrupts for the first time. Locating the source of interrupt-related problems can be quite challenging. And, if you add possible bugs in the other driver modules to the mix, it could even approach impossible. It's far better to use polling to get the guts of the driver working. That way you'll know how the device works (and that it is indeed working) when you start looking for the source of your interrupt problems. And there will almost certainly be some of those.

5. Decodificarea adreselor.

1. Microprocessor-based System Design, Ricardo Gutierrez-Osuna,

Wright State University

Page 71: Sisteme Integrate Ver5
Page 72: Sisteme Integrate Ver5
Page 73: Sisteme Integrate Ver5
Page 74: Sisteme Integrate Ver5

6. Flip-Flops, Registers, Counters

6.1 Flip-Flops

6.1.1 RS Flip-Flops

When both inputs, R and S, are equal to 0 the latch maintains its existing state. This state may be either Qa = 0 and Qb = 1, or Qa = 1 and Qb = 0, which is indicated in the truth table by stating that the Qa and Qb outputs have values 0/1 and 1/0, respectively. Observe that Qa and Qb are complements of each other in this case. When R = 0 and S = 1, the latch is set into a state where Qa

= 1 and Qb = 0.

When R = 1 and S = 0, the latch is reset into a state where Qa = 0 and Qb = 1. The fourth possibility is to have R = S = 1. In this case both Qa and Qb will be 0.

The basic SR latch can serve as a useful memory element. It remembers its state when both the S

and R inputs are 0. It changes its state in response to changes in the signals on these inputs. The state changes occur at the time when the changes in the signals occur. If we cannot control the time of such changes, then we don’t know when the latch may change its state.

Gated SR Latch with NAND Gates

Page 75: Sisteme Integrate Ver5

6.1.1 Gated D latch

We describe another gated latch that is even more useful in practice. It has a single data input, called D, and it stores the value on this input, under the control of a clock signal. It is called a gated D

latch.

6.1.2 Master-Slave and Edge-Triggered D Flip-Flops

In the level-sensitive latches, the state of the latch keeps changing according to the values of input signals during the period when the clock signal is active (equal to 1 in our examples).

As we will see, there is also a need for storage elements that can change their states no more than once during one clock cycle. We will discuss two types of circuits that exhibit such behavior.

Page 76: Sisteme Integrate Ver5

Consider the circuit given above, which consists of two gated D latches. The first, called master, changes its state while Clock = 1. The second, called slave, changes its state while Clock = 0. The operation of the circuit is such that when the clock is high, the master tracks the value of the D input signal and the slave does not change. Thus the value of Qm follows any changes in D, and the value of Qs remains constant. When the clock signal changes to 0, the master stage stops following the changes in the D input. At the same time, the slave stage responds to the value of the signal Qm and changes state accordingly. Since Qm does not change while Clock = 0, the slave stage can undergo at most one change of state during a clock cycle. From the external observer’s point of view, namely, the circuit connected to the output of the slave stage, the master-slave circuit changes its state at the negative-going edge of the clock. The negative edge is the edge where the clock signal changes from 1 to 0. Regardless of the number of changes in the D input to the master stage during one clock cycle, the observer of the Qs signal will see only the change that corresponds to the D

input at the negative edge of the clock. The above circuit is called a master-slave D flip-flop. The term flip-flop denotes a storage element that changes its output state at the edge of a controlling clock signal. The timing diagram for this flip-flop and a graphical symbol are given also. In the symbol we use the > mark to denote that the flip-flop responds to the “active edge” of the clock. We place a bubble on the clock input to indicate that the active edge for this particular circuit is the negative edge.

Page 77: Sisteme Integrate Ver5

A positive-edge-triggered D flip-flop.

It requires only six NAND gates and, hence, fewer transistors. The operation of the circuit is as follows. When Clock = 0, the outputs of gates 2 and 3 are high. Thus P1 = P2 = 1, which maintains the output latch, comprising gates 5 and 6, in its present state. At the same time, the signal P3 is equal to D, and P4 is equal to its complement D. When Clock changes to 1, the following changes take place. The values of P3 and P4 are transmitted through gates 2 and 3 to cause P1 = D and P2 = D, which sets Q = D and Q = D. To operate reliably, P3 and P4 must be stable when Clock changes from 0 to 1. Hence the setup time of the flip-flop is equal to the delay from the D input through gates 4 and 1 to P3. The hold time is given by the delay through gate 3 because once P2 is stable, the changes in D no longer matter.

For proper operation it is necessary to show that, after Clock changes to 1, any further changes in D

will not affect the output latch as long as Clock = 1. We have to consider two cases. Suppose first that D = 0 at the positive edge of the clock. Then P2 = 0, which will keep the output of gate 4 equal to 1 as long as Clock = 1, regardless of the value of the D input. The second case is if D = 1 at the positive edge of the clock. Then P1 = 0, which forces the outputs of gates 1 and 3 to be equal to 1, regardless of the D input. Therefore, the flip-flop ignores changes in the D input while Clock = 1.

6.1.3 D Flip-Flops with Clear and Preset

Flip-flops are often used for implementation of circuits that can have many possible states, where the response of the circuit depends not only on the present values of the circuit’s inputs but also on the particular state that the circuit is in at that time. A simple example is a counter circuit that counts the number of occurrences of some event, perhaps passage of time. A counter comprises a number of flip-flops, whose outputs are interpreted as a number. The counter circuit has to be able to

Page 78: Sisteme Integrate Ver5

increment or decrement the number. It is also important to be able to force the counter into a known initial state (count).

Obviously, it must be possible to clear the count to zero, which means that all flip-flops must have Q = 0. It is equally useful to be able to preset each flip-flop to Q = 1, to insert some specific count as the initial value in the counter.

Page 79: Sisteme Integrate Ver5

6.1.4 T Flip-Flop

The D flip-flop is a versatile storage element that can be used for many purposes. By including some simple logic circuitry to drive its input, the D flip-flop may appear to be a different type of storage element. An interesting modification is presented below.

This circuit uses a positive-edge-triggered D flip-flop. The feedback connections make the input signal D equal to either the value of Q or Q under the control of the signal that is labeled T. On each positive edge of the clock, the flip-flop may change its state Q(t). If T = 0, then D = Q and the state will remain the same, that is, Q(t + 1) = Q(t). But if T = 1, then D = Q and the new state will be Q(t

+ 1) = Q(t). Therefore, the overall operation of the circuit is that it retains its present state if T = 0, and it reverses its present state if T = 1.

The operation of the circuit is specified in the form of a truth.

Any circuit that implements this truth table is called a T flip-flop. The name T flip-flop derives from the behavior of the circuit, which “toggles” its state when T = 1. The toggle feature makes the T flip-flop a useful element for building counter circuits.

Page 80: Sisteme Integrate Ver5

6.1.5 T JK Flip-Flop

Another interesting circuit can be derived from above figure. Instead of using a single control input, T, we can use two inputs, J and K, as indicated in Figure below. For this circuit the input D is defined as

D = JQ + KQ

A corresponding truth table is given in also. The circuit is called a JK flip-flop. It combines the behaviors of SR and T flip-flops in a useful way. It behaves as the SR flip-flop, where J = S and K = R, for all input values except J = K = 1. For the latter case, which has to be avoided in the SR flip-flop, the JK flip-flop toggles its state like the T flip-flop.

The JK flip-flop is a versatile circuit. It can be used for straight storage purposes, just like the D and SR flip-flops. But it can also serve as a T flip-flop by connecting the J and K inputs together.

Summary of Terminology

We have used the terminology that is quite common. But the reader should be aware that different interpretations of the terms latch and flip-flop can be found in the literature. Our terminology can be summarized as follows:

Basic latch is a feedback connection of two NOR gates or two NAND gates, which can store one bit of information. It can be set to 1 using the S input and reset to 0 using the R input.

Gated latch is a basic latch that includes input gating and a control input signal. The latch retains its existing state when the control input is equal to 0. Its state may be changed when the control signal

Page 81: Sisteme Integrate Ver5

is equal to 1. In our discussion we referred to the control input as the clock. We considered two types of gated latches:

• Gated SR latch uses the S and R inputs to set the latch to 1 or reset it to 0, respectively.

• Gated D latch uses the D input to force the latch into a state that has the same logic value as the D input.

A flip-flop is a storage element based on the gated latch principle, which can have its output state changed only on the edge of the controlling clock signal. We considered two types:

• Edge-triggered flip-flop is affected only by the input values present when the active edge of the clock occurs.

• Master-slave flip-flop is built with two gated latches. The master stage is active during half of the clock cycle, and the slave stage is active during the other half.

The output value of the flip-flop changes on the edge of the clock that activates the transfer into the slave stage. Master-slave flip-flops can be edge-triggered or level sensitive. If the master stage is a gated D latch, then it behaves as an edge-triggered flip-flop. If the master stage is a gated SR latch, then the flip-flop is level sensitive.

6.2 Registers

A flip-flop stores one bit of information. When a set of n flip-flops is used to store n bits of information, such as an n-bit number, we refer to these flip-flops as a register. A common clock is used for each flip-flop in a register, and each flip-flop operates as described in the previous sections. The term register is merely a convenience for referring to n-bit structures consisting of flip-flops.

6.2.1 Shift Register

We explained that a given number is multiplied by 2 if its bits are shifted one bit position to the left and a 0 is inserted as the new least-significant bit. Similarly, the number is divided by 2 if the bits are shifted one bit-position to the right. A register that provides the ability to shift its contents is called a shift register.

Figure below shows a four-bit shift register that is used to shift its contents one bit position to the right. The data bits are loaded into the shift register in a serial fashion using the In input. The contents of each flip-flop are transferred to the next flip-flop at each positive edge of the clock. An illustration of the transfer is given below, which shows what happens when the signal values at In

during eight consecutive clock cycles are 1, 0, 1, 1, 1, 0, 0, and 0, assuming that the initial state of all flip-flops is 0.

Page 82: Sisteme Integrate Ver5

To implement a shift register, it is necessary to use either edge-triggered or master-slave flip-flops. The level-sensitive gated latches are not suitable, because a change in the value of In would propagate through more than one latch during the time when the clock is equal to 1.

6.2.2 Parallel-Access Shift Register

In computer systems it is often necessary to transfer n-bit data items. This may be done by transmitting all bits at once using n separate wires, in which case we say that the transfer is performed in parallel. But it is also possible to transfer all bits using a single wire, by performing the transfer one bit at a time, in n consecutive clock cycles. We refer to this scheme as serial

transfer. To transfer an n-bit data item serially, we can use a shift register that can be loaded with all n bits in parallel (in one clock cycle). Then during the next n clock cycles, the contents of the register can be shifted out for serial transfer. The reverse operation is also needed. If bits are received serially, then after n clock cycles the contents of the register can be accessed in parallel as an n-bit item.

Figure below shows a four-bit shift register that allows the parallel access. Instead of using the normal shift register connection, the D input of each flip-flop is connected to two different sources. One source is the preceding flip-flop, which is needed for the shift register operation. The other source is the external input that corresponds to the bit that is

to be loaded into the flip-flop as a part of the parallel-load operation. The control signal Shift/Load is used to select the mode of operation. If Shift/Load = 0, then the circuit operates as a shift register. If

Page 83: Sisteme Integrate Ver5

Shift/Load = 1, then the parallel input data are loaded into the register. In both cases the action takes place on the positive edge of the clock.

We have chosen to label the flip-flops outputs as Q3, . . . ,Q0 because shift registers are often used to hold binary numbers. The contents of the register can be accessed in parallel by observing the outputs of all flip-flops. The flip-flops can also be accessed serially, by observing the values of Q0 during consecutive clock cycles while the contents are being shifted. A circuit in which data can be loaded in series and then accessed in parallel is called a series-to-parallel converter. Similarly, the opposite type of circuit is a parallel-to-series converter. The presented circuit can perform both of these functions.

6.3 Counters

Counter circuits are used in digital systems for many purposes. They may count the number of occurrences of certain events, generate timing intervals for control of various tasks in a system, keep track of time elapsed between specific events, and so on.

Counters can be implemented using the adder/substractor circuits. However, since we only need to change the contents of a counter by 1, it is not necessary to use such elaborate circuits. Instead, we can use much simpler circuits that have a significantly lower cost. We will show how the counter circuits can be designed using T and D flip-flops.

6.3.1 Asynchronous Counters

The simplest counter circuits can be built using T flip-flops because the toggle feature is naturally suited for the implementation of the counting operation.

Page 84: Sisteme Integrate Ver5

6.3.1.1 Up-Counter with T Flip-Flops

Figure 7.20a gives a three-bit counter capable of counting from 0 to 7. The clock inputs of the three flip-flops are connected in cascade. The T input of each flip-flop is connected to a constant 1, which means that the state of the flip-flop will be reversed (toggled) at each positive edge of its clock. We are assuming that the purpose of this circuit is to count the number of pulses that occur on the primary input called Clock. Thus the clock input of the first flip-flop is connected to the Clock line. The other two flip-flops have their clock inputs driven by the Q output of the preceding flip-flop. Therefore, they toggle their state whenever the preceding flip-flop changes its state from Q = 1 to Q = 0, which results in a positive edge of the Q signal.

Figure 7.20b shows a timing diagram for the counter. The value of Q0 toggles once each clock cycle. The change takes place shortly after the positive edge of the Clock signal. The delay is caused by the propagation delay through the flip-flop. Since the second flip-flop is clocked by Q0, the value of Q1 changes shortly after the negative edge of the Q0 signal.

Similarly, the value of Q2 changes shortly after the negative edge of the Q1 signal. If we look at the values Q2Q1Q0 as the count, then the timing diagram indicates that the counting sequence is 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, and so on. This circuit is a modulo-8 counter. Because it counts in the upward direction, we call it an up-counter.

The counter in Figure above has three stages, each comprising a single flip-flop. Only the first stage responds directly to the Clock signal; we say that this stage is synchronized to the clock. The other two stages respond after an additional delay. For example, when Count = 3, the next clock pulse will cause the Count to go to 4. As indicated by the arrows in the timing diagram, this change requires the toggling of the states of all three flip-flops. The change in Q0 is observed only after a propagation delay from the positive edge of Clock. The Q1 and Q2 flip-flops have not yet changed; hence for a brief time the count is Q2Q1Q0 = 010. The change in Q1 appears after a second propagation delay, at which point the count is 000. Finally, the change in Q2 occurs after a third

Page 85: Sisteme Integrate Ver5

delay, at which point the stable state of the circuit is reached and the count is 100. The circuit in Figure below is an asynchronous counter, or a ripple counter.

6.3.1.2 Down-Counter with T Flip-Flops

A slight modification of the above circuit is presented in below. The only difference is that the clock inputs of the second and third flip-flops are driven by the Q outputs of the preceding stages, rather than by the not-Q outputs.

The timing diagram shows that this circuit counts in the sequence 0, 7, 6, 5, 4, 3, 2, 1, 0, 7, and so on. Because it counts in the downward direction, we say that it is a down-counter.

It is possible to combine the functionality of the circuits above circuits to form a counter that can count either up or down. Such a counter is called an up/down-counter.

6.3.2 Synchronous Counters

The asynchronous counters are simple, but not very fast. If a counter with a larger number of bits is constructed in this manner, then the delays caused by the cascaded clocking scheme may become too long to meet the desired performance requirements. We can build a faster counter by clocking all flip-flops at the same time, using the approach described below.

6.3.2.1 Synchronous Counter with T Flip-Flops

Table 7.1 shows the contents of a three-bit up-counter for eight consecutive clock cycles, assuming that the count is initially 0. Observing the pattern of bits in each row of the table, it is apparent that bit Q0 changes on each clock cycle. Bit Q1 changes only when Q0 = 1. Bit Q2 changes only when both Q1 and Q0 are equal to 1. In general, for an n-bit up-counter, a given flip-flop changes its state only when all the preceding flip-flops are in the state Q = 1. Therefore, if we use T flip-flops to realize the counter, then the T inputs are defined as

Page 86: Sisteme Integrate Ver5

T0 = 1

T1 = Q0

T2 = Q0Q1

T3 = Q0Q1Q2

Tn = Q0・Q1 ・ ・ ・Qn−1

Instead of using AND gates of increased size for each stage, which may lead to fan-in problems, we use a factored arrangement, as shown in the figure. This arrangement does not slow down the response of the counter, because all flip-flops change their states after a propagation delay from the positive edge of the clock. Note that a change in the value of

Q0 may have to propagate through several AND gates to reach the flip-flops in the higher stages of the counter, which requires a certain amount of time. This time must not exceed the clock period. Actually, it must be less than the clock period minus the setup time for the flip-flops.

Page 87: Sisteme Integrate Ver5

Enable and Clear Capability

The above counters change their contents in response to each clock pulse. Often it is desirable to be able to inhibit counting, so that the count remains in its present state. This may be accomplished by including an Enable control signal, as indicated below.

The circuit is the counter where the Enable signal controls directly the T input of the first flip-flop. Connecting the Enable also to the AND gate chain means that if Enable = 0, then all T inputs will be equal to 0. If Enable = 1, then the counter operates as explained previously.

In many applications it is necessary to start with the count equal to zero. This is easily achieved if the flip-flops can be cleared. The clear inputs on all flip-flops can be tied together and driven by a Clear control input.

6.3.2.2 Synchronous Counter with D Flip-Flops

While the toggle feature makes T flip-flops a natural choice for the implementation of counters, it is also possible to build counters using other types of flip-flops. The JK flip-flops can be used in exactly the same way as the T flip-flops because if the J and K inputs are tied together, a JK flip-flop becomes a T flip-flop. We will now consider using D flip-flops for this purpose.

It is not obvious how D flip-flops can be used to implement a counter. Here we will present a circuit structure that meets the requirements. We gives a four-bit up-counter that counts in the sequence 0, 1, 2, . . . , 14, 15, 0, 1, and so on. The count is indicated by the flip-flop outputs Q3Q2Q1Q0. If we assume that Enable = 1, then the D inputs of the flip-flops are defined by the expressions

D0 = Q0 = 1 ⊕ Q0

D1 = Q1 ⊕ Q0

D2 = Q2 ⊕ Q1Q0

D3 = Q3 ⊕ Q2Q1Q0

For a larger counter the ith stage is defined by

Di = Qi ⊕ Qi−1Qi−2・・・Q1Q0

We will show how to derive these equations in Chapter 8.

Page 88: Sisteme Integrate Ver5

We have included the Enable control signal so that the counter counts the clock pulses only if Enable = 1. In effect, the above equations are modified to implement the circuit in the figure as follows

D0 = Q0 ⊕ Enable

D1 = Q1 ⊕ Q0 ・ Enable

D2 = Q2 ⊕ Q1 ・ Q0 ・ Enable

D3 = Q3 ⊕ Q2 ・Q1 ・Q0 ・Enable

The operation of the counter is based on our observation for that the state of the flip-flop in stage i changes only if all preceding flip-flops are in the state Q = 1. This makes the output of the AND gate that feeds stage i equal to 1, which causes the output of the XOR gate connected to Di to be equal to Qi . Otherwise, the output of the XOR gate provides Di = Qi , and the flip-flop remains in the same state.

This resembles the carry propagation in a carry-look-ahead adder circuit; hence the AND-gate chain can be thought of as the carry chain. Even though the circuit is only a four-bit counter, we have included an extra AND that produces the “output carry.” This signal makes it easy to concatenate two such four-bit counters to create an eight-bit counter.

Page 89: Sisteme Integrate Ver5

6.3.3 Counters with Parallel Load

Often it is necessary to start counting with the initial count being equal to 0. This state can be achieved by using the capability to clear the flip-flops. But sometimes it is desirable to start with a different count. To allow this mode of operation, a counter circuit must have some inputs through which the initial count can be loaded.

Using the Clear and Preset inputs for this purpose is a possibility, but a better approach is discussed below.

A two-input multiplexer is inserted before each D input. One input to the multiplexer is used to provide the normal counting operation. The other input is a data bit that can be loaded directly into the flip-flop. A control input, Load, is used to choose the mode of operation. The circuit counts when Load = 0. A new initial value, D3D2D1D0, is loaded into the counter when Load = 1.

Reset Synchronization

We have already mentioned that it is important to be able to clear, or reset, the contents of a counter prior to commencing a counting operation. This can be done using the clear capability of the individual flip-flops. But we may also be interested in resetting the count to 0 during the normal

Page 90: Sisteme Integrate Ver5

counting process. An n-bit up-counter functions naturally as a modulo- 2n counter. Suppose that we wish to have a counter that counts modulo some base that is not a power of 2. For example, we may want to design a modulo-6 counter, for which the counting sequence is 0, 1, 2, 3, 4, 5, 0, 1, and so on.

The most straightforward approach is to recognize when the count reaches 5 and then reset the counter. An AND gate can be used to detect the occurrence of the count of 5.

Actually, it is sufficient to ascertain that Q2 = Q0 = 1, which is true only for 5 in our desired counting sequence. A circuit based on this approach is given below. It uses a three-bit synchronous counter of the type depicted in Figure 7.25. The parallel-load feature of the counter is used to reset its contents when the count reaches 5. The resetting action takes place at the positive clock edge after the count has reached 5. It involves loading D2D1D0 = 000 into the flip-flops. As seen in the timing diagram in Figure 7.26b, the desired counting sequence is achieved, with each value of the count being established for one full clock cycle. Because the counter is reset on the active edge of the clock, we say that this type of counter has a synchronous reset.

Page 91: Sisteme Integrate Ver5

The flip-flops are cleared to 0 a short time after the NAND gate has detected the count of 5. This time depends on the gate delays in the circuit, but not on the clock. Therefore, signal values Q2Q1Q0 = 101 are maintained for a time that is much less than a clock cycle. Depending on a particular application of such a counter, this may be adequate, but it may also be completely unacceptable. For example, if the counter is used in a digital system where all operations in the system are synchronized by the same clock, then this narrow pulse denoting Count = 5 would not be seen by the rest of the system. To solve this problem, we could try to use a modulo-7 counter instead, assuming that the system would ignore the short pulse that denotes the count of 6. This is not a good way of designing circuits, because undesirable pulses often cause unforeseen difficulties in practice.

7. Timers/Counters

Timers and counters, which are present in most microcontroller chips, allow generation of pulses and interrupts at regular intervals. They can also be used to count pulses and measure event timing. Some of the more sophisticated versions can measure frequency, pulse width, and relative pulse timing on inputs. Outputs can be defined to have a given repetition rate, pulse width, and even complex sequences of pulses in some cases. A simple timer consists of a simple, loadable 8-bit counter. You could build this from a couple of 74HC161 counters or equivalent PLD logic. The microprocessor can write a value to the timer that is transferred to the counter outputs. If the counter is an UP counter, it counts up. A DOWN counter counts down. A typical timer embedded in a microcontroller or in a timer IC will have some means to start the timer once it is loaded, typically by setting a bit in a register. The clock input to the counter may be a derivative of the microprocessor clock or it may be a signal applied to one of the external pins. A real timer will also provide the outputs of the counter to the microprocessor so it can read the count. If the

Page 92: Sisteme Integrate Ver5

microprocessor loads this timer with a value of 0xFE and then starts the timer, it will count from FE to FF on the next clock. On the second clock, it will count from FF to 00 and generate an output. The output of the timer may set a flip-flop that the microprocessor can read, or it may generate an interrupt to the microprocessor, or both. The timer may stop once it has generated an output, or it may continue counting from 00 back to FF. The problem with a continuously running timer is that it will count from the loaded value the first time it counts up, but the second time it will start from 00.

3.3.1.1 Reloading a timer

This timer has an 8-bit latch to hold the value written by the microprocessor. When the microprocessor writes to the latch, it also loads the counter. An OR gate also loads the timer when it rolls over from FF to 00. For this example, we will assume that the logic in the IC gets all the polarities and timings of the load signal correct so that there are no glitches or race conditions. The way this timer works is that the microprocessor writes a value to the latch (also loading it into the timer) and then starts the timer. When the timer rolls over from FF to 00, it generates an output (again, either a latched bit for the microprocessor to read or an interrupt). At the same time that the output is generated, the timer is loaded from the latch contents. Since the latch still holds the value written by the microprocessor, the counter will start counting again from the same point it did before. Now the timer will produce a regular output with the same accuracy as the input clock. This output could be used to generate a regular interrupt, to provide a baud rate clock to a UART, or to provide a signal to any device that needs a regular pulse. A variation of this feature used in some microcontrollers does not load the counter with the desired count value but instead loads it into a digital comparator. The comparator compares the counter value to the value written by the microprocessor. The counter starts at zero and counts up. When the count equals the value written by the microprocessor, the counter is reset to zero and the process repeats. The effect is the same as the timer just described.

3.3.1.2 Input Capture Timer

In this case, the timer counts from zero to FF. When a pulse occurs on the capture input pin, the contents of the counter are transferred to an 8-bit latch and the counter is reset. The input pulse also generates an interrupt to the microprocessor. The timer is connected directly to the input pin; in an actual circuit, of course, there will be some gating and synchronizing logic to make sure all the timing is right. Similarly, the capture pin will not connect directly to a microprocessor interrupt but will be passed through some flip-flops, timing logic, interrupt controller logic, and so on. This configuration is typically used to measure the time between the leading edge of two pulses. The timer is run at a constant clock, usually a derivative of the microprocessor clock. Each time an edge occurs on the input capture pin, the processor is interrupted and the software reads the capture latch. The value in the latch is the number of clocks that occurred since the last pulse. Some microcontrollers do not reset the counter on an input capture but let the counter free run. In those configurations, the software must remember the previous reading and subtract the new reading from it. When the counter rolls over from FF to 00, the software must recognize that fact and correct the numbers; if it doesn’t, negative values will result. Many microcontrollers that provide a capturetype timer also provide a means for the counter to generate an interrupt when it rolls over, which can simplify this software task.

3.3.1.3 Watchdog Timer

The watchdog timer (WDT) acts as a safety net for the system. If the software stops responding or attending to the task at hand, the watchdog timer detects that something is amiss and resets the software automatically. The system might stop responding as a result of any number of difficult-to-detect hardware or firmware defects. For example, if an unusual condition causes a buffer over run that corrupts the stack frame, some function’s return address could be overwritten. When that function completes, it then returns to the wrong spot leaving the system utterly confused. Runaway pointers (firmware) or a glitch on the data bus (hardware) can cause similar crashes. Different external factors can cause “glitches.” For example, even a small electrostatic discharge

Page 93: Sisteme Integrate Ver5

near the device might cause enough interference to momentarily change the state of one bit on the address or data bus. Unfortunately, these kinds of defects can be very intermittent, making them easy to miss during the project’s system test stage. The watchdog timer is a great protector. Its sole purpose is to monitor the CPU with a “you scratch my back and I’ll scratch yours” kind of relationship. The typical watchdog has an input pin that must be toggled periodically (forexample, once every second). If the watchdog is not toggled within that period, it pulses one of its output pins. Typically, this output pin is tied either to the CPU’s reset line or to some nonmaskable interrupt (NMI), and the input pin is tied to an I/O line of the CPU. Consequently, if the firmware does not keep the watchdog input line toggling at the specified rate, the watchdog assumes that the firmware has stopped working, complains, and causes the CPU to be restarted.

3.3.1.4 Using Timers

Time-Based Temperature Measurement - An example that illustrates some of the important issues you must consider when using timers involves measurement of temperature. The Maxim MAX6576 is an IC that measures temperature. The MAX6576 has a single wire output and produces a square wave with a period that is proportional to temperature in degrees Kelvin. The MAX6576 can operate from -40°C to +125°C. By connecting the TSO and TS1 inputs to ground or Vcc in various combinations, the MAX6576 can be configured so that the period varies 10, 40, 160, or 640µs per degree. In the configuration shown, the period will vary by 40µs per degree. At 25° C, the period will be:

(25 + 273.15) x 40 = 11,926 microseconds, or 11.926ms Say you connect this to an microprocessor using input capture mode. Let’s supose the microprocessor is operating with a 4.096MHz crystal and using a prescaler of 256, so the timer gets a clock of 4.096MHz/256, or 16,000Hz. The counter increments every 62.5 µs. For this application, it doesn’t matter whether the input capture occurs on the rising or falling edge of the MAX6576 output. How accurately can you measure temperature with this arrangement? Since the MAX6576 changes 40µs per degree and the clock to the counter is 16,00OHz, each increment of the counter corresponds to 62.5/40 or 1.56 degrees. This is the best resolution you can get. If the temperature of the sensor is 25°C, the captured count value will be 11,926/62.5 = 190.8. Since the counter can only capture integral values, the actual count will be 190 (the .8 is dropped). For the count to be less than 190, the temperature must go to 23.7°C. Any changes between these two values cannot be read by the microprocessor. If we decide that this is insufficient accuracy for our application, we might change the prescaler to 1, making the counter clock the same as the CPU clock, 4.096MHz. Now the counter increments every 244.1ns, and the resolution is 244.1ns/40µs, or .0061 degrees per counter increment. This is much better accuracy than the sensor itself has. What happens in this configuration if the temperature goes from 25°C to 125°C? The count value will go from 11,926 to 15,926. This will result in a captured count of 65,232. The timer is 16 bits wide, so this is not a problem, but it is very close to the 65,535 upper limit of the counter. What happens at 125°C if we take the accuracy of the sensor itself into account? The MAX6576 has a typical accuracy of 35°C at 125°C, but the maximum error is +5°C. This means that, at 125°C, the output may actually indicate up to 130°C. At 130°C, the output period is 16126ms. This corresponds to a count value of 66,052, which means the timer we are using would roll over from 65,535 to zero while sampling. The actual count that would be captured would be 517, indicating a much lower temperature than the MAX6576 is actually sensing. There are several solutions to this specific problem: The timer prescaler could be changed, the configuration of the MAX6576 could be changed, or even the microprocessor crystal could be changed. You could leave the hardware as-is and handle the error in software by detecting the rollover. The important point is to perform this type of analysis when you use timers in microprocessor designs.

Page 94: Sisteme Integrate Ver5

Another issue that arises from this example is that of sampling time. The system can only sample the temperature at a rate equal to the period of the output. As the temperature goes up, the time between samples also goes up. If several samples need to be averaged, the sampling rate goes down proportionally. While a worstcase sample time of 16ms is probably not unreasonable for a temperature measurement system, an analysis of the effects of sample time should be performed in cases where the input rate of a signal affects it. Motor Control - Say you have a DC motor that is part of a microprocessor controlsystem. The motor has an encoder that produces 100 pulses per revolution, and the microprocessor must control the speed of the motor from 10RPM to 2000RPM. Some undefined external source provides a command to the microprocessor to set motor speed. At 10RPM, the microprocessor will get pulses from the motor encoder at the following frequency:

Sec

Pulses

Sec

Min

v

Pulses

Min

v6.16

60

1

Re100

Re10 =××

A similar calculation results in a frequency of 3333.33 pulses/sec at 2000RPM. If the input capture hardware is configured to generate an interrupt when the input pulse occurs, then the processor will get an interrupt every 60ms at lORPM, and every 300 ps at 2000 RPM. Say we want to calculate motor speed by using a microcontroller with input capture capability to measure the time between encoder pulses. If the input capture is measured with a 1MHz reference clock, then the input capture registers will contain 1 MHz/16.6Hz or 60,024 at 10RPM. Similarly, the registers will contain a value of 300 at 2000RPM. The l00 count encoder produces one pulse every 3.6 degrees of rotation (360/100). This is true at any motor speed. However, the input capture reference clock is fixed, so its accuracy (in degrees of rotation) vanes with the motor speed. At 10RPM, each reference clock corresponds to:

ckferenceCloDegreesPerckferenceClo

Sec

seEncoderPul

Deg

Sec

sesEncoderPulRe1060

Re1000000

16.366.16 6−

×=××

At 2000RPM, this becomes .012 degrees. While either of these is probably adequate for a motor control application, the principle is important; at faster RPM, the accuracy of the reference clock with respect to the input signal is less.

8. PWM Control

8.1 Examples and description

Page 95: Sisteme Integrate Ver5

Pulse-width modulation control works by switching the power supplied to the motor on and off very rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on (nearly 12V) and zero, giving the motor a series of power "kicks".

If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel momentum.

By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie, the time fraction it is "on", the average power can be varied, and hence the motor speed.

Advantages are,

1. The output transistor is either on or off, not partly on as with normal regulation, so less power is wasted as heat and smaller heat-sinks can be used. 2. With a suitable circuit there is little voltage loss across the output transistor, so the top end of the control range gets nearer to the supply voltage than linear regulator circuits. 3. The full-power pulsing action will run fans at a much lower speed than an equivalent steady voltage.

Disadvantages:

1. Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power supply is no longer continuous. 2. The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A clicking or growling vibration at PWM frequency can be amplified by case panels. 3. Some authorities claim the pulsed power puts more stress on the fan bearings and windings, shortening its life.

An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant and power is lost. Frequencies of 30-200Hz are commonly used.

A potentiometer is used to set a steady reference voltage (blue line).

A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it is switched off. This gives a square wave output to the fan motor.

If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth never reaches it, so output is zero. With a low reference, the comparator is always on, giving full power.

A simple PWM consists of an 8-bit up/down counter that counts from 00 to FF, then back down to 00 and an 8-bit comparator that compares the value in the 8-bit latch to the counter value. When the two values are equal, the comparator clocks the “D” flipflop (again, timing logic makes sure everything works correctly). If the counter is counting up, a “1” is clocked into the “D” flip-flop. If the counter is counting down, a “0” is loaded. The flip-flop output is connected to one of the microcontroller output pins. Say the microprocessor writes a value of 0xFE into the latch. The counter counts from 00 to FE, where the PWM output goes to “1” because the counter bits match the latched value. The counter continues to FF, then back down through FE to zero. When the counter passes through FE,

Page 96: Sisteme Integrate Ver5

the PWM output goes to zero. So in this case, the PWM output is high for two counts (FE and FF) out of 256, or about .78 percent duty cycle. If the microprocessor writes 0xF0 to the latch, the PWM output will be high from F0 to FF and back to F0, for a total of 30 counts or 11.7 percent duty cycle. A more sophisticated PWM timer would include a second latch and comparator so the counter can reverse direction at values other than FF. In such a timer, this comparator would set the frequency of the PWM signal while the other comparator would set the duty cycle. Some microprocessors provide other means to generate PWM. Some microcontrollers don’t use an up/down counter but instead they provide two comparators. After the first count value is reached, the counter is reset and the second comparator is used to indicate end-of-count. The output pin indicates which comparator is being used so a PWM output can be generated by controlling the ratios of the comparator values. PWM Output

Similar considerations apply to timer outputs. If you are using an 8-bit timer to generate a PWM signal, the output duty cycle can only be changed by one timer count, or 1 in 256. This results in a duty cycle resolution of .3 percent. Note, though, that this applies only if the timer is allowed to run a full 256 counts. If you are using an 8-bit timer but only 100 counts for the PWM period, then one step is 1 percent of the total period. In this case, the best resolution you can get is I percent. This is sufficient for many applications but is inadequate for others. In an application in which you vary the PWM period and duty cycle, you need to be sure that the resolution at the fastest period (least number of timer counts per cycle) is adequate for the application.

PWM Control

Pulse-width modulation control works by switching the power supplied to the motor on and off very rapidly. The DC voltage is converted to a square-wave signal, alternating between fully on (nearly 12V) and zero, giving the motor a series of power "kicks".

If the switching frequency is high enough, the motor runs at a steady speed due to its fly-wheel momentum.

By adjusting the duty cycle of the signal (modulating the width of the pulse, hence the 'PWM') ie, the time fraction it is "on", the average power can be varied, and hence the motor speed.

Advantages are,

Page 97: Sisteme Integrate Ver5

The output transistor is either on or off, not partly on as with normal regulation, so less power is wasted as heat and smaller heat-sinks can be used.

With a suitable circuit there is little voltage loss across the output transistor, so the top end of the control range gets nearer to the supply voltage than linear regulator circuits.

The full-power pulsing action will run fans at a much lower speed than an equivalent steady voltage.

Disadvantages:

Without adding extra circuitry, any fan speed signal is lost, as the fan electronics' power supply is no longer continuous.

The 12V "kicks" may be audible if the fan is not well-mounted, especially at low revs. A clicking or growling vibration at PWM frequency can be amplified by case panels. A way of overcoming this by "blunting" the square-wave pulse is described in Application Note #58 from Telcom. (a 58k pdf

file, right-click to download). I've tried this, it works, but some of advantage #3 is lost.

Some authorities claim the pulsed power puts more stress on the fan bearings and windings, shortening its life.

How It Works

An oscillator is used to generate a triangle or sawtooth waveform (green line). At low frequencies the motor speed tends to be jerky, at high frequencies the motor's inductance becomes significant and power is lost. Frequencies of 30-200Hz are commonly used.

A potentiometer is used to set a steady reference voltage (blue line).

A comparator compares the sawtooth voltage with the reference voltage. When the sawtooth voltage rises above the reference voltage, a power transistor is switched on. As it falls below the reference, it is switched off. This gives a square wave output to the fan motor.

If the potentiometer is adjusted to give a high reference voltage (raising the blue line), the sawtooth never reaches it, so output is zero. With a low reference, the comparator is always on, giving full power.

Principle

Page 98: Sisteme Integrate Ver5

An example of PWM: the supply voltage (blue) modulated as a series of pulses results in a sine-like flux density waveform (red) in a magnetic circuit of electromagnetic actuator. The smoothness of the resultant waveform can be controlled by the width and number of modulated impulses (per given cycle)

Fig. 1: a square wave, showing the definitions of ymin, ymax and D.

Pulse-width modulation uses a square wave whose pulse width is modulated resulting in the variation of the average value of the waveform. If we consider a square waveform f(t) with a low value ymin, a high value ymax and a duty cycle D (see figure 1), the average value of the waveform is given by:

As f(t) is a square wave, its value is ymax for and ymin for . The above expression then becomes:

This latter expression can be fairly simplified in many cases where ymin = 0 as . From this, it is obvious that the average value of the signal ( ) is directly dependent on the duty cycle D.

Page 99: Sisteme Integrate Ver5

Fig. 2: A simple method to generate the PWM pulse train corresponding to a given signal is the intersective PWM: the signal (here the green sinewave) is compared with a sawtooth waveform (blue). When the latter is less than the former, the PWM signal (magenta) is in high state (1). Otherwise it is in the low state (0).

The simplest way to generate a PWM signal is the intersective method, which requires only a sawtooth or a triangle waveform (easily generated using a simple oscillator) and a comparator. When the value of the reference signal (the green sine wave in figure 2) is more than the modulation waveform (blue), the PWM signal (magenta) is in the high state, otherwise it is in the low state.

Fig. 3 : Principle of the delta PWM. The output signal (blue) is compared with the limits (green). These limits correspond to the reference signal (red), offset by a given value. Every time the output signal reaches one of the limits, the PWM signal changes state.

Page 100: Sisteme Integrate Ver5

Fig. 4 : Principle of the sigma-delta PWM. The top green waveform is the reference signal, on which the output signal (PWM, in the middle plot) is subtracted to form the error signal (blue, in top plot). This error is integrated (bottom plot), and when the integral of the error exceeds the limits (red lines), the output changes state.

Fig. 5 : Three types of PWM signals (blue): leading edge modulation (top), trailing edge modulation (middle) and centered pulses (both edges are modulated, bottom). The green lines are the sawtooth signals used to generate the PWM waveforms using the intersective method.

Square wave is a unique function for many applications such as Pulse Width Modulation (PWM). PWM is widely used in a variety of applications in measurement and digital controls. It offers a simple method for digital control logic to create an analog equivalence.

The majority of microcontrollers today has built-in PWM capability that facilitates the implementation of the control. Using PWM in communication systems is very popular due to the fact that the digital signal is more robust and less vulnerable to noise.

Page 101: Sisteme Integrate Ver5

8.2 Concepts of Pulse Width Modulation (PWM)

PWM is a method of digitally encoding analog signal levels. The duty cycle of a square wave is modulated to encode a specific analog signal level using high-resolution counters. The PWM signal is still a digital signal because at the given instant of time, the full DC supply is either fully on or fully off.

The voltage or current source is supplied to the analog load by a repetitive series of ON and OFF pulses. The ON time is the period when the DC supply is applied to the load, and the OFF time is the period when the DC supply is switched off. If the available bandwidth is suffi cient, any analog value can be encoded using PWM.

An analog signal has a continuously varying value, with infinite resolution in both time and magnitude, and it can be used to control many electronic devices directly. For example, in a simple analog radio, a knob is connected to a variable resistor. When turning the knob, the resistance goes down or up, and the current fl owing through the resistor increases or decreases. Consequently, the current that drives the speaker is changed proportionally, thus increasing or decreasing the volume.

Although analog control may be considered intuitive and simple, it is not always economically attractive or practical. Analog circuits tend to drift over time and are very difficult to tune.

Problems solved by precision analog circuits can be large, heavy, and expensive.

Analog circuits tend to generate heat through the power dissipation. The power dissipated is proportional to the voltage across the active elements, multiplied by the current that flows through it. Analog circuitry can also be sensitive to noise because of its infinite resolution; even minor perturbations of an analog signal can change its value.

By controlling analog circuits digitally, system costs and power consumption can be drastically reduced. Many microcontrollers and digital signal processors (DSPs) already include PWM controllers in the chip, thus making implementation easier.

Frequency and Duty Cycle

Figure 1 illustrates a circuit established using a battery, a switch and a LED. This circuit turns on the

LED for one second and then turns off the LED for one second using the switch control.

The LED is ON for 50% of the period and OFF the other 50%. The period is defined as the total time it takes to complete one cycle (from OFF to ON state and back to OFF state).

Page 102: Sisteme Integrate Ver5

The signal can be further characterized by the duty cycle, which is the ratio of the “ON” time divided by the period. A high duty cycle generates a bright LED while a small duty cycle generates a dimmer LED. The example shown in Figure 1 provides a 50% duty cycle.

In Figure 2, two waveforms with different frequencies produce the same amount of light. Note that the

amount of light is independent from the frequency, but proportional to the duty cycle.

The frequency range you can use to control a circuit is limited by the number of response time to the circuit.

In the example shown in Figure 1, a low frequency can cause the LED to flash noticeably. A high frequency, in turn, can cause an inductive load to saturate.

For example, a transformer has a limited frequency range to transfer the energy efficiently. For some designs, harmonics (or beat frequencies) of the PWM frequency can get coupled into the analog circuitry, causing unwanted noise. If the right frequency is selected, the load being controlled will act as a stabilizer, a light will glow continuously and the momentum will allow a rotor to turn smoothly.

Generating PWM signals

The PWM signals are easy to generate using a comparator with a sine wave as one of the input signals. Figure 3 shows a sample block diagram of an analog PWM generator.

Figures 4 and 5 show the PWM output waveform (red line) generated by a comparator with two input signals: a sine wave (black line) and an input signal (gray line). The input signal of 0.5 VDC is the voltage reference to be compared with the sine wave to produce a PWM waveform.

Page 103: Sisteme Integrate Ver5

With the steady-state reference voltage of 0.5 VDC, a PWM waveform with 50% duty cycle is generated.

If the reference voltage decreases to 0.25 VDC, the generated PWM waveform will have a higher duty cycle, as shown in Figure 5

Advantages of Using PWM Application

PWM offers several advantages over an analog control. For example, using PWM to control the brightness of a lamp, the heat dissipated from the lamp is less than the heat generated from an analog control that converts the current to heat. Hence, less power is delivered to the load (light), which will prolong the life cycle of the load.

With a higher frequency rate, the light (load) brightness can be controlled as smoothly as an analog control.

Rotors can operate at a lower speed if they are controlled by PWM. Some of the rotors might not function with low analog current. When an analog current controls a rotor, it will not produce significant torque at low speed. The magnetic field that is created by the small current is insufficient to turn the rotor. On the other hand, a PWM current can create short pulses of magnetic flux at full strength that enables the rotor to turn at a slow speed.

Combining ON/OFF (1/0) states with the variety voltage and the duty cycle, PWM can output at a desired voltage level. Thus, it can be used as voltage regulator for many applications. When the desired voltage level is higher than the output voltage level, the state will be ON (1). On the other hand, the state will be OFF (0) when the desired voltage level is lower than the output voltage level. For example, PWM can be applied when CPLD is used for simple voltage regulation or with a FPGA for complex control algorithms using its internal DSP blocks.

In addition, the entire control circuit can be digitized using the PWM technique. This eliminates the need to use digital-to-analog converters in control circuitries. The digital control lines generated by PWM reduce the susceptibility of your circuit to the interference.

The technology has become more pervasive as PWM controls are incorporated into low cost microcontrollers. Microcontrollers offer simple commands to vary the duty cycle and frequencies of the

Page 104: Sisteme Integrate Ver5

PWM control signal. PWM is also widely used in the communications fi eld because the digital signals are extremely immune to noise.

The popularity of PWM will continue to grow as the functionality becomes more popular in microcontrollers and development tools. Hence, having profound knowledge of PWM will make it easier to incorporate in your designs and works.

In addition, when working on a PWM design, a U1252A handheld DMM can be a great tool for creating a waveform.

Glossary

Duty cycle – the percentage of time of a pulse train at its higher voltage

Period – total time taken before the signal repeats

Pulse width – total time during which the pulse is in the “true state”

8.3 PWM Study

În cele ce urmează ne propunem descrierea in detaliu a controlului PWM.

1. De ce PWM Pentru a conduce sisteme continuale este necesar sa furnizăm semnale de control continue în timp. În practica reglării numerice aceasta se face folosind convertoare analog-numerice (CAN). Această opţiune este relativ scumpă, iar în practica sistemelor embedded este evitată. PWM s-a impus ca o metodă de generare a unor semnale de control pentru instalaţii continuale folosind ieşiri numerice, disponibile în număr mare pe orice micrcontroller.

Deşi ieşirile numerice oferă numai informaţie logică, variabila timp este cea folosită de implementarea PWM pentru emularea unui semnal analogic.

Pentru exemplificare, vom efectua nişte experimente în Matlab/Simulink

comanda

numerica

comanda

continuala

Transfer Fcn 1

1

s+1

Transfer Fcn

1

s+1

Scope

Pulse

Generator

Constant

0.5

Page 105: Sisteme Integrate Ver5

Modelul de mai sus implementează cele 2 tipuri de control. Pe de o parte avem de-aface cu un generator de impulsuri (configurabil) care implementează o comandă PWM. Pe de altă parte, acelaşi sistem este comandat de o mărime de comandă continuală, o mărime de comandă constantă care reprezintă valoarea factorului de umplere (Duty Cycle, Pulse Width). În exemplul de mai sus, valoarea este 0.5 (deoarece Pulse Width = 50%)

2. Cum se configurează un PWM După cum se observă din interfaţa de dialog a generatorului de impulsuri, există 3 parametri imprtanţi:

a. amplitudinea impulsului b. perioada impulsului c. laţimea impulsului (Pulse Width, factorul de umplere) Rezultatul unei simulari este:

a. Amplitudinea impulsului este data de valoarea High a ieşirii numerice, fiind un dat constructiv. Teoretic, reprezintă valoarea maximă a comenzii aplicate instalaţiei. Cu alte cuvinte, daca vom alege un factor de umplere de 100%, vom aplica o marime continuă cu amplitudinea 1. b. Perioada impulsului reprezintă un parametru al PWM care trebuie adaptat la dinamica sistemului controlat. În exemplul nostru, se observă ca distanţa în timp între valorile extreme ale oscilaţiilor este de 2 secunde. Dacă vom păstra ceilalţi parametri dar vom reduce perioada la 0.4 sec, vom obţine:

Page 106: Sisteme Integrate Ver5

Cu alte cuvinte, prin reducerea perioadei PWM, sistemul condus va avea amplitudini ale oscilaţiilor mai mici, dar la frecvenţe mai mari, cu consecinţe asupra elementelor de execuţie. Pentru a elimina acest neajuns, se preferă introducerea unui filtru la intrarea instalaţiei:

comanda

numerica

comanda

continuala

Transfer Fcn 1

1

s+1

Transfer Fcn

1

s+1

Scope

Pulse Generator

Pulse Width = 50 %

PWM period = 0.4

Filtru PWM

1

.2s+1

Constant

0.5

Acest filtru PWM trebuie sa nu influenţeze dinamica sistemului comandat, deci constanta sa de timp trebuie sa fie mai mică (de 5 ori în cazul nostru) decât a sistemului comandat.

Cea mai bună reţetă este să alegem un filtru cu o constantă de timp de 10 ori mai mică decât a sistemului comandat (dinamica sistemului controlat nu este influenţată prea mult de filtrul PWM), iar perioada PWM să fie de 10 ori mai mică decât a filtrului (buna filtrare a PWM), deci de 100 de ori mai mică decât a sistemului comandat:

comanda

numerica

comanda

continuala

Transfer Fcn 1

1

s+1

Transfer Fcn

1

s+1

Scope

Pulse Generator

Pulse Width = 50%

PWM period = 0.01

Filtru PWM

1

.1s+1

Constant

0.5

Page 107: Sisteme Integrate Ver5

Se observă cum oscilaţiile date de PWM a dispărut.

De fapt, analiza Fourier ne oferă răspunsurile la problematica de mai sus.

Semnalul PWM x(t) periodic se poate reprezenta (sintetiza) astfel:

[ ]∑∞

=

ω+ω+=1n

PWMnPWMn0PWM )tnsin(S)tncos(CC)t(x

unde PWM

PWMPWM T

2f2

π=π=ω .

Formulele de calcul ale coeficienţilor în acest caz sunt date de:

∫=

PWMT

PWM

PWM

0 dt)t(xT

1C

care reprezintă aşa-numita componentă continuă a semnalului, şi

∫ ⋅ω⋅=

PWMT

PWMPWM

PWM

n dt)tncos()t(xT

2C

∫ ⋅ω⋅=

0T

PWMPWM

PWM

n dt)tnsin()t(xT

2S

Prin ( )∫PWMT

dt* înţelegem ( )∫PWMT

0

dt* iar PWMT este perioada semnalului periodic xPWM(t).

-40

-35

-30

-25

-20

-15

-10

-5

0

Magn

itud

e (

dB

)

10-2

10-1

100

101

102

-90

-45

0

Ph

as

e (

de

g)

Bode Diagram

Frequency (rad/sec)

Page 108: Sisteme Integrate Ver5

În diagrama Bode de mai sus, am reprezentat caracteristica de frecvenţă a instalaţiei aperiodice, în care am considerat sec]/rad[10 1

PWM−

=ω . Se observă cum caracteristica de frecvenţă a instalaţiei va

lăsa sa treacă la ieşire primele 16 frecvenţe din desvoltarea Fourier.

Dacă vom mări frecvenţa PWM-ului, de ex. la valoarea sec]/rad[104 1PWM

−⋅=ω , mai puţine frecvenţe

se vor regăsi la ieşirea instalaţiei.

-40

-35

-30

-25

-20

-15

-10

-5

0

Magn

itud

e (

dB

)

10-2

10-1

100

101

102

-90

-45

0

Ph

as

e (

de

g)

Bode Diagram

Frequency (rad/sec)

Dacă vom mari valoarea frecvenţei PWM la sec]/rad[10PWM =ω , vom observa că instalaţia va filtra

practic toate frecvenţele din dezvoltarea Fourier, la ieşire regăsindu-se numai componenta continuă, care se foloseşte pentru comandă

-40

-35

-30

-25

-20

-15

-10

-5

0

Magn

itud

e (

dB

)

10-2

10-1

100

101

102

-90

-45

0

Ph

as

e (

de

g)

Bode Diagram

Frequency (rad/sec)

c. Pulse Width (factorul de umplere) al PWM acţionează asupra valorii medii a comenzii. Un Pulse Width de 70% va conduce sistemul la un regim staţionar variabil în jurul valorii 0.7.

Într-adevăr, cum ∫=

PWMT

PWM

PWM

0 dt)t(xT

1C este componenta continuă din dezvoltarea Fourier, asupra

oricărui sistem va acţiona această componentă împreună cu celelate frcvenţe multiplu ale frecvenţei de bază. Dacă aceste frecvenţe sunt filtrate de către instalaţie (cum este cazul de mai sus), atunci asupra sistemului controlat va acţiona numai componenta continuă, exact ca în cazul unui CNA.

Page 109: Sisteme Integrate Ver5

comanda

numerica

comanda

continuala

Transfer Fcn 1

1

s+1

Transfer Fcn

1

s+1

Scope

Pulse

Generator

Pulse width = 70 %

PWM period = 0.1

Constant

0.7

3. Cum se implementează un sistem de reglare cu PWM? Principial, PWM se implementează pentru a emula un CAN care lipseşte în aplicaţiile embedded. Schema generală a unui sistem de reglare cu comandă PWM este prezentată mai jos.

Sistem comandat cu PWM Generator PWM

Factor Umplere (Duty Cycle)

Scalare 0%...100%

Regulator (PID)

Emulare CAN

Algoritmul de reglare numeric, de ex. un PID, va modula unul dintre parametrii generatorului PWM. Cum amplitudinea şi perioada PWM sunt parametri fixaţi fie tehnologic (amplitudine TTL pentru semnal numeric: 0/5V, respectiv perioada PWM), parametrul folosit în modulaţie este Pulse Width (de unde şi numele PWM). Comanda regulatorului se scalează pentru a oferi un factor de umplere 0%...100% către generatorul PWM.

4. Cum se alege perioada de eşantionare pentru un SRA bazat pe PWM? Teoretic, cănd implementăm un SRA pe un sistem embedded, algoritmul de reglare se calculează din modelul dinamic al instalaţiei tehnologice pe baza unei anumite prerioade de eşantionare. Din p.d.v. al implementării, practic nu există nici o legătură între perioada PWM (TPWM) şi perioada de eşantionare Te.

Page 110: Sisteme Integrate Ver5

Constructiv, generatorul PWM se informează la începutul perioadei de valoarea factorului de umplere, după care generează semnalul PWM. Procesul se repetă cu o periodicitate dictată de TPWM.

Se pot face anumite consideraţii:

1. Cazul TPWM > Te

Te

TPWM

În acest caz, sistemul de reglare citeşte valorile erorii şi calculează comanda cu o frecvenţă mai mare decât a PWM-ului. Se observă că anumite valori ale comenzii sunt calculate inutil de către regulator.

2. TPWM < Te

Te

TPWM

În acest caz, PWM se informează inutil câteodată despre valoare afactorului de umplere, care se modifică cu o frecvenţă mai mică.

Rezultă că cel mai favorabil caz este acela în care Te = TPWM. În acest caz, fiecare comandă calculată de regulator va influenţa, cu o întârziere constantă, valoarea PWM.

În celelalte cazuri, se observă o variaţie a momentelor la care comanda de ieşire se poate modifica efectiv (jitter) faţă de momentele k*Te la care comenzile se calculează, fapt care poate influenţa negativ calitatea reglării.

Page 111: Sisteme Integrate Ver5

Te

TPWM

9. DAC and ADC

9.1 Digital-to-Analog Converters (DAC)

A DAC is a hardware device that takes a set of bits, typically from a processor, as input and produces an analog signal proportional to the digital input as output. Digital to analog converters might be as simple as an array of resistors configured in the typical 'R-2R' fashion or a hybrid module that generates very precise results with many bits of resolution. In an ideal DAC, the numbers are output as a sequence of impulses that are then filtered by a reconstruction filter. This would, in principle, reproduce a sampled signal precisely up to the Nyquist frequency, although in practice a perfect reconstruction filter cannot be practically constructed as it has infinite phase delay; and there are errors due to quantisation. The Pulse Width Modulator is the simplest DAC. A stable current or voltage is switched into a low pass analog filter with a duration determined by the digital input code. This technique is often used for electric motor speed control, and is now becoming common in high-fidelity audio. DACs are at the beginning of the analog signal chain, which makes them very important to system performance. The most important characteristics of these devices are: - Resolution: This is the number of possible output levels the DAC is designed to reproduce. This is usually stated as the number of bits it uses, which is the base two logarithm of the number of levels. For instance a 1 bit DAC is designed to reproduce 2 (21) levels while an 8 bit DAC is designed for 256 (28) levels. Resolution is related to the Effective Number of Bits which is a measurement of the actual resolution attained by the DAC. - Maximum sampling frequency: This is a measurement of the maximum speed at which the DACs circuitry can operate and still produce the correct output. As stated in the Shannon-Nyquist sampling theorem, a signal must be sampled at over twice the frequency of the desired signal. For instance, to reproduce signals in the entire audible spectrum, which includes frequencies of up to 20 kHz, it is necessary to use DACs that operate at over 40 kHz. - Monotonicity: This refers to the ability of DACs analog output to increase with an increase in digital code or the converse. This characteristic is very important for DACs used as a low frequency signal source or as a digitally programmable trim element. - THD+N: This is a measurement of the distortion and noise introduced to the signal by the DAC. It is expressed as a percentage of the total power of unwanted harmonic distortion and noise that accompany the desired signal. This is a very important DAC characteristic for dynamic and small signal DAC applications.

Page 112: Sisteme Integrate Ver5

- Dynamic range: This is a measurement of the difference between the largest and smallest signals the DAC can reproduce expressed in Decibels. This is usually related to DAC.

9.2 Analog-to-Digital Converters (ADC)

Analog-todigital converters (ADCs) do the exact opposite of DACs-they output a binary word that is a digital representation of an analog voltage or current. An 8-bit ADC converts an input into 256 steps. A 10-bit ADC produces 1024 steps. DACs and ADCs interface to a microprocessor just like other peripheral ICs. Parts are available with different bus interface types, including SPI and 1%. While the microprocessor side of a DAC or an ADC is the same as other parts, there are some special considerations when dealing with these analog devices, which we’ll discuss in this section.

9.2.1 Reference Voltage

The reference voltage is the maximum value that the ADC or DAC can convert. An 8-bit ADC can convert values from 0V to the reference voltage. This voltage range is divided into 256 values, or steps. The size of the step is given by the following equation:

mVVorVtageferenceVol

5.190195.256

5

256

Re==

This is the step size of the converter. It also defines the converter’s resolution. Note that no ADC or DAC can be more accurate than its reference. If your reference is a zener diode with a 10 percent tolerance, it doesn’t matter how many bits of resolution you have, your product will have a 10 percent variation between units unless you perform some kind of calibration as part of production. Some microcontrollers have internal ADCs. Many of these permit you to provide an external reference, or they let you use the supply voltage as the reference. This typically frees the reference pin for use as another analog input. If the supply voltage is used as a reference and the supply voltage is 5V, measuring a 3V input would produce the following result:

Digital word = (Vin/Vref) x 255 = (3V/5V) x 255 = 15310 = 9916 However, the result depends on the value of the 5V supply. If the supply voltage is high by 1 percent, it has a value of 5.05V. Now the value of the A/D conversion will be: (3V/5.05V) x 255 = 15110 = 9716 So a 1 percent change in the supply voltage causes the conversion result to change by two counts. Typical power supplies can vary by 2 or 3 percent, so power supply variations can have a significant effect on the results. The power supply output can vary with loading, especially if there is any significant drop in the cabling that connects the power supply to the microprocessor board. Thus, if your design needs all the analog inputs and cannot use an external reference, be sure power supply variations will not cause accuracy problems. One way to minimize such errors is to power the measured signal from the microcontroller supply.

9.2.1 Resolution

The resolution of an ADC or DAC is determined by the reference input and by the word width. The resolution defines the smallest voltage change that can be converted. As mentioned earlier, the resolution is the same as the smallest step size and can be calculated by dividing the reference voltage by the number of possible conversion values.

Page 113: Sisteme Integrate Ver5

For the example we’ve been using so far, an 8-bit ADC with a 5V reference, the resolution is .0195V (19.5mV). This means that any input voltage below 19.5mV will result in an output of zero. Input voltages between 19.5 mV and 39 mVwill result in an output of 1. Between 39 mV and 58.6 mV, the output will be 3. Resolution can be improved by reducing the reference input. Changing from 5V to 2.5V gives a resolution of 2.5/256, or 9.7mV. However, the maximum voltage that can be measured is now 2.5V instead of 5V. The only way to increase resolution without changing the reference is to use an ADC with more bits. A 10-bit ADC using a 5V reference has 21°, or 1024 possible output codes. Thus, the resolution is 5\3/1024, or 4.88mV. The resolution also has implications for system design, especially in the area of noise. A 0-to-5V, 10-bit ADC with 4.88mV resolution will respond to 4.88mV of noise just like it will to a DC input of 4.88mV. If your input signal has 10mV of noise, you will not get anything like 10 bits of precision unless you take a number of samples and average them. This means you either have to insure a very quiet input or allow time for multiple samples.

10. Communication

10.1 UART

Serial transmission of digital information (bits) through a single wire or other medium is much more cost effective than parallel transmission through multiple wires. A UART is used to convert the transmitted information between its sequential and parallel form at each end of the link. Each UART contains a shift register which is the fundamental method of conversion between serial and parallel forms. The UART usually does not directly generate or receive the external signals used between different items of equipment. Typically, separate interface devices are used to convert the logic level signals of the UART to and from the external signaling levels.

10.1.1 Synchronous Serial Transmission

Synchronous serial transmission requires that the sender and receiver share a clock with one another, or that the sender provide a strobe or other timing signal so that the receiver knows when to “read” the next bit of the data. In most forms of serial Synchronous communication, if there is no data available at a given instant to transmit, a fill character must be sent instead so that data is always being transmitted. Synchronous communication is usually more efficient because only data bits are transmitted between sender and receiver, and synchronous communication can be more costly if extra wiring and circuits are required to share a clock signal between the sender and receiver. A form of Synchronous transmission is used with printers and fixed disk devices in that the data is sent on one set of wires while a clock or strobe is sent on a different wire. Printers and fixed disk devices are not normally serial devices because most fixed disk interface standards send an entire word of data for each clock or strobe signal by using a separate wire for each bit of the word. The standard serial communications hardware in the PC does not support Synchronous operations.

Page 114: Sisteme Integrate Ver5

10.1.2 Asynchronous Serial Transmission

Asynchronous transmission allows data to be transmitted without the sender having to send a clock signal to the receiver. Instead, the sender and receiver must agree on timing parameters in advance and special bits are added to each word which is used to synchronize the sending and receiving units. When a word is given to the UART for Asynchronous transmissions, a bit called the "Start Bit" is added to the beginning of each word that is to be transmitted. The Start Bit is used to alert the receiver that a word of data is about to be sent, and to force the clock in the receiver into synchronization with the clock in the transmitter. These two clocks must be accurate enough to not have the frequency drift by more than 10% during the transmission of the remaining bits in the word. After the Start Bit, the individual bits of the word of data are sent, with the Least Significant Bit (LSB) being sent first. Each bit in the transmission is transmitted for exactly the same amount of time as all of the other bits, and the receiver “looks” at the wire at approximately halfway through the period assigned to each bit to determine if the bit is a 1 or a 0. For example, if it takes two seconds to send each bit, the receiver will examine the signal to determine if it is a 1 or a 0 after one second has passed, then it will wait two seconds and then examine the value of the next bit, and so on. The sender does not know when the receiver has “looked” at the value of the bit. The sender only knows when the clock says to begin transmitting the next bit of the word. When the entire data word has been sent, the transmitter may add a Parity Bit that the transmitter generates. The Parity Bit may be used by the receiver to perform simple error checking. Then at least one Stop Bit is sent by the transmitter. When the receiver has received all of the bits in the data word, it may check for the Parity Bits (both sender and receiver must agree on whether a Parity Bit is to be used), and then the receiver looks for a Stop Bit. If the Stop Bit does not appear when it is supposed to, the UART considers the entire word to be garbled and will report a Framing Error to the host processor when the data word is read. The usual cause of a Framing Error is that the sender and receiver clocks were not running at the same speed, or that the signal was interrupted. Regardless of whether the data was received correctly or not, the UART automatically discards the Start, Parity and Stop bits. If the sender and receiver are configured identically, these bits are not passed to the host. If another word is ready for transmission, the Start Bit for the new word can be sent as soon as the Stop Bit for the previous word has been sent. Because asynchronous data is “self synchronizing,” if there is no data to transmit, the transmission line can be idle.

10.2 RS232

Due to it’s relative simplicity and low hardware overhead (as compared to parallel interfacing), serial communications is used extensively within the electronics industry. Today, the most popular serial communications standard in use is certainly the EIA/TIA–232–E specification. This standard, which has been developed by the Electronic Industry Association and the Telecommunications Industry Association (EIA/TIA), is more popularly referred to simply as “RS–232” where “RS” stands for “recommended standard”. In recent years, this suffix has been replaced with “EIA/TIA” to help identify the source of the standard. The official name of the EIA/TIA–232–E standard is “Interface Between Data Terminal Equipment and Data Circuit–Termination Equipment Employing Serial Binary Data Interchange”.

Page 115: Sisteme Integrate Ver5

Although the name may sound intimidating, the standard is simply concerned with serial data communication between a host system (Data Terminal Equipment, or “DTE”) and a peripheral system (Data Circuit–Terminating Equipment, or “DCE”). The EIA/TIA–232–E standard which was introduced in 1962 has been updated four times since its introduction in order to better meet the needs of serial communication applications. The letter “E” in the standard’s name indicates that this is the fifth revision of the standard.

RS–232 SPECIFICATIONS

RS–232 is a “complete” standard. This means that the standard sets out to ensure compatibility between the host and peripheral systems by specifying 1) common voltage and signal levels, 2) scommon pin wiring configurations, and 3) a minimal amount of control information between the host and peripheral systems. Unlike many standards which simply specify the electrical characteristics of a given interface, RS–232 specifies electrical, functional, and mechanical characteristics in order to meet the above three criteria. Because the functional characteristics of the interface are covered by the standard this essentially means that RS–232 has defined the function of the different signals that are used in the interface. These signals are divided into four different categories: common, data, control, and timing. Table 1 illustrates the signals that are defined by the RS–232 standard.

Table 1. – RS 232 Defined Signals

As can be seen from the table there is an overwhelming number of signals defined by the standard. The standard provides an abundance of control signals and supports a primary and secondary communications channel. Fortunately few applications, if any, require all of these defined signals. For example, only eight signals are used for a typical modem. Some simple applications

Page 116: Sisteme Integrate Ver5

may require only four signals (two for data and two for handshaking) while others may require only data signals with no handshaking. The third area covered by RS–232 concerns the mechanical interface. In particular, RS–232 specifies a 25–pin connector. This is the minimum connector size that can accommodate all of the signals defined in the functional portion of the standard. The pin assignment for this connector is shown in Figure 1.6. The connector for DCE equipment is male for the connector housing and female for the connection pins. Likewise, the DTE connector is a female housing with male connection pins. Although RS–232 specifies a 25–position connector, it should be noted that often this connector is not used. This is due to the fact that most applications do not require all of the defined signals and therefore a 25–pin connector is larger than necessary. This being the case, it is very common for other connector types to be used. Perhaps the most popular is the 9–position DB9S connector which is also illustrated in Figure 1.6. This connector provides the means to transmit and receive the necessary signals for modem applications, for example. This will be discussed in more detail later.

Figure 1.6 – RS 232 Connector Pin Assignments

Most systems designed today do not operate using RS–232 voltage levels. Since this is the case, level conversion is necessary to implement RS–232 communication. Level conversion is performed by special RS–232 IC’s. These IC’s typically have line drivers that generate the voltage levels required by RS–232 and line receivers that can receive RS–232 voltage levels without being damaged. These line drivers and receivers typically invert the signal as well since a logic 1 is represented by a low voltage level for RS–232 communication and likewise a logic 0 is represented by a high logic level. Figure 1.7 illustrates the function of an RS–232 line driver/receiver in a typical modem application. In this particular example, the signals necessary for serial communication are generated and received by the Universal Asynchronous Receiver/Transmitter (UART). The RS–232 line driver/receiver IC performs the level translation necessary between the CMOS/TTL and RS–232 interface. The UART just mentioned performs the “overhead” tasks necessary for asynchronous serial communication. For example, the asynchronous nature of this type of communication usually requires that start and stop bits be initiated by the host system to indicate to the peripheral system when communication will start and stop. Parity bits are also often employed to ensure that the data sent has not been corrupted. The UART usually generates the start, stop, and parity bits when transmitting data and can detect communication errors upon receiving data. The UART also functions as the intermediary between byte–wide (parallel) and bit–wide

Page 117: Sisteme Integrate Ver5

(serial) communication; it converts a byte of data into a serial bit stream for transmitting and converts a serial bit stream into a byte of data when receiving.

Figure 1.7 – Typical RS 232 Modem Application

Now that an elementary explanation of the TTL/CMOS to RS–232 interface has been provided we can consider some “real world” RS–232 applications. It has already been noted that RS–232 applications rarely follow the RS–232 standard precisely. Perhaps the most significant reason this is true is due to the fact that many of the defined signals are not necessary for most applications. As such, the unnecessary signals are omitted. Many applications , such as a modem, require only nine signals (two data signals, six control signals, and ground). Other applications may require only five signals (two for data, two for handshaking, and ground), while others may require only data signals with no handshake control. We will begin our investigation of “real world” implementations by first considering the typical modem application.

RS–232 IN MODEM APPLICATIONS

Modem applications are one of the most popular uses for the RS–232 standard. Figure 1.8 illustrates a typical modem application utilizing the RS–232 interface standard. As can be seen in the diagram, the PC is the DTE and the modem is the DCE. Communication between each PC and its associated modem is accomplished using the RS–232 standard. Communication between the two modems is accomplished via telecommunication. It should be noted that although a microcomputer is usually the DTE in RS–232 applications, this is not mandatory according to a strict interpretation of the standard.

Page 118: Sisteme Integrate Ver5

Figure 1.8 – Modem Communication Between Two PC’s

10.3 Serial Peripheral Interface

The Serial Peripheral Interface Bus or SPI bus is a synchronous serial data link standard named by Motorola that operates in full duplex mode. Devices communicate in master/slave mode where the master device initiates the data frame. Multiple slave devices are allowed with individual slave select (chip select) lines. Sometimes SPI is called a "four wire" serial bus, contrasting with three, two, and one wire serial buses. In the standard configuration for a slave device (see Figure 1.9), two control and two data lines are used. The data output SDO serves on the one hand the reading back of data, offers however also the possibility to cascade several devices. The data output of the preceding device then forms the data input for the next IC.

Figure 1.9 – SPI slave

There is a MASTER and a SLAVE mode. The MASTER device provides the clock signal and determines the state of the chip select lines, i.e. it activates the SLAVE it wants to communicate with. CS and SCKL are therefore outputs. The SLAVE device receives the clock and chip select from the MASTER, CS and SCKL are therefore inputs. This means there is one master, while the number of slaves is only limited by the number of chip selects.

Page 119: Sisteme Integrate Ver5

A SPI device can be a simple shift register up to an independent subsystem. The basic principle of a shift register is always present. Command codes as well as data values are serially transferred, pumped into a shift register and are then internally available for parallel processing. Here we already see an important point, which must be considered in the philosophy of SPI bus systems: The length of the shift registers is not fixed, but can differ from device to device. Normally the shift registers are 8Bit or integral multiples of it. Of course there also exist shift registers with an odd number of bits. For example two cascaded 9Bit EEPROMs can store 18Bit data. If a SPI device is not selected, its data output goes into a high-impedance state (hi-Z), so that it does not interfere with the currently activated devices. When cascading several SPI devices, they are treated as one slave and therefore connected to the same chip select. Thus there are two meaningful types of connection of master and slave devices. Figure 1.10 shows the type of connection for cascading several devices.

Figure 1.10 – Cascading several SPI devices

In Figure 1.10 the cascaded devices are evidently looked at as one larger device and receive therefore the same chip select. The data output of the preceding device is tied to the data input of the next, thus forming a wider shift register. If independent slaves are to be connected to a master another bus structure has to be chosen, as shown in Figure 1.11. Here, the clock and the SDI data lines are brought to each slave. Also the SDO data lines are tied together and led back to the master. Only the chip selects are separately brought to each SPI device.

Page 120: Sisteme Integrate Ver5

Figure 1.11 – Master with independent slaves

Last not least both types may be combined. It is also possible to connect two micro controllers via SPI. For such a network, two protocol variants are possible. In the first, there is only one master and several slaves and in the second, each micro controller can take the role of the master. For the selection of slaves again two versions would be possible but only one variant is supported by hardware. The hardware supported variant is with the chip selects, while in the other the selection of the slaves is done by means of an ID packed into the frames. The assignment of the IDs is done by software. Only the selected slave drives its output, all other slaves are in high-impedance state. The output remains active as long as the slave is selected by its address. The first variant, named single-master protocol resembles the normal master-slave communication. The micro controller configured as a slave behaves like a normal peripheral device. The second possibility works with several masters and is therefore named multi-master protocol. Each micro processor has the possibility to take the roll of the master and to address another micro processor. One controller must permanently provide a clock signal. The MC68HC11 provides hardware error recognition, useful in multiple-master systems. There are two SPI system errors. The first occurs if several SPI devices want to become master at the same time. The other is a collision error that occurs for example when SPI devices work with different polarities. More details can be found in the MC68HC11 manual. The SPI requires two control lines (CS and SCLK) and two data lines (SDI and SDO). Motorola names these lines MOSI (Master-Out-Slave-In) and MISO (Master-In-Slave-Out). The chip select line is named SS (Slave-Select). With CS (Chip-Select) the corresponding peripheral device is selected. This pin is mostly active-low. In the unselected state the SDO lines are hi-Z and therefore inactive. The master decides with which peripheral device it wants to communicate. The clock line SCLK is brought to the device whether it is selected or not. The clock serves as synchronization of the data communication. The majority of SPI devices provide these four lines. Sometimes it happens that SDI and SDO are multiplexed, for example in the temperature sensor LM74 from National Semiconductor or that one of these lines is missing. A peripheral device which must or cannot be configured, requires no input line, only a data output. As soon as it gets selected it starts sending data. In some ADCs therefore the SDI line is missing (e.g. MCCP3001 from Microchip). There are also devices that have no data output. For example LCD controllers (e.g. COP472-3 from National Semiconductor), which can be configured, but cannot send data or status messages.

Page 121: Sisteme Integrate Ver5

Because there is no official specification, what exactly SPI is and what not, it is necessary to consult the data sheets of the components one wants to use. Important are the permitted clock frequencies and the type of valid transitions. There are no general rules for transitions where data should be latched. Although not specified by Motorola, in practice four modes are used. These four modes are the combinations of CPOL and CPHA. In table 2, the four modes are listed.

SPI-mode CPOL CPHA

0

1

2

3

0

0

1

1

0

1

0

1

Table 2 – SPI Modes If the phase of the clock is zero, i.e. CPHA = 0, data is latched at the rising edge of the clock with CPOL = 0, and at the falling edge of the clock with CPOL = 1. If CPHA = 1, the polarities are reversed. CPOL = 0 means falling edge, CPOL = 1 rising edge. The micro controllers from Motorola allow the polarity and the phase of the clock to be adjusted. A positive polarity results in latching data at the rising edge of the clock. However data is put on the data line already at the falling edge in order to stabilize. Most peripherals which can only be slaves, work with this configuration. If it should become necessary to use the other polarity, transitions are reversed.

10.4 Local Interconnect Network (LIN)

Many mechanical components in the automotive sector have been replaced or are now being replaced by intelligent mechatronical systems. A lot of wires are needed to connect these components. To reduce the amount of wires and to handle communications between these systems, many car manufacturers have created different bus systems that are incompatible with each other. In order to have a standard sub-bus, car manufacturers in Europe have formed a consortium to define a new communications standard for the automotive sector. The new bus, called LIN bus, was invented to be used in simple switching applications like car seats, door locks, sun roofs, rain sensors, mirrors and so on. The LIN bus is a sub-bus system based on a serial communications protocol. The bus is a single master / multiple slave bus that uses a single wire to transmit data. To reduce costs, components can be driven without crystal or ceramic resonators. Time synchronization permits the correct transmission and reception of data. The system is based on a UART / SCI hardware interface that is common to most microcontrollers. The bus detects defective nodes in the network. Data checksum and parity check guarantee safety and error detection.

Features and possibilities

The LIN is a serial communications protocol which efficiently supports the control of mechatronics nodes in distributed automotive applications. The main properties of the LIN bus are: • Single master with multiple slave’s concept • Low cost silicon implementation based on common UART/SCI interface hardware, an equivalent in software or as pure state machine. • Self synchronization without a quartz or ceramics resonator in the slave nodes • Deterministic signal transmission with signal propagation time computable in advance • Low cost single-wire implementation • Speed up to 20 Kbit/s. • Signal based application interaction • Predictable behavior

Page 122: Sisteme Integrate Ver5

• Reconfigurability • transport layer and diagnostic support

Concept of operation

A cluster consists of one master task and several slave tasks. A master node contains the master task as well as a slave task. All other slave nodes contain a slave task only. A node may participate in more than one cluster. The term node relates to a single bus interface of a node if the node has multiple bus interfaces. A sample cluster with one master node and two slave nodes is depicted below:

Figure 1.12 – LIN sample cluster

The master task decides when and which frame shall be transferred on the bus. The slave tasks provide the data transported by each frame. Both the master task and the slave task are parts of the Frame handler.

Frames

A frame consists of a header (provided by the master task) and a response (provided by a slave task). The header consists of a break field and sync field followed by a frame identifier. The frame identifier uniquely defines the purpose of the frame. The slave task appointed for providing the response associated with the frame identifier transmits it, as depicted below. The response consists of a data field and a checksum field. The slave tasks interested in the data associated with the frame identifier receives the response, verifies the checksum and uses the data transported.

This results in the following desired features: - System flexibility: Nodes can be added to the LIN cluster without requiring hardware or software changes in other slave nodes. - Message routing: The content of a message is defined by the frame identifier (similar to CAN). - Multicast: Any number of nodes can simultaneously receive and act upon a single frame.

Schedule table

The master task (in the master node) transmits headers based on a schedule table. The schedule table specifies the frames and the interval between the start of a frame and the start of the following frame. The master application may use different schedule tables and select among them.

Page 123: Sisteme Integrate Ver5

Signal Management

A signal is transported in the data field of a frame.

Signal Types

A signal is either a scalar value or a byte array. A scalar signal is between 1 and 16 bits long. A one bit scalar signal is called a Boolean signal. Scalar signals in the size of 2 to 16 bits are treated as unsigned integers. A byte array is an array of between one and eight bytes. Each signal has exactly one publisher, i.e. it is always written by the same node in the cluster. Zero, one or multiple nodes may subscribe to the signal. All signals have initial values. The initial value for a published signal is valid until the node writes a new value to this signal. The initial value for a subscribed signal is valid until a new updated value is received from another node.

Signal Consistency

Scalar signal writing or reading must be atomic operations, i.e. it should never be possible for an application to receive a signal value that is partly updated. This also applies to byte arrays. However, no consistency is guaranteed between any signals.

Signal Packing

A signal is transmitted with the LSB first and the MSB last. There is no restriction on packing scalar signals over byte boundaries. Each byte in a byte array shall map to a single frame byte starting with the lowest numbered data byte. Several signals can be packed into one frame as long as they do not overlap each other. Note that signal packing/unpacking is implemented more efficient in software based nodes if signals are byte aligned and/or if they do not cross byte boundaries. The same signal can be packed into multiple frames as long as the publisher of the signal is the same. If a node is receiving one signal packed into multiple frames the latest received signal value is valid. Handling the same signal packed into frames on different LIN clusters is out of the scope.

Signal Reception and Transmission

The point in time when a signal is transmitted/received needs to be defined to help design tools and testing tools to analyze timing of signals. This means that all implementations will behave in a predictable way. The definitions below do not contain factors such as bit rate tolerance, jitter, buffer copy execution time, etc. These factors must be taken into account to get a more detailed analysis. The intention for the definitions below is to have a basis for such analysis. The timing is different for a master node and a slave node. The reason is that the master node controls the schedule and knows which frame will be sent. A slave node gets this information first when the header is transmitted on the bus. A signal is considered received and available to the application as follows: •Master node - at next time base tick after the maximum frame length. The master node updates its received signals periodically at the time base start (i.e. at task level). •Slave node - when the checksum for the received frame is validated. The slave node updates its received signals directly after the frame is finished (i.e. at interrupt level).

Page 124: Sisteme Integrate Ver5

A signal is considered transmitted (latest point in time when the application may write to the signal) as follows: •Master node - before the frame transmission is initiated. •Slave node - when the ID for the frame is received.

Frame Structure

The structure of a frame is shown in Figure 1.13. The frame is constructed of a number of fields, one break field followed by four to eleven byte fields, labeled as in the figure. The time it takes to send a frame is the sum of the time to send each byte plus the response space and the inter-byte spaces. The header starts at the falling edge of the break field and ends after the end of the stop bit of the protected identifier (PID) field. The response starts at the end of stop bit of the PID field and ends at the after the stop bit of the checksum field. The inter-byte space is the time between the end of the stop bit of the preceding field and the start bit of the following byte. The response space is the inter-byte space between the PID field and the first data field in the data. Both of them must be non-negative.

Figure 1.13 – LIN Frame Structure

Page 125: Sisteme Integrate Ver5

Each byte field, except the break field, is transmitted as the byte field shown in Figure 4.6. The LSB of the data is sent first and the MSB last. The start bit is encoded as a bit with value zero (dominant) and the stop bit is encoded as a bit with value one (recessive).

Figure 1.14 – Byte Field

Break field The break field is used to signal the beginning of a new frame. It is the only field that does not comply with Figure 1.14. A break field is always generated by the master task (in the master node) and it shall be at least 13 nominal bit times of dominant value, followed by a break delimiter, as shown in Figure 1.15. The break delimiter shall be at least one nominal bit time long. A slave node shall use a break detection threshold of 11 dominant local slave bit times. Slave nodes with a bit rate tolerance better than FTOL_RES_SLAVE, (typically a crystal or ceramic resonator) may use a 9.5 dominant nominal bit times break detection threshold.

Figure 1.15 – Break Field

Sync byte field Sync is a byte field with the data value 0x55, as shown in Figure 1.16.

Figure 1.16 – Sync Field

A slave task shall always be able to detect the break/sync field sequence, even if it expects a byte field (assuming the byte fields are separated from each other). A desired, but not required, feature is to detect the break/sync field sequence even if the break is partially superimposed with a data byte. When a break/sync field sequence happens, the transfer in progress shall be aborted and processing of the new frame shall commence. Protected identifier field A protected identifier field consists of two sub-fields; the frame identifier and the parity. Bits 0 to 5 are the frame identifier and bits 6 and 7 are the parity. Frame identifier Six bits are reserved for the frame identifier; values in the range 0 to 63 can be used. The frame identifiers are split in three categories: •Values 0 to 59 (0x3B) are used for signal carrying frames, •60 (0x3C) and 61 (0x3D) are used to carry diagnostic and configuration data, •62 (0x3E) and 63 (0x3F) are reserved for future protocol enhancements. Parity The parity is calculated on the frame identifier bits as shown in equations (1) and (2): P0 = ID0 + ID1 + ID2 + ID4 (1)

Page 126: Sisteme Integrate Ver5

P1 = ¬ (ID1 + ID3 + ID4 + ID5) (2) Mapping The mapping of the bits (ID0 to ID5 and P0 and P1) is shown in Figure 1.17.

Figure 1.17 – Protected Identifier Field

Data A frame carries between one and eight bytes of data. The number of data contained in a frame with a specific frame identifier shall be agreed by the publisher and all subscribers. A data byte is transmitted as part of a byte field, see Figure 1.14. For data entities longer than one byte, the entity LSB is contained in the byte sent first and the entity MSB in the byte sent last (little-endian). The data fields are labeled data 1, data 2... up to maximum data 8, see Figure 1.18.

Figure 1.18 – Data Field

Checksum The last field of a frame is the checksum. The checksum contains the inverted eight bit sum with carry over all data bytes or all data bytes and the protected identifier. Checksum calculation over the data bytes only is called classic checksum and it is used for the master request frame, slave response frame and communication with LIN 1.x slaves. Eight bit sum with carry is equivalent to sum all values and subtract 255 every time the sum is greater or equal to 256. See section 2.8.3 for examples how to calculate the checksum. Checksum calculation over the data bytes and the protected identifier byte is called enhanced checksum and it is used for communication with LIN 2.x slaves. The checksum is transmitted in a byte field. Use of classic or enhanced checksum is managed by the master node and it is determined per frame identifier; classic in communication with LIN 1.x slave nodes and enhanced in communication with LIN 2.x slave nodes.

10.4 Controller Area Network

The Controller Area Network (CAN) is a serial communications protocol which efficiently supports distributed real-time control with a very high level of security. Its domain of application ranges from high speed networks to low cost multiplex wiring. In automotive electronics, engine control units, sensors, anti-skid-systems, etc. are connected using CAN with bitrates up to 1 Mbit/s. At the same time it is cost effective to build into vehicle body electronics, e.g. lamp clusters electric windows etc. to replace the wiring harness otherwise required. The intention of this specification is to achieve compatibility between any two CAN implementations. Compatibility, however, has different aspects regarding e.g. electrical features and the interpretation of data to be transferred. To achieve design transparency and implementation flexibility CAN has been subdivided into different layers according to the ISO/OSI Reference Model: • The Data Link Layer - The Logical Link Control (LLC) sub-layer - The Medium Access Control (MAC) sub-layer

Page 127: Sisteme Integrate Ver5

• The Physical Layer Note that in previous versions of the CAN specification the services and functions of the LLC and MAC sub-layers of the Data Link Layer had been described in layers denoted as ’object layer’ and ’transfer layer’. The scope of the LLC sub-layer is • To provide services for data transfer and for remote data request, • To decide which messages received by the LLC sub-layer are actually to be accepted, • To provide means for recovery management and overload notifications. There is much freedom in defining object handling. The scope of the MAC sub-layer mainly is the transfer protocol, i.e. controlling the Framing, performing Arbitration, Error Checking, Error Signaling and Fault Confinement. Within the MAC sub-layer it is decided whether the bus is free for starting a new transmission or whether a reception is just starting. Also some general features of the bit timing are regarded as part of the MAC sub-layer. It is in the nature of the MAC sub-layer that there is no freedom for modifications. The scope of the physical layer is the actual transfer of the bits between the different nodes with respect to all electrical properties. Within one network the physical layer, of course, has to be the same for all nodes. There may be, however, much freedom in selecting a physical layer. The scope of this specification is to define the MAC sub-layer and a small part of the LLC sub-layer of the Data Link Layer and to describe the consequences of the CAN protocol on the surrounding layers

Basic Concepts

CAN has the following properties: • Prioritization of messages • Guarantee of latency times • Configuration flexibility • Multicast reception with time synchronization • System wide data consistency • Multi-master • Error detection and signaling • Automatic retransmission of corrupted messages as soon as the bus is idle again •distinction between temporary errors and permanent failures of nodes and autonomous switching off of defect nodes Layered Architecture of CAN according to the OSI Reference Model

• The Physical Layer defines how signals are actually transmitted and therefore deals with the description of Bit Timing, Bit Encoding, and Synchronization. Within this specification the Driver/Receiver Characteristics of the Physical Layer are not defined so as to allow transmission medium and signal level implementations to be optimized for their application. • The MAC sub-layer represents the kernel of the CAN protocol. It presents messages received from the LLC sub-layer and accepts messages to be transmitted to the LLC sub-layer. The MAC sub-layer is responsible for Message Framing, Arbitration, Acknowledgment, Error Detection and Signaling. The MAC sub-layer are supervised by a management entity called Fault Confinement which is self-checking mechanism for distinguishing short disturbances from permanent failures. • The LLC sub-layer is concerned with Message Filtering, Overload Notification and Recovery Management. The scope of this specification is to define the Data Link Layer and the consequences of the CAN protocol on the surrounding layers. Messages

Page 128: Sisteme Integrate Ver5

Information on the bus is sent in fixed format messages of different but limited length. When the bus is free any connected unit may start to transmit a new message. Information Routing In CAN systems a CAN node does not make use of any information about the system configuration (e.g. station addresses). This has several important consequences. System Flexibility: Nodes can be added to the CAN network without requiring any change in the software or hardware of any node and application layer. Message Routing: The content of a message is named by an IDENTIFIER. The IDENTIFIER does not indicate the destination of the message, but describes the meaning of the data, so that all nodes in the network are able to decide by Message Filtering whether the data is to be acted upon by them or not. Multicast: As a consequence of the concept of Message Filtering any number of nodes can receive and simultaneously act upon the same message. Data Consistency: Within a CAN network it is guaranteed that a message is simultaneously accepted either by all nodes or by no node. Thus data consistency of a system is achieved by the concepts of multicast and by error handling. Bit rate The speed of CAN may be different in different systems. However, in a given system the bit-rate is uniform and fixed. Priorities The IDENTIFIER defines a static message priority during bus access. Remote Data Request By sending a REMOTE FRAME a node requiring data may request another node to send the corresponding DATA FRAME. The DATA FRAME and the corresponding REMOTE FRAME are named by the same IDENTIFIER. Multi-master When the bus is free any unit may start to transmit a message. The unit with the message of higher priority to be transmitted gains bus access. Arbitration Whenever the bus is free, any unit may start to transmit a message. If 2 or more units start transmitting messages at the same time, the bus access conflict is resolved by bitwise arbitration using the IDENTIFIER. The mechanism of arbitration guarantees that neither information nor time is lost. If a DATA FRAME and a REMOTE FRAME with the same IDENTIFIER are initiated at the same time, the DATA FRAME prevails over the REMOTE FRAME. During arbitration every transmitter compares the level of the bit transmitted with the level that is monitored on the bus. If these levels are equal the unit may continue to send. When a ’recessive’ level is sent and a ’dominant’ level is monitored (see Bus Values), the unit has lost arbitration and must withdraw without sending one more bit. Safety In order to achieve the utmost safety of data transfer, powerful measures for error detection, signaling and self-checking are implemented in every CAN node. Error Detection For detecting errors the following measures have been taken: - Monitoring (transmitters compare the bit levels to be transmitted with the bit levels detected on the bus) - Cyclic Redundancy Check - Bit Stuffing - Message Frame Check Performance of Error Detection The error detection mechanisms have the following properties:

Page 129: Sisteme Integrate Ver5

- All global errors are detected. - All local errors at transmitters are detected. - Up to 5 randomly distributed errors in a message are detected. - Burst errors of length less than 15 in a message are detected. - Errors of any odd number in a message are detected. Total residual error probability for undetected corrupted messages: less than message error rate * 4.7 * 10-11. Error Signaling and Recovery Time Corrupted messages are flagged by any node detecting an error. Such messages are aborted and will be retransmitted automatically. The recovery time from detecting an error until the start of the next message is at most 31 bit times, if there is no further error. Fault Confinement CAN nodes are able to distinguish short disturbances from permanent failures. Defective nodes are switched off. Connections The CAN serial communication link is a bus to which a number of units may be connected. This number has no theoretical limit. Practically the total number of units will be limited by delay times and/or electrical loads on the bus line. Single Channel The bus consists of a single channel that carries bits. From this data resynchronization information can be derived. The way in which this channel is implemented is not fixed in this specification. E.g. single wire (plus ground), two differential wires, optical fibers, etc. Bus values The bus can have one of two complementary logical values: ’dominant’ or ’recessive’. During simultaneous transmission of ’dominant’ and ’recessive’ bits, the resulting bus value will be ’dominant’. For example, in case of a wired-AND implementation of the bus, the ’dominant’ level would be represented by a logical ’0’ and the ’recessive’ level by a logical ’1’. Physical states (e.g. electrical voltage, light) that represent the logical levels are not given in this specification. Acknowledgment All receivers check the consistency of the message being received and will acknowledge a consistent message and flag an inconsistent message. Sleep Mode / Wake-up To reduce the system’s power consumption, a CAN-device may be set into sleep mode without any internal activity and with disconnected bus drivers. The sleep mode is finished with a wake-up by any bus activity or by internal conditions of the system. On wake-up, the internal activity is restarted, although the MAC sub-layer will be waiting for the system’s oscillator to stabilize and it will then wait until it has synchronized itself to the bus activity (by checking for eleven consecutive ’recessive’ bits), before the bus drivers are set to "on-bus" again.

Message Transfer

Frame Formats

There are two different formats which differ in the length of the IDENTIFIER field: Frames with the number of 11 bit IDENTIFIER are denoted Standard Frames. In contrast, frames containing 29 bit IDENTIFIER are denoted Extended Frames.

Frame Types

Message transfer is manifested and controlled by four different frame types: - A DATA FRAME carries data from a transmitter to the receivers.

Page 130: Sisteme Integrate Ver5

- A REMOTE FRAME is transmitted by a bus unit to request the transmission of the DATA FRAME with the same IDENTIFIER. - An ERROR FRAME is transmitted by any unit on detecting a bus error. - An OVERLOAD FRAME is used to provide for an extra delay between the preceding and the succeeding DATA or REMOTE FRAMEs. DATA FRAMEs and REMOTE FRAMEs can be used both in Standard Frame Format and Extended Frame Format; they are separated from preceding frames by an INTERFRAME SPACE.

DATA FRAME

A DATA FRAME is composed of seven different bit fields: START OF FRAME, ARBITRATION FIELD, CONTROL FIELD, DATA FIELD, CRC FIELD, ACK FIELD, and END OF FRAME. The DATA FIELD can be of length zero.

START OF FRAME (Standard Format as well as Extended Format) The START OF FRAME (SOF) marks the beginning of DATA FRAMES and REMOTE FRAMEs. It consists of a single ’dominant’ bit. A station is only allowed to start transmission when the bus is idle (see ’INTERFRAME Spacing’). All stations have to synchronize to the leading edge caused by START OF FRAME (see ’HARD SYNCHRONIZATION’) of the station starting transmission first. ARBITRATION FIELD The format of the ARBITRATION FIELD is different for Standard Format and Extended Format Frames. - In Standard Format the ARBITRATION FIELD consists of the 11 bit IDENTIFIER and the RTR-BIT. The IDENTIFIER bits are denoted ID-28 ... ID-18. - In Extended Format the ARBITRATION FIELD consists of the 29 bit IDENTIFIER, the SRR-Bit, the IDE-Bit, and the RTR-BIT. The IDENTIFIER bits are denoted ID-28 ... ID-0. In order to distinguish between Standard Format and Extended Format the reserved bit r1 in previous CAN specifications version 1.0-1.2 now is denoted as IDE Bit.

Page 131: Sisteme Integrate Ver5

IDENTIFIER IDENTIFIER - Standard Format The IDENTIFIER’s length is 11 bits and corresponds to the Base ID in Extended Format. These bits are transmitted in the order from ID-28 to ID-18. The least significant bit is ID-18. The 7 most significant bits (ID-28 - ID-22) must not be all ’recessive’. IDENTIFIER - Extended Format In contrast to the Standard Format the Extended Format consists of 29 bits. The format comprises two sections: Base ID with 11 bits and the Extended ID with 18 bits Base ID The Base ID consists of 11 bits. It is transmitted in the order from ID-28 to ID-18. It is equivalent to format of the Standard Identifier. The Base ID defines the Extended Frame’s base priority. Extended ID The Extended ID consists of 18 bits. It is transmitted in the order of ID-17 to ID-0. In a Standard Frame the IDENTIFIER is followed by the RTR bit. RTR BIT (Standard Format as well as Extended Format) Remote Transmission Request BIT In DATA FRAMEs the RTR BIT has to be ’dominant’. Within a REMOTE FRAME the RTR BIT has to be ’recessive’. In an Extended Frame the Base ID is transmitted first, followed by the IDE bit and the SRR bit. The Extended ID is transmitted after the SRR bit. SRR BIT (Extended Format) Substitute Remote Request BIT The SRR is a recessive bit. It is transmitted in Extended Frames at the position of the RTR bit in Standard Frames and so substitutes the RTR-Bit in the Standard Frame. Therefore, collisions of a Standard Frame and an Extended Frame, the Base of which is the same as the Standard Frame’s Identifier, are resolved in such a way that the Standard Frame prevails the Extended Frame.

Page 132: Sisteme Integrate Ver5

IDE BIT (Extended Format) Identifier Extension Bit The IDE Bit belongs to - The ARBITRATION FIELD for the Extended Format - The Control Field for the Standard Format The IDE bit in the Standard Format is transmitted ’dominant’, whereas in the Extended Format the IDE bit is recessive. CONTROL FIELD (Standard Format as well as Extended Format) The CONTROL FIELD consists of six bits. The format of the CONTROL FIELD is different for Standard Format and Extended Format. Frames in Standard Format include the DATA LENGTH CODE, the IDE bit, which is transmitted ’dominant’ (see above), and the reserved bit r0. Frames in the Extended Format include the DATA LENGTH CODE and two reserved bits r1 and r0. The reserved bits have to be sent ’dominant’, but receivers accept ’dominant’ and ’recessive’ bits in all combinations.

DATA LENGTH CODE (Standard Format as well as Extended Format) The number of bytes in the DATA FIELD is indicated by the DATA LENGTH CODE. This DATA LENGTH CODE is 4 bits wide and is transmitted within the CONTROL FIELD. Coding of the number of data bytes by the DATA LENGTH CODE abbreviations: - d ’dominant’ - r ’recessive’

DATA FRAME: admissible numbers of data bytes: {0, 1... 7, 8}. Other values may not be used. DATA FIELD (Standard Format as well as Extended Format)

Page 133: Sisteme Integrate Ver5

The DATA FIELD consists of the data to be transferred within a DATA FRAME. It can contain from 0 to 8 bytes, which each contain 8 bits which are transferred MSB first. CRC FIELD (Standard Format as well as Extended Format) contains the CRC SEQUENCE followed by a CRC DELIMITER

CRC SEQUENCE (Standard Format as well as Extended Format) The frame check sequence is derived from a cyclic redundancy code best suited for frames with bit counts less than 127 bits (BCH Code). In order to carry out the CRC calculation the polynomial to be divided is defined as the polynomial, the coefficients of which are given by the destuffed bit stream consisting of START OF FRAME, ARBITRATION FIELD, CONTROL FIELD, DATA FIELD (if present) and, for the 15 lowest coefficients, by 0. This polynomial is divided (the coefficients are calculated modulo-2) by the generator-polynomial: X15 + X14 + X10 + X8 + X7 + X4 + X3 + 1. The remainder of this polynomial division is the CRC SEQUENCE transmitted over the bus. In order to implement this function, a 15 bit shift register CRC_RG (14:0) can be used. If NXTBIT denotes the next bit of the bit stream, given by the destuffed bit sequence from START OF FRAME until the end of the DATA FIELD, the CRC SEQUENCE is calculated as follows: CRC_RG = 0; // initialize shift register REPEAT CRCNXT = NXTBIT EXOR CRC_RG (14); CRC_RG (14:1) = CRC_RG (13:0); // shift left by CRC_RG (0) = 0; // 1 position IF CRCNXT THEN CRC_RG (14:0) = CRC_RG (14:0) EXOR (4599hex); ENDIF UNTIL (CRC SEQUENCE starts or there is an ERROR condition) After the transmission / reception of the last bit of the DATA FIELD, CRC_RG contains the CRC sequence. CRC DELIMITER (Standard Format as well as Extended Format) The CRC SEQUENCE is followed by the CRC DELIMITER which consists of a single ’recessive’ bit. ACK FIELD (Standard Format as well as Extended Format) The ACK FIELD is two bits long and contains the ACK SLOT and the ACK DELIMITER. In the ACK FIELD the transmitting station sends two ’recessive’ bits. A RECEIVER which has received a valid message correctly, reports this to the TRANSMITTER by sending a ’dominant’ bit during the ACK SLOT (it sends ’ACK’).

Page 134: Sisteme Integrate Ver5

ACK SLOT All stations having received the matching CRC SEQUENCE report this within the ACK SLOT by super scribing the ’recessive’ bit of the TRANSMITTER by a ’dominant’ bit. ACK DELIMITER The ACK DELIMITER is the second bit of the ACK FIELD and has to be a ’recessive’ bit. As a consequence, the ACK SLOT is surrounded by two ’recessive’ bits (CRC DELIMITER, ACK DELIMITER). END OF FRAME (Standard Format as well as Extended Format) Each DATA FRAME and REMOTE FRAME is delimited by a flag sequence consisting of seven ’recessive’ bits.

11. IDE – Integrated Development Environment

An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of a source code editor, a compiler and/or interpreter, build automation tools, and (usually) a debugger. Sometimes a version control system and various tools are integrated to simplify the construction of a GUI. Many modern IDEs also have a class browser, an object inspector, and a class hierarchy diagram, for use with object oriented software development. Typically an IDE is dedicated to a specific programming language, so as to provide a feature set which most closely matches the programming paradigms of the language. However, some multiple-language IDEs are in use, such as Eclipse, recent versions of NetBeans, and Microsoft Visual Studio. IDEs present a single program in which all development is done. This program typically provides many features for authoring, modifying, compiling, deploying and debugging software. The aim is to abstract the configuration necessary to piece together command line utilities in a cohesive unit, which theoretically reduces the time to learn a language, and increases developer productivity. It is also thought that the tight integration of development tasks can further increase productivity. For example, code can be compiled while being written, providing instant feedback on syntax errors. While most modern IDEs are graphical, IDEs in use before the advent of windowing systems were text-based, using function keys or hotkeys to perform various tasks.

11.1 Source Code Editor

A source code editor is a text editor program designed specifically for editing source code of computer programs by programmers. It may be a standalone application or it may be built into an integrated development environment (IDE). Source code editors have features specifically designed to simplify and speed up input of source code, such as syntax highlighting, autocomplete and bracket matching functionality. These

Page 135: Sisteme Integrate Ver5

editors also provide a convenient way to run a compiler, interpreter, debugger, or other program relevant for software development process. So, while many text editors can be used to edit source code, if they don't enhance, automate or ease the editing of code, they are not "source code editors," but simply "text editors that can also be used to edit source code." Examples of source code editors: Notepad++ (Windows), PSPad (Windows), editix XML Editor (Windows, Linux, Mac OS X), Crimson Editor (Windows), EmEditor (Windows) UltraEdit (Windows), UNA (Windows, Linux, Mac OS X).

11.2 Compiler

A compiler is a computer program (or set of programs) that translates text written in a computer language (the source language) into another language (the target language). The original sequence is usually called the source code and the output called object code. Commonly the output has a form suitable for processing by other programs (e.g., a linker), but it may be a human-readable text file. The most common reason for wanting to translate source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine language). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language. A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization. 11.2.1 Front end

The front end analyzes the source code to build an internal representation of the program, called the intermediate representation or IR. It also manages the symbol table, a data structure mapping each symbol in the source code to associated information such as location, type and scope. This is done over several phases, which includes some of the following: - Line reconstruction. Languages which strop their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The top-down, recursive-descent, table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. - Lexical analysis breaks the source code text into small pieces called tokens. Each token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a lexical analyzer or scanner. - Preprocessing. Some languages, e.g., C, require a preprocessing phase which supports macro substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as Scheme support macro substitutions based on syntactic forms. - Syntax analysis involves parsing the token sequence to identify the syntactic structure of the program. This phase typically builds a parse tree, which replaces the linear sequence of tokens with

Page 136: Sisteme Integrate Ver5

a tree structure built according to the rules of a formal grammar which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler. - Semantic analysis is the phase in which the compiler adds semantic information to the parse tree and builds the symbol table. This phase performs semantic checks such as type checking (checking for type errors), or object binding (associating variable and function references with their definitions), or definite assignment (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the parsing phase, and logically proceeds the code generation phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.

11.2.2 Back end

The term back end is sometimes confused with code generator because of the overlapped functionality of generating assembly code. Some literature uses middle end to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators. The main phases of the back end include the following: - Analysis: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are data flow analysis to build use-define chains, dependence analysis, alias analysis, pointer analysis, escape analysis etc. Accurate analysis is the basis for any compiler optimization. The call graph and control flow graph are usually also built during the analysis phase. - Optimization: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are inline expansion, dead code elimination, constant propagation, loop transformation, register allocation or even automatic parallelization. - Code generation: the transformed intermediate language is translated into the output language, usually the native machine language of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes Examples of compilers: MULTI, Local C Compiler, LabWindows CVI, GCC, Sun Studio.

11.3 Linker

A linker or link editor is a program that takes one or more objects generated by compilers and assembles them into a single executable program. Linkers can take objects from a collection called a library. Some linkers do not include the whole library in the output; they only include its symbols that are referenced from other object files or libraries. Libraries exist for diverse purposes, and one or more system libraries are usually linked in by default. The linker also takes care of arranging the objects in a program's address space. This may involve relocating code that assumes a specific base address to another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example, zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores.

Page 137: Sisteme Integrate Ver5

11.4 Debugger

A debugger is a program that is used to test and debug other programs. The code to be examined might alternatively be running on an instruction set simulator (ISS), a technique that allows great power in its ability to halt when specific conditions are encountered but which will typically be much slower than executing the code directly on the appropriate processor. When the program crashes, the debugger shows the position in the original code if it is a source-level debugger or symbolic debugger, commonly seen in integrated development environments. If it is a low-level debugger or a machine-language debugger it shows the line in the disassembly. (A "crash" happens when the program cannot continue because of a programming bug. For example, perhaps the program tried to use an instruction not available on the current version of the CPU or attempted access to unavailable or protected memory.) Typically, debuggers also offer more sophisticated functions such as running a program step by step (single-stepping), stopping (breaking) (pausing the program to examine the current state) at some kind of event by means of breakpoint, and tracking the values of some variables. Some debuggers have the ability to modify the state of the program while it is running, rather than merely to observe it. The importance of a good debugger cannot be overstated. Indeed, the existence and quality of such a tool for a given language and platform can often be the deciding factor in its use, even if another language/platform is better-suited to the task. However, it is also important to note that software can (and often does) behave differently running under a debugger than normally, due to the inevitable changes the presence of a debugger will make to a software program's internal timing. As a result, even with a good debugging tool, it is often very difficult to track down runtime problems in complex multi-threaded or distributed systems. Examples of debuggers: CodeView, DBG - A PHP Debugger and Profiler, DDD - Data Display Debugger, Eclipse, TotalView, GNU Debugger (GDB), Insight, Interactive Disassembler.

12. Real-Time Operating Systems

12.1 Introduction

A real-time operating system (RTOS) is the key to many embedded systems today and, provides a software platform upon which to build applications. Not all embedded systems are designed with an RTOS. Some embedded systems with relatively simple hardware or a small amount of software application code might not require an RTOS. Many embedded systems with moderate-to-large software applications require some form of scheduling, and these systems require an RTOS.

12.2 Defining an RTOS

A real-time operating system (RTOS) is a program that schedules execution in a timely manner, manages system resources, and provides a consistent foundation for developing application code. Application code designed on an RTOS can be quite diverse, ranging from a simple application for a digital stopwatch to a much more complex application for aircraft navigation. Good RTOSes, therefore, are scalable in order to meet different sets of requirements for different applications.

Page 138: Sisteme Integrate Ver5

For example, in some applications, an RTOS comprises only a kernel, which is the core supervisory software that provides minimal logic, scheduling, and resource-management algorithms. Every RTOS has a kernel. On the other hand, an RTOS can be a combination of various modules, including the kernel, a file system, networking protocol stacks, and other components required for a particular application, as illustrated at a high level in Figure 2.1.

Figure 2.1: High-level view of an RTOS, its kernel, and other components found in embedded systems.

Most RTOS kernels contain the following components: • Scheduler - is contained within each kernel and follows a set of algorithms that determines which task executes when. Some common examples of scheduling algorithms include round-robin and preemptive scheduling. • Objects - are special kernel constructs that help developers create applications for real-time embedded systems. Common kernel objects include tasks, semaphores, and message queues. • Services - are operations that the kernel performs on an object or, generally operations such as timing, interrupt handling, and resource management. Figure 2.2 illustrates these components, each of which is described next.

Page 139: Sisteme Integrate Ver5

Figure 2.2: Common components in an RTOS kernel that including objects, the scheduler, and some

services. This diagram is highly simplified; remember that not all RTOS kernels conform to this exact set of objects,scheduling algorithms, and services.

12.3 The Scheduler

The scheduler is at the heart of every kernel. A scheduler provides the algorithms needed to determine which task executes when. To understand how scheduling works, this section describes the following topics: • schedulable entities, • multitasking, • context switching, • dispatcher, and • scheduling algorithms.

12.3.1 Schedulable Entities

A schedulable entity is a kernel object that can compete for execution time on a system, based on a predefined scheduling algorithm. Tasks and processes are all examples of schedulable entities found in most kernels. A task is an independent thread of execution that contains a sequence of independently schedulable instructions. Some kernels provide another type of a schedulable object called a process. Processes are similar to tasks in that they can independently compete for CPU execution time. Processes differ from tasks in that they provide better memory protection features, at the expense of performance and memory overhead. Note that message queues and semaphores are not schedulable entities. These items are inter-task communication objects used for synchronization and communication. So, how exactly does a scheduler handle multiple schedulable entities that need to run simultaneously? The answer is by multitasking. The multitasking discussions are carried out in the context of uniprocessor environments.

Page 140: Sisteme Integrate Ver5

12.3.2 Multitasking

Multitasking is the ability of the operating system to handle multiple activities within set deadlines. A real-time kernel might have multiple tasks that it has to schedule to run. One such multitasking scenario is illustrated in Figure 2.3.

Figure 2.3: Multitasking using a context switch.

In this scenario, the kernel multitasks in such a way that many threads of execution appear to be running concurrently; however, the kernel is actually interleaving executions sequentially, based on a preset scheduling algorithm. The scheduler must ensure that the appropriate task runs at the right time. An important point to note here is that the tasks follow the kernel’s scheduling algorithm, while interrupt service routines (ISR) are triggered to run because of hardware interrupts and their established priorities. As the number of tasks to schedule increases, so do CPU performance requirements. This fact is due to increased switching between the contexts of the different threads of execution.

12.3.3 The Context Switch

Each task has its own context, which is the state of the CPU registers required each time it is scheduled to run. A context switch occurs when the scheduler switches from one task to another. To better understand what happens during a context switch, let’s examine further what a typical kernel does in this scenario. Every time a new task is created, the kernel also creates and maintains an associated task control block (TCB). TCB’s are system data structures that the kernel uses to maintain task-specific information. TCBs contain everything a kernel needs to know about a particular task. When a task is running, it’s context is highly dynamic. This dynamic context is maintained in the TCB. When the task is not running, it’s context is frozen within the TCB, to be restored the next time the task runs. A typical context switch scenario is illustrated in Figure 3. As shown in Figure 3,

Page 141: Sisteme Integrate Ver5

when the kernel’s scheduler determines that it needs to stop running task 1 and start running task 2, it takes the following steps: 1. The kernel saves task’s 1 context information in its TCB. 2. It loads task’s 2 context information from its TCB, which becomes the current thread of execution. 3. The context of task 1 is frozen while task 2 executes, but if the scheduler needs to run task 1 again, task 1 continues from where it left off just before the context switch. The time it takes for the scheduler to switch from one task to another is the context switch time. It is relatively insignificant compared to most operations that a task performs. If an application’s design includes frequent context switching, however, the application can incur unnecessary performance overhead. Therefore, design applications in a way that does not involve excess context switching. Every time an application makes a system call, the scheduler has an opportunity to determine if it needs to switch contexts. When the scheduler determines a context switch is necessary, it relies on an associated module, called the dispatcher, to make that switch happen. 12.3.4 The Dispatcher

The dispatcher is the part of the scheduler that performs context switching and changes the flow of execution. At any time an RTOS is running, the flow of execution, also known as flow of control, is passing through one of three areas: through an application task, through an ISR, or through the kernel. When a task or ISR makes a system call, the flow of control passes to the kernel to execute one of the system routines provided by the kernel. When it is time to leave the kernel, the dispatcher is responsible for passing control to one of the tasks in the user’s application. It will not necessarily be the same task that made the system call. It is the scheduling algorithms of the scheduler that determines which task executes next. It is the dispatcher that does the actual work of context switching and passing execution control. Depending on how the kernel is first entered, dispatching can happen differently. When a task makes system calls, the dispatcher is used to exit the kernel after every system call completes. In this case, the dispatcher is used on a call-by-call basis so that it can coordinate task-state transitions that any of the system calls might have caused. (One or more tasks may have become ready to run, for example.) On the other hand, if an ISR makes system calls, the dispatcher is bypassed until the ISR fully completes its execution. This process is true even if some resources have been freed that would normally trigger a context switch between tasks. These context switches do not take place because the ISR must complete without being interrupted by tasks. After the ISR completes execution, the kernel exits through the dispatcher so that it can then dispatch the correct task.

12.3.5 Scheduling Algorithms

As mentioned earlier, the scheduler determines which task runs by following a scheduling algorithm (also known as scheduling policy). Most kernels today support two common scheduling algorithms: • preemptive priority-based scheduling, and • round-robin scheduling. The RTOS manufacturer typically predefines these algorithms; however, in some cases, developers can create and define their own scheduling algorithms. Each algorithm is described next. Preemptive Priority-Based Scheduling

Of the two scheduling algorithms introduced here, most real-time kernels use preemptive priority-based scheduling by default. As shown in Figure 2.4 with this type of scheduling, the task

Page 142: Sisteme Integrate Ver5

that gets to run at any point is the task with the highest priority among all other tasks ready to run in the system.

Figure 2.4: Preemptive priority-based scheduling.

Real-time kernels generally support 256 priority levels, in which 0 is the highest and 255 the lowest. Some kernels appoint the priorities in reverse order, where 255 is the highest and 0 the lowest. Regardless, the concepts are basically the same. With a preemptive priority-based scheduler, each task has a priority, and the highest-priority task runs first. If a task with a priority higher than the current task becomes ready to run, the kernel immediately saves the current task s context in its TCB and switches to the higher-priority task. As shown in Figure 4 task 1 is preempted by higher-priority task 2, which is then preempted by task 3. When task 3 completes, task 2 resumes; likewise, when task 2 completes, task 1 resumes. Although tasks are assigned a priority when they are created, a task’s priority can be changed dynamically using kernel-provided calls. The ability to change task priorities dynamically allows an embedded application the flexibility to adjust to external events as they occur, creating a true real-time, responsive system. Note, however, that misuse of this capability can lead to priority inversions, deadlock, and eventual system failure.

Round-Robin Scheduling

Round-robin scheduling provides each task an equal share of the CPU execution time. Pure round-robinn scheduling cannot satisfy real-time system requirements because in real-time systems, tasks perform work of varying degrees of importance. Instead, preemptive, priority-based scheduling can be augmented with round-robin scheduling which uses time slicing to achieve equal allocation of the CPU for tasks of the same priority as shown in Figure 2.5.

Figure 2.5: Round-robin and preemptive scheduling.

With time slicing, each task executes for a defined interval, or time slice, in an ongoing cycle, which is the round robin. A run-time counter tracks the time slice for each task, incrementing on every clock tick. When one task s time slice completes, the counter is cleared, and the task is placed at the end of the cycle. Newly added tasks of the same priority are placed at the end of the cycle, with their run-time counters initialized to 0. If a task in a round-robin cycle is preempted by a higher-priority task, its run-time count is saved and then restored when the interrupted task is again eligible for execution. This idea is illustrated in Figure 5, in which task 1 is preempted by a higher-priority task 4 but resumes where it left off when task 4 completes.

Page 143: Sisteme Integrate Ver5

12.4 Objects

Kernel objects are special constructs that are the building blocks for application development for real-time embedded systems. The most common RTOS kernel objects are • Tasks are concurrent and independent threads of execution that can compete for CPU execution time. • Semaphores are token-like objects that can be incremented or decremented by tasks for synchronization or mutual exclusion. • Message Queues are buffer-like data structures that can be used for synchronization, mutual exclusion, and data exchange by passing messages between tasks. Developers creating real-time embedded applications can combine these basic kernel objects (as well as others not mentioned here) to solve common real-time design problems, such as concurrency, activity synchronization, and data communication. These design problems and the kernel objects used to solve them are discussed in more detail in later chapters.

12.4.1 Tasks

12.4.1.1 Introduction

Simple software applications are typically designed to run sequentially , one instruction at a time, in a pre-determined chain of instructions. However, this scheme is inappropriate for real-time embedded applications, which generally handle multiple inputs and outputs within tight time constraints. Real-time embedded software applications must be designed for concurrency. Concurrent design requires developers to decompose an application into small, schedulable, and sequential program units. When done correctly, concurrent design allows system multitasking to meet performance and timing requirements for a real-time system. Most RTOS kernels provide task objects and task management services to facilitate designing concurrency within an application.

12.4.1.2 Defining a Task

A task is an independent thread of execution that can compete with other concurrent tasks for processor execution time. As mentioned earlier, developers decompose applications into multiple concurrent

tasks to optimize the handling of inputs and outputs within set time constraints. A task is schedulable. The task is able to compete for execution time on a system, based on a predefined scheduling algorithm. A task is defined by it’s distinct set of parameters and supporting data structures. Specifically, upon creation, each task has an associated name, a unique ID, a priority (if part of a preemptive scheduling plan), a task control block (TCB), a stack, and a task routine, as shown in Figure 2.6). Together, these components make up what is known as the task object.

Page 144: Sisteme Integrate Ver5

Figure 2.6: A task, its associated parameters, and supporting data structures.

When the kernel first starts, it creates its own set of system tasks and allocates the appropriate priority for each from a set of reserved priority levels. The reserved priority levels refer to the priorities used internally by the RTOS for its system tasks. An application should avoid using these priority levels for its tasks because running application tasks at such level may affect the overall system performance or behavior. For most RTOSes, these reserved priorities are not enforced. The kernel needs it’s system tasks and their reserved priority levels to operate. These priorities should not be modified. Examples of system tasks include:

• initialization or startup task initializes the system and creates and starts system tasks,

• idle task uses up processor idle cycles when no other activity is present,

• logging task logs system messages, • exception-handling task handles exceptions, and

• debug agent task allows debugging with a host debugger. Note that other system tasks might be created during initialization, depending on what other components are included with the kernel. The idle task, which is created at kernel startup, is one system task that bears mention and should not be ignored. The idle task is set to the lowest priority, typically executes in an endless loop, and runs when either no other task can run or when no other tasks exist, for the sole purpose of using idle processor cycles. The idle task is necessary because the processor executes the instruction to which the program counter register points while it is running. Unless the processor can be suspended, the program counter must still point to valid instructions even when no tasks exist in the system or when no tasks can run. Therefore, the idle task ensures the processor program counter is always valid when no other tasks are running. In some cases, however, the kernel might allow a user-configured routine to run instead of the idle task in order to implement special requirements for a particular application. One example of a special requirement is power conservation. When no other tasks can run, the kernel can switch control to the user-supplied routine instead of to the idle task. In this case, the user-supplied routine acts like the idle task but instead initiates power conservation code, such as system suspension, after a period of idle time. After the kernel has initialized and created all of the required tasks, the kernel jumps to a predefined entry point (such as a predefined function) that serves, in effect, as the beginning of the

Page 145: Sisteme Integrate Ver5

application. From the entry point, the developer can initialize and create other application tasks , as well as other kernel objects, which the application design might require. As the developer creates new tasks, the developer must assign each a task name, priority, stack size, and a task routine. The kernel does the rest by assigning each task a unique ID and creating an associated TCB and stack space in memory for it. 12.4.1.3 Task States and Scheduling

Whether it's a system task or an application task, at any time each task exists in one of a small number of states, including ready, running, or blocked. As the real-time embedded system runs, each task moves from one state to another, according to the logic of a simple finite state machine (FSM). Figure 2.7 illustrates a typical FSM for task execution states, with brief descriptions of state transitions.

Figure 2.7: A typical finite state machine for task execution states.

Although kernels can define task-state groupings differently, generally three main states are used in most typical preemptive-scheduling kernels, including:

• ready state-the task is ready to run but cannot because a higher priority task is executing.

• blocked state-the task has requested a resource that is not available, has requested to wait until some event occurs, or has delayed itself for some duration.

• running state-the task is the highest priority task and is running. Note some commercial kernels, such as the VxWorks kernel, define other, more granular states, such as suspended, pended, and delayed. In this case, pended and delayed are actually sub-states of the blocked state. A pended task is waiting for a resource that it needs to be freed; a delayed task is waiting for a timing delay to end. The suspended state exists for debugging purposes. For more detailed information on the way a particular RTOS kernel implements its FSM for each task, refer to the kernel's user manual. Regardless of how a kernel implements a task's FSM, it must maintain the current state of all tasks in a running system. As calls are made into the kernel by executing tasks, the kernel's scheduler first determines which tasks need to change states and then makes those changes. In some cases, the kernel changes the states of some tasks, but no context switching occurs because the state of the highest priority task is unaffected. In other cases, however, these state changes result in a context switch because the former highest priority task either gets blocked or is

Page 146: Sisteme Integrate Ver5

no longer the highest priority task. When this process happens, the former running task is put into the blocked or ready state, and the new highest priority task starts to execute. The following describe the ready, running, and blocked states in more detail. These descriptions are based on a single-processor system and a kernel using a priority-based preemptive scheduling algorithm. Ready State

When a task is first created and made ready to run, the kernel puts it into the ready state. In this state, the task actively competes with all other ready tasks for the processor's execution time. As Figure 2.7 shows, tasks in the ready state cannot move directly to the blocked state. A task first needs to run so it can make a blocking call, which is a call to a function that cannot immediately run to completion, thus putting the task in the blocked state. Ready tasks, therefore, can only move to the running state. Because many tasks might be in the ready state, the kernel's scheduler uses the priority of each task to determine which task to move to the running state. For a kernel that supports only one task per priority level, the scheduling algorithm is straightforward-the highest priority task that is ready runs next. In this implementation, the kernel limits the number of tasks in an application to the number of priority levels. However, most kernels support more than one task per priority level, allowing many more tasks in an application. In this case, the scheduling algorithm is more complicated and involves maintaining a task-ready list . Some kernels maintain a separate task-ready list for each priority level; others have one combined list. Figure 2.8 illustrates, in a five-step scenario, how a kernel scheduler might use a task-ready list to move tasks from the ready state to the running state. This example assumes a single-processor system and a priority-based preemptive scheduling algorithm in which 255 is the lowest priority and 0 is the highest. Note that for simplicity this example does not show system tasks, such as the idle task.

Figure 2.8: Five steps showing the way a task-ready list works.

Page 147: Sisteme Integrate Ver5

In this example, tasks 1, 2, 3, 4, and 5 are ready to run, and the kernel queues them by priority in a task-ready list. Task 1 is the highest priority task (70); tasks 2, 3, and 4 are at the next-highest priority level (80); and task 5 is the lowest priority (90). The following steps explains how a kernel might use the task-ready list to move tasks to and from the ready state: 1. Tasks 1, 2, 3, 4, and 5 are ready to run and are waiting in the task-ready list. 2. Because task 1 has the highest priority (70), it is the first task ready to run. If nothing higher is running, the kernel removes task 1 from the ready list and moves it to the running state. 3. During execution, task 1 makes a blocking call. As a result, the kernel moves task 1 to the blocked state; takes task 2, which is first in the list of the next-highest priority tasks (80), off the ready list; and moves task 2 to the running state. 4. Next, task 2 makes a blocking call. The kernel moves task 2 to the blocked state; takes task 3, which is next in line of the priority 80 tasks, off the ready list; and moves task 3 to the running state. 5. As task 3 runs, frees the resource that task 2 requested. The kernel returns task 2 to the ready state and inserts it at the end of the list of tasks ready to run at priority level 80. Task 3 continues as the currently running task. Although not illustrated here, if task 1 became unblocked at this point in the scenario, the kernel would move task 1 to the running state because its priority is higher than the currently running task (task 3). As with task 2 earlier, task 3 at this point would be moved to the ready state and inserted after task 2 (same priority of 80) and before task 5 (next priority of 90). Running State

On a single-processor system, only one task can run at a time. In this case, when a task is moved to the running state, the processor loads its registers with this task's context. The processor can then execute the task's instructions and manipulate the associated stack. A task can move back to the ready state while it is running. When a task moves from the running state to the ready state, it is preempted by a higher priority task. In this case, the preempted task is put in the appropriate, priority-based location in the task-ready list, and the higher priority task is moved from the ready state to the running state. Unlike a ready task, a running task can move to the blocked state in any of the following ways:

• by making a call that requests an unavailable resource,

• by making a call that requests to wait for an event to occur, and

• by making a call to delay the task for some duration.

• In each of these cases, the task is moved from the running state to the blocked state, as described next. Blocked State

The possibility of blocked states is extremely important in real-time systems because without blocked states, lower priority tasks could not run. If higher priority tasks are not designed to block, CPU starvation can result. CPU starvation occurs when higher priority tasks use all of the CPU execution time and lower priority tasks do not get to run. A task can only move to the blocked state by making a blocking call, requesting that some blocking condition be met. A blocked task remains blocked until the blocking condition is met. (It

Page 148: Sisteme Integrate Ver5

probably ought to be called the un blocking condition, but blocking is the terminology in common use among real-time programmers.) Examples of how blocking conditions are met include the following:

• a semaphore token (described later) for which a task is waiting is released,

• a message, on which the task is waiting, arrives in a message queue, or

• a time delay imposed on the task expires. When a task becomes unblocked, the task might move from the blocked state to the ready state if it is not the highest priority task. The task is then put into the task-ready list at the appropriate priority-based location, as described earlier. However, if the unblocked task is the highest priority task, the task moves directly to the running state (without going through the ready state) and preempts the currently running task. The preempted task is then moved to the ready state and put into the appropriate priority-based location in the task-ready list.

12.4.1.4 Typical Task Operations

In addition to providing a task object, kernels also provide task-management services . Task-management services include the actions that a kernel performs behind the scenes to support tasks, for example, creating and maintaining the TCB and task stacks. A kernel, however, also provides an API that allows developers to manipulate tasks. Some of the more common operations that developers can perform with a task object from within the application include:

• creating and deleting tasks,

• controlling task scheduling, and

• obtaining task information. Developers should learn how to perform each of these operations for the kernel selected for the project. Each operation is briefly discussed next. Task Creation and Deletion

The most fundamental operations that developers must learn are creating and deleting tasks. Developers typically create a task using one or two operations, depending on the kernel s API. Some kernels allow developers first to create a task and then start it. In this case, the task is first created and put into a suspended state; then, the task is moved to the ready state when it is started (made ready to run). Creating tasks in this manner might be useful for debugging or when special initialization needs to occur between the times that a task is created and started. However, in most cases, it is sufficient to create and start a task using one kernel call. The suspended state is similar to the blocked state, in that the suspended task is neither running nor ready to run. However, a task does not move into or out of the suspended state via the same operations that move a task to or from the blocked state. The exact nature of the suspended state varies between RTOSes. For the present purpose, it is sufficient to know that the task is not yet ready to run. Starting a task does not make it run immediately; it puts the task on the task-ready list.

Page 149: Sisteme Integrate Ver5

Many kernels also provide user-configurable hooks , which are mechanisms that execute programmer-supplied functions, at the time of specific kernel events. The programmer registers the function with the kernel by passing a function pointer to a kernel-provided API . The kernel executes this function when the event of interest occurs. Such events can include:

• when a task is first created,

• when a task is suspended for any reason and a context switch occurs, and

• when a task is deleted. Hooks are useful when executing special initialization code upon task creation, implementing status tracking or monitoring upon task context switches, or executing clean-up code upon task deletion. Carefully consider how tasks are to be deleted in the embedded application. Many kernel implementations allow any task to delete any other task. During the deletion process, a kernel terminates the task and frees memory by deleting the task s TCB and stack. However, when tasks execute, they can acquire memory or access resources using other kernel objects. If the task is deleted incorrectly, the task might not get to release these resources. For example, assume that a task acquires a semaphore token to get exclusive access to a shared data structure. While the task is operating on this data structure, the task gets deleted. If not handled appropriately, this abrupt deletion of the operating task can result in:

• a corrupt data structure, due to an incomplete write operation,

• an unreleased semaphore, which will not be available for other tasks that might need to acquire it, and

• an inaccessible data structure, due to the unreleased semaphore. As a result, premature deletion of a task can result in memory or resource leaks. A memory leak occurs when memory is acquired but not released, which causes the system to run out of memory eventually. A resource leak occurs when a resource is acquired but never released, which results in a memory leak because each resource takes up space in memory. Many kernels provide task-deletion locks, a pair of calls that protect a task from being prematurely deleted during a critical section of code. Task Scheduling

From the time a task is created to the time it is deleted, the task can move through various states resulting from program execution and kernel scheduling. Although much of this state changing is automatic, many kernels provide a set of API calls that allow developers to control when a task moves to a different state ( Suspend, Resume, Delay, Restart, Get Priority, Set Priority, Preemption lock, Preemption unlock ). Using manual scheduling, developers can suspend and resume tasks from within an application. Doing so might be important for debugging purposes or, as discussed earlier, for suspending a high-priority task so that lower priority tasks can execute. A developer might want to delay (block) a task, for example, to allow manual scheduling or to wait for an external condition that does not have an associated interrupt. Delaying a task causes it to relinquish the CPU and allow another task to execute. After the delay expires, the task is returned to the task-ready list after all other ready tasks at its priority level. A delayed task waiting for an external condition can wake up after a set time to check whether a specified condition or event has occurred, which is called polling.

Page 150: Sisteme Integrate Ver5

A developer might also want to restart a task, which is not the same as resuming a suspended task. Restarting a task begins the task as if it had not been previously executing. The internal state the task possessed at the time it was suspended (for example, the CPU registers used and the resources acquired) is lost when a task is restarted. By contrast, resuming a task begins the task in the same internal state it possessed when it was suspended. Restarting a task is useful during debugging or when reinitializing a task after a catastrophic error. During debugging, a developer can restart a task to step through its code again from start to finish. In the case of catastrophic error, the developer can restart a task and ensure that the system continues to operate without having to be completely reinitialized. Getting and setting a task s priority during execution lets developers control task scheduling manually. This process is helpful during a priority inversion , in which a lower priority task has a shared resource that a higher priority task requires and is preempted by an unrelated medium-priority task. A simple fix for this problem is to free the shared resource by dynamically increasing the priority of the lower priority task to that of the higher priority task allowing the task to run and release the resource that the higher priority task requires and then decreasing the former lower priority task to its original priority. Finally, the kernel might support preemption locks , a pair of calls used to disable and enable preemption in applications. This feature can be useful if a task is executing in a critical section of code : one in which the task must not be preempted by other tasks. Obtaining Task Information

Kernels provide routines that allow developers to access task information (Get the current task’s ID, Get the current task’s TCB) within their applications. This information is useful for debugging and monitoring. One use is to obtain a particular task s ID, which is used to get more information about the task by getting its TCB. Obtaining a TCB, however, only takes a snapshot of the task context. If a task is not dormant (e.g., suspended), its context might be dynamic, and the snapshot information might change by the time it is used. Hence, use this functionality wisely, so that decisions aren t made in the application based on querying a constantly changing task context.

12.4.2 Semaphores

12.4.2.1 Introduction

Multiple concurrent threads of execution within an application must be able to synchronize their execution and coordinate mutually exclusive access to shared resources. To address these requirements, RTOS kernels provide a semaphore object and associated semaphore management services. 12.4.2.2 Defining Semaphores

A semaphore (sometimes called a semaphore token) is a kernel object that one or more threads of execution can acquire or release for the purposes of synchronization or mutual exclusion. When a semaphore is first created, the kernel assigns to it an associated semaphore control block (SCB), a unique ID, a value (binary or a count), and a task-waiting list, as shown in Figure 2.9.

Page 151: Sisteme Integrate Ver5

Figure 2.9: A semaphore, its associated parameters, and supporting data structures.

A semaphore is like a key that allows a task to carry out some operation or to access a resource. If the task can acquire the semaphore, it can carry out the intended operation or access the resource. A single semaphore can be acquired a finite number of times. In this sense, acquiring a semaphore is like acquiring the duplicate of a key from an apartment manager when the apartment manager runs out of duplicates, the manager can give out no more keys. Likewise, when a semaphore s limit is reached, it can no longer be acquired until someone gives a key back or releases the semaphore. The kernel tracks the number of times a semaphore has been acquired or released by maintaining a token count, which is initialized to a value when the semaphore is created. As a task acquires the semaphore, the token count is decremented; as a task releases the semaphore, the count is incremented. If the token count reaches 0, the semaphore has no tokens left. A requesting task, therefore, cannot acquire the semaphore, and the task blocks if it chooses to wait for the semaphore to become available. The task-waiting list tracks all tasks blocked while waiting on an unavailable semaphore. These blocked tasks are kept in the task-waiting list in either first in/first out (FIFO) order or highest priority first order. When an unavailable semaphore becomes available, the kernel allows the first task in the task-waiting list to acquire it. The kernel moves this unblocked task either to the running state, if it is the highest priority task, or to the ready state, until it becomes the highest priority task and is able to run. Note that the exact implementation of a task-waiting list can vary from one kernel to another. A kernel can support many different types of semaphores, including binary, counting, and mutual-exclusion (mutex) semaphores. Binary Semaphores

A binary semaphore can have a value of either 0 or 1. When a binary semaphore s value is 0, the semaphore is considered unavailable (or empty); when the value is 1, the binary semaphore is considered available (or full). Note that when a binary semaphore is first created, it can be initialized to either available or unavailable (1 or 0, respectively). The state diagram of a binary semaphore is shown in Figure 2.10.

Figure 2.10: The state diagram of a binary semaphore.

Page 152: Sisteme Integrate Ver5

Binary semaphores are treated as global resources, which means they are shared among all tasks that need them. Making the semaphore a global resource allows any task to release it, even if the task did not initially acquire it. Counting Semaphores

A counting semaphore uses a count to allow it to be acquired or released multiple times. When creating a counting semaphore, assign the semaphore a count that denotes the number of semaphore tokens it has initially. If the initial count is 0, the counting semaphore is created in the unavailable state. If the count is greater than 0, the semaphore is created in the available state, and the number of tokens it has equals its count, as shown in Figure 2.11.

Figure 2.11: The state diagram of a counting semaphore.

One or more tasks can continue to acquire a token from the counting semaphore until no tokens are left. When all the tokens are gone, the count equals 0, and the counting semaphore moves from the available state to the unavailable state. To move from the unavailable state back to the available state, a semaphore token must be released by any task. Note that, as with binary semaphores, counting semaphores are global resources that can be shared by all tasks that need them. This feature allows any task to release a counting semaphore token. Each release operation increments the count by one, even if the task making this call did not acquire a token in the first place. Some implementations of counting semaphores might allow the count to be bounded. A bounded count is a count in which the initial count set for the counting semaphore, determined when the semaphore was first created, acts as the maximum count for the semaphore. An unbounded count

allows the counting semaphore to count beyond the initial count to the maximum value that can be held by the count s data type (e.g., an unsigned integer or an unsigned long value). Mutual Exclusion (Mutex) Semaphores

A mutual exclusion (mutex) semaphore is a special binary semaphore that supports ownership, recursive access, task deletion safety, and one or more protocols for avoiding problems inherent to mutual exclusion. Figure 2.12 illustrates the state diagram of a mutex.

Page 153: Sisteme Integrate Ver5

Figure 2.12: The state diagram of a mutual exclusion (mutex) semaphore.

As opposed to the available and unavailable states in binary and counting semaphores, the states of a mutex are unlocked or locked (0 or 1, respectively). A mutex is initially created in the unlocked state, in which it can be acquired by a task. After being acquired, the mutex moves to the locked state. Conversely, when the task releases the mutex, the mutex returns to the unlocked state. Note that some kernels might use the terms lock and unlock for a mutex instead of acquire and release. Depending on the implementation, a mutex can support additional features not found in binary or counting semaphores. These key differentiating features include ownership, recursive locking, task deletion safety, and priority inversion avoidance protocols. Mutex Ownership

Ownership of a mutex is gained when a task first locks the mutex by acquiring it. Conversely, a task loses ownership of the mutex when it unlocks it by releasing it. When a task owns the mutex, it is not possible for any other task to lock or unlock that mutex. Contrast this concept with the binary semaphore, which can be released by any task, even a task that did not originally acquire the semaphore. Recursive Locking Many mutex implementations also support recursive locking , which allows the task that owns the mutex to acquire it multiple times in the locked state. Depending on the implementation, recursion within a mutex can be automatically built into the mutex, or it might need to be enabled explicitly when the mutex is first created. The mutex with recursive locking is called a recursive mutex . This type of mutex is most useful when a task requiring exclusive access to a shared resource calls one or more routines that also require access to the same resource. A recursive mutex allows nested attempts to lock the mutex to succeed, rather than cause deadlock , which is a condition in which two or more tasks are blocked and are waiting on mutually locked resources. The problem of recursion and deadlocks is discussed later in this chapter, as well as later in this book. As shown in Figure 12, when a recursive mutex is first locked, the kernel registers the task that locked it as the owner of the mutex. On successive attempts, the kernel uses an internal lock count associated with the mutex to track the number of times that the task currently owning the mutex has recursively acquired it. To properly unlock the mutex, it must be released the same number of times. In this example, a lock count tracks the two states of a mutex (0 for unlocked and 1 for locked), as well as the number of times it has been recursively locked (lock count > 1). In other implementations, a mutex might maintain two counts: a binary value to track its state, and a separate lock count to track the number of times it has been acquired in the lock state by the task that owns it. Do not confuse the counting facility for a locked mutex with the counting facility for a counting semaphore. The count used for the mutex tracks the number of times that the task owning the mutex has locked or unlocked the mutex. The count used for the counting semaphore tracks the number of tokens that have been acquired or released by any task. Additionally, the count for the mutex is always unbounded, which allows multiple recursive accesses.

Page 154: Sisteme Integrate Ver5

Task Deletion Safety

Some mutex implementations also have built-in task deletion safety. Premature task deletion is avoided by using task deletion locks when a task locks and unlocks a mutex. Enabling this capability within a mutex ensures that while a task owns the mutex, the task cannot be deleted. Typically protection from premature deletion is enabled by setting the appropriate initialization options when creating the mutex. Priority Inversion Avoidance

Priority inversion commonly happens in poorly designed real-time embedded applications. Priority inversion occurs when a higher priority task is blocked and is waiting for a resource being used by a lower priority task, which has itself been preempted by an unrelated medium-priority task. In this situation, the higher priority task’s priority level has effectively been inverted to the lower priority task s level. Enabling certain protocols that are typically built into mutexes can help avoid priority inversion. Two common protocols used for avoiding priority inversion include: • priority inheritance protocol ensures that the priority level of the lower priority task that has acquired the mutex is raised to that of the higher priority task that has requested the mutex when inversion happens. The priority of the raised task is lowered to its original value after the task releases the mutex that the higher priority task requires. • ceiling priority protocol ensures that the priority level of the task that acquires the mutex is automatically set to the highest priority of all possible tasks that might request that mutex when it is first acquired until it is released. When the mutex is released, the priority of the task is lowered to its original value.

12.4.2.3 Typical Semaphore Operations

Typical operations that developers might want to perform with the semaphores in an application include: • creating and deleting semaphores, • acquiring and releasing semaphores, • clearing a semaphore s task-waiting list, and • getting semaphore information. Creating and Deleting Semaphores Several things must be considered when creating and deleting semaphores. If a kernel supports different types of semaphores, different calls might be used for creating binary, counting, and mutex semaphores, as follows: • binary specify the initial semaphore state and the task-waiting order. • counting specify the initial semaphore count and the task-waiting order. • mutex specify the task-waiting order and enable task deletion safety, recursion, and priority-inversion avoidance protocols, if supported. Semaphores can be deleted from within any task by specifying their IDs and making semaphore-deletion calls. Deleting a semaphore is not the same as releasing it. When a semaphore is deleted, blocked tasks in its task-waiting list are unblocked and moved either to the ready state or to the running state (if the unblocked task has the highest priority). Any tasks, however, that try to acquire the deleted semaphore return with an error because the semaphore no longer exists. Additionally, do not delete a semaphore while it is in use (e.g., acquired). This action might result in data corruption or other serious problems if the semaphore is protecting a shared resource or a critical section of code. Acquiring and Releasing Semaphores

Page 155: Sisteme Integrate Ver5

The operations for acquiring and releasing a semaphore might have different names, depending on the kernel: for example, take and give , sm_p and sm_v , pend and post , and lock and unlock . Regardless of the name, they all effectively acquire and release semaphores. Tasks typically make a request to acquire a semaphore in one of the following ways: • Wait forever task remains blocked until it is able to acquire a semaphore. • Wait with a timeout task remains blocked until it is able to acquire a semaphore or until a set interval of time, called the timeout interval , passes. At this point, the task is removed from the semaphore’s task-waiting list and put in either the ready state or the running state. • Do not wait task makes a request to acquire a semaphore token, but, if one is not available, the task does not block. Note that ISRs can also release binary and counting semaphores. Note that most kernels do not support ISRs locking and unlocking mutexes, as it is not meaningful to do so from an ISR. It is also not meaningful to acquire either binary or counting semaphores inside an ISR. Any task can release a binary or counting semaphore; however, a mutex can only be released (unlocked) by the task that first acquired (locked) it. Note that incorrectly releasing a binary or counting semaphore can result in losing mutually exclusive access to a shared resource or in an I/O device malfunction. For example, a task can gain access to a shared data structure by acquiring an associated semaphore. If a second task accidentally releases that semaphore, this step can potentially free a third task waiting for that same semaphore, allowing that third task to also gain access to the same data structure. Having multiple tasks trying to modify the same data structure at the same time results in corrupted data. Clearing Semaphore Task-Waiting Lists

To clear all tasks waiting on a semaphore task-waiting list, some kernels support a flush

operation. The flush operation is useful for broadcast signaling to a group of tasks. For example, a developer might design multiple tasks to complete certain activities first and then block while trying to acquire a common semaphore that is made unavailable. After the last task finishes doing what it needs to, the task can execute a semaphore flush operation on the common semaphore. This operation frees all tasks waiting in the semaphore’s task waiting list. The synchronization scenario just described is also called thread rendezvous, when multiple tasks executions need to meet at some point in time to synchronize execution control. Getting Semaphore Information At some point in the application design, developers need to obtain semaphore information (Show general information about semaphore, Get a list of IDs of tasks that are blocked on a semaphore) to perform monitoring or debugging These operations are relatively straightforward but should be used judiciously, as the semaphore information might be dynamic at the time it is requested.

12.5 Services

Along with objects, most kernels provide services that help developers create applications for real-time embedded systems. These services comprise sets of API calls that can be used to perform operations on kernel objects or can be used in general to facilitate timer management, interrupt handling, device I/O, and memory management. Again, other services might be provided; these services are those most commonly found in RTOS kernels.