error detection and correction using sn54/74ls630 or sn54/74ls631

8
Application note Error detection and correction using SN54/74LS630 or SN54/74LS631 Error detection and correction circuits require numerous TTL chips which makes them expensive to implement. Economies can be made with the 74LS630/1 circuit described by Dale Hunt and Thomas J Tyson Although dynamic memory is much cheaper than static memory, bit for bit, many designers prefer to use static memory if they require highly reliable systems. Dynamic memory is prone to random errors caused by subatomic ionizing particles passing through a cell and destroying the stored charge. Many microcomputers employing dynamic memory use an extra parity bit per stored word to detect a single error. This detects a single error when a faulty word is read and an interrupt can be generated to halt the processor. Detecting an error seems to leave one hardly better off than not detecting it. By adding several redundant bits per word, it becomes possible both to detect and correct errors. Error detecting and correcting circuits require rather a large number of TTL chips and are therefore expensive to implement. This application report from Texas Instruments describes the 74LS630 error detecting and correcting circuit which enables engineers to produce economic error correcting memories. A.C. microsysterns memory design error correction ERROR DETECTION SCHEMES Checksum generation/checking Simple parity The most common error detection scheme is simple parity. In 16-bit machines, the dataword is divided into two 8-bit bytes and the designer arbitraritly uses Odd or Even parity and generates a ninth parity bit to be stored along with the 8-bit dataword. Upon retrieval, the parity is checked and the data are accepted if parity sense is correct. If the sense is in error, an attempt is made to rewrite the faulty data. Rewriting is the only recourse since this scheme cannot pinpoint the faulty bit. An error in a single bit of either byte will result in parity sense inversion. Two errors in the same byte will not be detected since no parity inversion results. Obviously this method is valid only where the probability of a dual-bit error in a single byte is insignificant. Texas Instruments Inc., 13500 North Central Expressway, PO Box 655012, Dallas, TX 75265, USA Memory errors can be detected by arithmetically summing a block of N memory words as they are stored, then repeating the summation during retrieval and comparing the 'checksums'. Obviously, this is not a realtime error detection scheme and requires considerable CPU overhead for the summations. No error correction is possible. Increasing need for error correction With the advent of the 64kbyte DRAM, the complexity level of memory boards has reached the point where one can no longer afford to ignore the need for error correction. The memory system manufacturer must provide reasonable mean time between failure (MTBF) to the end user for large memory boards (64 kbyte or larger). This desired level 0141-9331/89/07473-08 $03.00 © 1989 Butterworth & Co. of quality is impossible to achieve without error correction. In the past, the implementation of error correction with discrete medium-scale integration (MSl) logic was expensive both in terms of board space and package count. The SN54/ 74LS630 or SN54/74LS631 EDAC chip provides a simple solution to this problem for 16-bit machines. Semiconductor DRAMs tend to fail at the package level rather than the failure of a memory location within the chip. Obviously, 4 kbyte x 1, 16 kbyte X 1, or 64 kbyte x 1 is the preferred DRAM organization for this error correction implementation. Realtime single-bit correction in conjunction with a reasonable maintenance schedule can control DRAM MTBF to the point where system reliability will depend almost entirely on peripheral support ICs and not on the DRAM itself. In addition to single-bit error correction, the 'LS630/'LS631 EDAC detects all possible dual-bit errors. This reduces the chance that the CPU will use invalid data by several orders of magnitude. WHY ERROR DETECTION AND CORRECTION USING "LS630/1 ? There are four advantages to using SN54/74LS630 or SN54/74LS631: • improves system reliability • virtual elimination of system downtime due to memory increases practical size of semi- conductor memory systems • decreases PC board complexity. (Publishers) Ltd Vol 13 No 7 September 1989 473

Upload: dale-hunt

Post on 21-Jun-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note

Error detection and correction using SN54/74LS630 or SN54/74LS631

Error detection and correction circuits require numerous TTL chips which makes them expensive to implement. Economies can be made with the 74LS630/1 circuit described

by Dale Hunt and Thomas J Tyson

Although dynamic memory is much cheaper than static memory, bit for bit, many designers prefer to use static m e m o r y if they require highly reliable systems. Dynamic memory is prone to random errors caused by subatomic ionizing particles passing through a cell and destroying the stored charge. Many microcomputers employing dynamic m e m o r y use an extra parity bit per stored word to detect a single error. This detects a single error when a faulty word is read and an interrupt can be generated to halt the processor. Detecting an error seems to leave one hardly better off than not detecting it. By adding several redundant bits per word, it becomes possible both to detect and correct errors. Error detecting and correcting circuits require rather a large number of TTL chips and are therefore expensive to implement. This application report from Texas Instruments describes the 74LS630 error detecting and correcting circuit which enables engineers to produce economic error correcting memories. A.C.

microsysterns memory design error correction

ERROR DETECTION SCHEMES Checksum generation/checking

Simple parity

The most common error detection scheme is simple parity. In 16-bit machines, the dataword is divided into two 8-bit bytes and the designer arbitraritly uses Odd or Even parity and generates a ninth parity bit to be stored along with the 8-bit dataword. Upon retrieval, the parity is checked and the data are accepted if parity sense is correct. If the sense is in error, an attempt is made to rewrite the faulty data. Rewriting is the only recourse since this scheme cannot pinpoint the faulty bit. An error in a single bit of either byte will result in parity sense inversion. Two errors in the same byte will not be detected since no parity inversion results. Obviously this method is valid only where the probability of a dual-bit error in a single byte is insignificant.

Texas Instruments Inc., 13500 North Central Expressway, PO Box 655012, Dallas, TX 75265, USA

Memory errors can be detected by arithmetically summing a block of N memory words as they are stored, then repeating the summation during retrieval and comparing the 'checksums'. Obviously, this is not a realtime error detection scheme and requires considerable CPU overhead for the summations. No error correction is possible.

Increasing need for error correction

With the advent of the 64kbyte DRAM, the complexity level of memory boards has reached the point where one can no longer afford to ignore the need for error correction. The memory system manufacturer must provide reasonable mean time between failure (MTBF) to the end user for large memory boards (64 kbyte or larger). This desired level

0141-9331/89/07473-08 $03.00 © 1989 Butterworth & Co.

of quality is impossible to achieve without error correction.

In the past, the implementation of error correction with discrete medium-scale integration (MSl) logic was expensive both in terms of board space and package count. The SN54/ 74LS630 or SN54/74LS631 EDAC chip provides a simple solution to this problem for 16-bit machines.

Semiconductor DRAMs tend to fail at the package level rather than the failure of a memory location within the chip. Obviously, 4 kbyte x 1, 16 kbyte X 1, or 64 kbyte x 1 is the preferred DRAM organization for this error correction implementation.

Realtime single-bit correction in conjunction with a reasonable maintenance schedule can control DRAM MTBF to the point where system reliability will depend almost entirely on peripheral support ICs and not on the DRAM itself.

In addition to single-bit error correction, the 'LS630/'LS631 EDAC detects all possible dual-bit errors. This reduces the chance that the CPU will use invalid data by several orders of magnitude.

WHY ERROR DETECTION AND CORRECTION USING "LS630/1 ?

There are four advantages to using SN54/74LS630 or SN54/74LS631:

• improves system reliability • virtual elimination of system

downtime due to memory • increases practical size of semi-

conductor memory systems • decreases PC board complexity.

(Publishers) Ltd

Vol 13 No 7 September 1989 473

Page 2: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note

SO

$1

Checkbit I/O CB0-CB5 ~ ~"

1 6 j

"1 Databit I/O DB0-DB15 16

1 6 / I 1 6 1 f

Function selector

l

l - I~g-1

T _ - - [ - ~ g~-g-1

61

I Error

corrector

Parity generator

% OD

Error detector

• 1 6 j J Error " / I decoder

J ~ SEF

J ~ DEF

I

Memory Select EDAC function Data I/O Checkword I/O cycle $1 SO

Error flags

SEF DEF

Write L L Generate checkword Input data Output checkword

Read L H Read data and checkword Input data Input checkworo

Read H H La~ch and flag errors Latch data Latch checkword

Read H L Correct dataword and Output corrected Output syndrome bits generate syndrome bits data

Enabled

Enabled

Figure 1. SN54/74LS630 or SN54/74LS361 functional block diagram and function table

Theory of operation

Figure 1 is an overall functional block diagram and function table for the SN54/74LS630 and SN54/74LS631. The operation of the 'LS630 can be broken into nine basic functions - - the description and circuit of each is also shown.

Operational description

Function selector

Ideally, SO and $1 should be automatically generated by Write and Read signals from the CPU so that the EDAC/memory will appear transparent to the CPU (see Figure 2).

6-bit check latch

Duringthe Read cycle as 51 goes from L to H, the 6-bit checkword is latched for parity checking against the 16-bit dataword from memory (see Figure 3).

s0~ - P " -

s, > r ' , . .

;>

6-bit check bit three-state buffers

During the Write cycle, these buffers present the 6-bit parity checkword for storage into memory. During the correction part of Read, these buffers

v

Figure 2. Function selector

To OE, three-state control for ~ - $ 1 DB0-DB15 buffers

- - To OE, three-state control for Ilk SO CB0-CB5 buffers

SO-S1 To parity generate

~ To OD or error detector for Latch and flag disable

474 Microprocessors and Microsystems

Page 3: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note

To parity generator

Figure 3. 6-bit check latch

transmit the error syndrome code to be used in locating the faulty memory chip. Many systems use a syndrome latch and LED display to assist R&M (see Figure 4).

Parity generator

During the Write cycle, the check bits from the internal latches are disregarded and the incoming dataword from the CPU is used to encode the 6-bit checkword. During the Read cycle, the internally stored check bits from memory are parity checked against a newly formed checkword generated by the parity tree using the stored data bits from memory. The result at the outputs of the parity generator is the error syndrome word in true and complement form. If no error occurs, the syndrome is all ones (see Figure 5).

Error detector

The 6-input AND gate detects any error (any inverted syndrome bit), while the 6-input parity tree detects dual-bit errors (see Figure 6).

Error decoder

The internal error decoder uses the same syndrome error code to pinpoint the 'correctable error'

SB0

SB5

Figure 4. buffers

cB._~o

CB5

SO

6-bit check bit three-state

DB0

C B 0 - -

DB1

DB2

D B 3 - -

C B 1 - -

D B 4 - -

D B 5 - -

D B 6 - -

CB2

D B T - -

DB8

DB9

CB3

D B 1 0 - -

D B 1 1 - -

D B 1 2 - - - -

C B 4 - -

DB13 --

DB14 --

DB15 -- ~ i I

CB5 (S0.S1)

Figure 5. Parity generator

SB2

~ S B 2

A •

located within the 16-bit dataword from memory. The proper NAND gate output goes low to provide the inverting signal for the correct cell within the error corrector (see Figure 7).

Error corrector

The 'correctable error' signal (CORRO) is inputted to the proper exclusive- NOR gate to invert the erroneous bit (see Figure 8).

16-bit data latch

During the Read cycle, the 16-bit dataword from memory is latched as $1 goes from L to H so that the memory word can be compared with the checkword for error decoding and correction (see Figure 9).

16-bit data three-state buffers

These buffers are active only during the correction cycle to present the corrected word to the CPU (see Figure 10).

Vol 13 No 7 September 1989 475

Page 4: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note SB0 SB1

SB2

SB3

SB4

SB5

SB5 •

SB4 '

SB3 SB2

SB1

SB0

Figure 6.

)

Error detector

SEF

DEF

Syndrome bits

SBO ) I I I I l ss--~ 1 1 1 1 1 1 s81 T I l [ I I se-'3 J. [ 1 ~ ! I . s82 I I [ I I [ "

I l l l l . l " .a l l l l I I I ss---~ I I I T J sa4 I I I I I Iq s-~ I [ l l l [ J sas l l I I [ ] =~ T T T I I T ]

NAND V = ¢ ¢ ¢ ¢

gates o ~ . ~ ~ ~ o

c:O::O 81o o 11 Figure 7.

T I L I L I T I T IIrI[lltl 1 1 1 [ [ 1 [ I I I I [ I I L [ [ I I t I I L

l l [ l l l l l ! l l l l I I I I I I [ I I [ l l l l [ I I I I I l I [ [ T 1 1 1 ~ 1 [ 1

] . . . ~

Error ~ c o d e r

co..o ) E > DBO COO0

: • • :

CORR15 ) D O D815 C1~15

Exclusive NOR truth table

A B Y

L L H L H L H L L H H H

Figure 8. Error corrector

OB [ ~ ~ ~ % % ~ TO parity

F~gure 9. 76-bit d a t a latch

CDO0 ~ ~ DB0

CDO15

~-sl [

Figure 10. 76-bit data three-state buffers

DB15

F l o w charts of Read /VVr i te cycles

Write cycle

SO = $1 = L

Present data to memory bus and EDAC

Al low delay t ime for check bit formation and memory set-up t ime requirements

Write the entire 22-bit word into memory

C B0 = DB0 C) DBI (L) DB3 (~) DB4 C) DB8 (L) DB9 (~ DB 10 C) DB 13 (L) 1

C B 1 = DBO (~) DB2 ~) DB3 (~) DB5 (~) DB6 (~) DB8 (~) DB 11 (~) DB 14 (~) 1

CB2 = DB1 (~) DB2 (~) DB4 (~) DB5 (~) DB7 (~) DB9 (~ DB12 (L) DBI 5 (~) 1

C B3 = DBO (~) DBI (~) DB2 (~) DB6 C) DB7 (~) DB10 (~) D Bll (~) DBI2 t~) 1

CB4 - DB3 (L) DB4 (~) DB5 (~) DB6 (~) DB7 (~) DBI 3 (~) DBI 4 (~ DB15 C) I

CB5 = DB8 C) DB9 ~ DBIO (~ DBII (~) DBI 2 C) DBI 3 (L) DBI4 (~) DBI5 (~ 1

Even Par i ty

Odd Pari ty

(With both SO and $1 low, the check bits from the latches are excluded from the parity evaluation.)

Read cycle (no error)

S1 SO

L H

H H

I

Activate memory outputs while S0 = H, S1 = L

Switch $1 from L to H to latch in the 22-bit word from memory and to enable the error f l ap

Interrogate SEF and DEF to see if both are low (both will be low in this case)

Accept the 16-bit dataword as valid

476 Microprocessors and Microsystems

Page 5: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note Note that this parity generator is identical with Write previously discussed,

except that the check bits from the latch storage are used to help generate parity. (Old and new check bits are actually parity checked against each other.)

Since no error exists, all internal syndrome bits are true and no flag will occur.

Read cycle (single-bit error)

SI SO Read memory L H Latch and flag H H Correct and syndrome H L

63

Activate memory outputs with S 0 = H , S1 = L

Switch S1 from Lto H to latch and enable flags Interrogate SEF and DEF (in this case SEFwill be high and DEF low) Place memory outputs into Hiz Switch SO from H to L to output corrected data and syndrome bits

Parity generation is identical with Read cycle with no error. Syndrome bits are generated during error correction and may be used to pinpoint the faulty IC memory chip.

As the error flags are identical for either a data bit or check bit single error, the EDAC must be sent through the correction cycle and corrected data output ted to the bus. In the case of the single error in one of the six check bits, the corrected data will be identical with the data from the memory.

As repeated syndrome words point to hard errors, hardware can be added on the memory board to latch and display the syndrome word for R&M.

Read cycle (dual-bit error)

SI SO L H H H

Activate memory outputs while SO = H, $1 = L

Switch $1 from L to H to latch and flag

Interrogate SEF and DEF (in this case both flags will be high)

Interrupt CPU

Syndrome bits are useless in the case of dual-bit errors.

Read cycle (gross errors, all zeros or all ones)

SI SO L H H H

)

() ()

Activate memory outputs while SO = H, 51 = L

Switch from L to H to latch and flag

Interrogate SEF and DEF (in this case both flags will be high)

Interrupt CPU

Syndrome bits are useless in the case of gross errors.

Syndrome bit-generation code

This is shown at the top of the following page.

Syndrome for check-bit error

C B E R R 0 = C B 0 ~ C B I

C B E R R I = C B 0 • C B I

C B E R R 2 = C B O • C B I

C B E R R 3 = C B 0 • C B I

C B E R R 4 = C B O • C B I

C B E R R 5 = CBO • C B 1

C B 2 • C B 3 • C B 4 • C B 5

C B 2 • C B 3 • C B 4 • C B 5

C B 2 • C B 3 • C B 4 ° C B 5

C B 2 • C B 3 • C B 4 ° C B 5

C B 2 • C B 3 • C B 4 ° C B 5

C B 2 • C B 3 • C B 4 ° C B 5

IMPLEMENTATION OF THE EDCA IN A TMS9900 MICROPROCESSOR-BASED SYSTEM

The operation of the EDAC can be made to seem transparent to the normal operation of the TMS9900. In other words, the EDAC can generate a 6-bit checkword and place this checkword on the memory bus during the normal memory Write cycle, and can also read, latch, flag errors (single- or dual-bit), and correct a single-bit error during a normal memory Read cycle.

Application

To use the EDAC in a memory board, some control logic was necessary to control the operation and timing of the EDAC and to modify the operation and timing of the memory. This logic has been called the EDAC controller. Figure 11 shows the block diagram of a memory board employing the EDAC device. In this system, the following assumptions were made.

• The memory system is 22-bit wide (I 6-bit for the dataword and 6-bit for the checkword) with separate inputs and outputs.

• The memory controller supplies the EDAC controller with a signal which is labelled MRDY. This signal will be active only during a memory cycle and when the

Vol 13 No 7 September 1989 477

Page 6: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note

CB0 = DB0 (~) DBI (~) DB3 (~) DB4 (~) DB8 (~) DB9 (~) DB10 (~) DB13 (~ CBO

CB1 = DBO (~) DB2 (~) DB3 (~) DB5 (3) DB6 (E) DB8 (E) D B l l (~) DB14 (~) CB1

CB2 = DB1 (~) DB2 (~) DB4 (~) DB5 (~) DB? (~) DB9 (~) DB12 (~) DB15 (~ CB2

CB3 = o a o (~ DB1 (E) DB2 O DB6 (~) DB7 (E) DBI0 (~) DBl I (~) DB12 (~) CB3

CB4 = DB3 (~) DB4 (~) DB5 (E) DB6 ® DB7 ® DB13 0 DBl4 0 DBI5 (~) CB4

CB5 = DB8 (~E) DB9 (~) DBIO (~ DBI1 (~) OB12 (~) DB13 (~) DB]4 0 OB15 (~) CB5

t To system interrupt

circuitry

EDAC ~=~3~._~._~To m e m o r y contro l ler controller

.•$1 S0 'LS630 OR 'LS631 nl;: I= EDAC

From memory controller

Memor~ cycle control lines

Data bus

Figure 11. Block diagram SN54/74LS631 EDAC

-Checkword "6 bus

of memory board using Tl's SN54/74LS360 or

To system interrupt ,# '1 ~

V c c

From 'LS04 .LS04 ~1 OEF

micro )rocessor

'

- -~ ~ ~ ~ so

- - , E 3 >

'LS04

WED

To memory Z _ controller ~

MBE

I

I From memory controller

MRDY Schematic of EDAC controller

EC> S1

Figure 12.

EDAC

memory is ready to read or write. • The memory output bus is driven

by three-state drivers (in this case the '13244 and 'LS367A) whose outputs are controlled by the EDAC control signal MBE and the memory controller.

• The microprocessor supplies the EDAC controller with the necessary memory control signals (i.e. MEMEN, WE, DBIN).

Memory Write cycle

The EDAC controller (see Figure 12) employs the MEMEN and DBIN signals from the microprocessor to set the EDAC control signals ($1 and SO) to a low state. In this mode the EDAC reads the 16-bit dataword outputted by the microprocessor from the data bus. It then generates a 6-bit checkword and outputs this checkword to the 6-bit checkword bus. To ensure the memory reads a valid dataword and a valid checkword, the write enable signal WE is delayed by approximately 60 ns. (This is the worst case time it takes the EDAC to generate a checkword.) The resultant signal WED is the memory write cycle enable signal. If the memory system needs an extended write cycle because of a slow access time, the EDAC is still kept in the write mode until the memory Write cycle is finished. If the memory system requires an extended memory Write cycle, the write enable signal from the microprocessor need not be delayed.

Memory Read cycle

When the microprocessor enters the memory Read cycle, the EDAC input controls are set in the following states: $1 = L, SO = H. In this mode, the EDAC reads the 22-bit word (16- bit dataword and 6-bit checkword) from memory. When the memory controller indicates that valid data are on the bus by setting MRDY high, the EDAC controller puts the EDAC in the latch mode ($1 = H, S0= H0). To avoid glitches on the EDAC error flags (i.e. DEF and SEF) valid data should be on the data bus approximately 30 ns before the MRDY goes active. In other words, valid data should be set up about 30 ns before the EDAC is put

478 Microprocessors and Microsystems

Page 7: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note o, - I

I MEMEN "~

I DBIN I

WE

MRDY

SO

I I I I I I I - - ' - 1 r - - [ I I ~ I I I I I l I L .

I I i ! i =

i i

I = I I , , ' i I F / / / / / / / . , D.on!tcare- / / J " / . / 2 . / / / / / / / [ I V ' / ~ [~oh:tfia=re" " / / I I = I I

s l I I

SEF I

1

DEF l MBE I

I I

SYSIN__ I WED I

D A T A I I

I Figure 13.

I I i

I Valid data and checkword J ~

Memory write cycle no wait state

Memory Write and Read cycle timing with no wait state

Correct data I CPU read - ' - ~ r

Memory read cycle no wait state no error

in the latch mode. If there is no error ~1 (single or dual), the EDAC will remain in this mode until the end of the ~2 memory Read cycle. If there is a single MEMEN error, the EDAC controller will put the 0 EDAC into the error correct mode, DBIN I and it will also send a signal MBE to I the memory controller to disable the W---E I memory output bus. In the error I correct mode $1 = H, SO = L, the MRDY I EDAC will correct the data bit that so I was in error and it will also place six 1 syndrome bits on the 6-bit bus, which s1 I is an error code to indicate the bit that I was in error. SEF

If there is a dual error, the EDAC will remain in the latch mode $1 = H, DE F

SO = H, until the end of the memory MBE read cycle. The FDAC controller will I set SYSIN high. This is an interrupt SYSIN I signal and should be an input to the I system interrupt circuitry. WED

Figure 13 gives the timing DATA I waveforms for a memory Write cycle with no wait states and for a memory Read cycle with no errors and no wait states.

Figure 14 gives the timing waveform for a memory Read cycle with a single-bit error and no wait states.

To simplify maintenance oper- ations, a hardware indication of the error syndrome and address bank might be used. Figure 15 shows a

Figure 14.

-1 !

I I I

1

I

I

I I I

r - - - L _

I

I

I I

i I

I I I

! I

- " !

I ! _ _ ! ! ReaO I I ncorrect data - J - I I I ~ Corrected data

i l l tf: tc I l I e I | I

I Memory read cycle I with single error no wait state

Memory Read cycle with single-error timing

simple circuit to perform this function. Once a single-bit error is

flagged, the 'LS374 will latch in the syndrome bits and the address bank

Vol 13 No 7 September 1989 479

Page 8: Error detection and correction using SN54/74LS630 or SN54/74LS631

Application note decode bits. This will inform the R&M technician which chip could be faulty; since soft errors might occur the pinpointed device should still be tested. The switch $1 will clear (set low) all the LEDs. A limitation of this circuit is that only one error can be indicated, even though more than one single-bit error might occur between service checks. If there is a high probability that many single-bit errors might occur, a stack register might be employed to store the syndrome and address bank bits.

DEF

SEF From EDAC [ ~

$1

Figure 7 5.

Error syndrome bits and

bank address decode

VCC

$1

Hardware single-bit error mapping

bits

"= 0

oc o

VCC

1 ~

LED ¶ ~ /

480 Microprocessors and Microsystems