class 09 content addressable memories

46
Class 09 Content Addressable Memories Cell Design and Peripheral Circuits

Upload: afya

Post on 14-Feb-2016

50 views

Category:

Documents


1 download

DESCRIPTION

Class 09 Content Addressable Memories. Cell Design and Peripheral Circuits. Semiconductor Memory Classification. FIFO: First-in-first-out LIFO: Last-in-first-out (stack) CAM: Content addressable memory. Memory Architecture: Decoders. pitch matched. line too long. 2D Memory Architecture. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Class 09 Content  Addressable Memories

Class 09Content Addressable Memories

Cell Design and Peripheral Circuits

Page 2: Class 09 Content  Addressable Memories

Semiconductor Memory Classification

RWM NVRWM ROM

EPROM

E2PROM

FLASH

RandomAccess

Non-RandomAccess

SRAM

DRAM

Mask-Programmed

Programmable (PROM)

FIFO

Shift Register

CAM

LIFO

FIFO: First-in-first-outLIFO: Last-in-first-out (stack)CAM: Content addressable memory

Page 3: Class 09 Content  Addressable Memories

Memory Architecture: Decoders

Word 0

Word 1

Word 2

Word N-1

Word N-2

Input-Output

S0S1S2

SN-2SN_1

(M bits)

StorageCell

M bits

N W

ords

Word 0

Word 1

Word 2

Word N-1

Word N-2

Input-Output(M bits)

StorageCell

M bits

Deco

der

A0

A1

AK-1

S0

N words => N select signalsToo many select signals

Decoder reduces # of select signalsK = log2N

pitch matched

line too long

Page 4: Class 09 Content  Addressable Memories

2D Memory Architecture

A0

Row

Dec

oder

A1

Aj-1Sense Amplifiers

bit line

word line

storage (RAM) cell

Row

Add

r es s

Colu

mn

Addr

ess

Aj

Aj+1

Ak-1

Read/Write Circuits

Column Decoder

2k-j

m2j

Input/Output (m bits)

amplifies bit line swing

selects appropriate word from memory row

Page 5: Class 09 Content  Addressable Memories

3D Memory ArchitectureRo

w A

ddr

Colu

mn

Addr

Bloc

k Ad

dr

Input/Output (m bits)

Advantages: 1. Shorter word and/or bit lines 2. Block addr activates only 1 block saving power

Page 6: Class 09 Content  Addressable Memories

Hierarchical Memory Architecture

Global Data Bus

RowAddress

ColumnAddress

BlockAddress

Block Selector GlobalAmplifier/Driver

I/O

ControlCircuitry

Advantages: shorter wires within blocks block address activates only 1 block: power management

Page 7: Class 09 Content  Addressable Memories

Read-Write Memories (RAM)

Static (SRAM) Data stored as long as supply is applied Large (6 transistors per cell) Fast Differential signal (more reliable)

Dynamic (DRAM) Periodic refresh required Small (1-3 transistors per cell) but slower Single ended (unless using dummy cell to generate

differential signals)

Page 8: Class 09 Content  Addressable Memories

Associative Memory

Page 9: Class 09 Content  Addressable Memories

What is CAM?• Content Addressable Memory

is a special kind of memory!• Read operation in traditional

memory:Input is address location of the

content that we are interested in it.

Output is the content of that address.

• In CAM it is the reverse:Input is associated with

something stored in the memory.

Output is location where the associated content is stored.

1 0 1 X X

0 1 1 0 X

0 1 1 X X

1 0 0 1 1

0 1 1 0 1

0 0

0 1

1 0

1 1

0 1

Content AddressableMemory

1 0 1 X X

0 1 1 0 X

0 1 1 X X

1 0 0 1 1

0 1

0 0

0 1

1 0

1 1

0 1 1 0 X

Traditional Memory

Page 10: Class 09 Content  Addressable Memories

Type of CAMs • Binary CAM (BCAM) only stores 0s and 1s

– Applications: MAC table consultation. Layer 2 security related VPN segregation.

• Ternary CAM (TCAM) stores 0s, 1s and don’t cares.– Application: when we need wilds cards such as, layer 3 and 4

classification for QoS and CoS purposes. IP routing (longest prefix matching).

• Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.

• CAM entries are structured as multiples of 36 bits rather than 32 bits.

Page 11: Class 09 Content  Addressable Memories

CAM: Introduction

• CAM vs. RAM

001101115100011014101111013110010112000011011010101010

10001101Data Out

4

Add

ress

In

110001115000111014100011013110010112000011011010101010

10001101Data In

3

Add

ress

Out

1000110110001101

Page 12: Class 09 Content  Addressable Memories

Memory Hierarchy

The overall goal of using a memory hierarchy is to obtain the highest-possible average access speed while minimizing the total cost of the entire memory system.

Microprogramming: refers to the existence of many programs in different parts of main memory at the same time.

Page 13: Class 09 Content  Addressable Memories

Main memory

Page 14: Class 09 Content  Addressable Memories

ROM Chip

Page 15: Class 09 Content  Addressable Memories

Memory Address Map

Memory Configuration (case study):

Required: 512 bytes ROM + 512 bytes RAM Available: 512 byte ROM + 128 bytes RAM

The designer of a computer system must calculate the amount of memory required for the particular application and assign it to either RAM or ROM.

The interconnection between memory and processor is then established from knowledge of the size of memory needed and the type of RAM and ROM chips available.

The addressing of memory can be established by meansof a table that specifies the memory address assigned to each chip.

The table, called a memory address map, is a pictorial representation of assigned address space for each chip in the system.

Page 16: Class 09 Content  Addressable Memories

Memory Address Map

Page 17: Class 09 Content  Addressable Memories

Associative Memory

The time required to find an item stored in memory can be reduced considerably if stored data can be identified for access by the content of the data itself rather than by an address.

A memory unit access by content is called an associative memory or Content Addressable Memory (CAM). This type of memory is accessed simultaneously and in parallel on the basis of data content rather than specific address or location.

When a word is written in an associative memory, no address is given. The memory is capable of finding an empty unused location to store the word. When a word is to be read from an associative memory, the content of the word or part of the word is specified.

The associative memory is uniquely suited to do parallel searches by data association. Moreover, searches can be done on an entire word or on a specific field within a word. Associative memories are used in applications where the search time is very critical and must be very short.

Page 18: Class 09 Content  Addressable Memories

Hardware Organization

Argument register (A)

Key register (K)

Associative memoryarray and logic

m words n bits per word

M

Matchregister

Input

WriteRead

Output

Page 19: Class 09 Content  Addressable Memories

Associative memory of an m word, n cells per word

A1

C11

AnAj

K1 KnKj

C1j C1n

C i1 C ij C in

Cm1 Cmj Cmn

M1

Mm

Mi

Bit 1 Bit nBit j

Word 1

Word m

Word i

Page 20: Class 09 Content  Addressable Memories

One Cell of Associative Memory

R S Matchlogic

Input

Read

Write

Output

To M i

K jA i

F ij

Page 21: Class 09 Content  Addressable Memories

Match Logic cct.

F'i1 Fi1

A1K1

F'i2 Fi2

A2K2

F'in Fin

AnKn

M i

Page 22: Class 09 Content  Addressable Memories

CAM: Introduction

• Binary CAM Cell

BL1cBL1

WL

SL1c SL1

ML

BL1c_cellBL1_cell

P1 P2

N1 N2

N3N4

N5 N7

N6 N8

Page 23: Class 09 Content  Addressable Memories

CAM: Introduction

• Ternary CAM (TCAM)

00X001115010011014000111013110010X12101011011010X01010

XXX01101

Input Keyword

XXXXX1115XXXX11014XXX111013XX0010112X00011011010101010

01101

01101

1101

00011011

4

Match

Match

1

4

Match

Match

10001101

Input Keyword

Page 24: Class 09 Content  Addressable Memories

CAM: Introduction

• TCAM Cell– Global Masking SLs– Local Masking BLs

BL1 BL2 Logic0 1 01 0 11 1 X0 0 N.A.

BL1 BL2

WL

RAM Cell

RAM Cell

SL1 SL2ML

BL1c BL2c

Comparison Logic

Page 25: Class 09 Content  Addressable Memories

CAM: Introduction

• DRAM based TCAM Cell Higher bit densitySlower table updateExpensive processRefreshing circuitryScaling issues (Leakage)

BL2BL1

WL

SL2 SL1

ML

BL2_cellBL1_cell

N3 N4

N5 N7

N6 N8

Page 26: Class 09 Content  Addressable Memories

CAM: Introduction

• SRAM based TCAM Cell Standard CMOS process Fast table updateLarge area (16T)

BL1 BL1c BL2BL2c

WL

SL1 SL2

ML

BL1c_cell BL2c_cell

Page 27: Class 09 Content  Addressable Memories

CAM: Introduction

• Block diagram of a 256 x 144 TCAM

CAM Cell (0)

BL1c(0) BL2c(0)

CAM Cell (143)

BL1c(N) BL2c(N)

CAM Cell (0)

BL1c(0) BL2c(0)

CAM Cell (143)

BL1c(N) BL2c(N)

ML0SL1(143) SL2(143) SL1(0) SL2(0)

MLSAMLSO(0)

MLSAML255 MLSO(255)

SL Drivers

Search Lines (SLs)

ML Sense Amplifiers

Match Lines

(MLs)

Page 28: Class 09 Content  Addressable Memories

CAM: Introduction

• Why low-power TCAMs?– Parallel search Very high power

– Larger word size, larger no. of entries High power

– Embedded applications (SoC)

Page 29: Class 09 Content  Addressable Memories

CAM: Design Techniques

• Cell Design: 12T Static TCAM cell*– ‘0’ is retained by Leakage (VWL ~ 200 mV) High densityLeakage (3 orders)Noise marginSoft-errors (node S)Unsuitable for READ

Page 30: Class 09 Content  Addressable Memories

CAM: Design Techniques

• Cell Design: NAND vs. NOR Type CAM Low PowerCharge-sharingSlow CAM

Cell (N)CAM

Cell (1)CAM

Cell (0)

SAML_NAND M

SA

CAM Cell (N)

CAM Cell (1)

CAM Cell (0)

ML_NOR MM

BL1 BL1c

WL

SL1 SL1c

VDD BL1 BL1c

WL

SL1c SL1

VDD

NAND-type CAM NOR-type CAM

Page 31: Class 09 Content  Addressable Memories

CAM: Design Techniques

• MLSA Design: Conventional– Pre-charge ML to VDD

– Match VML = VDD

– Mismatch VML = 0

MM MM

VDD

PREMLSO

VDD

ML

Page 32: Class 09 Content  Addressable Memories

CAM: Design Techniques

• Low Power: Dual-ML TCAM– Same speed, 50% less energy (Ideally!)

– Parasitic interconnects degrade both speed and energy

– Additional ML increases coupling capacitance

Page 33: Class 09 Content  Addressable Memories

CAM: Design Techniques

• Static Power Reduction– 16T TCAM: Leakage Paths*

WL

BL1 BL1c

SL1 SL2

BL2BL2c

ML

‘1’‘0’ ‘1’

‘0’

N1 N2

N3 N4

P1 P2

N5 N6

N7 N8

P3 P4N12

N9 N11

N10

‘0’ ‘0’‘1’ ‘1’

BL1c_cell BL2c_cell

* N. Mohan, M. Sachdev, Proc. IEEE CCECE, pp. 711-714, May 2-5, 2004

Page 34: Class 09 Content  Addressable Memories

CAM: Design Techniques

• Static Power Reduction– Side Effects of VDD Reduction in TCAM Cells Speed: No change Dynamic power: No changeRobustness – VDD Volt. Margin (Current-race sensing) Voltage Margin

ML [0]

MLSO [0]

ML [1]

Page 35: Class 09 Content  Addressable Memories

CAM for Routing Table Implementation

• CAM can be used as a search engine.• We want to find matching contents in a

database or Table.• Example Routing Table

Source: http://pagiamtzis.com/cam/camintro.html

Page 36: Class 09 Content  Addressable Memories

Simplified CAM Block Diagram The input to the system is the search word. The search word is broadcast on the search lines. Match line indicates if there were a match btw. the search and stored word. Encoder specifies the match location. If multiple matches, a priority encoder selects the first match. Hit signal specifies if there is no match. The length of the search word is long ranging from 36 to 144 bits. Table size ranges: a few hundred to 32K. Address space : 7 to 15 bits.

Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

Page 37: Class 09 Content  Addressable Memories

CAM Memory Size

• Largest available around 18 Mbit (single chip).

• Rule of thumb: Largest CAM chip is about half the largest available SRAM chip.A typical CAM cell consists

of two SRAM cells.

• Exponential growth rate on the size

Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

Page 38: Class 09 Content  Addressable Memories

CAM Basics• The search-data word is loaded

into the search-data register.• All match-lines are pre-charged to

high (temporary match state).• Search line drivers broadcast the

search word onto the differential search lines.

• Each CAM core compares its stored bit against the bit on the corresponding search-lines.

• Match words that have at least one missing bit, discharge to ground. Source: K. Pagiamtzis, A. Sheikholeslami, “Content-Addressable

Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE J. of Solid-state circuits. March 2006

Page 39: Class 09 Content  Addressable Memories

CAM Advantages• They associate the input (comparand) with their memory

contents in one clock cycle.

• They are configurable in multiple formats of width and depth of search data that allows searches to be conducted in parallel.

• CAM can be cascaded to increase the size of lookup tables that they can store.

• We can add new entries into their table to learn what they don’t know before.

• They are one of the appropriate solutions for higher speeds.

Page 40: Class 09 Content  Addressable Memories

CAM Disadvantages • They cost several hundred of dollars per CAM even in large

quantities.

• They occupy a relatively large footprint on a card.

• They consume excessive power.

• Generic system engineering problems:– Interface with network processor.– Simultaneous table update and looking up requests.

Page 41: Class 09 Content  Addressable Memories

CAM structure• The comparand bus is 72 bytes

wide bidirectional.• The result bus is output.• Command bus enables instructions

to be loaded to the CAM.• It has 8 configurable banks of

memory. • The NPU issues a command to the

CAM.• CAM then performs exact match or

uses wildcard characters to extract relevant information.

• There are two sets of mask registers inside the CAM.

CAM control

Global mask registers

72 bits 131072CAM

(72 bits x 16K x 8 structures)

Mixable with72 bits x 16384144 bits x 8192288 bits x 4096576 bits x 2048

Em

pty

Bit

Prio

rity

Enc

oder

Flag

Con

trol

Out

put P

ort

Con

trol

Control & status registers

I/O P

ort C

ontro

l

Dec

oder

Pip

elin

e ex

ecut

ion

cont

rol

(com

man

d bu

s)

Page 42: Class 09 Content  Addressable Memories

CAM structure

There is global mask registers which can remove specific bits and a mask register that is present in each location of memory.

The search result can be one output (highest priority) Burst of successive results.

The output port is 24 bytes wide.

Flag and control signals specify status of the banks of the memory.

They also enable us to cascade multiple chips.

CAM control

Global mask registers

72 bits 131072CAM

(72 bits x 16K x 8 structures)

Mixable with72 bits x 16384144 bits x 8192288 bits x 4096576 bits x 2048

Em

pty

Bit

Prio

rity

Enc

oder

Flag

Con

trol

Out

put P

ort

Con

trol

Control & status registers

I/O P

ort C

ontro

l

Dec

oder

Pip

elin

e ex

ecut

ion

cont

rol

(com

man

d bu

s)

Page 43: Class 09 Content  Addressable Memories

CAM Features• CAM Cascading:

– We can cascade up to 8 pieces without incurring performance penalty in search time (72 bits x 512K).

– We can cascade up to 32 pieces with performance degradation (72 bits x 2M).

• Terminology:– Initializing the CAM: writing the table into the memory.– Learning: updating specific table entries.– Writing search key to the CAM: search operation

• Handling wider keys:– Most CAM support 72 bit keys.– They can support wider keys in native hardware.

• Shorter keys: can be handled at the system level more efficiently.

Page 44: Class 09 Content  Addressable Memories

CAM Latency• Clock rate is between 66 to 133

MHz.• The clock speed determines

maximum search capacity.• Factors affecting the search

performance:– Key size– Table size

• For the system designer the total latency to retrieve data from the SRAM connected to the CAM is important.

• By using pipeline and multi-thread techniques for resource allocation we can ease the CAM speed requirements.

Source: IDT

Page 45: Class 09 Content  Addressable Memories

Management of Tables Inside a CAM• It is important to squeeze as much information as we can in a CAM.• Example from Netlogic application notes:

– We want to store 4 tables of 32 bit wide IP destination addresses.– The CAM is 128 bits wide.– If we store directly in every slot 96 bits are wasted.

• We can arrange the 32 bit wide tables next to each other.– Every 128 bit slot is partitioned into four 32 bit slots.– These are 3rd, 2nd, 1st, and 0th tables going from left to right.– We use the global mask register to access only one of the tables.

MASK 3 00000000 FFFFFFFF FFFFFFFF FFFFFFFFMASK 2 FFFFFFFF 00000000 FFFFFFFF FFFFFFFFMASK 1 FFFFFFFF FFFFFFFF 00000000 FFFFFFFFMASK 0 FFFFFFFF FFFFFFFF FFFFFFFF 00000000

Page 46: Class 09 Content  Addressable Memories

Example Continued• We can still use the mask register (not global mask register) to do maximum prefix

length match.

1 0 1 0 0 0….1 0 1 1 1 0….1 0 1 1 0 1….1 1 0 1 1 1….

127 97 96 95

0

1

0

0

94

1 1 0

1 0 1

0 0 0

0 1 1

3 2 1

1

0

1

0

0

1 0 1 1 1 0…. 0 1 1 1 0

MATCH FOUND

0 0 0 0 0 1…. 1 1 1 1 1

ComparandRegister

Global MaskRegister

….….….….

….

….