Download - Inference at the Nano-scale - Computer Action Teamweb.cecs.pdx.edu/~strom/talks/hammerstrom_mees_2020_talk.pdfMaseeh College of Engineering 5/23/10 Hammerstrom 5 and Computer Science

Maseeh College of Engineering and Computer Science 5/23/10 1

Inference at the Nano-scale

Dan Hammerstrom Electrical And Computer Engineering

http://web.cecs.pdx.edu/~strom/talks/talks.html

Maseeh College of Engineering and Computer Science Hammerstrom

  What are the architectures of the processing components of the iKNOW?

  Moore of the same?

  Radical new devices and circuits create opportunities for radical new architectures and solutions

5/23/10 2

Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 3

We Do Not Know How to Program Computers To Behave Intelligently – A Problem for the iKNOW Machine

  In spite of the transistor bounty of Moore’s law, there is a large class of problems that computers still do not solve well

  These problems involve capturing complex, structured relationships in the huge quantities of noisy, ambiguous data being input to the system

  Our inability to adequately solve these problems constitutes a significant barrier to computer usage and to huge potential markets

  The term Intelligent Signal Processing (ISP) has been used to describe algorithms and techniques that involve the creation, efficient representation, and effective utilization of large, complex models of semantic and syntactic data


A Prototypical Model of Intelligent Computing

  Although an over-simplification, this diagram characterizes most intelligent computing implementations

Front End Signal

Processing

Feature Extraction Classification

Contextual Semantic Inference

Motor Control

Motor Control

Subprogram

Motor Control

Programs

Decision Making

The “Front End” - DSP The “Back End” - ISP


  The Front End   Well understood, it is the realm of traditional digital signal and image processing   Often consists of applying the same computation over large arrays of elements,

computation then tends to be data parallel, and communication tends to be local

  The Back End   In the early days of computing, “Artificial Intelligence” focused on the

representation and use of contextual and semantic information   Knowledge was generally represented by a set of rules   However, these systems were “brittle,” exhibiting limited flexibility, generalization,

and graceful degradation   And they were “hand” made and unable to adapt dynamically (i.e., learn) within

the context of most real world applications


What Does The ISP (The “Back End”) Do?

5/23/10 6


The ISP Toolbox – Still mostly empty after all these years …


  Conceptually one can think of “computational intelligence” as a spectrum   Where does iKNOW need to be on this spectrum?

5/23/10 8

Big “C” Cognition: humans

Small “c” cognition: most mammals

Computing Is currently

here

An even longer way!

ISP

Increasing Intelligence

Our Goal

A long way!


Desirable Characteristics Of New ISP Algorithms

  Efficiently represent and utilize high level knowledge

  Low power, massively parallel, low precision implementations

  Scaling - the scaling limitations of both symbolic and traditional neural network approaches constitute one of their major shortcomings

  Adaptive – a critically important characteristic of real systems is incremental, integrative adaptation or learning during system operation   Self-organizing – creates knowledge structures automatically


  Bayesian techniques do a lot of what we want   Bayesian networks express the structured, graphical representations of

probabilistic relationships between several random variables   But they don’t scale well, generating the network is mostly done by hand, and

inference is NP-Hard

  One promising solution is to use distributed data representations where each coding unit participates in multiple distinct representations   A representation has a more “statistical” aspect to it by virtue of the ensemble of

vectors in its representation   Fault tolerant, graceful degradation, incremental learning ...   So how do we build Bayesian Networks with distributed representations?

5/23/10 10


A “Bayesian Memory”

  Part A is a maximum entropy, dimension reducing vector quantizer like structure

  Self-supervised

  Part B implements Bayesian Belief Propagation or some approximation

5/23/10 11


“Bayesian Memory” (BM) As A Building Block

  Representation distribution results from partially overlapping connectivity

  The network creates a spatial and temporal hierarchy – space and time dilate as one ascends the hierarchy

  Inference requires as many backward paths and forward paths

  Heavily based on the work of others, especially Hierarchical Temporal Memory (HTM, Dileep George and Jeff Hawkins at Numenta)

5/23/10 12

BM BM BM BM

BM BM BM BM

BM BM BM BM

BM BM BM BM


From Big Brain by Gary Lynch and Rick Granger (Palgrave McMillan 2008):

  “… the ‘front end’ circuits of the brain … specialize in their own particular visual and auditory inputs, the rest of the brain converts these to random-access encodings in association areas throughout cortex. … these areas take initial sensory information and construct grammars

  “These are not grammars of linguistic elements, they are grammatical organizations (nested, hierarchical, sequences of categories) of percepts – visual, auditory, …

  “Processing proceeds by incrementally assembling these constructs … these grammars generate successively larger ‘proto-grammatical fragments,’ eventually constituting full grammars”

  “They thus are not built in the manner of most hand-made grammars; they are statistically assembled, to come to exhibit rule-like behavior, of the kind expected for linguistic grammars

5/23/10 13


Bayesian Memory

  Such self-organization requires incremental learning and some form of competition (“winner take all”)

  We know how to do this for spatial patterns with sub-threshold circuits and with nano-structures

  For example, research done by the DARPA SyNAPSE program is showing that certain configurations of memristors appear capable of complex spatial learning at very small dimensions

  But how to do this through time?   We are investigating memcapacitance as one possible mechanism for adding

temporal information to VQ spatial templates

5/23/10 14

Biological system Memristor-based synapse

Time Dependent Learning with the Memristor

Spike-Timing-Dependent Plasticity - when two neurons fire together, they wire together.

G. Q. Bi and M. M. Poo, Annual Review of Neuroscience, 24, 139-166, 2001.

Jo et al. in preparation.

Lu group, University of Michigan


  We are assuming that the computation model is implemented directly in the device and not indirectly via more traditional digital emulation

  Exploit the massive parallelism and the analog, time-dependent, non-linear dynamics that many potential nano and molecular devices offer

  More intimate mixing of storage and logic, move the data to the computation

5/23/10 16


But Wiring Algorithms Into Silicon Must be Done Very Carefully!!

Flex

ibili

ty

Performance/Price

Technology Barrier

General purpose Processors

Algorithms wired into Silicon

Amdahl is always waiting to ambush the unwary!


We Don’t Brake For Alternate Architectures

Alternate Architectures

The Intel Silicon Steamroller

Highly Optimized Commercial Silicon, Using State of the Art Process Technology And Generic Processor Architectures, Is Hard to Beat!


  These structures lead to non-von Neumann architectures that have the potential to implement an important computation with high density and low power

  In reality these processors will probably manifest themselves as “morphics” heterogeneous cores

  It is not just how many operations per Watt, it is what those operations are doing, how do they contribute to the desired functionality of the system?

  Many believe that probabilistic inference will become an increasingly important “generic” computation

  I believe that some kind of Inference Engine will be the microprocessor of the 21st Century

5/23/10 19

5/23/10 20

Our Immediate Goal – The Field Adaptable Bayesian Array (FABA) – A Potential iKNOW Building Block

Nanoscale Mixed Signal

VQ


VQ


VQ


VQ

Each Square is a single Bayesian Memory Node

CMOS provides sparse inter-module connectivity, I/O, signal amplification

Thousands of of nodes

Bayesian Memory Inside!

Download - Inference at the Nano-scale - Computer Action Teamweb.cecs.pdx.edu/~strom/talks/hammerstrom_mees_2020_talk.pdfMaseeh College of Engineering 5/23/10 Hammerstrom 5 and Computer Science

Top Related