Maseeh College of Engineering and Computer Science 5/23/10 1
Inference at the Nano-scale
Dan Hammerstrom Electrical And Computer Engineering
http://web.cecs.pdx.edu/~strom/talks/talks.html
Maseeh College of Engineering and Computer Science Hammerstrom
What are the architectures of the processing components of the iKNOW?
Moore of the same?
Radical new devices and circuits create opportunities for radical new architectures and solutions
5/23/10 2
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 3
We Do Not Know How to Program Computers To Behave Intelligently – A Problem for the iKNOW Machine
In spite of the transistor bounty of Moore’s law, there is a large class of problems that computers still do not solve well
These problems involve capturing complex, structured relationships in the huge quantities of noisy, ambiguous data being input to the system
Our inability to adequately solve these problems constitutes a significant barrier to computer usage and to huge potential markets
The term Intelligent Signal Processing (ISP) has been used to describe algorithms and techniques that involve the creation, efficient representation, and effective utilization of large, complex models of semantic and syntactic data
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 4
A Prototypical Model of Intelligent Computing
Although an over-simplification, this diagram characterizes most intelligent computing implementations
Front End Signal
Processing
Feature Extraction Classification
Contextual Semantic Inference
Motor Control
Motor Control
Subprogram
Motor Control
Programs
Decision Making
The “Front End” - DSP The “Back End” - ISP
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 5
The Front End Well understood, it is the realm of traditional digital signal and image processing Often consists of applying the same computation over large arrays of elements,
computation then tends to be data parallel, and communication tends to be local
The Back End In the early days of computing, “Artificial Intelligence” focused on the
representation and use of contextual and semantic information Knowledge was generally represented by a set of rules However, these systems were “brittle,” exhibiting limited flexibility, generalization,
and graceful degradation And they were “hand” made and unable to adapt dynamically (i.e., learn) within
the context of most real world applications
Maseeh College of Engineering and Computer Science Hammerstrom
What Does The ISP (The “Back End”) Do?
5/23/10 6
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 7
The ISP Toolbox – Still mostly empty after all these years …
Maseeh College of Engineering and Computer Science Hammerstrom
Conceptually one can think of “computational intelligence” as a spectrum Where does iKNOW need to be on this spectrum?
5/23/10 8
Big “C” Cognition: humans
Small “c” cognition: most mammals
Computing Is currently
here
An even longer way!
ISP
Increasing Intelligence
Our Goal
A long way!
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 9
Desirable Characteristics Of New ISP Algorithms
Efficiently represent and utilize high level knowledge
Low power, massively parallel, low precision implementations
Scaling - the scaling limitations of both symbolic and traditional neural network approaches constitute one of their major shortcomings
Adaptive – a critically important characteristic of real systems is incremental, integrative adaptation or learning during system operation Self-organizing – creates knowledge structures automatically
Maseeh College of Engineering and Computer Science Hammerstrom
Bayesian techniques do a lot of what we want Bayesian networks express the structured, graphical representations of
probabilistic relationships between several random variables But they don’t scale well, generating the network is mostly done by hand, and
inference is NP-Hard
One promising solution is to use distributed data representations where each coding unit participates in multiple distinct representations A representation has a more “statistical” aspect to it by virtue of the ensemble of
vectors in its representation Fault tolerant, graceful degradation, incremental learning ... So how do we build Bayesian Networks with distributed representations?
5/23/10 10
Maseeh College of Engineering and Computer Science Hammerstrom
A “Bayesian Memory”
Part A is a maximum entropy, dimension reducing vector quantizer like structure
Self-supervised
Part B implements Bayesian Belief Propagation or some approximation
5/23/10 11
Maseeh College of Engineering and Computer Science Hammerstrom
“Bayesian Memory” (BM) As A Building Block
Representation distribution results from partially overlapping connectivity
The network creates a spatial and temporal hierarchy – space and time dilate as one ascends the hierarchy
Inference requires as many backward paths and forward paths
Heavily based on the work of others, especially Hierarchical Temporal Memory (HTM, Dileep George and Jeff Hawkins at Numenta)
5/23/10 12
BM BM BM BM
BM BM BM BM
BM BM BM BM
BM BM BM BM
Maseeh College of Engineering and Computer Science Hammerstrom
From Big Brain by Gary Lynch and Rick Granger (Palgrave McMillan 2008):
“… the ‘front end’ circuits of the brain … specialize in their own particular visual and auditory inputs, the rest of the brain converts these to random-access encodings in association areas throughout cortex. … these areas take initial sensory information and construct grammars
“These are not grammars of linguistic elements, they are grammatical organizations (nested, hierarchical, sequences of categories) of percepts – visual, auditory, …
“Processing proceeds by incrementally assembling these constructs … these grammars generate successively larger ‘proto-grammatical fragments,’ eventually constituting full grammars”
“They thus are not built in the manner of most hand-made grammars; they are statistically assembled, to come to exhibit rule-like behavior, of the kind expected for linguistic grammars
5/23/10 13
Maseeh College of Engineering and Computer Science Hammerstrom
Bayesian Memory
Such self-organization requires incremental learning and some form of competition (“winner take all”)
We know how to do this for spatial patterns with sub-threshold circuits and with nano-structures
For example, research done by the DARPA SyNAPSE program is showing that certain configurations of memristors appear capable of complex spatial learning at very small dimensions
But how to do this through time? We are investigating memcapacitance as one possible mechanism for adding
temporal information to VQ spatial templates
5/23/10 14
Biological system Memristor-based synapse
Time Dependent Learning with the Memristor
Spike-Timing-Dependent Plasticity - when two neurons fire together, they wire together.
G. Q. Bi and M. M. Poo, Annual Review of Neuroscience, 24, 139-166, 2001.
Jo et al. in preparation.
Lu group, University of Michigan
Maseeh College of Engineering and Computer Science Hammerstrom
We are assuming that the computation model is implemented directly in the device and not indirectly via more traditional digital emulation
Exploit the massive parallelism and the analog, time-dependent, non-linear dynamics that many potential nano and molecular devices offer
More intimate mixing of storage and logic, move the data to the computation
5/23/10 16
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 17
But Wiring Algorithms Into Silicon Must be Done Very Carefully!!
Flex
ibili
ty
Performance/Price
Technology Barrier
General purpose Processors
Algorithms wired into Silicon
Amdahl is always waiting to ambush the unwary!
Maseeh College of Engineering and Computer Science Hammerstrom 5/23/10 18
We Don’t Brake For Alternate Architectures
Alternate Architectures
The Intel Silicon Steamroller
Highly Optimized Commercial Silicon, Using State of the Art Process Technology And Generic Processor Architectures, Is Hard to Beat!
Maseeh College of Engineering and Computer Science Hammerstrom
These structures lead to non-von Neumann architectures that have the potential to implement an important computation with high density and low power
In reality these processors will probably manifest themselves as “morphics” heterogeneous cores
It is not just how many operations per Watt, it is what those operations are doing, how do they contribute to the desired functionality of the system?
Many believe that probabilistic inference will become an increasingly important “generic” computation
I believe that some kind of Inference Engine will be the microprocessor of the 21st Century
5/23/10 19
5/23/10 20
Our Immediate Goal – The Field Adaptable Bayesian Array (FABA) – A Potential iKNOW Building Block
Nanoscale Mixed Signal
VQ
Nanoscale Mixed Signal
VQ
Nanoscale Mixed Signal
VQ
Nanoscale Mixed Signal
VQ
Each Square is a single Bayesian Memory Node
CMOS provides sparse inter-module connectivity, I/O, signal amplification
Thousands of of nodes
Bayesian Memory Inside!