cognitive computing with non-volatile memory devices - epfl · le reti neurali artiﬁciali, ideate...

Cognitive computing withnon-volatile memory devices

Master Thesis

A. Fumarola

30th August 2016

Prof. Y. Leblebici †

Prof. C. F. Pirri ‡, Prof. C. Ricciardi ‡

Dr. G. W. Burr §, Dr. P. Narayanan §

†School of engineering (STI), Institute of Electrical Engineering (IEL), EPFL Lausanne‡Department of Applied Science and Technology (DISAT), Politecnico di Torino§Science and Technology Department, Almaden Research Center, IBM Research

Abstract

The extreme flexibility of digital circuits has allowed modern proces-sors based on the Von Neumann architecture to not only efficientlyimplement algorithms for a wide variety of problems, but to consis-tently improve system performance at an exponential rate. However,with continued aggressive device scaling constrained by power- andvoltage-considerations, the time and energy spent transporting databetween memory and processor (across the so-called “Von-Neumannbottleneck”) has become problematic for data-centric applications suchas real-time image recognition and natural language processing.

One example of Non-Von Neumann (Non-VN) computing is the hu-man brain. Characterized by its massively parallel architecture andadaptive elements (e.g. its synapses), the brain can outperform mod-ern processors on many tasks involving unstructured data classificationand pattern recognition. Artificial Neural Networks (ANNs), first con-ceived in the mid-1940s to mimic what was then known about neuralsystems, perform computations in a naturally parallel fashion. Mod-ern Graphical Processing Units (GPUs) have greatly increased both thesize of the networks and the datasets that can be trained in reasonabletime. In turn, this has commensurately improved classification perfor-mance to the point that these systems are now becoming commerciallypervasive. In contrast to powe-hungry GPUs, IBM’s TrueNorth chip isa flexible and modular non-VN tool for implementing forward infer-ence of large ANNs at ultra-low power. Synaptic weights are typicallytrained off-line and transferred onto digital SRAM arrays to performforward propagation of large and complex ANNs.

One path for extending such Non-VN systems towards full on-chiplearning – and thus to provide accelerated ANN training at lowerpower than GPUs – is to move from reliable but binary SRAM devicesto dense and analog (but less reliable) Non-Volatile Memory (NVM).Such research has been explored here at IBM Research-Almaden forthe past several years, including a mixed hardware-software demon-stration of a three-layer feedforward fully-connected ANN with 165000synapses. However, these experiments have been built around unidi-rectional phase-change memory (PCM). Because PCM only offers in-cremental conductance change in one direction, ANN training mustbe paused periodically and every single conductance measured andadjusted, which is inherently inefficient.

In my internship work, I used our customized ANN simulator to studythe potential of bidirectional ReRAM devices based on PrxCa1-xMnO3(“PCMO”) materials for such on-chip learning applications. Real de-vice data from our collaborators in Korea is used to obtain highly-representative simulations of expected ANN performance. I obtainedhigh classification accuracy for several different weight-update schemesand synaptic architectures. Several improvements over our previouslypublished implementations will be discussed.

i

Sommario

L’estrema flessibilita dei circuiti digitali ha permesso ai processori d’og-gi, basati sull’architettura di Von Neumann, non solo di implementarealgoritmi per una grande varieta di problemi, ma anche di migliora-re costantemente le performance ad un ritmo esponenziale. Tuttavia,essendo la riduzione aggressiva della dimensione dei dispositivi elet-tronici soggetta a limiti dovuti alla potenza e al voltaggio, il tempo el’energia spesi a trasportare i data dalla memoria al processore (attra-verso il cosiddetto ”Von-Neumann bottleneck”) e doventato problema-tico per applicazioni con un uso intensivo di dati, come riconoscimentodelle immagini in tempo reale o analisi del linguaggio naturale.

Un esempio di architettura Non-Von Neumann (Non-VN) e il cervelloumano. Caratterizzado dalla sua architettura estremamente paralle-lizzata e da elementi plastici (ad esempio le sinapsi), il cervello puosuperare le prestazioni dei processori moderni in compiti come la clas-sificazione di dati non strutturati o il riconoscimento di schemi astratti.Le Reti Neurali Artificiali, ideate a meta degli anni quaranta per imitarequello che si conosceva all’eopca del sistema nervoso, possono esegui-re calcoli in modo naturalmente parallelizzato. I moderni processorigrafici (GPUs) hanno grandemente aumentato la dimensione delle retie delle informazioni che si possono simulare in un tempo ragionevole.Inoltre, cio ha aumentato incommensurabilmente le prestazioni classifi-cative al punto che questi sistemi stanno diventando commercialmentevantaggiosi. Il ’TrueNorth chip’ di IBM e un sistema non-VN flessibilee modulare per implementare grandi reti neurali che, al contrario deiprocessori grafici, ha un bassissimo assorbimento di potenza. I pesidelle connessioni sinaptiche sono allenati separatamente e trasferiti suuna schiera di celle SRAM per effettuare forward evaluate di complessereti neurali.

Una strada da seguire per estendere questi sistemi non-VN verso ap-prendimento in tempo reale - e quindi per fornire apprendimento aduna potenza inferiore rispetto alle GPUs - e di spostare l’attenzionedalle SRAM (affidabili ma binarie) a dense memorie non volatili (ana-logiche ma meno affidabili). Questa campo e stato esplorato in IBMnegli ultimi anni, inclusa una dimostrazione hardware-software di uncomplesso percettrone a tre livelli con 165 000 sinapsi. Tuttavia, questoesperimento e stato costruito con dispositivi unidirezionali chiamatimemorie a cambio di fase (PCM). Visto che le PCM offrono solamenteun aumento della conduttanza (ma non una diminuzione), l’appren-dimento della rete neurale deve essere periodicamente fermato e ognisingola sinapsi deve essere misurata e ri- regolata, operazione che einerentemente inefficiente.

Nel mio lavoro di tirocinio, ho usato il nostro simulatore di reti neuraliper studiare il potenziale applicativo di memorie resistive bidireziona-li basate su PrxCa1-xMnO3 (”PCMO”) per questo apprendimento suchip. Dati sperimentali dai nostri collaboratori in corea sono stati usatiper ottenere simulazioni altamente fedeli delle performance previste

ii

della rete neurale. Un’alta percentuale di classificazione e stata ottenu-ta per diversi metodi di apprendimento e architetture sinaptiche. Mol-ti miglioramenti dai precedenti lavori pubblicati dal gruppo verrannodiscussi.

iii

Resume

La plupart des processeurs modernes sont bases sur l’architecture deVon Neumann et ses variantes. L’immense flexibilite des circuits numeriques,couples a l’acceleration des processeurs et des modules memoire (RAM),ont permis d’implementer de facon efficace des algorithmes dedies aune grande variete de problemes, et ont connu, au cours des dernieresdecennies, des ameliorations significatives des performances. Cepen-dant, la reduction d’echelle agressive des circuits devient de plus enplus difficile et de nos jours la vitesse des interconnexions determineles limites des processeurs les plus haut de gamme. Les informationsdoivent etre transmises repetitivement de la memoire au processeur,causant des pertes de temps et d’energie, qui deviennent critiquesdans les applications traitant de grandes quantites de donnees. Cephenomene, souvent appele le goulot d’etranglement de Von Neumann,a limite le developpement de solutions populaires a cette categorie deproblemes, comme par exemple la reconnaissance d’image en tempsreel ou le traitement de langage naturel.

Un exemple d’ordinateur non-Von Neumann est represente par le cer-veau humain. Caracterise par une architecture massivement paralleleet d’elements plastiques (par exemple les synapses), il peut depasserles processeurs modernes en termes de performances dans beaucoupde taches mettant en jeu la classification d’elements non structures et lareconnaissance de motifs. Les reseaux de neurones artificiels (ANNs),concus au milieu des annees 40 dans le but d’imiter certaines fonction-nalites des systemes neuronaux, mettent en jeu des calculs de faconnaturellement parallele et ont fait l’objet de recherches en tant qu’alter-native viable au traitement classique des informations. Cependant, lamajeure partie des calculs faits sur des ANNs tirent partie de proces-seurs graphiques (GPU) ou de grandes grappes de processeurs, dont lastructure sequentielle inherente limite l’efficacite de l’entraınement, etl’implementation a de relativement petits reseaux. Un exemple notablede materiel destine specifiquement aux ANNs est le puce TrueNorthd’IBM, dont l’evolutivite, la consommation d’energie extremement faibleet la modularite representent la pointe de la technologie en termes fonc-tionnalites. Les poids synaptiques sont entraınes hors-ligne et trans-feres sur un reseau numerique de memoire SRAM dans le but derealiser la propagation directe.

Dans cette these, le travail presente dans [8, 5, 7] est revise et etendu.Un grand perceptron a trois couches (916 neurones et 16500 synapses)est entraıne (en ligne) et implemente en utilisant des matrices de com-mutation composees de memoires non volatiles (NVMs) en tant qu’elementssynaptiques. Alors que des experiences et simulations precedentes etaientbasees sur des memoires a changement de phase unidirectionnelles(PCMs), les resultats montres ici concernent des ReRAM bidirection-nelles analogiques (non fiables) basees sur des materiaux PCMO. Uneexactitude elevee de classification (jusqu’a 93% pour l’echantillon d’en-traınement et 90% pour l’echantillon de test) a ete demontree pourdifferents types de mise a jour des poids et architectures synaptiques.

iv

Plusieurs avantages par rapport a l’implementation realisee dans despublications precedentes ont ete signales. Parallelement a ce travail,focalise sur des dispositifs compatibles avec les matrices de commu-tation, l’autre partie du groupe de recherche est implique dans le de-sign de circuits peripheriques neuromorphiques pour mettre en œuvrela retropropagation du gradient integree a la puce. Les exigences entermes de surface et de vitesse pour obtenir une performance competitiveont ete pris en compte.

v

Acknowledgement

Firstly I would like to thank Prof. Leblebici, Prof. Pirri and Dr. Bozano fortheir efforts to make this internship possible. IBM Almaden is a truly inspir-ing place, and every moment spent in Silicon Valley was precious. Withouteven a single one of them, nothing would have been possible.

Thanks to Dr. G. Burr, for his constant work as a scientist/mentor/sage.I have never seen anyone excelling in all these skills like he does. Dr. P.Narayanan and Dr. R. Shelby, thank you for your ’copious spare time’. Thesupport of three great people like you was the pivot of my days in Almaden.

For Giulia and my family, who, even if far away, always took the time andthe effort to listen to me, also when I did not deserve. Your existence is veryprecious to me, and I am glad that mine means the same to you.

”If I had one penny for every time that my office mates were complaining, now Iwould be a rich man” -Anon.No need to add anything: Alexis, Lorenzo, Emeline and Vincent.

I acknowledge Carmelo, Irem and Severin for the previous work that theycarried out on the same subject. That helped me a lot during my stay inAlmaden.

All the folks in IBM, always ready to cheer up my day: all the SPINAPSerin my corridor, the Nanoscale Fabrication Group and the guys of the databasegroup.

Last but not the least, thanks to Prof. Eros Pasero, whose advice during mybachelor degree created my interest for neuromorphic computing.

vi

Contents

Contents vii

1 Introduction 1

2 Beyond Von Neumann Computing 32.1 The problem of computation . . . . . . . . . . . . . . . . . . . 32.2 The end of Moore’s Law? . . . . . . . . . . . . . . . . . . . . . 32.3 The Big-Data era and the ”Cognitive” shift . . . . . . . . . . . 4

3 Artificial Neural Networks 73.1 The Biological Inspiration . . . . . . . . . . . . . . . . . . . . . 73.2 The Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 The Backpropagation Algorithm . . . . . . . . . . . . . . . . . 9

4 Non Volatile Memories 154.1 Phase-Change Memory . . . . . . . . . . . . . . . . . . . . . . . 154.2 PCMO based devices . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Fabrication Process . . . . . . . . . . . . . . . . . . . . . 184.2.2 Resistive Switching in PCMO devices . . . . . . . . . . 184.2.3 Improvements from previous devices . . . . . . . . . . 20

5 Hardware Neural Networks with NVMs as Synaptic Device 235.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.1 Analog MAC Operation . . . . . . . . . . . . . . . . . . 245.2 Crossbar-compatible Weight Update . . . . . . . . . . . . . . . 255.3 The G-diamond Plot . . . . . . . . . . . . . . . . . . . . . . . . 275.4 Jump Table Concept . . . . . . . . . . . . . . . . . . . . . . . . 27

6 Simulated PCMO Performance Results 336.1 Jumpsize disparity in AlMo/PCMO RRAM . . . . . . . . . . . 336.2 Asymmetric Conductance Response . . . . . . . . . . . . . . . 35

vii

Contents

6.3 Ideal Bidirectional NVM Performance . . . . . . . . . . . . . . 386.4 Single Bidirectional NVM . . . . . . . . . . . . . . . . . . . . . 39

7 Conclusion 43

Bibliography 45

viii

Chapter 1

Introduction

The work presented herein was the results of a six-month internship carriedout at IBM Research, Almaden in San Jose, California. IBM Research is partof the big share of companies which are heavily investing in alternative com-putation and cognitive systems. One of the big results of this research is, forexample, Watson: a platform exploiting novel algorithm capable of naturallydealing with tasks that are usually very difficult for computers (e.g. contextaware search, language processing). One demonstration of its capability hasbeen given in the last years of development, making Watson able to competewith humans (and win on a regular basis) in the game Jeopardy!. In order tobeat a Jeopardy! champion, Watson had to show several skills including natu-ral language processing, hypotheses generation and uncertainty weighting.

Watson is just one side of the cognitive world that IBM is trying to explore:the cognitive revolution aims to change our lives, and has the potential tobecome the next game changer. The Machine Intelligence group is exploringthe field in different directions. A big share of the researchers is working onneo-cortical and context-aware learning. Both the core algorithms and poten-tial application have been studied. A smaller part of the group is dealingwith neuromorphic hardware which comprises a set of novel architectures anddevices enabled to efficiently implement those algorithms.

This thesis is part of a research work focused towards the creation of ananalog-mixed signal VLSI system capable of accelerating machine learningalgorithms. The system is based on a crossbar-array of non-volatile mem-ory devices with neuromorphic peripheral circuitry (whose function anddescription is not included in this work). Backpropagation trained artificialneural networks (described in Chap. 3), due to their vast use in industry,represents the main target. This thesis is divided in seven chapters:

— Chapter 1 is an introduction to contextualize the work carried out;— Chapter 2 defines and describes the (Non-)Von Neumann architec-

ture, pointing out some of the reasons of its success and the issues

1

1. Introduction

that arose with the new cognitive approach;— Chapter 3 provides some information about artificial neural networks

like its mathematical structure and characteristic equations;— Chapter 4 reviews some types of non-volatile memory (NVM), de-

vices used to build our system;— Chapter 5 explains the working principle of accelerating machine

learning with NVMs;— Chapter 6 provides the simulated results of a benchmark problem;— Chapter 7 is a conclusion of what has been achieved in this work and

the future perspective of the project;Part of this project was funded by the Research Frontiers Institute (RFI),a consortium of major companies (e.g. Samsung, Honda) which aims toshare IBM expertise in cutting-edge research for a faster integration in theindustrial world. Funding up to 6 million dollars in three years have beengranted to this and other 9 selected projects.

2

Chapter 2

Beyond Von Neumann Computing

2.1 The problem of computation

The Von Neumann architecture is the predominant computing paradigmin modern processor. The flexibility of digital circuits, together with thecomplex high-level capabilities of the Von Neumann architecture are two ofthe ingredients for the success of the latter in the computing industry.

However, different approaches to the problem of computation were studiedand analysed in early stages of research. During the 1930-’40 decade, anintense research activity was carried out by either mathematicians and infor-mation engineers on the theoretical and then practical aspects of computation[29].

2.2 The end of Moore’s Law?

The Dennard’s scaling law, which is a set of voltage and power considera-tion when reducing the physical size of transistors, helped to keep the pacestated in a 50-year-old observation made by Gordon Moore about the capa-bilities of electronic systems [10]. The feature size achievable with advancedlithographic techniques and new semiconductor fabrication processes en-abled digital processor to consistently become more powerful over the lastdecades. A set of processes and techniques capable of reaching a typicalfeature size (the minimum achievable distance within the system) is calledtechnology node. The interest in scaling down transistor dimensions arisessince smaller devices can switch with higher frequency, thus improving thecomputing performances.

The latest nodes have feature size, e.g. the channel length of transistors, ofless than 20nm (Samsung and Intel processes) where several physical phe-nomena undermine the electrostatic control of the gate over the channel.Generally, we refer to such effects as short-channel effects (SCEs). When

3

2. Beyond Von Neumann Computing

Figure 2.1 – (a) While the speed of CPU and memory components is increas-ing together with the power consumption (following closely the predictionof the Moore’s law), the bandwidth of the bus connecting CPU and mem-ory is being the limiting factor in Von-Neumann architectures (fig. (b)). Inone of the proposed implementations of ”distributed architectures” (fig. (c))computation is done AT the data, limiting the amount of information trans-ported during processing. From [21]

aggressively scaling down the device size, other problems emerged: thebandwidth of interconnections quickly became the limiting factor to fur-ther increase the speed of silicon chips [32]. In the Von Neumann archi-tecture, which intrinsically requires data to be transferred back and forthfrom the memory to the computing unit [38], the latter is a critical feature.In particular, for data-heavy applications, such as unstructured data classi-fication, pattern recognition and natural language processing, this so-calledVon-Neumann bottleneck has been recognized as a main antagonist to thedevelopment of efficient algorithms [21].

2.3 The Big-Data era and the ”Cognitive” shift

Information has always been a key element for the development of society.Since the beginning of the civilization era, restless efforts have been madeto find efficient ways of storing and transmitting data. Spoken languagescame first, as a development of primitive verses and sounds that humanscould produce, using our own brain as the storage element for the contentconveyed. However, minds are often subject to distortion and unpredictable

4

2.3. The Big-Data era and the ”Cognitive” shift

Figure 2.2 – Since the beginning of the digital era, the amount of data pro-duced daily in the world increased exponentially. The quantity of uncertaininformation (red line in the figure), comprised of the unstructured data pro-duced by millions of devices, is growing even faster. Currently, there is nosolution on how to process effectively such enormous amount of informa-tion.

loss of information, more reliable systems needed to be developed. Thefirst written historical testimonies were graphic representations of humanactivities. Since then, written communication forms have deeply evolvedand underwent several transformations, but remained the principal mean oftransmitting information among humans. A new revolutionary breakpointwas reached with the flourishing of the digital that enabled information,regardless of the content, to be stored as binary bits. In the last decade themost of the data was produced in this form, coming from highly diversifiedsources such as: sensor measurements, mobile device data. Most of theinformation, unfortunately, can not be analyzed in a straightforward way. Alot of effort has been put in research to tackle the problem, and the definitivesolution is still far from being found.

Alternative computing paradigms have been explored in order to allowfaster data processing. Artificial neural networks ANNs are one of those.ANNs are adaptive models that can be used, for example, to classify in-put patterns or to approximate multivariate functions. A deeper insight inneural networks will be given in section 3. The ensemble of these novelapproaches to data analysis is generally referred to as cognitive or adap-tive computing. The interesting characteristic of such techniques is the mas-

5

2. Beyond Von Neumann Computing

sive parallelism of data processing, event-driven computation and multi-dimensional structure.

Cognitive algorithms are very demanding in terms of computation whenimplemented in Von-Neumann machines: in most of the cases, matrices andvector with up to millions of elements need to be multiplied and summedin an efficient way to achieve competitive performances. This is the reasonwhy most of the research is carried out on implementing these techniqueson fast GPUs or CPU racks, that allow high degree of parallelism . However,the inherently sequential Von-Neumann architecture represents a severe lim-itation in the execution of such algorithms. The data used for training andtesting (two important concepts that will be described in Section 3) needsto be transported repeatedly from the memory, where it is stored, to theCPU, where the computation is performed, in a sequential fashion. A lot ofresearch has been also carried out to increase the bandwidth of the intercon-nection for graphic and data intensive applications ([12]).

An example of naturally ’cognitive’ machine is the human brain. Besideshaving much slower fundamental building blocks (e.g. neurons and synapses),it can outperform cutting-edge digital microprocessor in several tasks involv-ing, in particular, quick unstructured data sorting and classification. Thefact that in the biological brain the data is stored in the same place wherecomputation is performed is one of the features that determines this greatadvantage.

Brain-inspired electrical systems are often called neuromorphic systems,from Prof. Mead original textbook ’Analog VLSI and Neural Systems’ (1989)and other early papers [20]. Several trials of successfully designing and fab-ricating such chips have been made, some of which generated a big hype inthe scientific world. An example is IBM TrueNorth neuromorphic chip, com-prising 5.4 billion transistors simulating the activity of 1 million neurons and256 million synapses. TrueNorth is capable of implementing large-scale neu-ral networks in an efficient and low-power fashion. The secret ingredient ofthis system is the use of SRAM memory cells to implement connections be-tween the different neural units. Despite being (relatively) large and volatile,SRAMs offer extremely fast access time and consumer-grade operation reli-ability. All these features represent a milestone in neuromorphic computing,but there are still some issues to be addressed. From the perspective of theSynapse team, training of the neural networks is still performed off-chip onthe kind of systems mentioned in the previous paragraphs.

The aim of this work, as well as of the rest of the project, is to demonstratethat is possible to build analog VLSI hardware system with on-line machinelearning capabilities. If deployed at large scale, such system would be agame changer in the field of data analysis, allowing, for example, real-timeadaptive filtering, clustering and predictive control.

6

Chapter 3

Artificial Neural Networks

In this section, some theoretical insight into Artificial Neural Networks(ANNs) will be given. Diving into the mathematical details of this topicis not in the scope of this work, but some preliminary knowledge of theproperties and of the working mechanism might clarify some concepts ex-pressed in the following chapters.

3.1 The Biological Inspiration

Neuroscience has always been one of the most active fields of research inmedicine and biology. Even after 100 years of discoveries (crowned byseveral Nobel prize winners and outstanding scientific achievements), ourknowledge of the nervous system is far from complete. Nowadays manyinstitutes propose joints research program in which neuroscience, mathe-matics, physics and engineering can try to unroll, with collaborative efforts,

Figure 3.1 – Structure of theMcCulloch-Pitts (MCP) neuron.Inhibitory and excitatory inputs (xi)are discriminated by their weight(wi), which can be equal to 1 or −1.The model comprises a summerand a threshold unit, Similaritieswith the biological structure ofneuronal cells include the presenceof multiple inputs (dendrites) andonly one output (axon) for everyneuron.

7

3. Artificial Neural Networks

Figure 3.2 – Schematic of a thresh-old logic unit. The values xi andthe weights wi not binary like in thecase of MCP. The bias (in Eq. 3.1) isrepresented as a weight associatedto a constant input.

the complex veil that covers the subject (e.g. Institute of Neuroinformaticsin Zurich).

The first breakthrough discoveries in the field date back to the beginningof the 20th century, when the first Nobel prize for neuroscience (just sixyears after the institution of the price itself) was awarded for the thoroughclassification work on neurons. The first neuronal model was introduced in1943 by Warren McCulloch and Walter Pitts [30, 13]. In their work, the tworesearchers developed a simple mathematical formulation that could repro-duce some functionality of the observed cells. The McCulloch-Pitts neurons(MCPs) are logical units with multiple excitatory and inhibitory binary inputsand one binary output. If the sum of the excitatory inputs exceeds a thresh-old value θ, the neuron is ’activated’ and its output value set to 1. However,if any of the inhibitory inputs is 1 or if the sum of excitatory input is lowerthan the threshold, the neuron is ’not active’ and its output is 0. In Fig.3.1 the structure of the MCP neuron is showed and compared to the actualstructure of biological neurons. The basic logic operation where proven us-ing threshold logic and, despite the simple functionality, it was shown thatmultiple MCPs connected in a network can solve more complex problems[19].

3.2 The Perceptron

Following the work of McCulloch and Pitts, more complex type of neuronswere developed. Threshold logic units (TLUs) share the same backbonestructure of McCulloch neurons, with the binary weights and inputs beingreplaced by analog quantities [31]. The output of the unit is instead a binaryvalue, and its computed according to equation 3.1.

f (x) =

{1 i f (∑N

i=1 xiwi + b) > θ

−1 otherwise(3.1)

The schematic of a typical threshold logic unit is showed in Fig. 3.2. Un-derstanding the potential of such logic units, Rosenblatt and his colleagues

8

3.3. The Backpropagation Algorithm

Figure 3.3 – Schematic view ofa generic multi-layer perceptron(MLP). Neurons are arranged in lay-ers (superscript ’A’, ’B’ and ’C’). Ev-ery layer is fully connected to thesuccessive. The first (input) receivesdata from the outside, while the oth-ers (hidden and output) process inter-nal neuron values. Synaptic weightsdefine the strength of neuronal con-nections and are adaptive quantities(they can be adjusted to improve thenetwork performance).

developed a logic around TLUs. Several architectures were proposed, in-volving different connection schemes between the computing units. Themathematical description of such networks is complex, in particular wheninvolving feedback connections. For this reason, in this work, only the so-called feedforward perceptrons are taken into account for analysis. From amathematical point of view, perceptrons can perform a binary classification,that is sorting in two classes the data in the input space. An useful way ofvisualizing the operation of perceptrons is to imagine the input data spaceto be separated by an hyperplane, whose direction is related to how we tunethe weights of the neuron connections.

3.3 The Backpropagation Algorithm

A brief historical overview of the precursors of modern neural networkshas been carried out in sections 3.1 and 3.2. However, the implementationof neural networks used herein (which is also one of the most commonlyused in many fields of research), makes use of tools derived from the abovementioned even if subtle differences are present. In this section, we willfocus on the type of network analyzed in this context.

Artificial neural networks are constituted by two main components:— (Artificial) Neurons: characterized by their output or activation value

which is a function of the received net input;— Synapses: adaptive elements associated to a synaptic weight, they es-

tablish connections between one upstream and one downstream neuron;Fig. 3.3 shows a sketch of a generic ANN. The two components need tointeract during the operation of the neural network, and their representative

9


values are calculated with the following three rules:

— Network topology: is the connectivity model used for a neural net-work. The most common topologies include fully connected networks,where all the neurons are interconnected, or layered networks in whichneurons are arranged in layers and connections within the same layerare prohibited. Network connectivity describes which neurons cancontribute to the value of other neurons;

— Activity rule: is the function that binds the output value of the neu-ron to its net input. The mostly used is a multiply-accumulate operationand a non-linear squashing function in the form expressed by Eq. 3.2.

xBj = f

(N

∑i=1

(xAi · wij)

)(3.2)

Common squashing functions are the hyperbolic tangent or the recti-fied linear unit (ReLU).

— Learning rule: synaptic weights need to be adjusted to optimize thenetwork performance. The approaches that can be used are dividedin categories, that will be described later in this section. Generally, acost function is defined, and the weight change is such to minimizethe value of this function.

The third item on the list is the one that mostly attracted the curiosity of re-searchers, as it allows to achieve impressive results when correctly defined.

Soon after the first models for neurons were proposed (Sec. 3.1), sheddingsome lights on how these cells produce their signals, other theories were sug-gested regarding the adaptation of such systems ([15] ). After observation onliving neural tissues, Hebb (1949) concluded that the synaptic weight (e.g.the connection strength) between two neurons is strengthened when theiroutput value is correlated, else, it is weakened. It has to be noted that, inthis case, the change in weight of every conductance depends just on localinformation concerning the downstream and upstream neuron, not requir-ing prior knowledge about the task which is being performed. This kind oflearning is known as unsupervised learning. Backpropagation, on the otherhand, is a supervised learning method, because, as it will be pointed outin the next paragraph, it demands additional information about the inputdata (labels or ground truth). Semi-supervised learning or reinforcementlearning is most widely used for adaptive filtering and control application,and it represents an intermediate algorithm between the previously men-tioned. The system receives a feedback (e.g. a performance evaluation) usedto correct its behaviour. For example, if a robot is using semi-supervisedalgorithm to learn how to walk, a fall or staggering walk could represent anegative feedback to correct the model.

10


(a) (b)

Figure 3.4 – (a) The MNIST dataset of handwritten digits is a standard prob-lem to test the effectiveness of machine learning algorithms. Every pixel isrepresented by a number between 0 (pixel is dark) to 255 (pixel is white).Besides being considered a ’minor league’ test, it serves the purpose of as-sessing the suitability of hardware synaptic devices and peripheral circuitry.The neural network (from [18]) will receive as input a subset of 5000 grey-scale input images (called a training epoch). (b) A four-layer deep neuralnetwork is chosen as the classifier. Every input neuron receives a singlepixel value, while every output neuron represents a different recognizeddigit. Image from [8].

In Fig. 3.4 the structure of the MLP and the database used as classificationtask is reported. The MNIST dataset of handwritten digits was introducedin 1998 by the French computer scientist Yann LeCun. It comprises a set of60 000 gray-scale pictures ( 1 pixel = 1 value ) with handwritten numericaldigits (from 0 to 9) called the training set and another set of 10 000 differentimages called the test set. For reason of speed, just the first 5000 examplesof the training set will be used in the simulations presented herein (5000images = 1 training epoch). A label is associated to every image and itrepresents the digit shown in the picture (e.g. in this context, we have 10different labels, one for every digit). The task of image recognition consistsin adapting the network connectivity, the synaptic weights, using only thepictures of the training set, and then measuring the recognition rate on theseparate test set [18].

It has to be pointed out that the architecture of neural networks strongly de-pends on the task assigned. In this case, for example, the number of inputunits naturally follows the number of pixels present in the input pictures,

11


while the number of output neurons, after deciding to use just one activeneuron per digit, has to be equal to the number of possible digits. The num-ber of layers, instead, does not follows a strict rule. In [22] is shown thatsingle layer perceptrons can not solve non-linear separation problems, forwhose adding hidden layers and non-linear squashing functions is necessaryto converge to a solution [2]. The choice of the squashing function is alsoimportant, and many different types can coexist within the same network. Itis not straightforward to understand what is the impact of different squash-ing functions on the network behaviour and performances. However, somegeneral constraints can be set, for example, whether to choose bounded func-tions (e.g. if the value of the neurons is assumed to be restricted to a certainrange) or not bounded functions [13].

As mentioned in this section, the backpropagation algorithm is a super-vised learning technique which is widely spread in research and industrialapplications, and, despite its simplicity, has been proven to solve several dif-ferent problems in the field of machine learning. One of the reason of itssuccess is that it involves calculation of very precise derivatives of functionsthat are known, and it can be applied to different fields other than ANNs[28].

From an algorithmic point of view, backpropagation presents three distinctsteps, which are repeated for every input example in every training epoch:

1. Forward Propagation: the input data is fed into the network (Eq. 3.3,where information flows from the first to the last layer. All the values(xA

i ) of the neurons are computed in this step according to Eq. 3.4 (Fig.3.5);

xAi = InputPixels (3.3)

xBj = tanh

(N

∑i=1

(xAi · wij)

)(3.4)

2. Reverse Propagation: an error term (δDj ) is generated at the output

neurons by comparing the activation value with the expected result(Eq. 3.5). These terms are propagated back to the first hidden layer(Fig. 3.6, Eq. 3.6);

δDj = xD

j − gj (3.5)

δCk = tanh′(xC

k )

(N

∑j=1

(δDj · wij)

)(3.6)

3. Weight Update: using the values computed in step 1 and 2, the weightof every synaptic connection is updated to follow a gradient descentof an error function for the present example (Eq. 3.7).

∆wij = η · xAi · δB

j (3.7)

12


Figure 3.5 – During forward evaluation information flows from the inputto the output layer. Artificial neurons perform a multiply and accumulateoperation on their inputs and apply a non-linear squashing function to cal-culate their output value xi. At the end of this phase, the network will havecomputed all the output values of all the neurons Image from [8].

If there exist one or more set of synaptic weights capable of classifying thetraining set, the backpropagation will likely converge to one of these. ANNswith such learning algorithm will not find the optimal solution, as the learn-ing stops as soon as the local gradient is zero (Fig.3.7). Different machinelearning methods (such as support vector machine) or modifications of thebackpropagation algorithm are able to take into account the classificationmargin of the solution.

An important aspect on how to efficiently implement ANNs on hardwarewhich will not be covered in this thesis is the circuit implementation of allthe quantities present in Eq.s 3.4,3.6 for every neuron. For example, thehyperbolic tangent function of a signal is difficult to obtain on-chip in anarea efficient fashion without approximations [23]. Studies of the impact ofsuch approximations on the network performance have been carried out byother members of the group, and the results are collected in [11, 24].

13


Figure 3.6 – During reverse propagation information flows from the outputto the input layer. The error term (δij is computed at the output neuronsand its equal to the difference between the output value of the neuron andthe expected value (ground truth). The error is propagated to the previouslayers and used in combination with the output value calculated during theforward evaluation, to computed the weight change in the synaptic connec-tions (∆wij). From [8].

Figure 3.7 – Example of differentsolutions for a binary classificationtask performed with two differentmodels. The green line (H1 in fig-ure) is not a stable solution for anANN as some of the examples arestill misclassified. The blue line(H2), despite being far from the op-timal solution, is a possible sepa-ration line obtained with the back-propagation algorithm. The red line(H3) shows the boundary that maxi-mizes the classification margin.

14

Chapter 4

Non Volatile Memories

Non-Volatile Memories (shortly NVMs) are a class of devices capable ofretaining information in a power-supply independent fashion. Several dif-ferent technologies are currently employed to produce non-volatile memory,and they might exploit different physical principles of data storage. NVMs,have been used for diverse applications through the years. Magnetic mem-ories, such as tapes, floppy disks and the newer magnetic hard disks, dueto their high retention time and information density, have dominated thefield of data storage for several years [3]. Other kinds of non volatile mem-ories comprise read-only memory, (e.g. EEPROM), phase-change memory(optical discs, DVDs) and solid state non volatile memories, like FLASH andSSDs. In the following sections, a brief overview about NVMs used in ICtechnology for VLSI implementation will be given. In this work, particularfocus will be given to three classes of NVMs that can be implemented inultra-dense crossbar-arrays:

— Phase-change Memories: devices of vast interest in the context ofnovel VLSI storage and computing architectures. PCMs have alsobeen used in previous work concerning neuromorphic architectures.[8, 7, 5] of which this thesis represents an extension;

— PCMO non-filamentary NVMs : a class of devices with very interest-ing current and power scalability. All the results presented in thisthesis are obtained with such devices;

4.1 Phase-Change Memory

Phase-change materials are a special class of compounds that exhibit elec-trical or optical properties change when switching from the amorphous tothe crystalline state [27, 6]. The switching process is usually thermally ac-tivated, and can be controlled with electric programming schemes [6] (seeFig. 4.2). The use of phase change materials for information storage has

15

4. Non Volatile Memories

Figure 4.1 – (a) SET and RESET characteristic of PCM devices. The thresh-old voltage of the device is highlighted in the plot.(b) Structure of volumeminimized cell (pore cell, bottom) and contact minimized cell (mushroomcell, top). From [27]

Figure 4.2 – Phase-change is controlled by mean of electrical pulses. Shortand high voltage pulses (RESET) lead to abrupt temperature change in thematerial, where a volume is melted and rapidly quenched in an amorphousstate. Lower voltage and longer pulses allow recrystallization of the material(SET). A low amplitude pulse is used to read out the resistance of the device,in order not to perturb its state. From [39]

16

4.2. PCMO based devices

Figure 4.3 – Overview of the structure and main properties of AlMo/PCMORRAM device. Differently from previous versions of the device, both anAl electrode and a Mo diffusion barrier are present thus allowing veryhigh ON/OFF ratios and improved stability (e.g. linearity in the resistiveswitching dynamic). The Molybdenum layer increases the energy requiredto switch from the low resistance state to the high resistance state. Imagecourtesy of K.Moon and H.Hwang, POSTECH, South Korea.

attracted the attention of researchers since the 1960s. The first applicationexploiting changes in electrical conductivity was an U.S. patent in 1969 [37]using chalcogenide glass, the same material employed in CD- and DVD-RW, in which the refractive index was the changed by means of heating bya laser. Remarkable progress has been achieved concerning phase changematerials in thin film technology and integrated systems. GST is the mostcommonly used material for nanoscale phase-change memories, with BEOLhigh-yield high-density processes proved in several works [1, 14]. Scalabil-ity, large array CMOS compatible fabrication, long retention, endurance andswitching performances are amongst the advantages of PCM devices overother type of non-volatile memories [6]. A typical current-voltage curve fora PCM device is shown in figure 4.1 (a). It can be noted that electric currentflow is impeached up to a certain voltage value, called the threshold voltage.This characteristic is of fundamental importance when considering PCM asit would require enormous power to trigger the phase transition [6]. Mostof the work on HNN carried out by the Machine Intelligence group in IBM isfocused on PCM devices. However, all the work presented in this thesis em-ploys different types of non-volatile memory devices, described in section4.2.

4.2 PCMO based devices

All the data provided for the simulation of PCMO devices as synapses inhardware neural networks and the images present in this section were pro-

17


Figure 4.4 – Cross section ofthe PCMO-based RRAM TEM.The total thickness of the Allayer is ≈ 20nm, while theMo diffusion-barrier is ≈ 3nm.The perovskite crystal is ≈30nm thick. Image cour-tesy of K.Moon and H.Hwang,POSTECH, South Korea.

vided by Prof. Hwang group in POSTECH University, South Korea.

4.2.1 Fabrication Process

A 1k-bit PCMO-based resistive memory arrays having a stack of Pt/Mo/PC-MO/Pt were fabricated for evaluation as synaptic devices. For device fabri-cation, a 50-nm-thick Pt layer for a bottom electrode and a 30-nm-thick poly-crystalline PCMO film were deposited and patterned using conventionallithography and reactive ion etching. Next, an 80-nm-thick SiNx layer wasdeposited by chemical vapor deposition, and via-holes (ranging in size from0.15 to 1.0 µm) were formed by conventional lithography and reactive ionetching. According to the type of device taken into account, either a 10-nm-thick Mo or a 20nm/3nm bilayer of Al and Mo were fabricated. In figure 4.4Eventually, an 80-nm-thick Pt layer for a top electrode was deposited andpatterned by conventional lithography.

4.2.2 Resistive Switching in PCMO devices

Electrical characteristics of the PCMO-based resistive memory devices weremeasured using an Agilent B1500A [11]. In figure 4.5 the I-V characteristic ofa AlMo based PCMO resistive device is reported. An hysteresis loop in theelectrical characterization is well evident, including PCMO-based resistivedevices in what have been defined memristive systems [9]. Resistive switch-ing in PCMO-based devices is caused by gradual drift of oxygen ions andvacancies in the polycrystalline PCMO layer, changing the conductance ofthe device in an analog fashion. Charge carriers, oxygen ions and vacancies,are generated by a redox reaction taking place at the interfaces of the PCMOpolycrystalline layer. To ensure proper operation of the device, the fabrica-tion process parameters need to be finely adjusted [17]. Injection (removal)of oxygen ions takes place at the PCMO-oxide (-metal) interface through

18


Figure 4.5 – Hysteresis I-V curve for AlMo/PCMORRAM devices. The con-ductance can be adjusted inmultiple analog states com-prised between a HRS anda LRS. The absolute conduc-tance of the LRS dependson the active area of the de-vices with a linear relation.Image courtesy of K.Moonand H.Hwang, POSTECH,South Korea.

(a) (b)

Figure 4.6 – Examples of potentiation (a) and depression (b) characteristic forAlMo/PCMO RRAM devices. AlMo/PCMO devices allow to change theirconductance states bidirectionally by firing programming pulses in oppositedirections. In these examples, pulse widths of 100us for SET and 10µs forRESET were used for different pulse amplitudes (from 2 to 4V). A uniqueprogramming scheme will be chosen when implementing the devices inthe hardware neural network. Image courtesy of K.Moon and H.Hwang,POSTECH, South Korea.

19


Figure 4.7 – (a)Measured switching energy vs conductance of the Al-Mo/PCMO for different device sizes (ranging from 150nm to 1µm) with1V read pulses. (b)Normalized data with respect to the active area of thedevice. From[11].

oxidation (reduction) reactions. Asymmetry in the device structure and theoxidation-reduction reactions dynamics contribute to the asymmetry in theswitching characteristics, but the gradual SET and RESET operation are ofgreat interest for neuromorphic applications.

Measurements were taken to characterize the dependence of the switchingcharacteristics on the dimension of the device (defined by the hole size inthe patterning process) with results showed in figure 4.7. The switchingenergy (the current) increases with the conductance of the device, and lin-early depends on the hole size used for fabrication. This feature is clearlyvisible in fig.4.7 (b), where perfect overlap of the specific switching energyis highlighted.

4.2.3 Improvements from previous devices

Previous research concerning PCMO-based RRAM was carried out at Prof.Hwang ’s lab, both for basic electrical characterization [33, 34] and potentialneuromorphic applications [25, 26, 35]. Several material and device improve-ments were made to assess, in particular, low ON/OFF ratio, data retentionand switching stability. In this subsection the two latest version of PCMOresistive switching devices will be compared, namely the Mo/PCMO andthe AlMo/PCMO. The structural difference between the two devices hasbeen already pointed out previously in Sec. 4.2.1. The physics of the switch-ing mechanism is complex and involves redox reactions, electrostatic forces,diffusion mechanism as well as interface physics thus making the choice ofmaterials combinations utterly difficult. In Fig. 4.8 the I-V curve of the twodifferent devices is compared.

20


Figure 4.8 – Comparison ofthe current-voltage charac-teristics of Mo and Al-Mo/PCMO based resistivememory. The hysteresisloop in the I − V curve isdue to the change in conduc-tance of the device undertest after the SET and RESETtransition. Image courtesyof K.Moon and H.Hwang,POSTECH, South Korea.

Figure 4.9 – Using an Al electrodeinstead of Mo greatly increasesthe device ON/OFF ratio. Whenputting the two electrodes together(using the Mo layer as a diffusionbarrier), the ON/OFF ratio staysapproximately the same, indicatingthat the reaction still happens onthe Al electrode. Image courtesy ofK.Moon and H.Hwang, POSTECH,South Korea.

Figure 4.10 – The high resistancestate in Al-only devices is not sta-ble. Adding a Mo diffusion barrierfor Oxygen atoms prevents the dis-solution of the oxide at the interfaceof the electrode. Image courtesy ofK.Moon and H.Hwang, POSTECH,South Korea.

21


Desirable characteristics of resistive switching devices include a high ON/OFFratio, which means that the difference resistance of the HRS and the LRSneeds to be large. The Al electrode fits perfectly this requirement: as theoxygen ions react with the metal, a thin layer of AlOx is formed. While Alis a good conductive material, its oxide is a strong insulator, endowing thedevice with a huge swing in resistance between the two states. However,devices with the aluminium electrode only had problems with data reten-tion: the LRS had the tendency to drift towards higher resistance values. Tosolve both problems, a Mo diffusion layer has been added to the device toimprove stability. These problems are summarized in Fig.s 4.9 and 4.10.

For either neuromorphic or storage application, these two characteristic arehighly desirable to achieve competitive performance.

22

Chapter 5

Hardware Neural Networks withNVMs as Synaptic Device

A brief theoretical overview of ANNs has been carried out in Chap. 3. Themain concepts relative to neural networks, such as history, development,working principle, advantages and disadvantages have been pointed out.Some information about NVM devices, with particular focus on PCM andPCMO devices, has been gathered in Chap. 4 as preliminary introduction.

In this chapter, the use of NVMs as synaptic devices in hardware neuralnetworks will be treated. A short selected literature survey will introducethe techniques used in the present work to assess the performance of realresistive switching devices implemented in large-scale simulated neural net-works.

5.1 Literature Review

The operation of artificial neural networks was reviewed in Chap. 3. Themultiply-accumulate (MAC) function of neurons was one of the focus pointof the chapter. MAC in backpropagation is executed both in the forward andreverse propagation phase, and currently represents one of the most timeand energy consuming steps of the algorithm implementation on CPUs anGPUs as the matrices and vectors to be multiplied usually have large size(thousand to millions element each). Accelerating the MAC operation iscrucial for an efficient on-chip implementation of machine learning.

In this section it will be shown how crossbar-arrays of resistors can naturallyperform the MAC operation using the Kirchhoff law of currents. Further inthe same chapter, a parallel crossbar compatible weight update algorithmwill be presented, as introduced in [8]. Fig. 5.1 sketches the hardware struc-ture of the chip. Synapses are replaced by conductance pairs while neuronsare built from neuromorphic CMOS circuits.

23

5. Hardware Neural Networks with NVMs as Synaptic Device

Figure 5.1 – NVM crossbar compatible implementation of a feedforwardmultilayer perceptron. Artificial neurons are implemented with CMOS cir-cuitry while synapses are represented by NVM conductances pairs. From[8].

5.1.1 Analog MAC Operation

Two of the mathematical formulas characterizing forward and reverse eval-uation in ANNs are iterated again in Eq.s 5.1 and 5.2.

xBj = tanh

(N

∑i=1

(xAi · wij)

)(5.1)

δCk = tanh′(xC

k )

(N

∑j=1

(δDj · wij)

)(5.2)

The products of the weights with the respective upstream neuron valuesand then again the weights with the error terms of the downstream neuronsneed to be calculated and then summed together. Fig. 5.2 shows how, byencoding the neuron values as voltages and the synaptic weights as differ-ences in conductances pairs on the same column it is possible to obtain acurrent proportional to the argument of the squashing function in Eq. 5.1.By integrating the current on an empty capacitor, a voltage proportional tothe forward read current can be generated, preparing the neuron value topropagate to the next level [8]. For reverse read, the same principle is ex-ploited to obtain the sum along the rows (bit lines) of the crossbar array.

24

5.2. Crossbar-compatible Weight Update

Figure 5.2 – Scheme of the two-NVM synapse. The weight is en-coded as the difference between theconductance values of the two de-vices: wij = G+

ij − G−ij in order torepresent negative weights. Havingtwo conductances per synapse alsohelped to mitigate the asymmet-ric conductance response of manyNVMs. Neuron values xi are pre-sented as bit line voltages and thecurrent is summed on the word line.From [8].

(a) (b)

Figure 5.3 – (a) CS version of the weight update algorithm. The downstreamand the upstream need to exchange information to determine the correctweight update. (b) Crossbar-compatible weight update scheme. The neu-rons fire a number of programming pulses proportional to the local neuronvalue (output or error). The programming is done in parallel for all thesynapses according to the overlap of pulses of downstream and upstream.From [5].

This analog MAC operation is showed, in some cases, to be up to 10 000xfaster than the digital ASIC approach [16].

5.2 Crossbar-compatible Weight Update

The third phase of the backpropagation algorithm needs to be acceleratingwhen implementing on-chip learning. This Sec. underlines one of the dif-ferences lying between this project and others such as the IBM TrueNorthchip. The Synapse team, responsible for the development of the chip, used

25


CS weight update

(a)

Crossbar Compatible Update

(b)

Figure 5.4 – (a) CS version of the weight update algorithm. The value ofthe weight change is an analog quantity resulting from the product ∆wij =η · xi · δj. (b) Crossbar-compatible weight update scheme. The change inweight is proportional to the number of pulses that overlap on the NVM.By adopting a determined firing scheme, it is possible to adjust the weightchange accordingly. From [8].

fast and reliable SRAM units to emulate the connections between neurons.Using an architecture similar to the distributed model shown in Chap. 2, theywere able to implement very-large scale neural networks on their hardware,in a ultra-low-power and efficient fashion. The team is currently workingon benchmarking all the state-of-the-art classifiers (including large convolu-tional and recurrent neural networks) for performance evaluation. However,the TrueNorth chip does not allow for on-line training (e.g. learning) of neu-ral networks, which still needs to be done off-line on inherently inefficientservers/Von-Neumann machines.

Fig. 5.3a shows the conventional weight update performed in the computer-science version of the backpropagation algorithm. Every synapse needs toadapt according to the values stored in the downstream and upstream neu-rons. Information has to be exchanged between the two layers and everysynapse must be processed serially. On the other hand, Fig. 5.3b shows thata similar weight update can be obtained with NVMs if we assume that thechange in conductance is proportional to the number of pulses that overlapon the resistive device. In particular, by adopting a certain pulse scheme(showed in Fig. 5.4b on the horizontal and vertical axes) it is possible to en-sure that the pulse overlap is proportional to the product of the two neuronvalues. In this context, the parameter previously called learning rate η canbe changed by tuning the number of pulses fired.

26

5.3. The G-diamond Plot

Figure 5.5 – The training and test-ing accuracy of the MLP presentedin Chap. 3 on the MNIST databaseof handwritten digits is unchangedwhen switching from the computerscience version of the weight up-date algorithm to the crossbar-compatible scheme. From [8].

In [8] it has been shown that on the MNIST dataset there is no loss of ac-curacy when employing the crossbar compatible weight update algorithm(Fig. 5.5 ). Two different weight update schemes will be presented andsome of their advantages and disadvantages will be pointed out ([7]). In theso-called Alternate Bidirectional scheme, only one conductance at the timeis programmed, in particular, the G+ is changed for even examples whilethe G− is changed in odd examples. In the Fully Bidirectional weight up-date algorithm, both the conductances are changed for every programmingevent. The difference in circuit implementation will be discussed further inChap. 6.

5.3 The G-diamond Plot

In this section an useful visualization tool will be introduced, the G-diamondplot. Every synaptic state can be fully determined by knowing the G+ andthe G− values of its conductances (Fig. 5.6). A bivariate histogram can bebuilt, sorting the synapses according to their conductance value. Plottingthe distribution of synaptic values can help to understand some problemsin programming the devices (e.g. all the conductances tend to be stuck intheir RESET region). The G-diamond plot can be used to visualize the be-haviour of the system for particular cases of weight change, in Fig. 5.7 thedifferent outcomes of the same weight update using alternate bidirectionaland fully bidirectional is shown.

5.4 Jump Table Concept

In the previous sections the crossbar-compatible weight update algorithmwas presented and the concept of the G-diamond plot was introduced. Thespeed and efficiency advantages of the former have been pointed out, how-ever, it is necessary to understand how such an update algorithm will per-form together with real resistive switching devices.

27


(a)(b)

Figure 5.6 – (a) Sketch of the G-diamond plot. Every synapse can be plottedusing its G+ and G− value. If the axes are rotated oblique the weight canbe represented on a vertical axis. (b) A heat map, taking into account theoccurrence of synapses for every (G+, G−) pair can be used to plot thedistribution of synaptic values for the whole network.

Figure 5.7 – Comparison between the two different update algorithms used.(a) The desired weight change (grey) is compared to the actual weightchange (blue/red, purple). (b) Alternate Bidirectional algorithm in the G-diamond. When one of the two conductances saturates (reaches the bound-aries of the plot), there is just 50% chance that at the next update the weightwill be stuck at the same value. (c) Fully Bidirectional update algorithm willalways fail to correctly update the weight in case one of the conductancesreaches the extreme value. From [7].

28

5.4. Jump Table Concept

Figure 5.8 – Variability in PCM memory devices as modelled by Bichler et al.[4]. For every conductance update, only one curve is chosen to implementthe change. With jump tables, only one plot is sufficient to model variabilityin resistive switching, and the resulting probability distribution its moreintuitively visualized.

Figure 5.9 – Conductance responseof an idealized resistive switchingdevice. For every SET or RESETprogramming pulse fired, there isa gradual (small) change in conduc-tance ∆G which has opposite signbut same magnitude. The perfor-mance as synaptic device is compa-rable to the computer science ver-sion of the network.

Resistive switching is present in different materials, structures and devices,and might be caused by different physical phenomena [27] [6]. From anelectrical standpoint, considering them as two-terminal passive devices, theyshow a common behaviour. The resistance between the two terminals variesaccording to the input current provided. For PCM devices, as mentionedin section 4.1, the different resistive behaviour is due to change in the crys-talline structure. In the perspective of crossbar VLSI implementation, withthe hope of characterizing resistive switching devices might be a difficulttask. The characterization ought to be accurate and usable for simulations.One common way to characterize NVMs is to measure the Gvsnpulses curve.The device under study is initialized in its high resistance state (full RESET)and gradually switched to its low resistance state by performing successiveconductance jumps caused by the programming pulses.

In Sec. 5.2, it is assumed that every effective programming pulse achieves anequal, smooth change in conductance (equivalent to a conductance responselike the one showed in Fig. 5.9). Real devices usually do not show such idealresistive behaviour. Fig. 5.10 shows the Gvspulse curve of real measured Al-

29


SET

RESET+ 1s

Average conductance response

#pulses

Measured conductance response G (a.u.)

Figure 5.10 – Measured SET and RESET characteristic of AlMo/PCMO de-vices. The blue line shows the average measured response, while the redsegments indicate the slope of the conductance change at ±1σ (standarddeviation) from the median.

Mo/PCMO devices. The average conductance response is highly non linear(in particular for the first SET pulses and for the first RESET pulses) andevery jump is subject to variability. To try to predict accurately the perfor-mance of such NVMs used as synaptic devices, it is necessary to build a toolwith the following characteristics:

— Model conductance behaviour of any kind (e.g. linear, exponential),— Takes into account variability of every different jump;— Straightforward to implement in computer simulations,— Usable for both real and modelled devices (e.g. it can either describe

a real device or be built bottom-up to reproduced desired switching).Under the assumption that always the same type of programming pulses areused (e.g. duration, amplitude and shape do not change between differentprogramming events), the characterization tool can be function just of thedevice properties.

Jump tables plots are able to fully describe the behaviour of resistive switch-ing device arrays. Information such as the average jumpsize, probability dis-tribution for jumpsizes in every conductance value and switching anomaliescan be extracted by analyzing the source data of the plots. This approachrepresents an advancement of what has been already presented in literaturefor PCMO as synaptic devices, where factors such as variability could notbee taken into account [17]. To have a more realistic implementation of suchdevices, the jump table approach does not allow to measure the value of the

30

5.4. Jump Table Concept

Figure 5.11 – Jump tablesare plots whose horizon-tal axis is represented bythe normalized conductancestate (G expressed as a per-centage from its minimumto its maximum possiblevalue) and the vertical rep-resents the entity of a con-ductance jump for a singlepulse. The color indicates,for every G value, the prob-ability that the jump per-formed is lower that the ∆Gin that point.

Equal programming pulses

(a)

Differentiated programming pulses

(b)

Figure 5.12 – Modeling of PCMO RRAM conductance response in [17]. Byvarying the amplitude (e.g. the voltage) of the programming pulses, it ispossible to obtain a linear conductance response. Very high machine learn-ing performance can be obtained using such scheme, but the system doesnot allow for a crossbar compatible weight update scheme. Image from [17].

31


G (% of maxG)G (% of maxG)

Measured conductance changeDG (a.u.)

Measured conductance changeDG (a.u.)

SET RESET

0 100806040200 80604020 0 100806040200 80604020

100%

50%

0%

Figure 5.13 – Jump Table for AlMo PCMO devices. 50000 total SET pulses(4.0V, 10ms) and RESET pulses (3.5V, 10ms) followed by 1V read pulses wereused on three identically-sized (200nm) devices.

conductance state before firing a programming pulse. It has been showedthat using a differentiated pulse scheme, as in [17], can result in a linear con-ductance response (Fig. 5.12b) which allows to achieve high classificationaccuracies also with defective devices (90.55%).

In Fig. 5.13 the measured jump table for the AlMo/PCMO devices intro-duced in section 4 are reported. Despite its bidirectionality and relativesmoothness of the reset transition, the conductance respose of these devicesstill presents some undesirable characteristic among the ones reviewed inSidler et al. [36]. Some of these will be individually analyzed in Sec. 6 toassess their impact on the classification performance.

32

Chapter 6

Simulated PCMO Performance Results

In this chapter the simulated performances of PCMO RRAM as synapticdevices will be presented.In the studies presented in this chapter, some par-ticular aspects of AlMo/PCMO switching behaviour will be individuallytaken into account, and their impact on the classification capabilities of thenetwork will be assessed. After performing full optimization on knownparameters, the results will be compared to the performance of ideal bidi-rectional switching devices with gradual, smooth and symmetric SET andRESET characteristics.

6.1 Jumpsize disparity in AlMo/PCMO RRAM

One of the undesirable characteristic of the switching behaviour of PCMORRAM is the difference in jumpsize between the extrema of the jump tableand the center. The physical cause of this difference might be traced backto the redox reaction that causes the change in resistance, whose dynamicis highly non-linear for low oxygen concentrations. Fig. 6.1 highlights thejumpsize disparity in the SET and RESET jump tables of AlMo/PCMO de-vices ((a) and (b)) as well as in their pulsed conductance response (c). Jump-size disparity is a feature which is impossible to compensate for withouthaving to measure the value of the conductance before the programming(as stated in Sec. 5.4, having to measure the G value before firing wouldmake the VLSI implementation infeasible).

In order to assess the impact of this phenomenon and isolate it from theother characteristics of AlMo/PCMO devices for machine learning perfor-mances, a set of constructed conductance responses have been built to repro-duce jump size disparity. Fig. 6.3 (a) and 6.3 (b) show the structure of thejump tables used for this simulation. The range of possible conductances onthe jump table (the horizontal axis) was divided in two regions, spanning re-spectively 10% and 90% of the total. The shorter region was endowed with

33

6. Simulated PCMO Performance Results

Figure 6.1 – Due to the non-linearity of the reaction dynamic, AlMo/PCMORRAM devices show very steep response for extreme values of the conduc-tance. This feature can be spotted using both the jump table plots and thepulsed conductance response (green and magenta ellipses).

.]a) b)

ance

) [a.

ua) b)

(con

duct

aG

(

# pulses

Figure 6.2 – The PCMO jump table shows high jumps for extreme values ofthe conductance. This feature causes a steep change in conductance for thefirst SET pulses (starting from the HRS state) and in the first RESET pluses(starting from the LRS state). Fig. (a) Shows the impact of the G vs pulsecharacteristic on the distribution of conductances in the G-diamond. Fig. (b)highlights the steep region of the G vs pulse plot. Image courtesy of SeverinSidler.

34

6.2. Asymmetric Conductance Response

Figure 6.3 – (a), (b)Constructed jumptables used to as-sess the impact ofjumpsize disparity.The jumpsize valueis ∆Gmax for 10%of the conductancerange, and ∆Gminfor the remaining90%. (c) Accuracyof the perceptronfor varying ratios of∆Gmax/∆Gmax.

a linear jumpsize (∆Gmax which is equal or bigger than the other (∆Gmin).By keeping the value of ∆Gmin constant, the other jumpsize was varied. Itis expected that the classification accuracy of the MLP will decrease as thevalue of ∆Gmax increases, because the same programming pulse may causevery different weight changes according to the initial state of the conduc-tance to which it has been fired. The achieved neural network performanceare plotted as a function of the ratio between the jumpsizes in Fig. 6.3 (c).Good accuracy is retained up to jumpsize ratios of ≈ 100, as an example,the measured ∆Gmax/∆Gmin of AlMo/PCMO devices is ≈ 189.

6.2 Asymmetric Conductance Response

Figure 6.4 – The average jumpsize for SET and RESET in AlMo/PCMO de-vices has different values. For a correct synaptic behaviour, the two changesshould be equal.

35


(a) (b)

Figure 6.5 – (a)(b) Crossbar-array compatible implementation of asymme-try correction for the fully (a) and alternate (b) bidirectional weight updatescheme. Weight increases (decreases) can be implemented either as a SEToperation on the G+ (G-) or a RESET operation on the G- (G+). Asymmetryin the partial SET and RESET operation is compensated by applying a differ-ent learning rates (ηSET , ηRESET) modulating the number of pulses fired bythe neurons into the array[11].

One of the advantages of PCMO devices with respect to phase-change mem-ories mentioned in Chap. 4, was the analog smooth RESET characteristicwhich allows for bidirectional conductance change. However, the dynamicof the SET and RESET operation is not exactly the same, causing the aver-age jumpsize for SET to be different from the one of the RESET. Fig. 6.4shows a zoom in the AlMo/PCMO jump tables where the asymmetric isevident. This is an undesirable feature as synapses will tend to drift awayfrom the center of the G-diamond, where is desirable for synapses to be. Dif-ferently from disparity in jumpsizes, asymmetric conductance response canbe corrected with modifications of the crossbar-compatible weight updatealgorithm. In Fig. 6.5 the idea of individual learning rates is shown. Whenperforming the weight update, it is known a priori if the pulses fired are SETor RESET pulses. It follows that it is possible to intentionally fire more orless programming pulses to compensate for the asymmetry. In Fig. 6.6 theclassification accuracy is plotted as a function of the correction factor (e.g.the ratio between the individual learning rates) and the effect of the correc-tion on the distribution of synaptic states is visualized with the G-diamond.In Fig. 6.6 (b), (c) and (d) three G-diamond plots relative to part (a) are plot-ted. It can be noticed that by changing the correction factor it is possible tomove the center of gravity of the synapses, incrementing the available weightrange and thus the accuracy. However, the compensation can easily causeunbalance, severely decrementing the performance.

In Sec. 6.1 and 6.2, two important aspects of AlMo/PCMO two-NVM-per-synapse were analyzed and their impact on the performances was assessed.

36

6.2. Asymmetric Conductance Response

Figure 6.6 – In the bottom left part of the figure, the accuracy as a function ofthe individual learning rate ratio is plotted (ηSET/ηRESET). The G-diamondis plotted for three different point: (a) ηSET = ηRESET if no correction is made,synapses tend to concentrate in the left part of the G-diamond, limiting theweight range available (thus, the accuracy), (b) ηSET = 1.66 · ηRESET, by firingslightly more SET pulses for every programming step, it is possible to drivethe synapses towards the center of the G-diamond, maximizing the accuracy.(c) ηSET = 10 · ηRESET if overcompensation occurs, the accuracy decreasesand some regions of the G-diamond are not populated at all.

37


Figure 6.7 – Evo-lution of the train-ing accuracy forideal bidirectionalNVM compared toperformances ofreal AlMo/PCMORRAM. The fullybidirectional weightupdate algorithmachieves better re-sults when synapsesare close to ’ideal’while the alternatebidirectional is ableto even out theperformances (bydecreasing the ac-curacy in the idealcase).

In the following sections, a new synaptic architecture will be proposed andits advantages and disadvantages will be pointed out.

6.3 Ideal Bidirectional NVM Performance

In this case, the only limitation for a perfect training is the crossbar-compatibleupdate algorithm and the limitation in weight magnitude (as the conduc-tances themselves have a limited conductance range). Fig. 6.7 shows theevolution of the training and the testing accuracy with both the alternateand the fully bidirectional weight update if compared to the fully optimizedAlMo/PCMO performance. It can be noticed that the fully bidirectionalachieves the highest training accuracy (> 99%) while the alternate bidirec-tional has lower performance. Fig. 6.8 explains one of the reason of thisdiscrepancy: in case of a perfectly symmetrical conductance response, thefully bidirectional weight update manages to keep all the synapses exacty atthe center of the G-diamond, where the weight range available is larger. Onthe other hand, the alternate bidirectional algorithm scatters all the synapsesin different regions, limiting some of them to very narrow weight ranges.

38

6.4. Single Bidirectional NVM

Figure 6.8 – For highly non-linear conductance response, the ”fully bidi-rectional” update algorithm can have lower performances because two bigjumps can be performed at the same time. However, for an ideal conduc-tance response, the same algorithm is able to keep the synapses at the centerof the G-diamond, maximizing the weight available. (a) and (b) show twodifferent G-diamonds plotted with ideal NVMs as synaptic devices. Imagefrom [7].

6.4 Single Bidirectional NVM

Due to the partial SET and RESET switching it is possible to use PCMORRAM as a standalone synaptic device. In this case, weight potentiationwill be performed with gradual SET pulses, while weight depression will beimplemented with RESET pulses. The advantage of such implementation isevident, as the density of the system could be doubled by keeping the samedevice-per-area. A sketch of the circuit implementation is presented in Fig.6.9. The weight update algorithms retains the characteristics described insection 5.2. The new update scheme is showed in figure 6.10. It is still pos-sible to plot useful information concerning the distribution of the weightsand of the conductances (e.g. the G-diamond), but some modifications needto be done. In Fig. 6.11 a one-to-one correspondence between the valueof the conductance and the weight of the synapse is showed. In particular,the weight is equal to the conductance of the device minus a certain refer-ence value, which should be set as the ’average’ value of the conductance, inorder to have the conductance range equally divided between positive andnegative weights. There are many ways that could be used in the real hard-ware to produce this reference value: for example (which is showed in Fig.6.9) is to keep the same structure and circuitry of the two-NVM-per-synapse,and subtract to the neuron forward current a ’reference’ current. The valueof the reference current is given by Eq. 6.1, and it can be produced on chipby adding a single column with all the devices set at an intermediate value

39


Figure 6.9 – With bidirec-tional switching devices, ispossible to replace the col-umn of the G− conduc-tances with a reference cur-rent given by equation 6.1.This implementation couldpotentially double the den-sity of neuromorphic ele-ments. The reference cur-rent must be produced on-chip as it depends on theupstream values of the neu-rons.

Figure 6.10 – Forsingle bidirectionalNVMs, weightupdate has thesame sign of thechange in conduc-tance: SET pulsesimplement weightpotentiation whileRESET pulses im-plement weightdepression.

of conductance.Ire f = ∑ xi ·

Gmax + Gmin

2(6.1)

The same concept of individual learning rates applies, and simulated net-work performance are plotted in Fig. 6.12

40

6.4. Single Bidirectional NVM

Figure 6.11 – The distribution ofsynapses in the single bidirectionalNVM synaptic architecture can berepresented by a histogram. Theweights have a 1-to-1 correspon-dence with the value of the conduc-tances.

Figure 6.12 – Singlebidrectional NVMretains the samepeak performancesof the two-NVM-per-synapse architecture.However, for nonideal parameter SET(for example whenthe weight updateis strongly asymmet-ric), the accuracystrongly decreases.

41

Chapter 7

Conclusion

AlMo/PCMO devices are a consistent and great improvement with respectto previous versions (like the TiN/PCMO used in [17] or the Mo/PCMOpreviously studied by the group).

The accuracy is comparable with the one obtained with PCM devices withthe LG technique (introduced in [7] but still confidential as patent filing isin process). This means that PCMO devices, after this work, have become afeasible alternative to PCMs.

As described in Chap. 4, phase change memories are unidirectional devices,which means that the conductance can be only increased in a controlledway. The two-PCM-per-synapse architecture allows to both gradually po-tentiating an depressing the weight by incrementally crystallizing the G+ orthe G−. However, if the PCM conductance can only increase, both deviceswill soon reach saturation, starting to zero all the weights. The effect on thenetwork would be the same as ’forgetting’ the weights learned with training.Occasional RESET (see. [8] for technical details) solves this problem by pe-riodical RESET of both synapses, followed by a partial SET on the G+ orthe G- to correctly implement the weight previously learned. There is noefficient way of performing the occasional RESET with the current circuitryimplemented, and implementing the synapses with PCMO devices couldsolve this problem.

There are still several things to improve in the current RRAM devices: switch-ing time (currently equal to 10ms and not scalable below 1ms) and energyand yield improvement. This work will be carried out by Prof. Hwanggroup in POSTECH after the ideas raised by this work. If further improve-ments can be made to the device and more robust algorithm will be devel-oped, promising large scale hardware chips could be produced.

43

Bibliography

[1] SJ Ahn, YJ Song, CW Jeong, JM Shin, Y Fai, YN Hwang, SH Lee,KC Ryoo, SY Lee, JH Park, et al. Highly manufacturable high den-sity phase change memory of 64mb and beyond. In Electron DevicesMeeting, 2004. IEDM Technical Digest. IEEE International, pages 907–910.IEEE, 2004.

[2] James A Anderson. An introduction to neural networks. MIT press, 1995.

[3] Bharat Bhushan. Tribology and mechanics of magnetic storage devices.Springer Science & Business Media, 2012.

[4] Olivier Bichler, Manan Suri, Damien Querlioz, Dominique Vuillaume,Barbara DeSalvo, and Christian Gamrat. Visual pattern extraction us-ing energy-efficient ”2-pcm synapse” neuromorphic architecture. IEEETransactions on Electron Devices, 59(8):2206–2214, 2012.

[5] G. W. Burr, P. Narayanan, R. M. Shelby, S. Sidler, I. Boybat, C. di Nolfo,and Y. Leblebici. Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative perfor-mance analysis (accuracy, speed, and power). In 2015 IEEE InternationalElectron Devices Meeting (IEDM), pages 4.4.1–4.4.4, December 2015.

[6] Geoffrey W. Burr, Matthew J. Breitwisch, Michele. Franceschini, Da-vide. Garetto, Kailash. Gopalakrishnan, Bryan. Jackson, Bulent. Kurdi,Chung. Lam, Luis A. Lastras, Alvaro. Padilla, Bipin. Rajendran, Simone.Raoux, and Rohit S. Shenoy. Phase change memory technology. Journalof Vacuum Science and Technology B, 28(2):223–262, 2010.

[7] Geoffrey W. Burr, Robert M. Shelby, Severin Sidler, Carmelo di Nolfo,Junwoo Jang, Irem Boybat, Rohit S. Shenoy, Pritish Narayanan, Ku-mar Virwani, Emanuele U. Giacometti, Bulent N. Kurdi, and HyunsangHwang. Experimental Demonstration and Tolerancing of a Large-ScaleNeural Network (165 000 Synapses) Using Phase-Change Memory as

45

Bibliography

the Synaptic Weight Element. IEEE Transactions on Electron Devices,62(11):3498–3507, November 2015.

[8] G.W. Burr, R. M. Shelby, Carmelo di Nolfo, J. Jang, R.S. Shenoy, Pri-tish Narayanan, K. Virwani, Emanuele U. Giacometti, B.N. Kurdi, andH. Hwang. Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memoryas the synaptic weight element. IEDM Technical Digest, page T29.5, 2014.

[9] Leon Chua. Resistance switching memories are memristors. AppliedPhysics A, 102(4):765–783, 2011.

[10] Robert H Dennard, Fritz H Gaensslen, V Leo Rideout, Ernest Bassous,and Andre R LeBlanc. Design of ion-implanted mosfet’s with verysmall physical dimensions. IEEE Journal of Solid-State Circuits, 9(5):256–268, 1974.

[11] Alessandro Fumarola, Pritish Narayanan, Lucas L. Sanches, Severin Si-dler, Junwoo Jang, Kibong Moon, Robert M. Shelby, Hyunsang Hwang,and Geoffrey W. Burr. Accelerating machine learning with non-volatilememory: exploring device and circuit tradeoffs. In ICRC 2016, page toappear, 2016.

[12] Kiarash Gharibdoust, Armin Tajalli, and Yusuf Leblebici. A hybridnrz/multi-tone i/o with crosstalk and isi reduction for dense intercon-nects. IEEE Journal of Solid-State Circuits, 51(4):992–1002, 2016.

[13] Martin T Hagan, Howard B Demuth, Mark H Beale, and OrlandoDe Jesus. Neural network design, volume 20. PWS publishing companyBoston, 1996.

[14] Hendrik F Hamann, Martin O’Boyle, Yves C Martin, Michael Rooks,and H Kumar Wickramasinghe. Ultra-high-density phase-change stor-age and memory. Nature materials, 5(5):383–387, 2006.

[15] Donald Olding Hebb. The organization of behavior: A neuropsychologicaltheory. Psychology Press, 2005.

[16] Miao Hu, John Paul Strachan, Zhiyong Li, Emmanuelle Merced Grafals,Noraica Davila, Catherine Graves, Sity Lam, Ning Ge, R StanleyWilliams, and Jianhua Yang. Dot-product engine for neuromorphiccomputing: programming 1t1m crossbar to accelerate matrix-vectormultiplication. In Proceedings of DAC, volume 53, 2016.

[17] Jun Woo Jang, Sangsu Park, Geoffrey W. Burr, Hyunsang Hwang, andYoon Ha Jeong. Optimization of conductance change in Pr1−xCax MnO3-based synaptic devices for neuromorphic systems. IEEE Electron DeviceLetters, 36(5):457–459, 2015.

46

Bibliography

[18] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradi-ent Based Learning Applied to Document Recognition. Proceedings ofthe IEEE, 86(11):2278–2324, 1998.

[19] Warren S McCulloch and Walter Pitts. A logical calculus of the ideasimmanent in nervous activity. The bulletin of mathematical biophysics,5(4):115–133, 1943.

[20] Carver Mead. Neuromorphic electronic systems. Proceedings of the IEEE,78(10):1629–1636, 1990.

[21] Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S.Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam,Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K. Esser,Rathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D. Flick-ner, William P. Risk, Rajit Manohar, and Dharmendra S. Modha. A mil-lion spiking-neuron integrated circuit with a scalable communicationnetwork and interface. Science, 345(6197):668–673, 2014.

[22] Marvin Minsky and Seymour Papert. Perceptrons. 1969.

[23] Ashkan Hosseinzadeh Namin, Karl Leboeuf, Roberto Muscedere,Huapeng Wu, and Majid Ahmadi. Efficient hardware implementationof the hyperbolic tangent sigmoid function. In 2009 IEEE InternationalSymposium on Circuits and Systems, pages 2117–2120. IEEE, 2009.

[24] Pritish Narayanan, Lucas L. Sanches, Alessandro Fumarola, Robert M.Shelby, Junwoo Jang, Hyunsung Hwang, Yusuf Leblebici, and Geof-frey W. Burr. Reducing circuit design complexity for neuromorphic ma-chine learning systems based on nonvolatile memory arrays. In 2016IEEE International Electron Devices Meeting, page in review. IEEE, 2016.

[25] S Park, H Kim, M Choo, J Noh, A Sheri, S Jung, K Seo, J Park, S Kim,W Lee, et al. Rram-based synapse for neuromorphic system with pat-tern recognition function. IEDM Tech Dig, 10:1–10, 2012.

[26] Sangsu Park, Jinwoo Noh, Myung-lae Choo, Ahmad Muqeem Sheri,Man Chang, Young-Bae Kim, Chang Jung Kim, Moongu Jeon, Byung-Geun Lee, Byoung Hun Lee, et al. Nanoscale rram-based synapticelectronics: toward a neuromorphic computing device. Nanotechnology,24(38):384009, 2013.

[27] S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y.-C. Chen, R. M.Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, and C. H. Lam.Phase-change random access memory: A scalable technology. IBMJournal of Research and Development, 52(4.5):465–479, 2008.

[28] Martin Riedmiller and Heinrich Braun. A direct adaptive methodfor faster backpropagation learning: The rprop algorithm. In Neural

47

Bibliography

Networks, 1993., IEEE International Conference On, pages 586–591. IEEE,1993.

[29] Raul Rojas. Neural networks. Springer-Verlag, Berlin, page Second edi-tion, 1996.

[30] Raul Rojas. Neural networks: a systematic introduction. Springer Science& Business Media, 2013.

[31] Frank Rosenblatt. The perceptron: a probabilistic model for informa-tion storage and organization in the brain. Psychological review, 65(6):386,1958.

[32] Krishna C Saraswat and Farrokh Mohammadi. Effect of scaling of inter-connections on the time delay of vlsi circuits. IEEE Journal of Solid-StateCircuits, 17(2):275–280, 1982.

[33] Dong-Jun Seong, Musarrat Hassan, Hyejung Choi, Joonmyoung Lee,Jaesik Yoon, Ju-Bong Park, Wootae Lee, Min-Suk Oh, and HyunsangHwang. Resistive-switching characteristics of for nonvolatile memoryapplications. IEEE Electron Device Letters, 30(9):919–921, 2009.

[34] Dong-jun Seong, Jubong Park, Nodo Lee, Musarrat Hasan, SeungjaeJung, Hyejung Choi, Joonmyoung Lee, Minseok Jo, Wootae Lee, SangsuPark, et al. Effect of oxygen migration and interface engineering onresistance switching behavior of reactive metal/polycrystalline pr 0.7ca 0.3 mno 3 device for nonvolatile memory applications. In 2009 IEEEInternational Electron Devices Meeting (IEDM), pages 1–4. IEEE, 2009.

[35] Ahmad Muqeem Sheri, Hyunsang Hwang, Moongu Jeon, and Byung-geun Lee. Neuromorphic character recognition system with two pcmomemristors as a synapse. IEEE Transactions on Industrial Electronics,61(6):2933–2941, 2014.

[36] S. Sidler, I. Boybat, R. M. Shelby, P. Narayanan, J. Jang, A. Fumarola,K. Moon, Y. Leblebici, H. Hwang, and G. W. Burr. Large-scale neu-ral networks implemented with non-volatile memory as the synapticweight element: impact of conductance response. In ESSDERC 2016,page to appear, 2016.

[37] Charles Henry Sie. Memory cell using bistable resistivity in amorphousas-te-ge film. 1969.

[38] John Von Neumann and Ray Kurzweil. The computer and the brain. YaleUniversity Press, 2012.

[39] Fei Wang. Non-volatile memory devices based on chalcogenide materi-als. In Igor Stievano, editor, Flash Memories, chapter 10. InTech.

48

cognitive computing with non-volatile memory devices - epfl · le reti neurali artiﬁciali, ideate...

Documents