memory advances in neural turing machines · 2020. 12. 21. · 8/06/2019 1 hanoi, june 2019 truyen...

31
8/06/2019 1 Hanoi, June 2019 Truyen Tran Deakin University @truyenoz truyentran.github.io [email protected] letdataspeak.blogspot.com goo.gl/3jJ1O0 Memory Advances in Neural Turing Machines

Upload: others

Post on 29-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/06/2019 1

    Hanoi, June 2019

    Truyen TranDeakin University

    @truyenoz

    truyentran.github.io

    [email protected]

    letdataspeak.blogspot.com

    goo.gl/3jJ1O0

    Memory Advancesin Neural Turing Machines

  • 8/06/2019 2

    Deep Learning

    Domain expert

    Knowledge-based

  • 8/06/2019 3

    Can we learn from data a model that is as powerful as a Turing machine?

    In other words, can we learn a (neural) program that learns to program from data?

  • 8/06/2019 4

    Program memory

    Outlook

    Sparse read/write

    Variational memory

    Neural Turing Machine

    Agenda

  • Modelling

    Three interwoven processes:• Disease progression• Interventions & care processes• Recording rules

    Example: Electronic medical records

    8/06/2019 5

    Source: medicalbillingcodings.org

    visits/admissions

    time gap ?

    prediction point

    Abstraction

    Need memory to handle thousands of events, compute complex healthcare “grammars”, support chain of reasoning, rapid switching of tasks.

  • Neural Turing machine (NTM)

    A controller that takes input/output and talks to an external memory module.Memory has read/write operations.

    The main issue is where to write, and how to update the memory state.All operations are differentiable.

    https://rylanschaeffer.github.io/content/research/neural_turing_machine/main.html

  • 8/06/2019 7

    Program memory

    Outlook

    Sparse read/write

    Variational memory

    Neural Turing Machine

    Agenda

  • Motivation: Dialog system

    8/06/2019 8

    A dialog system needs to maintain the history of chat (e.g., could be hours)Memory is needed

    The generation of response needs to be flexible, adapting to variation of moods, styles Current techniques are mostly based on LSTM, leading

    to “stiff” default responses (e.g., “I see”).

    There are many ways to express the same thought Variational generative methods are needed. vectorstock

  • Variational memory encoder-decoder (VMED)

    8/06/2019 9

    Conditional Variational Auto-Encoder

    contextgenerated

    latent variables

    VMED

    contextgenerated

    latent variables memory

    reads

  • Sample response

    8/06/2019 10

  • 8/06/2019 11

    Program memory

    Outlook

    Sparse read/write

    Variational memory

    Neural Turing Machine

    Agenda

  • Problems of current NTMs

    Lack of theoretical analysis on optimal memory operations.

    Previous works are based on intuitions:Location-based reading/writing; temporal linkage reading; least-used writing [Santoro et.al, Graves et.al]

    Sparse access over big memory [Rae et.al]

    Very slow due to heavy memory read/write computations

    12

  • Cached Uniform Writing (CUW)

    13

  • Ablation StudyMemory-augmented Neural Networks w/wo Uniform Writing

    14Task: repeat the input sequence twice

  • Synthetic tasks: memorize all

    15

  • Synthetic tasks: memorize selectively

    16

  • Synthetic sinusoidal generation: memorize featured points

    17

  • Flatten MNIST classification

    18

  • Document classification

    19

  • 8/06/2019 20

    Program memory

    Outlook

    Sparse read/write

    Variational memory

    Neural Turing Machine

    Agenda

  • Computing devices vs neural counterparts

    FSM (1943) ↔ RNNs (1982)PDA (1954) ↔ Stack RNN (1993)TM (1936) ↔ NTM (2014)UTM/VNA (1936/1945) ↔ NUTM--ours (2019)The missing piece: A memory to store programs Neural stored-program memory

  • NUTM = NTM + NSM

  • Multi-level modellingHierarchical Regression: if the input is clustered, clustering before regression helps

    Prove for low dimensions maybe available, higher dimension?

  • NSM is beneficial to NTM

  • Algorithmic single tasks

  • Sequencing tasks

  • Continual Learning

  • Few-shot learning

  • Question answering (bAbI dataset)

  • 8/06/2019 30

    Program memory

    Outlook

    Sparse read/write

    Variational memory

    Neural Turing Machine

    Agenda

  • Memory for graphs & relational structuresTuring machine to design machine learning algorithms

    Memory-supported reasoningImaginative memorySocial memory: collective mem, theory of mind, memory of others

    Full cognitive architecturesTheoretical analysis

    8/06/2019 31

    https://twitter.com/nvidia/status/1010545517405835264

    Towards AGI:Is Human Brain a

    (super-)Turing machine?

    Memory Advances in Neural Turing MachinesSlide Number 2Slide Number 3Slide Number 4Example: Electronic medical recordsNeural Turing machine (NTM)Slide Number 7Motivation: Dialog systemVariational memory encoder-decoder (VMED)Sample responseSlide Number 11Problems of current NTMsCached Uniform Writing (CUW)Ablation Study�Memory-augmented Neural Networks w/wo Uniform WritingSynthetic tasks: memorize allSynthetic tasks: memorize selectivelySynthetic sinusoidal generation: memorize featured pointsFlatten MNIST classificationDocument classificationSlide Number 20Computing devices vs neural counterpartsNUTM = NTM + NSMMulti-level modellingNSM is beneficial to NTMAlgorithmic single tasksSequencing tasksContinual LearningFew-shot learningQuestion answering (bAbI dataset)Slide Number 30Slide Number 31