using ml for ml to span the gamut of · using ml for ml to span the gamut of tinyml hardware tinyml...
TRANSCRIPT
![Page 1: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/1.jpg)
![Page 2: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/2.jpg)
Using ML for ML to span the gamut of TinyML hardware
TinyML 2020 Jason Knight - Co-founder and CPO at OctoML
![Page 3: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/3.jpg)
Motivation
CNN GAN RNN MLP DQNNCambrian explosion of models, workloads, and use cases.
Rapidly evolving ML software ecosystem
Silicon scaling limitations (Dennard and Moore):
Cambrian explosion of HW backends. Heterogeneous HW.
Growing set of requirements: Cost Latency Power Security
2
![Page 4: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/4.jpg)
Motivation
CNN GAN RNN MLP DQNNCambrian explosion of models, workloads, and use cases.
Rapidly evolving ML software ecosystem
Silicon scaling limitations (Dennard and Moore):
Cambrian explosion of HW backends. Heterogeneous HW.
Growing set of requirements: Cost Latency Power Security
3
Hard enough on server class hardware…
But now we want to do it in on bare-metal devices?
![Page 5: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/5.jpg)
Bare-Metal Devices
4
FPGA ARM M class,
RISC-V MCUs, etc
![Page 6: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/6.jpg)
Programming bare-metal devices is cumbersome
5
ML Model
?
![Page 7: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/7.jpg)
Running ML on bare-metal hardware is hard
6
ML Model
![Page 8: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/8.jpg)
Running ML on bare-metal hardware is hard
7
ML Model
ML from scratch in C…
![Page 9: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/9.jpg)
ML FrameworkML Model
Running ML on bare-metal hardware is pretty hard
8
![Page 10: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/10.jpg)
ML FrameworkML Model
Running ML on bare-metal hardware is pretty hard
9
Slow kernels,Incomplete coverage
![Page 11: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/11.jpg)
ML FrameworkML Model
Hand- optimize kernels
Running ML on bare-metal hardware is really hard
10
![Page 12: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/12.jpg)
ML FrameworkML Model
Hand- optimize kernels
Running ML on bare-metal hardware is really hard
11
Not much help here…
![Page 13: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/13.jpg)
ML FrameworkML Model
Hand- optimize kernels
Running ML on bare-metal hardware is really hard
12
Bugs!
![Page 14: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/14.jpg)
ML Framework
Run ML models
Debug kernels
Hand- optimize kernels
Running ML on bare-metal hardware is really hard
13
![Page 15: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/15.jpg)
ML Framework
Run ML models
Debug kernels
Hand- optimize kernels
Optimizing ML on bare-metal hardware is a long road
14
![Page 16: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/16.jpg)
μTVM simplifies ML software on bare-metal
15
ML Framework
Run ML models
Debug kernels
Hand- optimize kernels
Automatically
![Page 17: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/17.jpg)
Our Goal
16
Minimize the effort to run and optimize ML on bare-metal.
![Page 18: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/18.jpg)
Our Goal
17
Minimize the effort to run and optimize ML on bare-metal.
So what’s our approach?
![Page 19: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/19.jpg)
The TVM in µTVM
18
![Page 20: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/20.jpg)
The TVM in µTVM
19
FPGA ASIC
![Page 21: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/21.jpg)
The TVM in µTVM
20
FPGA ASIC
How do we get from here…
to here?
![Page 22: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/22.jpg)
TVM
The TVM in µTVM
21
FPGA ASIC
![Page 23: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/23.jpg)
???
The TVM in µTVM
22
FPGA ASIC
![Page 24: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/24.jpg)
???
The TVM in µTVM
23
High-Level Differentiable IR
FPGA ASIC
![Page 25: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/25.jpg)
???
The TVM in µTVM
24
High-Level Differentiable IR
Tensor Expression IR
FPGA ASIC
![Page 26: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/26.jpg)
The TVM in µTVM
25
FPGA ASIC
High-Level Differentiable IR
Tensor Expression IR
VTALLVM, CUDA
![Page 27: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/27.jpg)
The TVM in µTVM
26
FPGA ASIC
High-Level Differentiable IR
Tensor Expression IR
VTALLVM, CUDA
Hand-optimization?
![Page 28: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/28.jpg)
The TVM in µTVM
27
FPGA ASIC
High-Level Differentiable IR
Tensor Expression IR
VTALLVM, CUDA
AutoTVM
![Page 29: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/29.jpg)
Where TVM is being used today
28
Every “Alexa” wake-up today across all devices uses a model optimized with TVM
“[TVM enabled] real-time on mobile CPUs for free...We are excited about the performance TVM achieves.” More than 85x speed-up for speech recognition model.
Bing query understanding: 112ms (Tensorflow) -> 34ms (TVM). QnA bot: 73ms->28ms (CPU), 10.1ms->5.5ms (GPU)
“TVM is key to ML Access on Hexagon”
![Page 30: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/30.jpg)
Where TVM is being used today
29
Every “Alexa” wake-up today across all devices uses a model optimized with TVM
“[TVM enabled] real-time on mobile CPUs for free...We are excited about the performance TVM achieves.” More than 85x speed-up for speech recognition model.
Bing query understanding: 112ms (Tensorflow) -> 34ms (TVM). QnA bot: 73ms->28ms (CPU), 10.1ms->5.5ms (GPU)
“TVM is key to ML Access on Hexagon”
Cf: TVM Conference 2019 (Dec) - videos and slides available online
![Page 31: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/31.jpg)
The TVM in µTVM
30
FPGA ASIC
High-Level Differentiable IR
Tensor Expression IR
VTALLVM, CUDA
AutoTVM
Many hardware targets already enjoy speedups from TVM
![Page 32: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/32.jpg)
Motivation
31
Except for microcontrollers...
![Page 33: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/33.jpg)
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAµTVM
Today’s Talk
32
![Page 34: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/34.jpg)
Today’s Talk
33
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAµTVM
1
![Page 35: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/35.jpg)
Today’s Talk
34
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAµTVM
1
2
![Page 36: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/36.jpg)
Today’s Talk
35
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAµTVM
1
2
3
![Page 37: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/37.jpg)
Today’s Talk
36
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAµTVM
1
2
3
![Page 38: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/38.jpg)
AutoTVM
37
TVM + Program
![Page 39: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/39.jpg)
AutoTVM
38
TVM + Programcompile program
Hardware Backend
![Page 40: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/40.jpg)
AutoTVM
39
TVM + Program
runandmeasure
compile programHardware Backend
![Page 41: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/41.jpg)
AutoTVM
40
AutoTVM
runandmeasure
compile program
return timing feedback
Hardware BackendTVM + Program
![Page 42: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/42.jpg)
AutoTVM
41
AutoTVM
runandmeasure
compile program
improveimplementation
Hardware Backend
return timing feedback
TVM + Program
![Page 43: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/43.jpg)
AutoTVM
42
runandmeasure
compile program
improveimplementation
Hardware Backend
return timing feedback
TVM + Program
AutoTVM
![Page 44: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/44.jpg)
AutoTVM
43
runandmeasure
compile program
improveimplementation
Hardware Backend
return timing feedback
TVM + Program
AutoTVM
![Page 45: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/45.jpg)
AutoTVM
44
runandmeasure
compile program
improveimplementation
Hardware Backend
return timing feedback
TVM + Program
AutoTVM
![Page 46: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/46.jpg)
AutoTVM
45
runandmeasure
compile program
improveimplementation
Hardware Backend
return timing feedback
TVM + Program
AutoTVM
![Page 47: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/47.jpg)
AutoTVM
Automatically adapt to hardware type by learning
46
![Page 48: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/48.jpg)
µTVM
Today’s Talk
47
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDA
1
2
3
![Page 49: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/49.jpg)
Today’s Talk
48
µTVM
Using µTVM µTVM Runtime
Graph Runtime
![Page 50: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/50.jpg)
Today’s Talk
49
µTVM
Using µTVM µTVM Runtime
Graph Runtime
![Page 51: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/51.jpg)
How easy is it to use μTVM?
50
![Page 52: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/52.jpg)
How easy is it to use μTVM?
51
• Implement an interface to: - Read and write to device memory - Start code execution
![Page 53: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/53.jpg)
How easy is it to use μTVM?
52
• Implement an interface to: - Read and write to device memory - Start code execution
• Provide a cross-compiler for the device
![Page 54: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/54.jpg)
How easy is it to use μTVM?
53
+
• Implement an interface to: - Read and write to device memory - Start code execution
• Provide a cross-compiler for the device
![Page 55: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/55.jpg)
ResNet Example in Mainline TVM
54
![Page 56: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/56.jpg)
ResNet Example in Mainline TVM
55
image =
![Page 57: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/57.jpg)
ResNet Example in Mainline TVM
56
resnet, params = get_resnet() module = graph_runtime_build(resnet, params)
image =
![Page 58: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/58.jpg)
ResNet Example in Mainline TVM
57
resnet, params = get_resnet() module = graph_runtime_build(resnet, params) module.run(image)
image =
> 'cat'
![Page 59: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/59.jpg)
TVM → µTVM
58
resnet, params = get_resnet()
module = graph_runtime_build(resnet, params)
image =
resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)
image =
![Page 60: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/60.jpg)
TVM → µTVM
59
resnet, params = get_resnet()
module = graph_runtime_build(resnet, params)
image =
resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)
image =wrap in a session
![Page 61: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/61.jpg)
TVM → µTVM
60
resnet, params = get_resnet()
module = graph_runtime_build(resnet, params)
image =
resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)
image = different build function
wrap in a session
![Page 62: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/62.jpg)
Today’s Talk
61
µTVM
Using µTVM µTVM Runtime
Graph Runtime
![Page 63: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/63.jpg)
Today’s Talk
62
µTVM
Using µTVM µTVM Runtime
Graph Runtime
![Page 64: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/64.jpg)
Today’s Talk
63
C CodeGenerator
μDeviceInterface
µTVMRuntime
![Page 65: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/65.jpg)
µTVM
Code Generation
64
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDA
![Page 66: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/66.jpg)
µTVM
Code Generation
65
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDA
why not LLVM?
![Page 67: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/67.jpg)
µTVM
Code Generation
66
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDA
lack of support…
![Page 68: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/68.jpg)
µTVM
Code Generation
67
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDAC
![Page 69: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/69.jpg)
µDevice Interface
68
Read Write Execute
![Page 70: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/70.jpg)
µDevice Interface
69
Read Write Execute
Sockets? JTAG?Doesn't matter.
![Page 71: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/71.jpg)
The Runtime
70
μTVM Runtime
C CodeGenerator
μDeviceInterface
![Page 72: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/72.jpg)
The Runtime in Action
71
μTVM Runtime
C CodeGenerator
μDeviceInterface
![Page 73: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/73.jpg)
Set up a RWX interface with the board
72
communication interface
μTVM Runtime
C CodeGenerator
μDeviceInterface
![Page 74: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/74.jpg)
Code generator lowers TVM IR to C code
73
func.c
IR→code
communication interface
μTVM Runtime
C CodeGenerator
μDeviceInterface
![Page 75: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/75.jpg)
Device compiler cross compiles generated code
74
func.ogcc
communication interface
μTVM Runtime
C CodeGenerator
μDeviceInterface
func.c
IR→code
![Page 76: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/76.jpg)
μTVM remaps generated binary to custom device memory locations for multiple library loading
75
remapfunc
ld linker
communication interface
μTVM Runtime
C CodeGenerator
μDeviceInterface
func.c
IR→code
func.ogcc
![Page 77: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/77.jpg)
Remapped binary is loaded onto device
76
customloader
communication interface
μTVM Runtime
C CodeGenerator
μDeviceInterface
func.c
IR→code
func.ogcc remap
func
ld linker
![Page 78: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/78.jpg)
Function invocation on host transfers parameters to the device for execution
77
run
send parameters
'cat'
μTVM Runtime
C CodeGenerator
μDeviceInterface
func.c
IR→code
func.ogcc remap
func
ld linker
customloader
![Page 79: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/79.jpg)
µTVM
Today’s Talk
78
High-Level Differentiable IR
Tensor Expression IR AutoTVM
VTA
FPGA ASIC
LLVM, CUDA
1
2
3
![Page 80: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/80.jpg)
AutoTVM on µTVM
79
● Same pipeline as usual ● Load kernels into RAM
instead of flash
![Page 81: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/81.jpg)
End-to-End CIFAR-10 Evaluation
80
Conv2D
MaxPool2DConv2D AvgPool2D
Conv2DAvgPool2D
Dense
3@32x32
32@32x32 32@16x16 32@16x16 32@8x8
64@8x8 64@4x4
1x10
Replicated an int8-quantized CNN from an ARM Mbed tutorial
![Page 82: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/82.jpg)
Preliminary CIFAR-10 CNN Results
81
● Ran on ARM Cortex-M7
● Compared against CMSIS-NN
● Vanilla template
● ~5 hours of tuning
● No vectorization
![Page 83: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/83.jpg)
Preliminary Int-8 Conv2D Results
82
arm_convolve_HWC_q7_fast in CMSIS-NN arm_convolve_HWC_q7_RGB in CMSIS-NN
![Page 84: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/84.jpg)
Coming Soon - Self-hosted models
83
![Page 85: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/85.jpg)
Coming Soon - Ultra low bit-width quantization
84Riptide: Fast End-to-End Binarized Neural Networks. MLSys 2020 (March 3rd) Joshua Fromm · Meghan Cowan · Matthai Philipose · Luis Ceze · Shwetak Patel
Squeezenet on RaspberryPi 3
• TVM already supports flexible code generation for a variety of data types
• Natural to combine this with µTVM
• See Josh Fromm’s MLSys talk on March 3rd to learn more
![Page 86: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/86.jpg)
Coming soon - µTVM as a service
• OctoML offering hosted version of AutoTVM for cloud, mobile, and embedded • Cloud CPU/GPU • ARM A class CPU/GPU • ARM M class microcontrollers • More on request
• Trained model in, optimized binary out • Currently in private beta
TensorFlow, Pytorch, ONNX serialized models
Optimized deployment artifacts
Octomizer
API and web UI
Auto-tuning using OctoML clusters
85
![Page 87: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/87.jpg)
Status and next steps
86
• Functional on x86, ARM, and RISC-V • In-depth writeup early March on our blog with upstream TVM
documentation/tutorial • Follow us on https://octoml.ai or @octoml
• Interop with CMSIS-NN library and TF Lite Micro (better together!) • First class SIMD vectorization support for ARM schedules, (in addition
to current tensorization approach)
![Page 88: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/88.jpg)
Questions or interest? Please reach out:
Jason Knight [email protected] Acknowledgments
Logan Weber - AutoTVM Joshua Fromm - RipTide
Thanks!
![Page 89: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/89.jpg)
88
![Page 90: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML](https://reader035.vdocuments.us/reader035/viewer/2022070704/5e878c140a65913536294db7/html5/thumbnails/90.jpg)
Copyright NoticeThe presentation(s) in this publication comprise the proceedings of tinyML® Summit 2020. The content reflects the opinion of the authors and their respective companies. This version of the presentation may differ from the version that was presented at the tinyML Summit. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.
There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.
tinyML is a registered trademark of the tinyML Foundation.
www.tinyML.org