tensorflow - marco serafini · control flow •how do enable dynamic control flow with static...
TRANSCRIPT
![Page 1: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/1.jpg)
TensorFlow
Marco Serafini
COMPSCI 590SLecture 22
![Page 2: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/2.jpg)
![Page 3: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/3.jpg)
3 3
Motivations• DistBelief: Previous iteration, parameter server• Limitations:
• Monolithic layers, difficult to define new ones• Difficult to offload computation with complex dependencies to parameter servers
• E.g. Apply updates based on gradients accumulated over multiple iterations• Fixed execution pattern
• Read data, compute loss function (forward pass), compute gradients for parameters (backward pass), write gradients to parameter server
• Not optimized for single workstations and GPUs
![Page 4: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/4.jpg)
44
TensorFlow• Dataflow graph of operators, but not a DAG
• Loops and conditionals• Deferred (lazy) execution
• Enables optimizations, e.g. pipelining• Composable, simple basic operators
• Matrix multiplication, convolution• Can be combined in more complex operators
• Stateful operators • For shared parameters
• Concept of devices• CPUs, GPUs, mobile devices
![Page 5: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/5.jpg)
55
Example
![Page 6: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/6.jpg)
66
Tensors• Format
• n-dimensional arrays• Elements have primitive types (including byte arrays)
• Tensors are dense• All elements are represented• User must find ways to encode sparse data efficiently
![Page 7: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/7.jpg)
77
Operations• Inputs and outputs are tensors• State is kept through stateful operators• Operations to handle variables (also tensors)
• Variable op: Returns unique reference handle• Read op: Take reference handle, produce value of variable• Write ops: Take reference and value and update. Multiple possible write operatios
• Queues are also stateful operators• Get reference handle, modify through operations• Blocking semantics, backpressure, synchronization
![Page 8: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/8.jpg)
88
Execution Model• We have a computation graph• Step: client executes a subgraph by indicating:
• Edges to feed the subgraph with input tensors• Edges to fetch the output tensors• Runtime prunes the subgraph to remove unnecessary steps
• Can invoke multiple concurrent steps• Example: concurrent batches for data-parallel training
![Page 9: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/9.jpg)
99
Example• Data-parallel training looks like this
Stateful queues
Stateful variables
Concurrent steps for data parallelism
![Page 10: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/10.jpg)
1010
Scheduling: Tasks and Devices• Tasks: named processes that send messages
• PS tasks: store parameters, but can also run computations• Worker tasks: the rest• Note: “informal” categories, not enforced by TensorFlow
• Devices: CPU, GPU, TPU, mobile, …• CPU is the host device• Device executes kernel for each operation assigned to it
• Same operation (e.g. matrix multiplication) has different kernels for different devices
• Requirements for a device• Must accept kernel for execution• Must allocate memory for inputs and outputs• Must transfer data to and from host memory
![Page 11: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/11.jpg)
1111
Placement• TensorFlow runtime places operations on devices
• Implicit constraints: stateful operation on same device as state
• Explicit constraints: dictated by the user
• Optimal placement still open question
• Obtain per-device subgraphs• All operations assigned to device
• Send and Receive operations to replace edges across devices
• Specialized per-device implementations• CPU – GPU: CUDA memory copy
• Across tasks: TCP or RDMA
• Placement preserved throughout session
![Page 12: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/12.jpg)
1212
Control Flow• How do enable dynamic control flow with static graph?• Example: recurrent neural network
• Train network for sequence of variable length without unrolling• Conditional: Switch and Merge
SwitchData input
Control input
op
op
op
op
Merge
input
dead
Output one non-dead
input
![Page 13: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/13.jpg)
1313
Loops• Uses three additional operators
EnterData input op op Exit
NextIteration
![Page 14: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/14.jpg)
14
Scaling to Large Models• Parameter server approach to avoid moving terabytes of parameters every time
• Gather: reads tensor data from shard and computes• Part: Partitions the input across shards of parameters• Stitch: Aggregates all partitions
![Page 15: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/15.jpg)
1515
Fault Tolerance• Long running tasks face failures and pre-emption
• Sometimes run at night on idle machines• Small operations, no need to tolerate individual failures
• Even RDDs are overkill• User invokes Save for checkpointing
• Each variable in a task connected to same save for batching• Not consistent
• Other use cases: transfer learning
![Page 16: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/16.jpg)
1616
Synchronous Coordination• Use blocking queues for synchrony • Redundant tasks for stragglers
![Page 17: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/17.jpg)
1717
Implementation
![Page 18: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/18.jpg)
1818
Single-Machine Performance• Four convolutional models using one GPU
![Page 19: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/19.jpg)
19
Synchronous Microbenchmarks• Null training steps• Sparse performance is close to optimal (scalar)
![Page 20: TensorFlow - Marco Serafini · Control Flow •How do enable dynamic control flow with static graph? •Example:recurrentneural network •Trainnetwork forsequence of variable length](https://reader033.vdocuments.us/reader033/viewer/2022043022/5f3e1172c9ef6529595d68ad/html5/thumbnails/20.jpg)
2020
Scalability• Scalability bound by access to PS tasks (7)