progress report 2013/11/07. outline further studies about heterogeneous multiprocessing other than...
TRANSCRIPT
![Page 1: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/1.jpg)
Progress Report2013/11/07
![Page 2: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/2.jpg)
OutlineFurther studies about
heterogeneous multiprocessing other than ARM
Cache miss issueDiscussion on task scheduling
![Page 3: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/3.jpg)
Manufacturers Other than ARMQualcomm
◦aSMP(Asynchronous Symmetrical Multi-Processing)
◦Krait: Per-core DCVS (Dynamic Clock and
Voltage Scaling). Core that is not being used can be
completely collapsed independently. Reduce the need for hypervisors or more
complex software management of disparate cores.
![Page 4: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/4.jpg)
Manufacturers Other than ARMNvidia
◦vSMP(Variable Symmetric Multiprocessing)
◦Tegra 3 4 high performance Cortex A9 main
processor + 1 energy-efficient Cortex A9 Companion processor.
Cannot active companion processor and main processor simultaneously.
Main processors have to use the same frequency.
![Page 5: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/5.jpg)
HSA Foundation
![Page 6: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/6.jpg)
Cache Miss Issue“For each switching between
big(A15) and A7(LITTLE), the L2 cache is cleaned, thus cause memory access overhead.”
![Page 7: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/7.jpg)
Cache Miss Issue
Unless a chip(All A15 or All A7) is shutdown, clean L2 cache for each switching between A15 and A7 is weird.
A15L1
A15L1
A15L1
A15L1
A7L1
A7L1
A7L1
A7L1
L2
L2
![Page 8: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/8.jpg)
Task SchedulingTake loading of each task into
consideration.For a given task, assume it
behavior:◦Computation Ops: n time units.◦Memory Ops: 1 time unit.
Different core frequencies cause different loadings.◦F = 1, loading = n/(n+1)◦F= 2, loading = n/(n+2)◦F= 4, loading = n/(n+4)
![Page 9: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/9.jpg)
Single CoreFor a given set of tasks and their
behaviors, find the minimum frequency such the loading of the core is 100%.◦Lower frequency: loading = 100%,
but the performance decrease.◦Higher frequency: loading < 100%,
consume more (dynamic) power.
![Page 10: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/10.jpg)
Scheduling on HMPAccording to the core capability,
assign processes in the runqueue to core.
Each core apply DVFS/DCVS individually.
However, this does not apply for big.LITTLE. ◦Each (pair of) core is homogeneous.
![Page 11: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/11.jpg)
Big.LITTLE core SchedulingAssume that we have n pairs of
big.LITTLE cores.◦Initially all pairs use LITTLE core.
Assume we know the following information of a task Tk. ◦Task deadline.◦Estimated execution time on big
core.◦Estimated execution time on LITTLE
core.
![Page 12: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/12.jpg)
Heuristic Mentioned Last TimeFirst, we define “urgency” U to
indicate the priority of a task.For Task Tk
◦0< Uk ≦1, then task Tk can be finished before deadline on LITTLE core
◦Uk > 1, then task Tk can’t be finished before deadline on LITTLE core.
Deadline toTime
core LITTLEon Remaining TimekU
![Page 13: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/13.jpg)
Core SwitchingSwitch one LITTLE core to big
core if there exists a task Tk with urgency Uk > 1.
Find all the Tasks {Tj ,with Uj > 0.8}, assign these tasks to big cores.
Switch big cores to LITTLE cores if there is no task with urgency
greater than 0.8.
![Page 14: Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling](https://reader036.vdocuments.us/reader036/viewer/2022082713/5697bfc01a28abf838ca3b3f/html5/thumbnails/14.jpg)
Discussion