low contention mapping of rt tasks onto a tilepro 64 core processor

18
Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage & Limitaion 7 Significance & Improvement Lei Cui

Upload: ananda

Post on 22-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor. 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage & Limitaion 7 Significance & Improvement. Lei Cui . Related Terms & Concepts. Predictability TilePro 64-Core Processor - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor1 Background Introduction = why2 Goal3 What 4 How5 Experimental Result6 Advantage & Limitaion7 Significance & Improvement

Lei Cui

Page 2: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

Related Terms & ConceptsPredictabilityTilePro 64-Core ProcessorContentionStatic Timing analysisNoCIPCFull-deplex (communication)JitterHyper-Period

Page 3: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

1 Background Introduction (why) The predictability property of task execution is very important in

the RT system, especially the RT tasks, in addition, its upper bound of execution times can be determined via static timing analysis. This method may result in the unsafe underestimations under a situation that when the underlying communication paths are not determined, that is, when data from multiple sources share parts of a routing path in the NoC, which can lead to a thing to happen---contention.

Therefore, the contention analysis is a must to guarantee to provide a safe and reliable bounds. At the same time, the paper takes a measure of utilizing a multi-core architecture to achieve mapping tasks to cores in such a way the contention is minimized. In addition, the less is the number of cores, the more possible the overhead incurs under the situation of IPC.

In addition, the contention will lead to the latency, and then lead to unsafe underestimation, and then lead to unpredictability.

Page 4: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

1 Background Introduction (con)

Drawback1) The exhaustive approaches do not scale beyond small NoC mesh sizes as they can take days to solve mapping layouts.2) Previous work viewed communication as temporally stateless, which limited the amount of communication that could feasibly be solved.3) It also resulted in solutions that were overly conservative in that any potential for common message routes were considered contention.Improvement1) by separating temporally disjoint messages when analyzing link contention scenarios and thus increasing communication predictability.

Example: two messages 38 and 42 sent at the same time

Effect:The contention on the link 45 is resulted, and then result in delay, and then latency, and then missed deadline, and then unbounded time, and then unpredictability, and then non-RT

Page 5: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

2 GoalIncrease the predictability of RT

tasks on NoC architecturesModels & Solutions to low or

minimize contention during communications.

Page 6: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

3 What (Contributions)Exhaustive Solver Model exhaustively

maps RT tasks onto cores to minimize contention and improve predictability

SBTF to map communication traces into time frames to ensure separation of analysis for temporally disjoint communication

Heuristic Model, HSolver for rapid discovery of low contention solutions

Page 7: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

4 How – SBTF (Software-Based Temporal Framing)

Temporal Framing 9

Page 8: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

4 How – Exhaustive Solver Model

Page 9: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

4 How-Exhaustive Solver Model (continue)

For example:

Page 10: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

4 How – Heuristic Model (Hsolver)

Page 11: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

4 How – Heuristic Model (Hsolver-con)Example:

Maximum Cross Chat First (TMH) Degree(8) = 4, Degree(6) = 4 ==> 8,6 map empty cores (Group 1)Degree(3) = 3, Degree(4) = 3 ==> 3,4 map empty cores (Group 2)Degree(7) = 2, Degree(1) = 2 ==> 7,1 map empty cores (Group 3)Degree(5) = 1, Degree(2) = 1 ==> 5,2 map empty cores (Group 4)Degree(0) = 0 ==> 0 map empty cores (Group 5)Task Scheduling Sequence is 8, 6, (6,8). 3, 4, (4, 3), 7, 1, (1, 7), 5, 2, (2, 5), 0Here final choose sequence: 8, 6, 3, 4, 7, 1, 5, 2, 0

Maximum Cross Chat First (CMH)

Task Core

Page 12: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

5 Experimental Result (Ex 1)The 1st experiment compares the minimum solutions for each of the

solvers as the complexity of the systems increase.

This experiment evaluates the minimum aggregate cost across 100 randomly generated task sets in naive, heuristic and exhaustive model mappings as the NoC size increases along with a linear increase in the number of messages.

Page 13: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

5 Experimental Result (Ex 2)The 2nd experiment is to evaluate the HSolver approach to determine

the rate at which heuristics were used to generate the low-cost solution.

The left result shows the core selection strategies and the percent of use of each during heuristic solving, and a significant variation in the effectiveness of core strategies. Overall, minimizing the distances between frequently communicating cores is the most beneficial heuristic.

The right picture shows that correlates well with the results where two selection strategies account for 98% of the low-cost solutions. The most effective solution is generally obtained by selecting tasks by Maximum Cross-Chat relative to the currently mapped tasks.

Percent Use of Core Selection Strategies Percent Use of Task Selection Strategies

Page 14: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

5 Experimental Result (Ex 3)The 3rd experiment assesses the impact of link contention on communication jitter.

This figure shows that any single contended link can have a significant impact on the standard deviation of transfer latencies.

X-axis represents the 10 randomly generated task sets, each of which contains 200 messages within their hyper-period;Y-axis represents the standard deviation in clock cycles for different tasks sets for the three mapping approaches.

Table shows the timing results for each configuration evaluated in this experiment, all results determined by the heuristic approach converged within a second. Using the exhaustive solver, convergence can take up to 70 of minutes for solutions with contention.

Page 15: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

5 Experimental Result (Ex 4)The 4th experiments illustrate the impact of unavoidable contention on real-

time predictability.

This experiment shows the worst-case experienced over multiple runs and emphasises the significant impact that contention can have on bounding WCET.

These pictures depict the cost for sends and receives for one-to-one and two-to-one pairing of senders/receivers

Page 16: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

6 Focus-on & ImprovementNoC architecture with static routing without alternate path routingAddress homogeneous architecture & resource mapping to reduce overheadHard RT system and consider communication first ratherPredictability for RT system instead of power & utilize currently available architectures instead of resorting to simulationReduction of contention to increase predictabilityImplement on top of an architecture that does not provide contention avoidance at the hardware levelSoftware model allows for variable frame sizing to avoid impeding performance in system with little contentionImprovement: 1) the exhaustive solver to determine optimal mapping for solvable NoCs; 2) Hsolver generates fast and low contention solutions for heavily contended NoCs; 3) Hsolver can reduce aggregate contention by up to 70% while reducing jitter by up to 40%;

Page 17: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

7 Significance 1) the first work to consider IPC for WC time frames to simplify analysis and to

measure the impact an actual hardware for NoC-based real-time multi-core systems. 2) the first work to address predictability of NoC communication via framing

messages into temporal windows for real-time tasks.

Page 18: Low Contention Mapping  of RT Tasks onto a  TilePro  64 Core Processor

QuestionExperiment 3Experiment 4