a new approach for task level computational resource bi-partitioning gang wang, wenrui gong, ryan...
Post on 22-Dec-2015
214 Views
Preview:
TRANSCRIPT
A New Approach for Task Level A New Approach for Task Level Computational Resource Computational Resource Bi-PartitioningBi-Partitioning
Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE,
University of California, Santa Barbara
OverviewOverview Resource Partitioning Problem Ant System (AS) Heuristic AS for Task Level Resource
Partitioning Experiment Results Future Work
Resource Partitioning Problem(1)Resource Partitioning Problem(1) Heterogeneous architecture is
getting more and more popular Partitioning problem is a
fundamental challenge Automatically assign application onto
different computation resources Optimizing system performance under
constraints Two resource case : hardware/software
co-design
Resource Partitioning Problem(2)Resource Partitioning Problem(2) NP-hard Different heuristic methods
have been developed Simulated annealing Genetic Algorithms Tabu Search Expert System Kernighan/Lin
OverviewOverview Resource Partitioning Problem Ant System (AS) Heuristic AS for Task Level Resource
Partitioning Experiment Results Future Work
Ant System Heuristic (1)Ant System Heuristic (1) First introduced for optimization
problems by [Dorigo et. al. 1996] Inspired by ethological study on the
behavior of ants [Goss et. al. 1989] A meta heuristic A multi-agent cooperative searching
method A new way for combining
global/local heuristics
Key ObservationsKey Observations Autocatalytic effect Indirect communication (stigmergy)
Ants deposit pheromones on the ground different the quality of the paths Pheromone trails encode a long-term
global memory about the search process When the ants reach a decision, they
are biased by the amount of pheromone (maybe probabilistically )
OverviewOverview Resource Partitioning Problem Ant System (AS) Heuristic AS for Task Level Resource
Partitioning Experiment Results Future Work
AS Algorithm for HW/SW Co-DesignAS Algorithm for HW/SW Co-Design
Problem: For a given application, find the optimal resource partition under certain system constraints: Task level abstraction Task can map to GPP or
Configurable Logic Pre-knowledge about the
computational resources
Modeling the Task/Resource Modeling the Task/Resource Partitioning ProblemPartitioning Problem
Application is modeled as Task Graph (DAG)
Sequential scheduling (not pipelined)
t1
t2 t3
t4 t5
t6
t7
t8
t0
tn
Partitioning as Graph Bi-coloring Partitioning as Graph Bi-coloring Task 1, 2, 7 and 8
are assigned to the GPP
Task 3, 4, and 6 onto the configurable logic
The inbound edges are colored accordingly
We don’t care the coloring for virtual nodes t0 and tn
We don’t care the coloring for edge e8n
t1
t2 t3
t4 t5
t6
t7
t8
t0
tn
C o n fi gu rab le L o g i c ,c o lo r C 2
G P P , c o lo r C 1
Partitioning as Graph Bi-coloringPartitioning as Graph Bi-coloring Each computing resource is assigned
with a color ck
Each edge eij is associated with a set of global heuristics (pheromone trails) ij(k) indicating the favorableness for tj
to be colored with ck
A coherent coloring is defined as: Each task node in the DAG is colored All the inbound edges of a task node have
the same coloring as that of the corresponding task node
AS algorithm for resource AS algorithm for resource partitioning (1)partitioning (1)1. Initially, assign each of the edges in the task graph
with a fixed pheromone 0 for both color c1 and c2, where c1 corresponds to GPP, while c2 for the configurable logic;
2. Put m ants on t0;
3. Each ant traverses the task graph to create a feasible bi-coloring solution si for the task graph, where i =1, . . . ,m;
4. Evaluate all the m solutions. The quality of the solution s is measured by the overall execution time time(s). Among all solutions, find the best solution sbest which provides the minimum execution time and satisfies the configurable logic area constraint;
AS algorithm for resource AS algorithm for resource partitioning (2)partitioning (2)5. Update the pheromone for each color on
the edges as follows:
ij(k) (1 - )ij(k) + ij(k) (1)
where : 0 < < 1 is the evaporation ratio, escape from local minima
k = 1 or 2, ij(k) =Q/time(sbest ) if eij is colored with ck in sbest
0 otherwise
6. If the ending condition is reached, stop and report the best solution found. Otherwise go to step 2.
Step 3: How to construct Step 3: How to construct individual coloringindividual coloring Each ant traverses the graph in
topologically sorted order Guarantees that each inbound edge
to the current node has been already examined
At each node, the ant will: Make guesses for the coloring of the
successor nodes Make decision on the coloring of the
current node
Make guesses for the successor Make guesses for the successor task nodestask nodes At task node ti, the ant makes guesses the
coloring for each of the successor nodes tj : ij(k) : global heuristic on coloring tj with ck
j(k) : local heuristic on coloring tj with ck
)2((l)η(l)τ
(k)η(k)τ(k)p
1,2l
βj
αij
βj
αij
ij
)3(k)area(j,wk)time(j,w
1
k)cost(j,
1(k)η
atj
Make decision on the coloring of Make decision on the coloring of the current nodethe current node Upon entering a new task node ti, the
ant makes a decision on the coloring of ti : probabilistically based on the guesses
made by all the immediate precedents of ti Inbound edges are correspondingly
colored once this decision is made
)4( of precedents immediate ofcount
for guess ofcount (k)pi
i
ik
t
tc
t1
t2 t3
t4 t5
t6
t7
t8
t0
tn
Find the best and update thepheromone trails based on the solution’s quality
t1
t2 t3
t4 t5
t6
t7
t8
t0
tn
Next iteration
ExtensibilityExtensibility Easy to extend to multi-way
partitioning Different performance/constraint
pair Different task level cost model
OverviewOverview Resource Partitioning Problem Ant System (AS) Heuristic AS for Task Level Resource
Partitioning Experiment Results Future Work
Experiment System (1)Experiment System (1) Target system contains:
One GPP ( PowerPC 405 RISC) One configurable logic (Xilinx Virtex II
with 1232 CLBs) Sequential scheduling
Precedence level has to be respected Tasks without precedence constraint
can run concurrently given the resource partitioning allows
Experiment System (2)Experiment System (2) Testing benchmark:
DAGs of different sizes are generated randomly with average branching factor of 5
Real functions (in C/C++) extracted from the MediaBench suits are mapped onto the task nodes
Tasks are analyzed using SUIF and Machine SUIF tools to achieve detailed CDFG level description
Simplified communication interface between tasks Goal: Find the optimal resource partition
that achieves the best worst case execution time under FPGA area constraint
Evaluating AS algorithmEvaluating AS algorithm Compare the AS results with:
Brute force search Offers definitive measurement for the
quality Theoretical performance for Random
Sampling Helps to filter out EASY test cases
Stimulated annealing Popularly used Allow much bigger problem size
Experiment SettingsExperiment Settings Each DAG has 25 task nodes, over 33
million possible assignments! 50 testing instances are generated
originally After filtering out the “easy” cases using
the brute force search, 25 difficult testing cases left
Number of ants is set to 5, which equals to the average branching factor of the task graph
Force AS algorithm stop after 100 iterations in each run
Result Quality Assessment (I)Result Quality Assessment (I) 91.7% of the
results are within the top 3%
77% of the results of AS are within the top 2%
63.5% of the results are within top 0.1%
Result Quality Assessment (II)Result Quality Assessment (II) The absolute
performance of the majority of the results found by AS are within 10% range comparing with the optimal
Result Quality Assessment (III)Result Quality Assessment (III) The ability for finding one of the
optimal partitions 460 times for 2,500 instances (18.4%) While random sampling approach with
the same computation time only has a chance of 8.5E-7
For significant portion (>20%) of the tested examples, AS discovers the optimal partition with probability >1/2
Result Quality Assessment (IV):Result Quality Assessment (IV):Multi-way & SAMulti-way & SA
Extended to the 3-way partitioning problem
33 difficult testing cases
325 possible partitions
SA-50 has comparable run time as the AS
SA-500 and SA-1000 runs at 10 and 20 times
ContributionsContributions For the first time, introduced AS heuristic for
HW/SW co-design problem Constructed a novel AS algorithm that achieved
robust results that are qualitatively close to the optimal with minor computational cost for the testing benchmark
Provided definitive quality assessment by comparing the proposed algorithm with the theoretical random sampling results
Experiments shows the proposed algorithm surpasses popularly used SA heuristic
Future workFuture work Extend to the multi-way resource partitioning
problem More comprehensive comparison with other
heuristic methods (such as GA, Tabu) Hybrid approach (e.g. AS followed by SA) Applying to more realistic and complex system
model, e.g. more realistic communication model Extend AS from static partitioning to dynamic
partitioning problem ( truly reconfigurable)
Thanks the your attention. Questions?
top related