compiler optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/dec5/... · 2018. 12....

Machine Learning in Compiler Optimization Authors: Zheng Wang and Michael O’Boyle Qi Zhang, Shengpu Tang, Yanqi Wang, Yiting Shen

Upload: others

Post on 17-Nov-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Machine Learning in Compiler Optimization

Authors: Zheng Wang and Michael O’Boyle

Qi Zhang, Shengpu Tang, Yanqi Wang, Yiting Shen

Page 2: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Main Takeaways● Machine Learning (ML) based compilation is a trustworthy and exciting

direction for compiler research.

● Data is fuel to ML based research.

Page 3: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Why do we need ML?● Compilers have two jobs - translation and optimization

○ Compiler writers develop heuristics with the hope of improving

performance

○ ML can serve as a predictor of the optima

● Machine-Learning Compilation

○ Automation!

○ Evidence-based science

Page 4: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

OpenCL Kernel code snippet

Page 5: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Example: Best Thread Coarsening FactorThread Coarsening Pros & Cons

Pros:- Increase instruction-level parallelism

- Reduce # of memory-access operations

- Eliminate redundant computations

Cons:- Reduce the total amount of parallelism

- Increase the register pressure

Page 6: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Architecture Overview

Page 7: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Stage 1: Feature Engineering

Page 8: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Static Code FeaturesExtracted from the intermediate representations

Page 9: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Dynamic FeaturesExtracted from multiple layers of the runtime environment.

Page 10: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Reaction Based FeaturesCarefully selected compiler options.

Page 11: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Feature Selection and Dimension ReductionProblem: Too many features -> need a lot more training examples

Solution: Feature space dimension reduction

Page 12: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Example: Best Thread Coarsening Factor

Page 13: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Stage 2: Learning a Model

Page 14: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

ModelsSupervised vs. Unsupervised

Possible supervised setup

Page 15: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Supervised Learning: ClassificationGoal: Separate different types of program by a decision boundaryGiven a new unseen program, knows what to do

Page 16: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Supervised Learning: Regression

Goal: Learn a function to predict - Power consumption- Latency- Exec. time

Page 17: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Unsupervised Learning: Auto-encoderLearn best feature representation

Embedding

Page 18: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Stage 3: Deployment

Page 19: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

DeploymentHow to utilize the learned model:

1. Extract the features of the input program

2. Feed the extracted feature values to the learned model to make a

prediction

3. Apply the predicted results

Page 20: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

ApplicationWhat hardware problems can machine learning solve?

➢ Optimize sequential programs- target a fixed set of compiler options- represent the optimization problem as a multi-class classification

problem – where each compiler option is a class.

➢ Optimize parallel programs- provide the potential for high performance and energy-efficient

computing

Page 21: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Goal: Predict the optimal loop unroll factor

Approach:

● Decision tree based model○ Multi-class Classification

○ Label: loop unroll factor 𝑙 ∈ [0, 15], 16 classes in total

Optimize Sequential Programs - Example

Page 22: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Goal: Predict the scheduling policy

Approach:

● Regression based neural net

● SVM classifier

Optimize Parallel Programs - Example

Page 23: Compiler Optimizationweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec5/... · 2018. 12. 4. · Application What hardware problems can machine learning solve? Optimize sequential

Challenges & Limitations● Training cost

● Garbage in, garbage out

● Unable to invent new program

transformations

● Unable to prove the validity

Necromancer: Enhancing System Throughput by Animating Dead Cores Authors: Amin Ansari Shuguang Feng * Shantanu Gupta Scott Mahlke

Assessment: The Linchpin in the Collaborative Planning Processsoltreemrls3.s3-website-us-west-2.amazonaws.com/... · in the Collaborative Planning Process (Slides) Virginia Mahlke

University of Michigan Electrical Engineering and Computer Science Low-Power Scientific Computing Ganesh Dasika, Ankit Sethia, Trevor Mudge, Scott Mahlke

USER MANUAL - Kramer AV · 2019-06-02 · Kramer Electronics Ltd. KDS-EN5, KDS-DEC5 – Defining KDS-EN5, KDS-DEC5 H.264 Encoder and Decoder 4 Defining KDS-EN5, KDS-DEC5 H.264 Encoder

North Thompson Real Estate Connection Dec5

Northwestern College BEACONassets.nwciowa.edu/library/public/content/beacon/2008-2009/Dec5... · Northwestern College’s string and brass chamber ensembles will ... The Brass Quintet

PICO-NPA: High-Level Synthesis of Nonprogrammable …web.eecs.umich.edu/~mahlke/papers/2001/schreiber_hpl01.pdfPICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Accelerating Mobile Applications through Flip-Flop Replication Mark Gordon, David Ke Hong, Peter M. Chen, Jason Flinn, Scott Mahlke, Z. Morley Mao

EECS 583 Class 7 Static Single Assignment Formweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/583L7.pdf · 2018. 9. 25. · EECS 583 – Class 7 Static Single Assignment Form University

Sentinel Scheduling: A Model for Compiler-Controlled ...impact.crhc.illinois.edu/shared/papers/mahlke-1993.pdf · exception handling All program exceptions are accurately detected

Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints Mikhail Smelyanskiy, Scott Mahlke, Edward Davidson Department of EECS University

1. Legal Aspect of Lease and Related Principles-Dec5

Practical Improvements to the Construction and Destruction ...web.eecs.umich.edu/~mahlke/courses/583f11/reading/... · PRACTICAL IMPROVEMENTS TO BUILDING STATIC SINGLE ASSIGNMENT

Accelerating Asynchronous Programs through Event Sneak Peek Gaurav Chadha, Scott Mahlke, Satish Narayanasamy 17 June 2015 University of Michigan Electrical

TZSlicer: Security-aware dynamic program slicing for ...web.eecs.umich.edu/~mahlke/courses/583f18/lectures/... · slicing objectives for hardware isolation and meet the security requirement

Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University

Jody Iom Future Ph Dec5

2. Legal Aspects of Sale Lease Mortgage Dec5

Andrew Lukefahr – Curriculum Vitaehomes.sice.indiana.edu/lukefahr/docs/lukefahr_cv.pdfCurriculum Vitae Interests ... Shruti Padmanabha, Reetuparna Das, and Scott Mahlke, The PRISM-3

VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan

EECS 483 – Compiler Construction Fall 2006, University of Michigan Scott Mahlke (mall-key) September 6, 2006

Mascar: Speeding up GPU Warps by Reducing Memory Pitstops Ankit Sethia* D. Anoushe Scott Mahlke Jamshidi University of Michigan

University of Michigan Electrical Engineering and Computer Science 1 A Scalable Low-power Architecture For Software Radio Scott Mahlke Collaboration between:

Operation and Data Mapping for CGRAs with Multibank Memoryweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Nov26/talk1.pdfMultibank Memory Yongjoo Kim, Jongeun Lee, Aviral Shrivastava,

Niagara Falls Stanley site brochure dec5 v2

Frontline Vol26 No25 Dec5-18

EECS 583 Class 16 Automatic Parallelization Via Decoupled ...web.eecs.umich.edu/~mahlke/courses/583f18/lectures/583L...176.gcc 18 x x x x 181.mcf 0 x 186.crafty 9 x x x x x 197.parser

INSULATING CONCRETE FORM (ICF) CONTRACTOR …buildwithstrength.com/wp-content/uploads/2018/10/01-BWS-Event-Dec5.pdfConcrete Contractors, Masonry Contractors, Wood and Steel Framers

Optimus: Efficient Realization of Streaming Applications on FPGAs University of Michigan: Amir Hormati, Manjunath Kudlur, Scott Mahlke IBM Research: David

University of Michigan Electrical Engineering and Computer Science 1 Integrating Post-programmability Into the High-level Synthesis Equation* Scott Mahlke

entropy IoP Dec5 2012 · Title: Microsoft PowerPoint - entropy_IoP_Dec5_2012 Author: ï¿½ï¿½Bruce Tosh2 Created Date: 11/30/2013 5:21:10 PM

Mojtaba Mehrara, Jeff Hao, Po-Chun Hsu, Scott Mahlke Advanced Computer Architecture Lab

Bringing the Web Up to Speed with WebAssemblyweb.eecs.umich.edu/~mahlke/courses/583f18/lectures/Dec3/talk4.pdf · Bringing the Web Up to Speed with WebAssembly Matthew Furlong, Drew

Probabilistic Predicate-Aware Modulo Scheduling Mikhail Smelyanskiy 1, Scott Mahlke, Edward Davidson Department of EECS University of Michigan 1 Currently

11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer