“what and how can the open64 community collaborate more closely?” - our experiences and ideas
Post on 15-Jan-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
“What and how can the Open64 community collaborate more closely?”
- our experiences and ideas
Open64 Developers Forum 2010
Embedded Software Consortium
Jenq Kuen LeeChairman, MOE Embedded Software Consortium, Taiwan
Professor,
Department of Computer Science,
National Tsing Hua University,
Hsinchu, Taiwan
jklee@cs.nthu.edu.tw
Embedded Software Consortiu
m
Outline
• Our experience with Open64 for compiler research– Programming language and compiler research lab.
Tsing-Hua Univ., Taiwan– Major research funding from Taiwan MOEA
• What and how can the Open64 community collaborate more closely ?
Workshop on Embedded Systems Education, 2009
2
• Compiler for VLIW DSP processors with distributed register files
– Local register allocation• Register bank assignment and cluster assignment
for distributed register architectures• “PALF: Compiler Supports for Irregular Register
Files in Clustered VLIW DSP Processors” [Lin, CCPE’07]
– Global register allocation• Global decisions on register bank assignment
of multiple basic blocks• “LC-GRFA: Global Register File Assignment with
Local Consciousness for VLIW DSP Processors with Non-uniform Register Files” [Lu, CCPE’09]
– Improved register spilling • Spilling data to unoccupied register banks
rather than to memory• “Expression Rematerialization for VLIW DSP
Processors with Distributed Register Files” [Wu, CPC’09]
– SIMD intrinsics• Means and essential optimizations for users to
write high-performance code for VLIW DSP• “SIMD Intrinsic Supports for VLIW DSP Processors
with Distributed Register Files” [Kuan, CPC’10]
Loop Optimization
FrontEnd
Whirl-Level Optimizer (IPA, WOPT, LNO...)
Lowering / Code Selection / Intrisic
Hyperblock Formation / If-Conversion
EBO Pre Process
Control Flow Optimization
Control Flow Optimization
EBO Process
LC-GRFA
Global Scheduling (Before RA)
GRA
LRA
EBO Post Process
Global Scheduling (After RA)
Local Instruction Scheduling
Global Code Motion
Low-Power Optimization
Code Emition
SA-Based LRFAPALF-LRFA
CIO
Source Code
Assembly Code
New Phases for PACDSP
Specially Tuned for PACDSP
Ported for Target Dependency
Original Phases of Open64
Unroll SWP
Compiler for VLIW DSP processors with distributed register files
Embedded Software Consortium
• Probabilistic pointer analysis (PPA)– Quantitative
• Aggressive optimizations can be applied– Fast
• With the aid of SSA form, explicit def-side can be found in linear
– χ in SSA form helps find potential def-side that can not be known by symbolic checking
– Implemented in Opt_ssa.cxx in WOPT phase– Incorporate with edge profiling , acquire more
accurate point-to information– Optimizations
• Point-to information can be used to guide memory locality optimization in the presence of pointers.
• Speculative execution, • Transactional memory, • code specialization, • data layout assignments
int *p, *q, v, u; p=&v; q=&u; while ( … ) // condition 1 if ( … ) // condition 2 p=q; else q=p; *p = … // where does p points-to ?
PPA in SSA Form of Open64
Internal Memory
Internal Memory
External MemoryExternal Memory
Software Cache
Software Cache
So
ftwar
e C
ache
AP
I
So
ftwar
e C
ache
AP
I
multi-level memory systems:* internal memory (small & fast)* external memory (large & slow)
DSPDSP
DSPDSP
DSPDSP
Interprocedural Probabilistic Pointer Analysis, Peng-Sheng Chen, Yuan-Shin Hwang, Dz-Ching Ju, Jenq Kuen Lee, IEEE Transactions on Parallel and Distributed Systems, Volume 15, Issue 10, pp. 893-907, Oct. 2004.
OpenCL Compiler Support Based on Open64 for MPUs+GPUs
• OpenCL is an emerging standard for heterogeneous multicore programming.
• We’ve incorporated Open64 compiler in ATI SDK
– Syntax supports• Qualifiers• Vector data types• Built-in functions
– Future directions• WHIRL/CGIR
optimizations• Data locality and
SIMD optimizations
Our Ongoing Work with OpenCL
Embedded Software Consortium
clcclc prelink.bcInternal
optimizer and linker
Internal optimizer and linker
builtin-x86.bc
opt.s
ldld
asas
openccopencc
Reuse stub code and metadata
Reuse stub code and metadata
kernel.cl
builtin-x86.bcllvm-extract/llcllvm-extract/llc
stub/metadata
OpenCL_kernel.s
lib.c
libatiocl.so
clc: OpenCL-LLVM front-end.bc: LLVM IR files
ATI SDK → LLVM approach →
→ Open64 approach →
Partneruniversitie
s
MOE ESW Consortium, TaiwanMOE ESW Consortium, Taiwan
SoCConsortium
Advisory Committee
Otherconsortiums
ES Designcontest
Advisory Committee
ESW consortium
MOEAdvisory Office
PartnerUniversities
Collaboration With TEIA
Collaboration withNSC OpenSource/Embedded Program
Develop 25 courses and lab modules on Embedded
System Softwareyear course year Course
2003 Embedded Real-Time Operating System
2006
Embedded Middleware Design
2004
USB Driver Design & Implementation Embedded System Overview
Toolchain for embedded software
2007
Embedded Systems and software engineering (1)
I/O Architecture & Device Drivers Embedded Systems and software engineering (2)
Embedded Software for Networked SoC Systems Hands-on Lab Development based-on Local Platforms
2005
Embedded Compiler Design Embedded Multi-core System and Software
Embedded OS Implementation
2008
Embedded Multimedia Design
Embedded Microcontroller System Java Software for Embedded system
Embedded System Programming Lab modules of Embedded Hw/Sw Co-design and Analysis
2006Interface Design
2009Heterogenous Multi-cores
Embedded System Implementation OpenSparc Lab modules
2010 Innovative Embedded System Curriculum on Android Platforms
2010CPS and wireless sensor network
embedded multi-core programming languages
Embedded Software Consortiu
m
Embedded course development flow of the ESW
Embedded softwarecurricula
Curricula from ACM, IEEE-CS, other universities
Advisory board
Inputs from other task groups, profs. and industry executives
New course or course module
Seek for project leaders
Team up coursedevelopment team
ESW office
Course development Course trial run
Regular course development meeting
Regular course promotion workshop
Deployment phase
Promote Open64 via CollaborationCurriculums• Open64 courses and
textbooks• Hand-on labs
• Make it easy to break engineering challenges with Open64, and have students to focus on scientific innovations.
Encyclopedia Compilers
Open-64
How to devise Compiler to deliver optimal performance on Open-64
Lectures NotesHand-on Labs
Possible Collaboration on Joint Research Projects
• Potential collaboration items with OpenCL on Open-64
OpenCL and CUDA Front-end
OpenCL Optimizationsat WHIRL & CGIR
ATI GPU
Optimizations for Embedded or Green
Google Android
IBM & UIUCBlue Waters
Nvidia GPUEmbeddedMulticore
Code Generation for New Targets
code size
low-power
Update-to-dateC/C++ Front-end
Wish List: Serializable CGIR
As a research compiler• to save/restore current states is really important
– a valuable observation may be disappeared after other team members changed prior phase’s implementation, and then we have to find this case in other benchmarks/applications again and again
• to provide an interface in the entry point of CG phase is also important– sometimes we want to use different compilers’ optimizations just
before CG phase• in order to compare optimization capabilities of different compiler’s
front-end & middle-end• in order to take advantages of other implementations
– for example, to use LLVM for optimizing the code, and directly output to CG phase for performing existed optimizations & generating codes
Wish List: Replace Build System by CMake
• More powerful analysis for dependencies– it enables parallel make easily
• on an Intel Core 2 Extreme QX9650 3.4GHz (O.C.) machine, to build a full PACC32 compiler (based on Open64 4.0) with gmake -j5 just needs no more than 5 minutes
– it’s convenient to release product in binary form• rpath can be easily setup by simple CMake commands• any required runtime libraries can be included to
binary packages automatically– the speed of system inspection & build process
is faster than autotools (autoconf/automake/libtool), which is also not used in Open64 project so far
top related