creating a decision framework for opencl...
TRANSCRIPT
CREATING A DECISION FRAMEWORK
FOR OpenCL USAGE
Graham Brown
CTO Corel Corporation
2 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
AGENDA
OpenCL Overview
Corel’s View of Optimization
Sample of Corel’s Decision Framework
Additional Considerations
3 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL OVERVIEW
4 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
QUICK POLL:
WHO IS WORKING WITH OR
INVESTIGATING OpenCL?
OR HAS A COLLEAGUE WHO
IS DOING SO?
5 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL | KEY POINTS
Acronym for Open Computing Language
– allows cross-platform parallel programming
– Open and royalty–free standard
– Improves application speed and responsiveness
– Can leverage all computing resources (CPU, GPU, APU)
Source: Khronos.org
6 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL | GOALS
Near transparent use of computing resources
Use GPU resources for non-graphics processing w/o impacting power
usage or graphics rendering speed
Data or task-based parallel processing
Familiar / compatible with existing programming models
Source: Khronos.org
7 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OPENCL | History – from A to K
Created by Apple
Standardized in 2008, v1.1 released 2010
Wide industry participation
Maintained and evolved by Khronos
Source: Khronos.org
http://www.khronos.org
8 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
GPU PROCESSING | IHV Support
– GPU first supported as ATI Stream SDK
– AMD APP SDK 2.4 released April 2011, fully conformant with
OpenCL 1.1, includes CPU support
– OpenCL support initially added for SandyBridge
– Released fully conformant OpenCL 1.1 SDK Beta May, 2011
– GPU first supported on CUDA platform
– OpenCL 1.0 support ships with production drivers
9 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL | Code Sample
Think C99, Minus
– recursion
– function pointers
– variable-length arrays
But With
+ Math functions (matrix / vector)
+ extensions for
“work item” support
Vector, Image types
+ Image manipulation ops (AMD)
+ H.264 encode (Intel)
OCL Kernel:
workspace = (img_width, img_height)
__constant sampler_t sampler = CLK_ADDRESS_CLAMP |
CLK_FILTER_NEAREST | CLK_NORMALIZED_COORD_FALSE;
__kernel void RGBA2GRAY(__read_only image2d_t RGBAImg,
__write_only image2d_t GrayImg)
{
int2 coord;
coord.x = get_global_id(0);
coord.y = get_global_id(1);
uint4 rgba = read_imageui(RGBAImg, sampler, coord);
float4 frgba = convert_float4(rgba);
float4 coef = (0.299f,0.587f,0.114f,0.0f);
float res = dot(frgba, coef);
uint res_int = round(res);
uint4 res_vector = (res_int, res_int, res_int, res_int);
write_imageui(GrayImg, coord, res_vector);
}
10 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL | Drawbacks and Challenges
DRAWBACKS
– Not quite C or C++
– Hardware-specific “Tweaking” required
– (Potentially) requires 4 or more code streams to maximize performance
CHALLENGES
– Initial implementation costs
– Code complexity
– Support costs
– Install package complexity
11 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
SO WHY USE GPU / OpenCL?
12 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
VIDEO STUDIO X4 – Software vs OpenCL Encode
The Treasure Hunters
Clip Length: 199 Seconds
OpenCL Encode
137.5 Seconds
69% of realtime
Software Encode
322.4 Seconds
162% of realtime
OS: Windows 7 64-bit Ultimate
CPU: AMD PhenomII X6 1055T 2.8GHz
VGA: AMD Radeon HD 6870
Memory: 4.00GB
13 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
OpenCL | In the future
Transparent data and kernel switching between devices (CPU and GPU)
Operation of kernels on CPU or GPU without re-compilation
Corel’s Wishlist:
– Speedy transition of extensions to core OpenCL spec
– IHV alignment on rapidly moving OpenCL forward
– Continued focus on C99 “Likeness”
– Focus on tools by all IHV’s
14 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
COREL’S VIEW OF
OPTIMIZATION
15 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
COREL | A brief history
Entrepreneur start-up
Products over 26y history:
– Laser printers, SCSI cards
– CorelDRAW grew out of an extension to a DTP product
– Desktop Video Conferencing
– Linux OS
– 100’s of Windows Apps – many from acquisitions
Private / Public / Private / Public / Private(!) ownership
= Adept at change management!
16 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
WITH CREDIT TO WINSTON CHURCHILL
CHURCHILL:
COREL:
“Study History, study
history. In history lies all
the secrets of statecraft”.
Study History, study
history. In history lies all
the secrets of statecraft
optimization!
17 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
TOP 3 TAKEAWAYS:
18 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
TOP 3 TAKEAWAYS:
TAKE POST-MORTEM’S SERIOUSLY, AND
IMPLEMENT PROCESS CHANGES
19 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
TOP 3 TAKEAWAYS:
TAKE POST-MORTEM’S SERIOUSLY, AND
IMPLEMENT PROCESS CHANGES
COMMIT TO KEY STAFF
20 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
TOP 3 TAKEAWAYS:
TAKE POST-MORTEM’S SERIOUSLY, AND
IMPLEMENT PROCESS CHANGES
COMMIT TO KEY STAFF
“OPTIMIZATION SHOULD BE
ALL ABOUT THE USER”
21 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
“IT’S ALL ABOUT THE USER”
Success: Where optimization was focused on our users:
– “Real-Time” editing in CorelDRAW
– Increased battery life in WinDVD
Failure : Where it was not:
– Re-architecture / optimization not focuses on explicit user benefit
22 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
FOR OPTIMIZATION, IT’S
REALLY
ALL ABOUT THE USER
23 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
Test Image
500 objects, 2,000 points
Optimization Result:
– Display speed increased 10x
Engineer’s Test Drawing
24 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
Validation: Image
9 x 568 objects, 9 x 3,888 points
Optimization Result:
– Display speed increased 10x
25 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
Litmus Test:
6,254 objects, 52,378 points
Before Optimization:
– Display time ~2 minutes
26 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
Litmus Test:
6,254 objects, 52,378 points
Before Optimization:
– Display time ~2 minutes
After Optimization:
– Display time ~90 sec
27 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
What Happened?
Good News: In the end, “The Huntress” achieved the same 10x speed-up, displaying in about 10 seconds
Cause of discrepancy:
– Train creator used nested hierarchy of objects
– Huntress creator used linear object list
28 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | CorelDRAW 7
What Happened?
Good News: In the end, “The Huntress” achieved the same 10x speed-up, displaying in about 10 seconds
Cause of discrepancy:
– Train creator used nested hierarchy of objects
– Huntress creator used linear object list
KEY TAKEAWAY: Without knowing all user scenarios, can’t predict bottlenecks
29 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CASE STUDY | …AND CorelDRAW X5
Only 429 Objects, 25,061 Points
– but mesh fills = a new bottleneck!
30 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
SO:
“WHY OpenCL?”
31 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
TRENDS IN DIGITAL MEDIA – COREL’S VIEWPOINT
Trend User Impact Enhanced by OpenCL*
GPU & CPU advances More processing power
Mobile Computing More data on the cloud
Social Networking More cloud-based fix & edit
Touch-Based UI’sIncreased demand for
instant feedback
More 3D More data to process
Streaming Video Processing time extends wait
Enhanced User experience Rising user expectations
*Corel’s current optimization plans – does not reflect general OpenCL applicability
32 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
WHAT MATTERS TO OUR DIGITAL MEDIA USERS? | Many Variables
Open / Save Speed
Rendering / Encoding Speed
Battery Life
Execution of operations:
– Real-time preview of effects
– Image correction operations
33 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
PaintShop Pro – Original Image
34 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
PaintShop Pro – 3 Clicks / 30 seconds later
- Straighten
- Smart Photo Fix
- Local Tone Map
35 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
GUIDING OPTIMIZATION DECISIONS
Corel’s approach: Employ a framework to guide optimization decisions
Case Study: VideoStudio Encoding Optimization
– Data Flow Block Diagram
– Stated Objectives / desired outcomes (from PM / UED)
– Decision Tree (backed by detailed spreadsheet)
36 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Case Study: Data Flow Block Diagram – Encoding / Rendering in VideoStudio
Original
File(s)Encoded
File
CPU
Decoding
CPU Video Render
(effect / composite /
scaling / de-interlace
/ color conversion)
CPU
Encoding /
Mixing
GPU
Decoding
DXVA Accel
GPU/D3D Video
Render (effect /
composite / scaling /
de-interlace / color
conversion)
GPU Encoding
(H.264 / MPEG 2)
Legend
System Memory transfer
System/Video Memory Transfer
Video Memory Transfer
37 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Case Study: Clearly Stated Objectives
Clarity around optimization objectives from the user’s perspective is a key input
Some examples:
– Decrease time-to-render on target platforms by 50%, and ensure all platforms are at least 10% faster
– Make the XYZ effect real-time for any platform that meets our minimum spec
– Etc.
38 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Request for
Performance
Improvement
Proceed with
Design/Native
Code optimization
Design & Code
currently optimized
in C / C++?
Can the data
or operation be
parallelized?
Is the
operation
relative easy to
implement in
“standard”
OpenCL?
Provide 2 or more
implementations, based
upon hardware
Proceed with OpenCL
implementation
Implement on latest
version of OpenCL
Focus on CPU
Based optimization*
Implement on latest
Common version of
OpenCL
Implement solutions
Optimized by HW
Proceed witn
Non-OpenCL
implementation
YES
NO
Will optimized
Design / Native
Code address
Needs?
Is the
req’d functtionality
Available in 1 or more
external proprietary
libraries?
Performance
critical area, or risk/
QA effort warrant
using extension /
library?
Does
performance
difference warrant
splitting code
paths?
At least one
Solution using
OpenCL
Is the
latest
version of OpenCL
supported on all
req`d HW
NO
NO
YES
YES
NO
YES NO NO YES
YES YES NO
NO
YES
YES
Case Study: Encode Path Decision Tree
39 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Can the data
or operation be
parallelized?
YES
NO
Parallelism will usually be a key decision for OpenCL usage;
however, exceptions always exist (for example, when battery usage is critical)
Focus on CPU
Based optimization*
Case Study: Encode Path Decision Tree
40 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Case Study: Encode Path Decision Tree
Provide 2 or more
implementations, based
upon hardware
Proceed with OpenCL
implementation
Performance
critical area, or dev
risk warrants
using extension /
library?
YES
NO
Maintaining a single code base will always be preferable,
but performance will sometimes warrant forking the code
41 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Case Study: Encode Path Decision Tree
Implement on latest
version of OpenCL
Is the
latest
version of OpenCL
supported on all
req`d HW?
YES
NO
Developers will usually push to use the latest / greatest version of any technology,
but that may not be the correct answer for our users
42 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Case Study: Reminder
This is one of the VideoStudio decision trees
– Helps to focus where efforts should be directed for one class of optimization
Other apps or workflows will have their own decision tree’s
Presentation materials will be shared online
43 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CLOSING CTO MUSINGS
Post Mortems are good – in-process checklists, prepared in advance, are better!
Objectives Checklist Check
Optimizing on features that matter to users?
Battery performance, Open/save, rendering, encoding, other
Positive impact on largest/most important group of users?
Is OpenCL the right tool?
Options – OpenCL kernel libraries (e.g. AML), optimize existing code,
existing design
What’s changed?
Availability of technology or support for it
Are we keeping a reference copy of code to validate results?
Are we comparison checking results?
Did we validate results with real user artifacts AND real users?
44 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
CLOSING CTO MUSINGS – CONT’D
Post Mortems are good – in-process checklists, prepared in advance, are better…
….. And hardware vendors collaborating to progress OpenCL is great!
45 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
THANK YOU!
47 | Creating a Decision Framework for OpenCL Usage | June 15, 2011
Presentation Back-up Materials
Corel Stop Motion Video: Treasure Hunters
http://vimeo.com/20610210
http://youtu.be/xFDs-_CJHm8
Corel Stop Motion Secrets: Meet the Filmmaker (interview with John Huang)
http://vimeo.com/20066276
http://youtu.be/6zwO8Pp--WQ
Corel Time-Lapse Video: Time in Motion:
http://vimeo.com/20068138
http://youtu.be/hHvmiLpIsIY
Additional movies by two WordPerfect Project Leaders.
http://www.truedimensions.com/timesage/movies.htm