FastROCS: What does it mean to be “fast”?
OpenEye Scienti!c Software Brian Cole
March 26, 2013 © 2013 OpenEye Scienti!c Software
FastROCS and the “Chasm”
OpenEye Scientific Software Brian Cole
© 2013 OpenEye Scientific Software March 26, 2013
ROCS: Rapid Overlay of Chemical Structures
March 26, 2013 © 2013 OpenEye Scienti!c Software
LeadHopper
March 26, 2013 © 2013 OpenEye Scienti!c Software
And then you wait…
March 26, 2013 © 2013 OpenEye Scienti!c Software
What is FastROCS?
CPU GPU
Shap
e Overla
ys per Secon
d
© 2013 OpenEye Scienti!c Software
High is
Best
1
10
100
1,000
10,000
100,000
1,000,000
CPU GPU
Shap
e Overla
ys per Secon
d
What is FastROCS?
© 2013 OpenEye Scienti!c Software
High is
Best
© 2013 OpenEye Scien;fic So>ware
0
100,000
200,000
300,000
400,000
500,000
600,000
CPU GPU
Shap
e Overla
ys per Secon
d
What is FastROCS?
High is
Best
1
10
100
1,000
10,000
100,000
1 10 100
Log (Elapsed
5me in se
cond
s)
Log (cores/GPUs)
March 26, 2013 © 2013 OpenEye Scienti!c Software
But I want it now!
ROCS
FastROCS Low is
Best
Riding Moore’s Law
March 26, 2013 © 2013 OpenEye Scienti!c Software
0 200,000 400,000 600,000 800,000
1,000,000 1,200,000 1,400,000 1,600,000 1,800,000 2,000,000
C1060 C2050 C2075 C2090 K10 K20
Shap
e Overla
ys per Secon
d
High is
Best
ROCS user base
• Every Pharma R&D • Many BioTechs • Many Universities • National Labs and Research Centers • Other software companies
March 26, 2013 © 2013 OpenEye Scienti!c Software
Licenses by Year
March 26, 2013 © 2013 OpenEye Scienti!c Software
2009 2010 2011 2012
ROCS
FastROCS
High is
Best
Licenses by Year (Linear Scale)
March 26, 2013 © 2013 OpenEye Scienti!c Software
2009 2010 2011 2012
ROCS
FastROCS
%15
Pharmageddon
All ROCS users (linear scale)
March 26, 2013 © 2013 OpenEye Scienti!c Software
2009 2010 2011 2012
Academics
ROCS
FastROCS
%3
Technology Adoption Lifecycle
March 26, 2013 © 2013 OpenEye Scienti!c Software
%2.5 %13.5 %34 %34 %16
FastROCS
What’s in the “chasm”?
• “ROCS is already fast enough”
• “The results aren’t bitwise comparable”
• “There’s nothing else to run on the GPU”
• “GPUs are different”
March 26, 2013 © 2013 OpenEye Scienti!c Software
GTC!
Some other ;me…
FastROCS Quick Start
• crtl-alt-F1 (to switch to a non X-server terminal) • login as root • /sbin/init 3 (to turn off the X-server) • ./NVIDIA-Linux-x86_64-285.05.09.run • reboot • ./cuda.sh to give /dev/nvidia* correct permissions
• tar –xzf fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz • openeye/bin/ShapeDatabaseServer.py database.oeb.gz • openeye/bin/ShapeDatabaseClient.py localhost:8080 query.sdf out.sdf
March 26, 2013 © 2013 OpenEye Scienti!c Software
ROCS Quick Start
• tar –xzf ROCS-3.1.1-RHEL5-x64.tar.gz
• openeye/bin/rocs query.sdf database.oeb.gz
March 26, 2013 © 2013 OpenEye Scienti!c Software
S;ll a barrier to entry to work around!
This is even worse!
fastrocs-1.3.1-RHEL5-x64-OpenCL-1.1-CUDA-4.1.tar.gz
March 26, 2013 © 2013 OpenEye Scienti!c Software
NVidia OpenCL binaries are ;ghtly locked to a par;cular driver version
Worthwhile to upgrade
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
C2050 (260 Driver) C2050 (295 Driver)
Conformers /
Secon
d %11
High is
Best
Needed for new hardware
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
C2050 (295 Driver) M2090 (295 Driver)
Conformers /
Secon
d
High is
Best
Scalability between drivers (4x C2050)
March 26, 2013 © 2013 OpenEye Scienti!c Software
1
2
3
4
1 2 3 4
Speedu
p (Single GPU
5me / Mul5-‐GPU
5me)
Number of GPUs
Ideal
260 driver
295 driver
High is
Best
Really bad for 8x M2090
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
Speedu
p (Single GPU
5me / Mul5-‐GPU
5me)
Number of GPUs
High is
Best
Ways to transfer to device
• CL_MEM_USE_HOST_PTR – kernelBuf = clCreateBuffer(CL_MEM_USE_HOST_PTR)
• CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR – kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_COPY_HOST_PTR)
• CL_MEM_ALLOC_HOST_PTR – kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable – ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) – memcpy(ptr, data) – clEnqueueUnmapMemObject(ptr)
• clEnqueueMapBuffer – kernelBuf = clCreateBuffer() - cacheable – ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) – memcpy(ptr, data) – clEnqueueUnmapMemObject(ptr)
• clEnqueueWriteBuffer – kernelBuf = clCreateBuffer() - cacheable – clEnqueueWriteBuffer(kernelBuf, data)
• oclCopyCompute – pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) – cacheable – pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable – memcpy(pinnedPtr, data) – kernelBuf = clCreateBuffer() – cacheable – clEnqueueWriteBuffer(kernelBuf, pinnedPtr)
March 26, 2013 © 2013 OpenEye Scienti!c Software
Ways to transfer from device
• CL_MEM_ALLOC_HOST_PTR – kernelBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) - cacheable – ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) – memcpy(data, ptr) – clEnqueueUnmapMemObject(ptr)
• clEnqueueMapBuffer – kernelBuf = clCreateBuffer() - cacheable – ptr = clEnqueueMapBuffer(kernelBuf, CL_MAP_WRITE) – memcpy(data, ptr) – clEnqueueUnmapMemObject(ptr)
• clEnqueueReadBuffer – kernelBuf = clCreateBuffer() - cacheable – clEnqueueWriteBuffer(kernelBuf, data)
• oclCopyCompute – pinnedBuf = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR|CL_MEM_READ_WRITE) –
cacheable – pinnedPtr = clEnqueueMapBuffer(pinnedBuf, CL_MAP_WRITE) – cacheable – memcpy(pinnedPtr, data) – kernelBuf = clCreateBuffer() – cacheable – clEnqueueReadBuffer(kernelBuf, pinnedPtr)
March 26, 2013 © 2013 OpenEye Scienti!c Software
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
1
2
3
4
5
6
7
8
9
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 Speedu
p (Tim
e Sequ
en5a
l / Tim
e Pa
rallel)
Number of GPUs U5lized
FastROCS scalability across 8x M2070
Lessons from the mess
• clEnqueueWriteBuffer > clEnqueueMapBuffer
• clEnqueueMapBuffer >> clEnqueueReadBuffer
• CL_MEM_* constants aren’t worth the effort
March 26, 2013 © 2013 OpenEye Scienti!c Software
CUDA?
• Serious customers will only use NVidia cards
• Pinned memory
• Better support for binaries and compatibility • CUDA support >> OpenCL support
March 26, 2013 © 2013 OpenEye Scienti!c Software
FastROCS CUDA port
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
OpenCL CUDA CUDA-‐pinned
Confom
ers p
er Secon
d
2xC2075 2xC2090 2xK20
High is
Best
CUDA Scaling?
March 26, 2013 © 2013 OpenEye Scienti!c Software
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
1 2 3 4 5 6 7 8
Conformers p
er Secon
d
Number of individual K10 GPUs (Note, each K10 has 2 physical GPUs on the board)
CUDA
OpenCL
Ideal
High is
Best
CUDA vs OpenCL: Ding Ding!
• Portability vs Innovation
• NVidia vs Intel and AMD
• Open vs Proprietary
• Customers don’t care…
March 26, 2013 © 2013 OpenEye Scienti!c Software
ROCS Implementations
• We only care a little…
• Fortran code (1995) • C code (1999) • C++ wrapper code (2003) • OpenCL code (2009) • CUDA code (2012) • C++ thread-safe code (2013)
March 26, 2013 © 2013 OpenEye Scienti!c Software
OpenEye Software
• Lots of Software – 14 products – 13 software libraries
• C++ (no SIMD) – 2.5 million lines
• Python – 416 thousand lines
• Java – 63 thousand lines
• C# – 38 thousand lines
© 2012 OpenEye Scien;fic So>ware
20
12
10 Programmers Hardcore Scripter Other stuff
The People
• GPGPU = ½ of a developer – Only %2.5 of development effort
© 2012 OpenEye Scientific Software
Technology Adoption Lifecycle
March 26, 2013 © 2013 OpenEye Scienti!c Software
%2.5 %13.5 %34 %34 %16
OpenEye GPGPU development
LinkedIn skills
March 26, 2013 © 2013 OpenEye Scienti!c Software
%2.2
Technology Adoption Lifecycle
March 26, 2013 © 2013 OpenEye Scienti!c Software
%2.5 %13.5 %34 %34 %16
GPGPU development
I Believe…
• GPGPU computing can become ubiquitous…
• By expressing parallelism everywhere…
• We can make it easy for our customers… – Pre-installed in every operating system – Integrated seamlessly into every language – Then eventually becoming the CPU
March 26, 2013 © 2013 OpenEye Scienti!c Software
Acknowledgements
• Nikolai Sakharnykh (NVidia) • Dave Mullaly (HP) • Exxact Computing
March 26, 2013 © 2013 OpenEye Scienti!c Software
Father of “ROCS”
Andrew Grant April 28th 1963 - December 29th 2012
March 26, 2013 © 2013 OpenEye Scienti!c Software
March 26, 2013 © 2013 OpenEye Scienti!c Software
Dude, where’s my color?
March 26, 2013 © 2010 OpenEye Scienti!c Software
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
ROCS FastROCS
DUD Av
erage AU
C
Shape Only With Color
ROCS vs FastROCS Histogram
March 26, 2013 © 2010 OpenEye Scienti!c Software
0
2
4
6
8
10
12 0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Num
ber o
f Targets
Kendall Tau Correla5on Coefficient