accelerating numerically intensive life science codes for … · 2008-04-08 · low power • per...
TRANSCRIPT
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m1
Re
al
Scie
nce
Re
al
Nu
mb
ers
Re
al
So
ftw
are
Ac
cele
rati
ng
nu
meri
call
y i
nte
nsiv
e
life
scie
nce c
od
es f
or
mo
lecu
lar
an
d
qu
an
tum
mech
an
ics
Sim
on
Mc
Into
sh
-Sm
ith
, V
P C
usto
mer
Ap
pli
cati
on
s
sim
on
@cle
ars
peed
.co
m
MR
SC
08 c
on
fere
nce
Ap
ril
2n
d2008
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m2
Ag
en
da
•C
learS
pee
d in
tro
du
cti
on
•A
ccele
rato
r o
verv
iew
•S
oft
ware
to
ols
descri
pti
on
•A
ccele
rate
d a
pp
licati
on
su
cces
s s
tori
es
•F
utu
re d
evelo
pm
en
ts
•S
um
mary
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m3
Cle
arS
pee
d In
tro
du
cti
on
•F
ou
nd
ed
in
20
02
we a
re a
UK
co
mp
an
y, b
as
ed
in
Bri
sto
l,
wit
h o
ffic
es in
Sa
n J
os
e, C
A
•W
e a
re d
rive
n b
y t
he
pri
nc
iple
th
at
to d
elive
r H
IGH
PE
RF
OR
MA
NC
E y
ou
ha
ve
to
de
live
r H
IGH
PO
WE
R
EF
FIC
IEN
CY
as
sys
tem
s b
ec
om
e m
ore
sp
ace
, p
ow
er
su
pp
ly a
nd
co
olin
g c
on
str
ain
ed
•T
he
refo
re w
e d
elive
r th
e W
orl
d’s
mo
st
po
wer
eff
icie
nt,
hig
h-p
erf
orm
an
ce
pro
ce
ss
ors
, w
ith
su
pp
ort
ing
su
bs
ys
tem
s, s
oft
ware
deve
lop
me
nt
too
ls, lib
rari
es
an
d
ap
plic
ati
on
s
•W
e p
rovid
e s
olu
tio
ns f
or
bo
th t
he
Hig
h P
erf
orm
an
ce
Co
mp
uti
ng
(H
PC
) a
nd
em
be
dd
ed
sys
tem
s m
ark
ets
•P
art
neri
ng
wit
h H
P, IB
M, S
GI, S
un
an
d o
the
r O
EM
s t
o
de
live
r in
sys
tem
s
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m4
Re
al
Scie
nce
Re
al
Nu
mb
ers
Re
al
So
ftw
are
Cle
arS
pe
ed
ac
ce
lera
tor
ov
erv
iew
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m5
•Architecture designed for Coarse-
Grained Data Parallel Processing:
–Achieves high performance, low power
–Multi-threading enables asynchronous,
overlapped I/O with compute
–Scalable array of many Processor
Elements (PEs)
–Includes enterprise-class reliability
features necessary for HPC, such as
ECC on memories, spare PEs etc.
•Programmed in an extended version
of ANSI C called Cn:
–Rich expressive semantics
–Single “poly”data type modifier
Cle
arS
pee
d’s
accele
rato
r arc
hit
ectu
re
MT
AP
Pro
gra
mm
ab
le I
/O t
o D
RA
M
PE
0
Peripheral Network
PE
1
PE
n-1
…
Da
ta
Cac
he
Mo
no
Co
ntr
oll
er
Ins
tru
c-
tio
n
Cac
he
Co
ntr
ol
an
d
De
bu
gHigh Speed Bus
Po
ly C
on
tro
lle
r
High Speed Bus
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m6
Pro
ce
ssin
g E
lem
en
t (P
E)
arc
hit
ectu
re
•Multiple execution units per PE
•Floating point adder
•Floating point multiplier
•Fixed-point MAC 16x16 →32+64
•Integer ALU with shifter
•Load/store
•High-bandwidth, 5-port register file per PE
•Fast inter-PE communication path (swazzle)
•Closely coupled SRAM for data
•Keeping data close is key to low power
•Per PE address generators & DMA (PIO)
•Complete pointer model, including parallel pointer
chasing and vectors of addresses
•Key for gather/scatter and sparse operations
32 & 64-bit
IEEE 754
PE
n
Me
mo
ry a
dd
res
s
ge
ne
rato
r (P
IO)
Re
gis
ter
Fil
e
12
8 B
yte
s
PE
SR
AM
e.g
. 6
KB
yte
s
FP Mul
FP Add
MAC
ALU
PIO
Co
lle
cti
on
& D
istr
ibu
tio
n
64
64
64
32
64
64
PE
n+
1
PE
n–1
12
8
32
}
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m7
•A
rra
y o
f 96 P
rocesso
r E
lem
en
ts
•B
rute
fo
rce a
pp
roach
:
–Simple to use
–Power efficient
•64-b
it a
nd
32-b
it f
loati
ng
po
int
–Also integer processing
•210 MHz: key to low power and
therefore to absolute
performance
•In
teg
rate
d D
DR
2 m
em
ory
co
ntr
oll
er
•~
1 T
B/s
ec i
nte
rnal
ban
dw
idth
–At the register file
•128 m
illi
on
tra
nsis
tors
•L
ow
Po
wer,
Ap
pro
x 1
0 W
att
s
Th
e C
SX
600 P
roce
sso
r
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m8
Th
e C
learS
peed
Ad
van
ce
TM
e620 a
ccele
rato
r b
oard
•A
n Enterprise-class
HP
C a
ccele
rato
r–
Designed to fit into existing servers, even 1U. Single slot PCI Express x8 card
•Low power consumption, small, light
–Designed for high stability and reliability (MTBF)
•Board DRAM (1 GByte) is error protected and no moving parts (e.g. fans) are required
•80.6
4 D
ou
ble
Pre
cis
ion
(D
.P.)
IE
EE
754 G
FL
OP
S p
eak
–R∞≈66 GFLOPS for 64-bit matrix multiply (DGEMM) calls
–Hardware also supports 32-bit floating point and integer calculations
•O
ver
1 G
Byte
/sb
etw
een
accele
rato
r an
d h
os
t
•32/6
4-b
it d
rivers
fo
r L
inu
x (
Red
Hat
an
d S
use
) a
nd
Win
do
ws
•250 g
ram
s,
15c
m l
on
g,
35
watt
s f
or
en
tire
card
(a
t so
cket)
•N
o e
xtr
a p
ow
er
co
nn
ecto
rs,
co
oli
ng
or
sp
ace
(slo
ts)
req
uir
ed
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m9
Ult
ra d
en
se s
yste
ms
: 1 T
FL
OP
64-b
it in
1U
Lo
w p
ow
er
mean
s y
ou
can
bu
ild
very
d
en
se s
ys
tem
s t
o a
ch
ieve
th
e h
igh
es
t p
erf
orm
an
ce
:
•1U
sta
nd
ard
serv
er
–Intel 5365 3.0GHz
–2-socket, quad core
–0.096 Double Precision (D.P.) TFLOPS peak
–~600 watts
–36 servers �~3.5 TFLOPS peak in a 25 kW
rack
•C
learS
peed
Accele
rate
dT
era
sc
ale
Sys
tem
(C
AT
S)
–24 CSX600 processors
–~1 D.P. TFLOPS peak
–~600 watts
–36 CATS �~
35 T
FL
OP
Speak in a 25 kW
rack
•10X
sp
eed
up
fo
r th
e s
am
e p
ow
er,
sp
ace
an
d c
oo
lin
g–Or a 90% reduction in energy used and CO2
emitted for the same compute
2 PCIe x8
cables
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m10
CA
TS
1T
FL
OP
in
1U
arc
hit
ectu
re
Twelve Advance e620 boards
(two layers of six)
Power supply
PCI switching
Cooling fans
PCIe connectors
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m11
CA
TS
lau
nch
ed
at
SC
07 i
n R
en
o
•~
12 T
FL
OP
S 6
4-b
it in
12U
(21”)
•144 G
B
of
mem
ory
•921 G
B/s
mem
ory
ban
dw
idth
•A
pp
licati
on
s s
ho
wn
:
–Molpro (Quantum Mechanics)
–Amber (Molecular Dynamics)
–BUDE (Drug Docking)
–Sire (QM/MM)
–Monte Carlo (Financial Application)
•S
usta
ined
over
2 T
FL
OP
S o
n
Mo
lpro
•R
an
off
batt
ery
po
wer
for
5
min
ute
s d
uri
ng
a p
ow
er
cu
t!
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m12
Re
al
Scie
nce
Re
al
Nu
mb
ers
Re
al
So
ftw
are
Cle
arS
pee
d D
evelo
pm
en
t
En
vir
on
men
t
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m13
Cle
arS
pee
d d
evelo
pm
en
t en
vir
on
men
t
•V
ers
ion
3.0
ju
st
rele
as
ed
•Cn
op
tim
isin
g c
om
pile
r
–C with p
olyextension for SIMD datatypes
–Uses ACE CoSy compiler development system
•A
ss
em
ble
r, lin
ke
r, s
imu
lato
rs
•D
eb
ug
ge
r –
a p
ort
of
gd
b
–Runs on ClearSpeed’shardware at full speed
•P
rofi
lin
g –
cs
pro
f
–Heterogeneous, system-wide visualisation of an accelerated
application’s performance while running on both a multi-core host and
ClearSpeed accelerators. ClearSpeed’s hardware can be profiled in
real-time
•L
ibra
rie
s (
BL
AS
, R
NG
, F
FT
, m
ore
…)
& H
igh
le
ve
l A
PIs
•P
revie
w o
f an
EC
LIP
SE
ID
E
•D
oc
um
en
tati
on
, tr
ain
ing
ma
teri
als
•A
va
ila
ble
fo
r W
ind
ow
s a
nd
Lin
ux
(R
ed
Ha
t a
nd
SL
ES
)
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m14
Op
t L
ev
el 4
0
0.51
1.52
2.5
amber
american
_option_
pde_
mono
american
_option_
pde_
poly
asian_op
tion_mon
o
asian_op
tion_po
ly
asian_op
tion_vector
blackscholes_m
ono
blackscholes_p
oly
blackscholes_vector
broadieg
lasserman_m
ono
broadieg
lasserman_p
oly
broadieg
lasserman_vector
docking_compute eu
ro_o
ption
euro_o
ption_
bino
mial_mono
euro_o
ption_
bino
mial_po
ly
euro_o
ption_
bino
mial_vector
logp
mande
lbrot_po
ly
mersenn
e_twister
sinp
square
2.51
3.00 latest
Co
mp
iler
imp
rovem
en
ts in
th
e 3
.0 r
ele
ase
Represents an average of 15-20% performance improvement over
the previous release
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m15
Copyright ©2006 ClearSpeed Technology plc. All rights reserved.
15
Ecli
pse ID
E n
ow
availab
le in
pre
vie
w f
orm
Inte
gra
ted
de
ve
lop
men
t
en
vir
on
men
t s
up
po
rtin
g
co
mm
an
d l
ine C
SX
de
ve
lop
men
t to
ols
.
Based
on
th
e i
nd
ustr
y
sta
nd
ard
Ecli
ps
e p
latf
orm
.
Su
pp
ort
s d
evelo
pm
en
t o
f
ap
pli
ca
tio
n c
od
e u
sin
g t
he
Cle
arS
peed
Cn
lan
gu
ag
e.
Can
be u
se
d a
s a
plu
g-i
n
wit
h o
the
r exis
tin
g e
cli
pse
based
to
ols
to
pro
vid
e a
n
IDE
fo
r h
ete
rog
en
eo
us
so
ftw
are
de
velo
pm
en
t.
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m16
Cle
arS
pee
d c
sg
db
/dd
dd
eb
ug
ger
Real time plot of
contents of PE
Memory
Cn Source level
break point, watch
points single step
Disassembly,
break point, watch
points single step
Register contents
On Chip vector
contents displayed
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m17
Copyright ©2006 ClearSpeed Technology plc. All rights reserved.
17
Ecli
pse ID
E –
CS
X D
eb
ug
Pers
pecti
ve
Sta
nd
ard
Ecli
ps
e g
rap
hic
al
deb
ug
in
terf
ace
fo
r C
SX
pro
cesso
r d
eb
ug
gin
g.
CS
X p
rocesso
r p
rovid
es
fu
ll
hard
ware
deb
ug
gin
g o
f
ap
pli
ca
tio
n c
od
e.
Pro
vid
es s
ea
mle
ss v
iew
of
all
96
pro
cess
or
co
res a
nd
the a
ss
ocia
ted
sta
te.
All
ow
s f
ull
sym
bo
lic
deb
ug
of
the Cn
lan
gu
ag
e.
En
han
ced
vie
ws f
or
CS
X
sp
ecif
ic i
nfo
rmati
on
.
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m18
Advance™Accelerator Board
CSX 600
Pipeline
CSX 600
Pipeline
Host
CPU(s)
Host
CPU(s)
Host
CPU(s)
Cle
arS
pe
ed
pro
file
r fo
r h
ete
rog
en
eo
us
mu
lti-
pro
ce
sso
r s
ys
tem
s
Advance™Accelerator Board
Host
CPU(s)
CSX
Pipeline
HO
ST
/BO
AR
D I
NT
ER
AC
TIO
N
View host/board interactions.
Provides performance
information for data transfer
operations. Trace cluster
node/board interaction. See
overlap of host compute and
board compute.
CS
X P
IPE
LIN
E
View detailed instruction
issue information. Visualize
overlap of executing
instructions. Optimize code at
the instruction level. View
instruction level performance
bottlenecks. Get accurate
instruction timing.
CS
X S
YS
TE
M
View system level trace.
Visually inspect the
overlap of compute and
I/O. Visualize cache
utilization. View branch
trace of code executing.
Find and analyse
performance bottlenecks.
Get accurate event timing
CSX
Pipeline
HO
ST
CO
DE
PR
OF
ILIN
G
Visually inspect host code
executing.
Supports multiple threads
and processes. Time
specific code sections.
See overlap of host
threads executing.
Platform and processor
agnostic trace collection.
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m19
De
velo
per
su
pp
ort
•L
ots
of
on
lin
e s
up
po
rt a
nd
self
-tra
inin
g
mate
rials
:
–http://developer.clearspeed.com/resources/training/
•Self-paced training for programmers, includes optimisation
tips
–http://developer.clearspeed.com
•Online manuals, training materials, forums, support
–http://support.clearspeed.com
•All the latest software downloads, including example codes
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m20
Re
al
Scie
nce
Re
al
Nu
mb
ers
Re
al
So
ftw
are
Po
rtin
g lif
escie
nc
e a
pp
licati
on
s
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m21
Lif
escie
nce c
od
es a
re g
oo
d t
arg
ets
fo
r accele
rato
rs
•W
hy i
s t
his
?–They usually contain massive parallelism that can be exploited
–Often data parallel
•S
cie
nti
sts
in
th
e f
ield
s o
f co
mp
uta
tio
nal ch
em
istr
y,
bio
log
y a
nd
mate
rials
scie
nce w
an
t:–Simulations of larger systems (more atoms etc.)
–Longer simulations (e.g. protein folding)
–More accurate simulations, usually requiring more
computationally expensive methods
•C
learS
pee
d h
as f
ocu
sed
on
lif
e s
cie
nce c
od
es
req
uir
ing
flo
ati
ng
po
int
op
era
tio
ns:
–Molecular Dynamics codes such as Amber
–Quantum Chemistry codes such as Molpro
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m22
Ac
cele
rate
d a
pp
licati
on
s
•M
olp
ro e
lectr
on
ic m
ole
cu
lar
str
uctu
re c
od
e
–Collaboration with Dr Fred Manby’s group at Bristol
University’s Centre for Computational Chemistry
•S
ire Q
M/M
M f
ree e
nerg
y c
od
e (
uses M
olp
ro)
–Collaboration with Dr Christopher Woods
•B
UD
E m
ole
cu
lar
dyn
am
ics-b
as
ed
dru
g d
ockin
g
–Collaboration with Dr Richard Sessions at Bristol
University’s Department of Biochemistry
•A
mb
er
9 im
plicit
(p
rod
ucti
on
co
de)
–Molecular dynamics simulation with implicit solvent
•A
mb
er
9 e
xp
licit
(b
eta
)
–Molecular dynamics simulation with explicit solvent
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m23
Ex
pe
rie
nc
es p
ort
ing
th
e M
olp
ro/S
ire Q
ua
ntu
m M
ec
ha
nic
s c
od
e
•F
oc
us
ed
on
th
e D
en
sit
y F
un
cti
on
al T
he
ory
(D
FT
) p
art
of
Mo
lpro
: h
ttp
://d
eve
lop
er.
cle
ars
pe
ed
.co
m/r
es
ou
rces/t
rain
ing
/
•M
os
t o
f th
e r
un
-tim
e is
sp
en
t in
th
e e
xc
ha
ng
e a
nd
c
orr
ela
tio
n k
ern
els
. T
he
se
us
e q
ua
dra
ture
on
a g
rid
to
e
va
lua
te in
teg
rals
. T
he
co
mp
uta
tio
nal c
ost
of
bo
th s
ca
le
cu
bic
ally w
ith
th
e s
ize
of
the
pro
ble
m:
–B
uil
din
g t
he d
en
sit
y:
–B
uil
din
g t
he c
on
trib
uti
on
to
th
e F
ock
matr
ix:
–B
oth
can
be e
xp
ress
ed
as l
arg
e D
GE
MM
+ m
atr
ix-v
ecto
r o
ps
–U
se B
locked
DG
EM
M f
or
maxim
um
perf
orm
an
ce
•Was also possible to overlap this computation on the accelerator
with other Coulomb computation on the host at the same time –
heterogeneous computation
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m24
Ac
cele
rate
d M
olp
ro/S
ire R
esu
lts
•M
od
elin
g n
ew
Ne
ura
min
ida
se
in
hib
ito
rs t
o v
ac
cin
ate
ag
ain
st
the
In
flu
en
za
vir
us
•Sustaining
>2
20
do
ub
le p
rec
isio
n G
FL
OP
S o
n t
he
Mo
lpro
ap
plic
ati
on
pe
r C
AT
S n
od
e
•W
ith
10 C
AT
S n
od
es
� ���2.2 TFLOPS sustained
in a
sin
gle
rac
k a
t S
C0
7 (
DG
EM
M a
t ~
7 T
FL
OP
S!)
•E
xe
cu
ted
~60
x1
015 6
4-b
it f
loati
ng
po
int
op
era
tio
ns
on
Mo
lpro
in
~17
ho
urs
sp
rea
d o
ve
r 3
da
ys
du
rin
g S
C0
7
•7.3X speedup per CATS node across whole app
–Accelerated portion is ~17X faster
•Enables 1 ligand per day with
QM/MM levels of accuracy:
55 QM atoms, ~1600 MM atoms, DFT BLYP VDZ
Ligand bound to
Neuraminidase active site
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m25
Exp
eri
en
ces p
ort
ing
th
e B
UD
E d
rug
do
ckin
g c
od
e
•U
se
s M
ole
cu
lar
Dyn
am
ics
to
pe
rfo
rm M
on
te C
arl
o-b
as
ed
d
rug
do
ck
ing
•R
es
earc
h in
to p
ep
tid
e-b
as
ed
pro
tease
in
hib
ito
rs
•M
ult
iple
de
gre
es
of
ma
ssiv
e p
ara
llelism
to
ex
plo
it
–Multiple potential drug candidates requiring millions of docking
operations
–Different orientations and configurations of flexible ligands
means that the energy of billions of poses must be calculated
•T
he
fit
nes
s o
f a
po
se
is e
va
lua
ted
by a
co
mp
uta
tio
na
lly
ex
pe
ns
ive
, h
igh
ly a
cc
ura
te a
tom
-ato
m e
mp
iric
al fr
ee
e
ne
rgy f
orc
e f
ield
ca
lcu
lati
on
•T
he
ma
in k
ern
el w
as r
esp
on
sib
le f
or
>99
% o
f th
e
co
mp
uta
tio
na
l re
qu
ire
men
ts b
ut
co
nsis
ted
of
just
500
lin
es
of
FO
RT
RA
N s
ou
rce
co
de
•P
ort
ed
in
a f
ew
ma
n d
ays
an
d f
ull
y o
pti
mis
ed
in
ju
st
1-2
m
an
wee
ks
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m26
Ac
cele
rate
d B
UD
E r
esu
lts
•S
ca
les
lin
ea
rly a
cro
ss
mu
ltip
le C
AT
S n
od
es
–Ran on 10 CATS nodes simultaneously at SC07
–�120 ClearSpeed e620 accelerator boards
–�240 CSX600 processors
–�23,040 64-bit Processor Elements!
•13.5x speedup p
er
CA
TS
no
de
ac
ross
th
e w
ho
le
ap
plic
ati
on
co
mp
are
d t
o t
he
late
st
3.0
GH
z q
ua
d c
ore
C
PU
s
•A
wh
ole
pe
pti
de
lib
rary
ca
lcu
lati
on
to
ok
18
ho
urs
on
te
n
CA
TS
no
de
s, c
om
pa
red
to
th
e 5
da
ys
it
wo
uld
ha
ve
tak
en
o
n a
qu
ad
-co
re b
ase
d x
86
sys
tem
of
the
sa
me s
ize
an
d
ap
pro
xim
ate
po
wer
co
nsu
mp
tio
n
•T
he
fir
st
se
t o
f re
al p
ep
tid
es
ba
se
d o
n s
imu
lati
on
s r
un
on
C
AT
S h
ave
no
w b
ee
n s
yn
the
siz
ed
an
d a
re u
nd
erg
oin
g
tests
in
th
e la
bo
rato
ry
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m27
Am
ber
9 s
an
der
ap
plicati
on
accele
rati
on
•A
ccele
rate
d A
mb
er
9 s
an
der
imp
licit
sh
ipp
ing
n
ow
(d
ow
nlo
ad
fro
m C
learS
peed
web
sit
e)
•A
mb
er
9 s
an
der
exp
licit
avail
ab
le in
beta
so
on
•O
n C
AT
S, b
oth
exp
licit
an
d im
plicit
ach
ieve 1
0
to 2
0X
sp
eed
up
s–Saving 90-95% of the simulation time -> more simulations
–Alternatively, enabling larger, more accurate simulations
•W
hile r
ed
ucin
g p
ow
er
co
nsu
mp
tio
n b
y 6
6%
•A
nd
in
cre
asin
g s
erv
er
roo
m c
ap
acit
y b
y 3
00
%
•F
ull-s
cale
MD
ap
plicati
on
s h
ave b
een
hard
to
p
ort
–co
mp
licate
d c
od
es w
ith
lo
ts o
f cas
es
•C
learS
pee
d s
ou
rce c
od
e w
ill b
e in
clu
ded
in
th
e
Am
ber
10 r
ele
as
e b
y d
efa
ult
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m28
Oth
er
accele
rata
ble
ap
pli
cati
on
s
•C
om
pu
tati
on
al F
luid
Dyn
am
ics,
Sm
oo
th P
art
icle
Hyd
rod
yn
am
ics m
eth
od
s
–Collaborations with Jamil Appa at BAE systems and Dr
Graham Pullan and Tobias Brandvik at the Whittle Laboratory,
Cambridge University
•L
arg
e m
atr
ix-m
atr
ix a
rith
meti
c o
pera
tio
ns
–E.g. Electromagnetics, radar cross section etc.
•Im
ag
e p
roces
sin
g
•R
AD
AR
/SO
NA
R a
pp
licati
on
s (
e.g
. S
AR
)
•S
tar-
P f
rom
In
tera
cti
ve S
up
erc
om
pu
tin
g
•M
AT
LA
B a
nd
Math
em
ati
ca w
hen
perf
orm
ing
larg
e
matr
ix o
pera
tio
ns, su
ch
as s
olv
ing
syste
ms o
f
lin
ear
eq
uati
on
s
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m29
Re
al
Scie
nce
Re
al
Nu
mb
ers
Re
al
So
ftw
are
Fu
ture
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m30
Fu
ture
develo
pm
en
ts –
So
ftw
are
•E
as
e o
f u
se i
mp
rovem
en
ts,
e.g
.:
–ClearStack (near)
•Makes talking between the host and card much simpler
•Includes C++ object migration base class (beta, with partner)
•H
ete
rog
en
eo
us p
rog
ram
min
g e
nvir
on
men
t (n
ext)
–“Exploiting Loop-Level Parallelism for SIMD Arrays using
OpenMP”, IWOMP 2007 (Beijing)
•In
gen
era
l, p
rog
ram
min
g m
od
el
beco
min
g m
ore
Op
en
MP
-lik
e
–Cross-platform
–Heterogeneous
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m31
“E
xp
loit
ing
Lo
op
-Le
ve
l P
ara
lle
lis
m f
or
SIM
D A
rrays
us
ing
Op
en
MP
”
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m32
Fu
ture
develo
pm
en
ts –
Hard
ware
•N
ext
gen
era
tio
n p
roces
so
r “C
all
an
ish
”b
ein
g r
ele
ased
this
ye
ar
•N
ew
bo
ard
s a
nd
CA
TS
based
on
Call
an
ish
•B
ig in
cre
ases i
n p
erf
orm
an
ce a
nd
perf
orm
an
ce p
er
watt
•B
inary
co
mp
ati
ble
wit
h c
urr
en
t p
rod
ucts
sin
ce
rele
as
e 3
.0 o
f th
e S
DK
•W
atc
h t
his
sp
ace f
or
mo
re a
nn
ou
ncem
en
ts!
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m33
Su
mm
ary
•C
learS
pee
d a
ccele
rato
rs a
re d
esig
ned
fo
r H
PC
•H
ave f
ocu
sed
on
En
terp
rise-c
lass f
eatu
res t
o e
nab
le
reli
ab
le larg
e-s
cale
sys
tem
s
•R
eal
ap
plicati
on
s s
tart
ing
to
pro
ve a
cc
ele
rato
rs a
re
cre
dib
le
–E
.g. re
al c
om
po
un
ds
syn
the
siz
ed
fro
m B
UD
E d
oc
kin
g s
ims
!
•C
learS
pee
d e
nab
les P
eta
scale
syste
ms a
nd
be
yo
nd
, in
sm
all
er
form
facto
rs t
han
pre
vio
usly
po
ssib
le a
nd
wit
hin
exis
tin
g in
frastr
uctu
re
co
nstr
ain
ts
–E.g. 100 TFLOPS double precision in 3-4 racks, small enough to
fit in a department
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m34
Up
co
min
g e
ven
ts
•T
he n
ext
Cle
arS
peed
Us
er
Gro
up
meeti
ng
will
be h
eld
at
the In
tern
ati
on
al S
up
erc
om
pu
tin
g
Co
nfe
ren
ce (
ISC
08)
in D
resd
en
, G
erm
an
y o
n
Mo
nd
ay J
un
e 1
6th
–Inviting potential speakers to submit proposals now!
Copyright ©2008 ClearSpeed Technology Inc. All rights reserved.
ww
w.c
lears
peed
.co
m35