gpu technology conference - financial computation: value at...
TRANSCRIPT
Ettikan Kandasamy Karuppiah
GPU Team @MIMOS
(S3389) Financial Computation:
Value at Risk using Historical
Method on GPU
Outline
• Introduction to Portfolio Risk Management using VaR
• Challenges
• GPU Solution
• Performance Comparison
• Conclusion
• Future Works
Performance Comparison SQL VS GPU
BIG DATA
Solution A: Solution B:
SQL GPU
INTENSIVE
Up to 700 million records
COMPUTATION
0
500
1000
1500
2000
2500
3000
3 Levels 11 Levels
(Complex)
Se
con
ds
Performance
SQL
GPU
0x
2x
4x
6x
8x
10x
12x
14x
CPU GPU
Speed-up
1630s
124s
366s
> 24 HOUR !!
Next Step
x13x13
Overview
Introduction
Portfolio risk management applications estimate portfolio risk using three categories of approaches:
1. Assume multivariate-normally distributed risk factors that the portfolio is linear in.
Model: Normal Linear VaR (Variance-CoVariance)
2. Use historical data for as long as possible to “estimate” VaR with little
assumption about the distribution of risk factors.
Model: Historical simulation VaR
3. Assume some distribution for risk factors (simplest case: multivariate
normal) to simulate the returns of the portfolio.
Model: Monte Carlo VaR
Challenges
• Time sensitive results
• Complex configurations requires hours or even
days to compute.
• Compute intensive logics (details in next slide)
• Large and growing data (details in next slide)
• Requires a cost effective solution
Challenges – Compute Intensive Logics
• Client’s workflow requires the ability to find records matching certain values, such as stock symbol and business entity.
• Large number of records to search (up to 700 million records)
• Number of search keywords is exponentially proportional to the number of dimensions/levels (up to 11 dimensions/levels) and range of values in each dimension (between 2 and 1500 per dimension)
• Number of dimensions/levels is a user configurable option.
• Number of comparison is a product of number of records and number of search keywords. Brute comparison in the range of quadrillion or quintillion
Performance Comparison SQL VS GPU
BIG DATA
Solution A: Solution B:
SQL GPU
INTENSIVE
Up to 700 million records
COMPUTATION
0
500
1000
1500
2000
2500
3000
3 Levels 11 Levels
(Complex)
Se
con
ds
Performance
SQL
GPU
0x
2x
4x
6x
8x
10x
12x
14x
CPU GPU
Speed-up
1630s
124s
366s
> 24 HOUR !!
Next Step
x13x13
Overview
User Configuration- Scheme Level
Challenges-Large Data
• Data size up to 700 million records, 250
bytes per record
• Limited global memory ( 6GB per card)
• Limited shared memory
Processing Logics
•# of records up to 1-700
million
•Average record size
is 250 bytes
(1) Retrieve from DB
Data replicated up
to 700 times
(2) Data processing
and generation
Up to 1 million search
keywords
(3) Search and
Summation
Sort each set of summation
results
(4) Sort
Perform percentile on
each set
(5) Percentile
Brute force search generates
quadrillions (10^15) or
quintillion(10^18)
comparisons
Large Data but limited memory
Solution to Compute Intensive Logics
Brute force search generates
quadrillions (10^15) or
quintillion(10^18) comparisons
Solution to Compute Intensive Logics
After trimming the search list and keywords,
the number of comparisons drops from
quintillion(10^18 ) to 1 trillion (10^12)
Benchmark Database
• Solution comprises a mixture of CPU and GPU
• Benchmark database contains 1479 records duplicated 700 times to get 1479 x 700 = 1,035,300 historical records.
• # of search keywords for Nth level is product of N dimensions. (eg: 3rd level = 1*2*1479 = 2958)
• Total comparisons for level 3 is 2958x1035300 = 3.06 billion text comparisons
GPU Role
• To further improve performance, GPU is introduced to offload computationally intensive comparisons from CPU.
• Data is moved from SQL database into GPU before launching the Search and Sum kernel.
• To reduce the overheads associated with kernel launch, Search and Sum kernel performs multiple keyword searches in one kernel launch.
• Percentile results are returned and stored into the SQL DB.
Performance Comparison
Complexity
(1 -11)
SQL
(Seconds)
GPU
(Seconds)
Speed-up
3 Levels 1630 sec 40 secs ~40 (SQL/GPU)
6 Levels N/A 78 secs >40 (SQL/GPU)
11 Levels
(Complex)
> 24 hours *
(unable to
complete)
135 sec First Time Ever
made possible
(SQL/GPU)
* - previous application can’t complete the execution on PC
Future Works
• Multi-GPU solution
• Intelligent Search Algorithms
• More comparison between CPU and GPU
implementations
Library Interface WrapperUser interface APIs
App. 3
(SPARQL/SQL/XML)
App. 1
(Financial)App. 2
(Web interface)Applications
Application specific
Algorithms
VAR – Historical
VAR – Var-Covar
VAR – Monte Carlo
- Text/String Processing
- Text/Sting Analytics
In-memory DB Operations
& Query Parser/ Optimiser
Functional
Algorithms
(Generic)
String/Crypto
library*Searching
*Sorting
*Matching
* Data Cleansing
*Crypto
Numeric
Library*Financial
*Matrix
*Scientific
*Statistics.
Image & IMDB
Library*Retrieval
*Transfer
*Indexing
*Analytics
* VA Algos
XML
/SQL/SPARQL
Library*Unified indexer
*Query operator
*Multi-format data
manager
*Resource manager
Multi-core CPUGPU/CPU -
Many Core
Various
Hardware
MIMOS Library Architecture
Progress Indicator
2012 MIMOS Berhad. All Rights Reserved.
21
Enabling MIMOS Cloud
Infrastructure as a Service
Platform As a Service
Software As a Service
User Services
Metal as a Service
Cloud Enabled Infrastructure
VM
X86_64
VM
X86_64
VM
X86_64
VM Management layer
Application Application Application
User
Un
ifie
d C
lou
d S
ecu
rity (Unified) Software Management Layer
Platform Management Layer
Inte
llig
en
t R
eso
urc
e M
an
ag
er
(Unified Platform) Services Marketplace
Sv
c B
rok
er
OS
VM Management layer
NetworkTCP/IP
StorageCompute
X86_64 & GPU
Network & Storage Management layer
Heterogeneous Computing Access Library
Accelerator
Library Access
Sv
c B
rok
er
MIMOS
Technology