Download - Julia language: inside the corporation
|> inside the corporation
Andre Pemmelaar @QuantixResearch
About MeAndre Pemmelaar • 5-yrs Matsushita Financial System Solutions (Panasonic) • 12 Buy-Side Finance
• 7-yrs Japanese Gov’t Bond Options Market Maker (HNL) • 5-yrs Statistical Arbitrage (Global Equities)
• Low latency & Quantitative Algorithm • Primarily use mixture of basic statistics and machine
learning • R,Python,Java, F# …. and of course JULIA!
• Prefer function programming approach (F#, Scala, Haskell)
@QuantixResearch
My road to
My road to using
John Myles White 3.20.2013 at 9:38 am | Permalink Hi Andre, !In the abstract, I think Julia is the ideal language for doing both prototype modeling and transition to production. !But Julia is still very immature as a language, so I would not recommend it being used in production for another year or so. In addition, if you’re looking for an existing toolbox of models, R is the way to go. Even Python has still not caught up with R in this regard.
• Started reading about it in late 2012 ~ early 2013 • Wrote to John Myles White in Spring 2013
@QuantixResearch
• Decided it was too early -> kept following, but didn’t use
My road to using • Revisited ~ early 2014 • Began trying some simple projects
• Reinforcement Learning using tictactoe.jl • Found the code very easy to follow
• Started using the DataFrame.jl • Found it to be very stable and close enough to Panda (python)
• Started writing first serious attempt at something important in May 2014 • Orderbook simulation frame work
• Joined new company 3 months ago - using Julia almost exclusively for 3
month on real world problems in Finance
@QuantixResearch
Realized I could…
• Remain mostly functional in my approach to programming (but not 100%)
• Use fast for loops wherever appropriate (used in a lot of time series simulations)
• Easily code linear algebra, matrix calculations for machine learning, etc.(native in
Julia)
• Do it all it parallel (note: Julia’s parallel not yet 100% there yet)
• All of the above can be done in Python (Sci-kit, Numpy, etc) but often faster and with slightly less code in Julia
@QuantixResearch
My Moment
carefully insert here
Some background on my company
• One of Japan’s largest financial front-office system solution providers • Started off in derivative valuation and derivative OMS systems • Now offers an entire suite of products aimed at Japanese mega banks, and
2nd-tier financial organization • About 600 employees (about 60%~70 are technical) • Primarily production language is company isJava, with some done in C++,
or C • Quantitative analysis is done in Java (heavy duty large data set analysis) or
R for smaller datasets) with a few using Python users • Most quants are focused on Risk or Valuation, but a smaller team (mine)
makes use of predictive analytics, statistics, and ML to enhance various
algorithms
@QuantixResearch
Nothing sells like success• It helps to have a successful example to sell it internally
• In my case, during my first week I found some R code that was used every night (had lots of loops = ripe for porting to Julia)
• Re-wrote in Julia ->
• R took about 15:46m
• Java about +/- 20s
• Julia about 4.3 secs
• Note: Better Java programmer recently bested Julia version (3.9 secs)
On boarding new users
Making the first experience easier
• Set the expectation correctly • Documentation is sparse. • The stuff that is out there may not be current • Julia is fast, but can lose a lot of speed if coded improperly
@QuantixResearch
Poor Performance
Better Performance
Roadblocks
to initial adoption
I asked Julia colleagues, “What are/were the 3 biggest hurdles”
#3 Package breaking/incompatibility on update
#2 Lack of current documentation
#1 Lack of documentation
No one said bugs in base code, or lack of some critical feature. Everyone wants correct, examples of “here’s how you do this”
Roadblocks
to initial adoption
Really just two problems
1.Documentation
2.Update Chaos
DIY Documentation • Julia base documentation is good • The package’s docs vary greatly • The one great example is Gadfly
• Code, output, & explanation • Not so great doc ex: DataFrame
• No longer current • Many common tasks missing
• Create you own documentation • The single most difficult part of
learning Julia is the lack of current correct examples
• IJulia is fantastic for creating these!
• My Advice • Initially target early users
cases • DIY Document anything
people are struggling with
@QuantixResearch
Decide on the environment/tools
IJulia
@QuantixResearch
LightTables + Jewel
Decide on the environment & tools
• Julia is still new enough that small upgrades can break critical packages
• As the initial “Julia person” in your organization you will often be called on to solve various problems
• Solving new users problems is much easier if they are using the same tools and packages. Don’t underestimate this!
• At the beginning sharing exactly the same environment will make things smoother
• Recommend one person download the installers
• Create an thorough install read me file
@QuantixResearch
Our stack: • Julia 3.1 • IJulia • Light Tables
How did we do?• 6 people set out to learn Julia • 4 of them are now using it everyday • 1 is using it occasional along with Perl • 1 gave up • Why did that one give up?
• He as serious Java skills and good R • Started with Julia Studio (bad 1 st
experience) • Didn’t know about Light tables • Is physically separated from the rest
of us and thus didn’t get initial support to get through the initial low productivity period
@QuantixResearch
Julia: Real exampleRejection Order Algorithm
• The model:
• Determine if a order to lift a quote (execute against someones else's quote) in an OTC markets will be rejected
• Background: OTC market are “over the counter” and depending on the rules, the quoter can reject your order if it suits them
• Julia tools used:
• DataFrame.jl, StatsBase.jl, DecisionTrees.jl, SVM.jl
• Classification problem: 0 not rejected, 1 rejected
• Still on-going project: current best is about 0.54 Kappa
Julia: Real exampleRejection order algorithm con’t
Very unbalanced classes (0.1% are rejected)
• Regime shift means it needs to be somewhat adaptive
• Required us to change some of the libraries
• One of Julia’s great strength’s is that you can easily changed the libraries to suit you needs
What makes Julia great?• Speed? Julia is quite good, but Java can be as fast or faster. C
++ and C are faster
• Time to get a model out? Largely dependent on your knowledge of the tools you are using
• Parallelization? Not really. Still kinda raw. Memory usage can be a bit of an issue.
• Safer code via Functional approach? No. One can code functionally but doesn’t enforce it
• Easy to code and to access/read/understand others code? Yes
What makes Julia great?
Clear, concise code that can easily
be changed
When coded well, it is very fast
Great ability to mix loop based & matrix/vector operations
√ Java ∆ Python (Cython,etc) ∆ R (vectorized)
∆ Java (not really) √ Python ∆ R (only vectorized)∆ Java (not concise)
√ Python ∆ R (only R code. not C or C++)
Thank You!