productive data tools for quants
TRANSCRIPT
![Page 1: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/1.jpg)
Productive Data Tools for Quants
Wes McKinney@wesmckinn
Python in Finance 2013, 2013-04-05
![Page 2: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/2.jpg)
Me
• Started pandas project at AQR in 2008
• Other Python projects I’ve been involved with: statsmodels, vbench, gpustats
• http://blog.wesmckinney.com
• Currently: Founder of stealth SF data startup
![Page 3: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/3.jpg)
Book
• In print now!
• IPython
• NumPy
• pandas
• matplotlib
• Case studies
![Page 4: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/4.jpg)
Finance languages
![Page 5: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/5.jpg)
pandas
• Productivity-focused structured data manipulation tools for Python
• Fast, intuitive data structures
• Filling the gap between Python and more domain-specific languages like R
• Huge growth in 2011-2012, continuing in 2013
![Page 6: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/6.jpg)
Productivity, why do we care?
![Page 7: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/7.jpg)
People time = money
![Page 8: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/8.jpg)
Productive not same as high performance
![Page 9: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/9.jpg)
Tool bottlenecks impede innovation
![Page 10: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/10.jpg)
Aside: vbench for performance testing
![Page 11: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/11.jpg)
(Some) financial data challenges
• Metadata and data alignment
• “Missing” data
• Group Operations
• Time series
![Page 12: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/12.jpg)
Data alignment
• Stock universes
• Timestamps
![Page 13: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/13.jpg)
Let’s talk about...
![Page 14: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/14.jpg)
Let’s talk about...
a - b
Signal 1 Signal 2
![Page 15: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/15.jpg)
Let’s talk about...
sum(a - b) / mean(c)
![Page 16: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/16.jpg)
a - b• Same length?
• Same metadata?
• Same frequency?
Data alignment
Assumptions can be dangerous
![Page 17: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/17.jpg)
Data alignment• pandas uses axis indexing to specify default
join (“automatic data alignment”) behavior
B
C
D
E
1
2
3
4
A
B
C
D
0
1
2
3
+ =
A
B
C
D
NA
2
4
6
E NA
![Page 18: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/18.jpg)
Hierarchical indexes
• Semantics: a tuple at each tick
• Enables easy group selection
• Terminology: “multiple levels”
• Natural part of GroupBy and reshape operations
A 1
2
3
1
2
3
4
B
![Page 19: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/19.jpg)
Missing data
• Interpolation (esp. time series)
• Dropping / filtering
• Replacing with value
• Excluding from statistical computations
![Page 20: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/20.jpg)
Time series
• Data alignment
• Frequency conversions
• Date arithmetic
• Resampling
• Time zones
• “As of” joins and lookups
![Page 21: Productive Data Tools for Quants](https://reader034.vdocuments.us/reader034/viewer/2022042723/587adcf11a28ab542b8b59a7/html5/thumbnails/21.jpg)
GroupBy
A 0
B 5
C 10
5
10
15
10
15
20
A
A
A
B
B
B
C
C
C
A 15
B 30
C 45
A
B
C
A
B
C
0
5
10
5
10
15
10
15
20
sum
ApplySplit
Key
Combine
sum
sum