quant skillz beyond wall st: deriving value from large, non-financial datasets

Download quant skillz beyond wall st: deriving value from large, non-financial datasets

Post on 11-Aug-2014

2.399 views

Category:

Data & Analytics

4 download

DESCRIPTION

This presentation was prepared for a talk on 2014.08.06 at the NYC Algorithmic Trading meetup (http://www.meetup.com/NYC-Algorithmic-Trading/events/197749772/) Regardless of whether you call it "data science", "business intelligence", "analytics", "statistics" or just plain old "math", we have many tried and true techniques for dealing with uncertainty (particularly in quantitative finance). But ambiguitywhat problem do we need to solve in the first place?is a separate matter and, at least in my experience, is the hardest part of creating value from data. During this talk, I'll discuss how we address ambiguity by giving a guided tour of some of our client projects, such as how to reduce legal e-discovery costs by 99% (hint: supervised binary classification of text documents) or how to assemble project teams on emerging R&D opportunities in a multinational organization (hint: unsupervised classification of employee expertise).

TRANSCRIPT

@deanmalmgren @DsAtweet 2014 august nyc algorithmic trading quant skillz beyond wall st deriving value from large, non-nancial datasets @deanmalmgren | bit.ly/design-data data scientists thrive with ambiguity solve for x x = 5 + 2 projectevolution @deanmalmgren | bit.ly/design-data data scientists thrive with ambiguity solve for x x = 5 + 2 projectevolution A x = b @deanmalmgren | bit.ly/design-data data scientists thrive with ambiguity solve for x x = 5 + 2 projectevolution A x = b optimize A x = b subject to f(x) > 0 @deanmalmgren | bit.ly/design-data data scientists thrive with ambiguity solve for x x = 5 + 2 projectevolution A x = b optimize f(x) optimize A x = b subject to f(x) > 0 @deanmalmgren | bit.ly/design-data data scientists thrive with ambiguity solve for x x = 5 + 2 projectevolution A x = b optimize f(x) optimize A x = b subject to f(x) > 0 optimize our protability @deanmalmgren | bit.ly/design-data origins of ambiguity many feasible approaches @deanmalmgren | bit.ly/design-data origins of ambiguity unclear problems identify the best locations to plant new trees @deanmalmgren | bit.ly/design-data origins of ambiguity unclear problems @deanmalmgren | bit.ly/design-data identify the best locations to plant new trees how many? what kinds of trees? move old trees? replace old trees? @deanmalmgren | bit.ly/design-data origins of ambiguity unclear problems identify the best locations to plant new trees how many? what kinds of trees? move old trees? replace old trees? aesthetically pleasing? maximize growth? increase foliage? offset CO2 emissions? @deanmalmgren | bit.ly/design-data @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback design process is used everywhere anticipate failure 1-4 week iterations @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback surveys, interviews, focus groups split testing, A/B testing QA; requirements churn personas, scenarios, use cases business/product requirements story/user cards build device prototypes minimum viable product write code human-centered design lean startup agile programming design process is used everywhere anticipate failure 1-4 week iterations @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback design and data science challenges in practice 1-4 week iterations @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback problem lost in translation design and data science challenges in practice 1-4 week iterations @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback problem lost in translation takes a long time to collect data, analyze, and build visualization design and data science challenges in practice 1-4 week iterations @deanmalmgren | bit.ly/design-data generate hypotheses build prototype evaluate feedback proof is in the pudding problem lost in translation takes a long time to collect data, analyze, and build visualization design and data science challenges in practice 1-4 week iterations @deanmalmgren | bit.ly/design-data how do projects start? @deanmalmgren | bit.ly/design-data how do projects start? @deanmalmgren | bit.ly/design-data how do projects start? @deanmalmgren | bit.ly/design-data how do projects start? @deanmalmgren | bit.ly/design-data how do projects start? @deanmalmgren | bit.ly/design-data informal conversation to stated goals mostly bad ideas, but a few good ones @deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data mostly bad ideas, but a few good ones informal conversation to stated goals @deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data mostly bad ideas, but a few good ones Lorem Ipsum: a narrative about blankets. Author: Charlie Brown Date: 31 Jan 2012 ! Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. ! In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsums words have been changed so they dont read as a proper text. ! Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. ! However, a placeholder text must have a natural distribution of letters and punctuation or otherwise the markup will look strange and unnatural. Thats what Lorem Ipsum helps to achieve. ! I would like to thank Peppermint Pattyfor her support on studying Lorem Ipsum as well as the innite wisdom of Linus van Peltand his willingness to use his blanket in my experiments. informal conversation to stated goals @deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data mostly bad ideas, but a few good ones Lorem Ipsum: a narrative about blankets. Author: Charlie Brown Date: 31 Jan 2012 ! Lorem Ipsum is a dummy text used when typesetting or marking up documents. It has a long history starting from the 1500s and is still used in digital millennium for typesetting electronic documents, page designs, etc. ! In itself, the original text of Lorem Ipsum might have been taken from an ancient Latin book that was written about 50 BC. Nevertheless, Lorem Ipsums words have been changed so they dont read as a proper text. ! Naturally, page designs that are made for text documents must contain some text rather than placeholder dots or something else. However, should they contain proper English words and sentences almost every reader will deliberately try to interpret it eventually, missing the design itself. ! However, a placeholder text must have a natural distribution of letters and punctuation or otherwise the markup will look strange and unnatural. Thats what Lorem Ipsum helps to achieve. ! I would like to thank Peppermint Pattyfor her support on studying Lorem Ipsum as well as the innite wisdom of Linus van Peltand his willingness to use his blanket in my experiments. informal conversation to stated goals @deanmalmgren | bit.ly/design-data@deanmalmgren | bit.ly/design-data mostly bad ideas, but a few good ones informal conversation to stated goals @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing @deanmalmgren | bit.ly/design-data concept sketch comparisons qualitative a/b testing search engine with relevance metrics demographics human readable expertise summary @deanmalmgren | bit.ly/design-data from sketch to blue print to prototype add detail to get feedback (while building) @deanmalmgren | bit.ly/design-data from sketch to blue print to prototype add detail to get feedback (while building) @deanmalmgren | bit.ly/design-data from sketch to blue print to prototype add detail to get feedback (while building) @deanmalmgren | bit.ly/design-data from sketch to blue print to prototype add detail to get feedback (while building) @deanmalmgren | bit.ly/design-data motorola data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola new product announcement data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola new product announcement rst versions from manufacturer data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola new product announcement rst versions from manufacturer available in stores data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola new product announcement rst versions from manufacturer available in stores next generation to manufacturer data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola new product announcement rst versions from manufacturer available in stores next generation to manufacturer product defects from consumers data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola data-driven consumer feedback @deanmalmgren | bit.ly/design-data motorola data-driven consumer feedback @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis aboutpatent not aboutpatent @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis aboutpatent not aboutpatent turn over to plaintiff dont turn over to plaintiff adverse inference @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis aboutpatent not aboutpatent turn over to plaintiff dont turn over to plaintiff adverse inference give away trade secrets @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis aboutpatent not aboutpatent turn over to plaintiff dont turn over to plaintiff adverse inference give away trade secrets @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis turn over to plaintiff dont turn over to plaintiff @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis algorithm design patents @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis algorithm design patents fantasy football lunch coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis algorithm design patents marketing nances fantasy football lunch coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map algorithm design patents marketing nances fantasy football lunch coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee review away shades of grey @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis create a document map fantasy football algorithm design patents lunch marketing nances coffee review away shades of grey reduce reviews by 90-99% @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis awesome! @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis who cares? awesome! @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis who cares? awesome! @deanmalmgren | bit.ly/design-data data-driven e-discovery daegis @deanmalmgren | bit.ly/design-data quant skillz to data science? bit.ly/metis-ds generate hypotheses build prototype evaluate feedback 1-4 week iterations @deanmalmgren | bit.ly/design-data quant skillz to data science? bit.ly/metis-ds http://bit.ly/design-data http://bit.ly/metis-ds ! @deanmalmgren dean.malmgren@datascopeanalytics.com solve ambiguous problems with quantitative, iterative approach