Better Living
Through Data
Science Scott
Nicholson
@scootrous
snicholson@
accretivehealth.com
lnkd.in/scott
Helping people and businesses
make better decisions
Does big data help people make better decisions? No, insights do. BD is a realization that we can do more with data than we previously thought, just as much as it is about more data being available Companies in 2000 who didn’t know what to do with their “small” data won’t be any better off with big/huge/fat data today. It’s about insights, and data scientists are well-suited to create them. I’d prefer an brilliant Excel/SQL guru who asks the right questions than a deeply technical ‘big data’ engineer who focuses on elegance and algorithms.
Today
What is data
science?
Project phases
Where do you find
people who can do
it?
/Hila
“Data Scientist” means different
things to different people
/Hila
“Data Scientist” means different
things to different people
Credit: Drew Conway
/Hila
“Data Scientist” means different
things to different people
Credit: Hilary Mason
“Data Scientist” means different
things to different people
My definition of a data scientist:
Someone who uses data to solve
problems end-to-end, from asking
the right questions to making
insights actionable.
End-to-end data science: five stages
Ask the right
questions
Choose your
approach
Extract & clean
your data
Build a model
Deploy, learn, iterate
One of the hardest
things to find in a
data scientist
Phase 1
Ask the Right
Questions
Do we always
need to build a
model?
Phase 2
Choose an
Approach
Leverage other
disciplines and
intuition
Is building a
model the first
thing you should
do?
Credit: Sam Shah
The g(l)ory of data
science: most of
the work is here
Phase 3
Extract and
Clean Data
ddd
ddd
In the trenches, dirty jobs, porta-potty Vs Luxury, rocket science, fast cars
Health Care
EHR is not
designed for
data extraction
On the frontier,
but still difficult
to do agile data
Grab better/new logos
For most
problems, a wheel
has already been
invented…
…just recognize
the wheel!
Example: missing
charges on bill
Phase 4
Model
Building
Always use
workhorse
models first
Online advertising: logistic regression in production at Yahoo for a long time
Skills universe
Skills universe
Health Care
Networked data
also common
Agile Data
dd
Focus on quick solutions to identify bogeys and get feedback Think like Eric Ries Photo of sand trap?
Deployment and
execution of
predictive models
is crucial
Iteration is key,
especially in an
agile analytics
framework
Phase 5
Deploy,
Learn, Iterate
Subscriber churn
prevention emails
Health Care
Population health
management &
quality of care
Build a viewer
app
Picture of viewmaster
Who is good at this stuff?
Well that’s great but who is going to do all of that work?
Just as physicists moved to
Wall Street to be quants and
then on to online advertising
and consumer web, there will
be a significant talent
migration into health care in
the next few years.
But huge opportunities
But huge opportunities
One of the fundamental
problems of our time
18% of GDP! 0.01% is giant
revenue potential
Data availability and
richness only increasing
The right people are
realizing data and data
science are core to the
solution.
Take-aways
Data science is industry-agnostic
There are many challenges, but this is just the
beginning.
There are many challenges, but this is just the
beginning.
EHR data extraction and
updates difficult
Implementation barriers
Nothing scales
Privacy issues
Data aggregation difficult
Not all hospitals are
Stanford, Vanderbilt, etc.
What can we do
about these
challenges?
What can we do
about these
challenges?
Daily/hourly decision
support?
Communicate value
of data mining to
patients
SMART, roll-your-own
EHRs
Thank you! (we’re hiring)
Scott
Nicholson
bit.ly/accretive-data-science-job
@scootrous
snicholson@
accretivehealth.com
lnkd.in/scott