sds podcast episode 37 with harpreet singh · experfy, are very good both for clients and for data...
Post on 03-Jun-2020
1 Views
Preview:
TRANSCRIPT
Kirill: This is episode number 37 with Founder and Co-CEO of
Experfy Harpreet Singh.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur.
And each week we bring you inspiring people and ideas to
help you build your successful career in data science.
Thanks for being here today and now let’s make the complex
simple.
(background music plays)
Welcome to the SuperDataScience podcast. Super excited to
have you on board, and today we've got a very interesting
guest. Today we've got the Founder and Co-CEO of Experfy
Harpreet Singh. So what you need to know about Experfy is
this is a huge online marketplace for data science. So
basically, companies come along to Experfy to post their
problems, their challenges that they're facing that can be
solved, or that they think can be solved, with data science.
And then data scientists actually bid for those projects to
participate or to solve those projects. And at Experfy they
have a total of a staggering 30,000 data scientists. And so
how do they have so many data scientists? Well, because it
is a marketplace where anybody can come and apply to be
part of this marketplace. So basically, you could go to
Experfy, submit an application, and become a data scientist
that has the opportunity to bid for these projects, to
participate in these amazing projects that are changing the
world.
So in this podcast, you'll get to know more about Experfy
and how they operate, and also you'll get a good overview of
what other services they offer which are some interesting
ones, such as education, and Harpreet will actually make a
first time public announcement about a new project that
they launched. Plus, in this podcast, I could not resist the
temptation to use this opportunity to actually ask Harpreet
about all these applications of data science, machine
learning, analytics, deep learning, to real world projects. So
in this podcast, we're actually going to go over four real
world case studies of how data science has been applied to
different industries.
We'll talk about industries such as marketing in medicine,
predicting insurance fraud, prognostic analytics, and the
Internet of Things. So this is a podcast you definitely don't
want to miss. Buckle up for a fun ride. We're going to talk
about so many different applications of data science and
you're definitely going to have a lot of takeaways from today.
And without further ado, I bring to you my good friend,
Founder and Co-CEO of Experfy, Harpreet Singh.
(background music plays)
Hello everybody and welcome to the SuperDataScience
podcast. Today I've got a very special guest, a good friend of
mine, Harpreet Singh, calling in from Boston. How are you,
Harpreet, today?
Harpreet: I'm very well, Kirill. How are you doing?
Kirill: I'm doing great as well, especially having you on this show.
Harpreet is the Founder and Co-CEO of Experfy, a huge
online learning platform, and not just learning, it's a huge
data science platform launched through the Harvard
Innovation Lab. So this is going to be a very exciting
podcast, especially for those of you looking to break into the
space of data science or get some education or get some
experience in data science. Super excited about this.
Harpreet, how are you feeling about the podcast?
Harpreet: I'm very excited to be speaking with you.
Kirill: Awesome. Thank you so much. Alright, to get us started,
could you give us a bit of an overview of Experfy? What is
Experfy? What do you guys do?
Harpreet: Yeah, so Experfy is a platform where we have curated a very
large number of data scientists for on-demand consulting
and training. We have 30,000 data scientists, perhaps the
largest platform in the world, where companies can come to
us and seek experts for various use cases that they're
working on. Also companies can leverage the same
practitioners to upskill their own workers, their own
professionals within their firms. So there's a very interesting
dynamic going on, but if you look at the macro trend, there
is a growing scarcity of data science talent. And it's only
going to get worse, and companies are realising that and
they want to equip themselves with their own in-house staff
so that they don't have to rely on outside consultants. So
training is also a very important area for us, that we are
fulfilling a need in a very different way than the traditional
companies out there.
Kirill: Gotcha. That's very interesting. In all of that, I have so many
questions. Probably the first one is—30,000 data scientists.
I’m assuming they don’t all work in the same building. How
did you build up that capability? Where are these people
located? How are they connected and how did this all come
to be?
Harpreet: You know, marketplaces are extremely hard to start because
you have a chicken and egg problem. Unless you have the
demand, you don’t get the supply and unless you have the
supply, you don’t get the demand. So getting that started
was quite hard. We were lucky, however, that we started
three years ago. We were first to market. We got some very
good media coverage in the beginning with TechCrunch,
Forbes, Mashable, Wall Street Journal and the like. That
kind of propelled us in the limelight. And because we were
the only consulting platform, many data scientists decided
to join us. And once the projects started flowing in—you
know, marketplaces are like a machine, they kind of work
themselves—and we’ve been growing since. The supply is
growing very nicely, and the demand is also growing because
there is a real need out there.
Kirill: So, to understand it better, it’s basically a marketplace
where a company can come in and post their data science
problem and then data scientists come in and bid on who is
going to be solving it and then they build a relationship and
that’s how it goes from there. Is that about right?
Harpreet: Yes. However, there is a high-touch aspect to the service we
provide because unlike other disciplines or other
marketplaces, data science is quite complex as a field and
the problems can also be very complex. And every problem is
so unique because the data that a company possesses, the
format that data may be in, and other systems that that
data interacts with or comes out of is also quite unique.
So we provide an account management team that specializes
in data science in various verticals. So, if you are coming
from oil and gas or retail, we have an account manager for
you that understands that industry and then works with
you to articulate that use case and translate that into a
project description.
Once that project description has been articulated, then we
put it on the platform and we have an algorithm that looks
at who are the best matches for this project, and then those
people are invited to come in to provide a proposal. Even
though these are all bids, it’s never the cheapest or the most
cost-effective resource that wins. It’s always the person
that’s most qualified. So, you’ll see rates ranging from $100
all the way to $300-$400 on our platform.
Kirill: Per hour?
Harpreet: Yeah, per hour. U.S. Dollars, yes. But that’s still quite a
bargain because if you’re going to go to a Big Four
professional services firm, or if you go to a larger consulting
firm, I guess the cost is much greater there and could be
running to six or seven figures. Whereas on Experfy, a proof
of concept on average costs $10,000-$20,000.
Kirill: Yeah, I can totally agree with that. I attest to that, having
worked at a Big Four consulting firm. I worked at Deloitte
and the fees, of course, are much greater. On the other
hand, what Experfy charges, or the fees that are available on
Experfy, are very good both for clients and for data
scientists. So somebody working in that space of data
science, being an individual data scientist, having an
opportunity to make $100-$400 an hour, that’s a very, very
good price, especially for a freelance type of work when
you’re not really committed to any consulting firm or
company. With that in mind, can data scientists listening to
this podcast somehow get onto Experfy and become part of
this talent pool of 30,000 that you have currently?
Harpreet: Absolutely. We are always looking to expand our pool of
experts. It’s very simple: you go to experfy.com and you sign
up. There’s an application process you have to go through.
You fill out the application, we pull in your LinkedIn profile
as well so that you don’t have to do a lot of hard work, and
basically then we review the application and see if you are a
good fit for the platform.
Kirill: That’s very interesting. And what determines a good fit so
that people listening to this podcast can be prepared or
maybe start thinking in the right direction? What is deemed
a good fit? Maybe number of years of experience, or a
different variety of toolset? What are the things that you look
out for the most?
Harpreet: Data science is something that you can’t just learn part-
time. It requires years of education, you know, some
quantitative education, not necessarily data science
education. For example, you may be someone who studied
theoretical physics and that kind of person deals with a lot
of data and would make a terrific data scientist. So, we look
for relevant education and we also look for relevant
experience. You know, in the application it’s very good to
talk about the kind of use cases you may have worked on.
So, the tools are not as important as the actual ability to
work with large amounts of data or to think analytically.
Kirill: Okay, gotcha. And speaking of education, you guys have
your own educational platform and I’m proud to say that I
have a course published on Experfy, so that was a very
interesting start to our relationship and I’m very excited
about that. I can see people who are taking this course and
are excited to learn data science. So, with that, tell us a bit
more about your educational platform. How many courses
do you have? Who is it tailored towards and what are the
volumes of students coming through right now?
Harpreet: I want to preface that, that your course is a terrific one and
it’s really something that people are taking quite a bit and
we see a lot of enrolments and people are really benefitting
from that Tableau course on visualization.
Kirill: Thank you.
Harpreet: Maybe I can take a step back and tell you the genesis of this
platform and how it began. You know, we started as a
consulting marketplace, and we’ve been talking about that
briefly, but while we were providing this consulting, we
noticed that a lot of companies were coming to us and
posting projects related to training.
For example, University of California Davis came in and
posted a project that they wanted to launch a data science
program and they were looking for experts. This was two
years ago. And then many Fortune 500s were also struggling
to find subject matter experts. For example, someone came
to us and said, “I need someone who can teach supply chain
optimization” or “I need someone who can teach how do you
analyse certain kind of health care data.” Those kind of
courses are not available anywhere, not even on the MOOCs.
The MOOCs are a great place to learn for the sake of
learning, to build that foundational knowledge. And they’re
providing a very important function because much of the
education is free and you can really learn the basics of
something.
But as you want to progress into something that is more
industry specific, something that requires understanding of
a domain and the use cases within that, then you really
have to learn from someone who is working in the trenches,
someone who is actually doing that every day. And the
reason for that is that these technologies are changing so
rapidly that an academic cannot help you in understanding
that kind of content.
So we find ourselves in a very good place because we have
access to the best thought leaders in the industry, they’re on
the platform consulting, and we are able to also look at
which use cases are hot, which use cases are actually being
requested in the consulting context. So, we can combine the
thought leadership of our experts and also the project-based
work we’re doing and say, “Okay, these are the projects.” For
example, in the context of media and advertising or retail,
there are use cases like recommender systems that every
retailer wants to have. So every retailer is trying to build the
recommender system that may look like a Netflix
recommendations or what Amazon is doing.
We’ve executed dozens of such projects so when we think
about creating a course, we are seeing where the trends are
in the retail industry and we are building a retail track for
retail companies so we know which courses are important
even though the retail managers themselves may not know.
Or the Chief Learning Officer at a large retailer is a
generalist, so that Chief Learning Officer isn’t really aware
what kind of courses they should be offering to their
employees. They are thinking in a broad sense of, “I want to
facilitate digital transformation of my company so I should
look at data science, big data,” but they don’t really know
what to offer.
So we can then go into our library of projects we are
performing and make recommendations. And often we see
ourselves co-creating these courses with our industry
partners. That’s what makes us very unique. You know, we
are more focused on the B2B model than B2C, so we are
partnering with companies like Duracell and we’ve done
some text analytics training recently for the Federal Reserve
Bank of San Francisco. We’ve even had some of our experts
fly into India to present a training program for the executives
at Tata Teleservices, which is one of the largest telecom
companies in India.
So if you’re looking for training in emerging technologies,
like Internet of Things, certain types of industry analytics,
then we’re a much better venue than others that exist out
there because we have the courses.
Kirill: Gotcha. That’s interesting that you mentioned it because
that was my next question: How Experfy actually differs to
platforms out there like Udemy and Coursera and so on,
that offer either free or near to free training? That’s a great
answer. Like, those marketplaces have merits, they definitely
have advantages and they teach you the broad spectrum of
data science and the skills that you want to learn. But with
Experfy it sounds like you guys are doing something
completely different, where you’re going into what’s exactly
happening in the industry right now in these specific use
cases, and then from there you’re extracting the right
knowledge, you’re finding the right instructors to create that
content and offer it to your clients so that they can get
upskilled in a very laser specific way in what they need.
With that, you mentioned you mostly deal with B2B clients.
We have about 10% of our listeners who either own their
business or are entrepreneurs, and they should definitely
check out Experfy if they are looking to upskill themselves or
their team in data science. But for the majority of our
listeners, is there still an option for people to take these very
interesting courses if they are just a client, if they’re not a
business?
Harpreet: Yeah, absolutely. We are an online platform and all the
courses are available online. It’s as simple as finding the
course you like, or a learning path for that matter, and just
clicking on the “enrol” button and enrol in that course.
When we think about our go-to-market strategy as
entrepreneurs or as a business, we are selling primarily to
our business clients in a B2B fashion. But there is still a
very large population of students who are enrolling in the
courses who are just consumers.
We have, for example, the University of Alberta in Canada.
They’re having their students enrol in our data science
certification program, so we have a certification program
which is five courses and the first course on probability and
statistics using R is taught by a Harvard professor, Michael
Parzen and Kaitlin Hagan. Kaitlin is at Harvard Medical
School and Michael Parzen is at the Harvard University,
Harvard College. He’s been teaching this content for 30
years, so it’s fantastic for folks to learn from them.
And then there’s a course on data wrangling using R, and
that course is taught by Connie Brett. She was the founder
of Analytics Incubation Center at Cisco. And then there’s
econometrics course taught by Alan Yang, who is a professor
at Columbia University. And then there are others from the
industry, from Target and other major corporations who are
teaching in that track.
So we are trying to develop these certification tracks or
learning tracks so that you can say, “Okay, I want to become
a fraud and risk analyst, a data scientist who specializes in
fraud and risk or a data scientist who specializes in retail
analytics,” and then we will provide a pathway to take five or
six courses, or perhaps even more, that leads you to that
qualification. So there’s a lot of interest in upskilling
employees among companies. So we are taking this very
specific approach of how do you get someone going from the
basics all the way to a practitioner in a specific use case.
Kirill: Gotcha. That’s very interesting. I just wanted to comment
that it’s very cool how a university outsources their main
function of teaching students. They outsource it to you guys.
Instead of teaching them at the University of Alberta, they
send them to you to upskill them on certain topics. I imagine
that’s just the way of them recognizing that some certain
skills are so cutting edge that they just can’t keep up with
the university curriculum.
And in terms of your comment on the certification tracks, I
think that’s just fantastic. That’s not something you see
often in many places. For instance, Coursera has
certification tracks, but they’re like just data science. They’re
very general certification tracks, like a specific skillset for
data science, a certain industry whether it’s fraud analytics,
or it could be predictive analytics, or certain retail or
industry sector. I think that’s very valuable. And do you
guys provide, upon completion of these certification tracks—
a question that a lot of MOOCs get—do you provide a
certificate of completion that people can show off or show to
their employers and so on?
Harpreet: Yes, absolutely. We do exactly what Coursera and others
may do. You’ll get a certificate of completion that’s generated
by our systems and you can attach it to your LinkedIn
profile, the same way you would attach other certificates.
And we haven’t announced this yet, this is the first time I’m
actually talking about this publicly, that we are launching
an assessment platform as well. This assessment platform
will focus on different types of skillset, so anyone who hasn’t
even taken a course on Experfy could go and take an
assessment and we will then validate this person has certain
skills.
Again, our target here is more of a B2B market where
companies, or the HR departments, are struggling to
understand whether someone is a qualified data scientist so
we are giving them a lot of tools to say, “Okay, you are hiring
someone who understands R and Python in a role where
they’re going to be doing insurance analytics, for example.
So how do you validate that this person knows R and Python
in the context of insurance analytics and also has some of
the other skills that you may desire, like understanding of
Hadoop and Spark and Scala?” So we are focused on
building these test banks that will be incredibly useful to not
only the industry, but also to individuals who can come on
to Experfy and then take these assessments.
Kirill: Fantastic. I just want to preface my answer with, everybody
listening to this, did you hear that? It’s the first time this
information is available publicly! I am so proud that it’s been
announced on this podcast. That’s the first time this has
ever happened, that this podcast is being used as a source
to get information out there into the world, so thank you for
that, Harpreet.
Yeah, assessment platform—I can totally see where you’re
coming from. It is such a needed thing. I get questions all
the time, like, “Hey, I have these skills. I’ve taken these
courses. I’ve done this type of work, but how do I prove to
employers that I have this knowledge, that I’m ready?” And
you get this from passionate people who want to make a
difference in the world, but their main barrier is the fact that
their skills, even though they’re very strong when you
actually speak to them and they know they’re very strong,
other people, employers can’t see that. And I think this
assessment platform—congratulations on that—I think
that’s one of the first, if not the first in the world. So I’m very
excited for you guys. I’ll definitely check it out when it’s
ready. It sounds like a very, very big and exciting thing.
Harpreet: Yeah, thank you.
Kirill: I have so many questions. I could keep going and talking on
about Experfy for much, much longer, just drilling into
what’s going on there and how you guys are doing things,
but I would like to actually also talk about something else,
Harpreet, about some of the very interesting case studies
that you are sharing, about the successes that Experfy is
having. For example, you’ve posted close to a dozen articles
on LinkedIn about different successes of Experfy. I’ve had a
look through them and found them very interesting and
fascinating, the way you apply data science to different
projects and different industries. Are you happy to talk us
through a few of those?
Harpreet: Absolutely. It would be my pleasure.
Kirill: Okay, awesome. How about we start with your most recent
one, the most recent one just published like a week ago, or
two weeks ago? Artificial intelligence for marketing mix
models in the pharmaceutical sector reducing cost and
boosting sales. I’m just going to read out a couple of figures
from here. The pharmaceutical industry is over $30 billion.
Over $30 billion is spent on pharmaceuticals annually. This
is from your article. Basically it’s all about the fact that this
is a huge global industry, and therefore it provides access to
lots of markets for pharmaceutical companies, but at the
same time it’s highly, highly competitive and you need to
have effective marketing there. Otherwise you’ll end up
spending so much money on marketing instead of the actual
product. And this isn’t a high margin product like with
online products. This is a physical product that is tangible,
that needs to be shipped, that needs to go places and that
people actually need. So you can’t afford to spend too much
on marketing. And therefore a lot of responsibility is on data
science to optimize that. What were the challenges,
opportunities, and what solutions did you guys come up
with at Experfy?
Harpreet: Yeah, this is a very interesting use case. As you mentioned,
$30 billion are spent on the marketing of these drugs alone.
There’s additional expense like R&D and others, but we’re
just talking once you’ve got a drug that’s been approved,
how do you get it out the door? So, you have to influence the
physicians, and you have to influence others out there to
prescribe your drug—you know, the patients want to see
them, you see these infomercials on television, so it’s tricky
business.
So the way we’ve thought about this problem is that it’s all
about having access to good data. You know, what we are
after is, what are these pharma companies spending? So,
once a drug is launched, a pharma company may spend over
a billion dollars to market that drug, so if they can be more
judicious, they can save lots of money, hundreds of millions
of dollars, if they are judicious in how they’re spending, and
if they are able to track the ROI, what is being effective and
what is not. So it is possible today to track the sales of these
drugs on a zip code level. You know, there are these
providers who are capturing that data and then
extrapolating it to say, “Okay, this is how much this drug
sold in this week.” And then there are other ways.
You know, some drugs are renewed, so you’re looking at
renewals as well, and then you’re looking at fresh
prescriptions as well, and they’re all tracked as individual
line items for each zip code. So if one can isolate the
marketing for each of these regions and say, “Okay, I had a
conference in this region,” or “I actually ran television ads
and radio ads,” or even “I had Google ads or ads on
WebMD,” all of that can be captured, one can then create a
marketing mix model against the sales. So you can have a
control group where in one adjacent zip code or a different
region altogether, you don’t do certain activities.
For example, in a zip code you may have a sales rep going to
a doctor and doing these lunch conferences where they’re
trying to educate the doctors by doing lunch and learn sort
of activities, and then in a different region altogether, you
don’t do those things, and then you try to compare what
exactly is the difference in terms of sales, in terms of
adoption.
By creating these kinds of control groups and by looking at
the data of the sales and the spend, one can then begin to
model the spending. What we’ve done is we’ve been able to
create machine learning models where you can say, “I’m
going to spend this much money on radio, this much on
television, this much on Facebook ads, and then predict how
much sales that’s going to generate, that kind of a mix.” And
surprisingly, these models become more and more accurate
as you feed more data into them. So there’s a lot of benefit to
the pharma companies as a result.
Kirill: Fantastic! That’s a very good description, and I like the term
“marketing mix model.” So, guys, it sounds like that term is
going to be picking up in the future, so that was a good
overview of that as well. Okay, thank you for that. And now
I’d like to move on to a case study that is very close to my
heart. It was so cool reading this. I actually shared it around
on LinkedIn last week and a lot of my students actually
responded the same way. It’s called “The Internet of Things
and Prognostic Analytics for Predictive Maintenance in
Control Systems.”
So what this talks about is that you have huge companies—
well, let’s start with the basics. We have sensors everywhere,
right? For instance, an iPhone, you might think it has four
or five, but it actually has close to 30 sensors. And that’s
like sensors about geolocation, about the gyroscope, it’s got
some sensors for audio coming in or light sensors, and so
on, so close to 30 sensors. And that’s just an iPhone.
Everything around us is slowly getting covered with sensors,
and when you connect sensors to other devices all around
the Internet, that becomes the Internet of Things, and by
2020 we’re predicted to have—and this is from another one
of your articles—we’re predicted to have about 50 billion
things connected to the Internet of Things. That’s more than
the number of people that we’re going to have on the planet
at the time.
So this specific case study which you wrote about talks
about using this inter-hyperconnectedness of things to run
prognostic analytics, and that specifically means
maintenance and improving efficiency of control systems in,
for instance, large power plants or airlines or large
machinery. And you quote some interesting numbers.
For instance, just a 1% increase in efficiency of control in
airlines, and therefore prognostic analytics, can lead to a
cost saving between $2 to $3 billion; in utilities, $4 to $5
billion; in oil and gas companies, $5 to $7 billion; $4 to $5
billion in health care, and $1 to $2 billion in the transport
sector. And I’m assuming this is, for instance, if you have an
airplane and you’re running all these analytics, you don’t
have to wait for something, even for your data to show that
there’s a problem. Running prognostic analytics, you can see
that this performance is dropping. It’s still above average, it’s
still good performance, but it’s dropping. You can see the
trend in which it’s going, and therefore you can predict
basically that something is going to happen and it’s going to
need maintenance, and you can account for that
maintenance early on. Can you walk us a bit more through
this case study, please?
Harpreet: Yeah, absolutely. As you mentioned, a lot of the heavy
industry machinery uses control systems. These control
systems generate tons and tons of data. This has been
happening for 10, 20, 30, 40 years. This is not something
recent. The control systems, by definition, they are storing
that data and that data then goes into some black hole and
it’s never used. So there is a huge opportunity here for heavy
manufacturers. For example, Siemens happens to be one of
the manufacturers of control systems. This is a very highly
fragmented market. Siemens probably has 10%-12% of the
market share, so there are many others like that.
So if somehow we can take the data from these control
systems, the data that’s being generated as the machine
works, if we can take that and build some streaming
pipelines into the Cloud, whether they go to AWS or
somewhere else, maybe even a private cloud if people are not
happy with a public cloud, then we can look at this data for
anomalies. We can start analysing this data for preventive
maintenance and for other things.
As you pointed out in these numbers, how much can you
save if you just improved efficiency by just 1%, right? I
mean, these numbers are staggering. And the way to think
about this is, if you are in a power plant and your machine
fails, someone from Siemens has to get on a plane from a
different city, bring that part to your plant, and replace that
part. So that is all cost, someone had to rush over there to
do this job.
But if we start doing prognostic analytics—and I want to
differentiate prognostic analytics from predictive analytics in
a sense that predictive analytics tells us that something is
going to fail, you know, that “I’m going to predict that this
part is going to fail some time in the near future,” whereas
prognostic analytics tells us that something is going to fail in
the next two weeks or in the next ten days. So there is
almost a time dimension to prognostic analytics that isn’t so
accentuated in predictive analytics.
And how many times has it happened where we’re trying to
take a flight and something goes wrong with the aircraft and
then we’re sitting there until someone comes and changes
that part or fixes that issue? So, all of that, again, can be
avoided if we are making use of the data that the aircraft has
been collecting, but no one is actually making use of that
today.
So somehow, if we can start building these streaming
pipelines, and if we can start taking the data and start
building preventive maintenance use cases, it can be a huge
saving to everyone. Obviously, as passengers in the airline
context, airlines may pass that onto us and lower airfares.
So I think there is a value chain here that gets impacted as
we start to do more of this sort of analytics.
Kirill: Thank you for that. That’s a great overview. I was actually
after that definition or distinguishing terminology from you
about prognostic versus predictive, and that’s a very good
description, that prognostic actually has a time dimension to
it. Alright, that was awesome. I hope people are picking up
some value from these.
And we’re moving on to case study number three: using big
data to prevent health insurance fraud. Very interesting
space. And as we learned from one of our earlier podcasts, I
think it was podcast #5 with Dmitry Korneev, fraud is
actually a huge industry. You don’t hear about data science
and analytics in fraud that much, it’s not a huge focus, but
especially in the U.S., where the legal system is such that a
lot of companies are unfortunately in a lot of lawsuits with
other companies, the space of fraud analytics is huge,
specifically here—we’re talking about health care.
Some numbers that you’ve mentioned is that the National
Health Care Antifraud Association estimates that the
country has fraud costs of $68 billion annually. That’s 3% of
the whole health care spending, which is about $2.26
trillion. Some people will be interested to know I was
actually very surprised to know that the health care industry
is so large. $2.26 trillion! That’s 18% of the GDP of the
U.S.A. It’s a huge number. So, please, tell us a bit more
about fraud analytics in the health insurance space.
Harpreet: Again, this is a very valuable use case, fraud analytics, when
it comes to health insurance fraud. The challenge that most
insurance companies are facing is that the laws of the U.S.
are such that if someone were to submit a medical billing
claim to a health insurer, they have no choice but to pay it
within a certain time duration. You know, it’s like two days
or three days, and if the claim is not paid, then the insurer
is liable and they can be fined.
For that reason, the claims are paid like clockwork. As they
come in, they’re paid. So one has to get to a point where you
can start predicting fraud in real time for this to be valuable.
So, you know, there are a number of ways in which this can
be done, the data that is being gathered. Unfortunately,
today the way a lot of these claims are paid is through
paperwork. It’s a paper intensive activity. So, the first
challenge is how do you—
Kirill: —convert that to digital.
Harpreet: Exactly, so the digitization. A lot of progress has been made
in recent years, and I’m sure we will eventually get there.
And then the second question becomes—once you’ve got
that, then how do you start modelling for fraud and what are
the characteristics of fraud that you’re looking at? And as
you start developing—here, one thing that we’ve learned
through our consulting practice is that the better training
data you have for a specific use case, the better algorithm
you are going to build.
So, because there is such a high volume of fraud, and
because this is such a big market, it is certainly possible to
create these training datasets that are very helpful. And then
you can do feature engineering and you can then start
looking at which features are the most useful. You know, the
features may differ if I’m trying to prevent fraud for dental
insurance versus health insurance. We’re currently working
on a very exciting project to detect fraud in the life insurance
sector, and that’s even more challenging.
But it’s certainly doable because you don’t have to predict
everything 100%. You can say that if I can predict with 70%
confidence that this is fraud, then at least someone can take
a look and say, “Let me take these additional three steps to
find out what happened, or request more information on this
particular claim.” That’s the opportunity here, that we don’t
have to build models that are 100% accurate. We can still
build models that are useful and then there is some human
intervention to get more information before a claim is paid
out.
Kirill: Okay. That is definitely going to be useful. Again, it’s such a
huge industry. It’s just mind-blowing that $68 billion—
whoever solves that problem, that’s a multibillion dollar
analytics company waiting to be created right there. So
thank you again for that overview. And I’m just looking at
the number of different case studies that you have so kindly
shared with everybody. To be honest, I’m getting torn apart.
We’ve done three, and we definitely have time for at least one
more. What I would like to suggest is, if you could, could you
choose the best one? What would you like to talk about?
What is, in your view, one of the most successful
breakthroughs that you guys have had at Experfy, and if you
can share that with us?
Harpreet: Yes, I mean, there are a lot of very exciting things we are
doing in the IoT space and that doesn’t get talked about
enough. We had a very interesting project that we embarked
on with Gulf Oil, which has their gas stations. This was Gulf
Oil out of Mexico, their franchise there, and they had a
wonderful idea of how do you differentiate yourself from
other similar businesses. One way is that, if you are a full-
service gas station, then you have to add more value. How
do you do that? We started with that question.
The way we work on it at Experfy is that generally, when
there’s a big question, we start with a road map of some
kind of a visioning exercise. So someone who’s done this sort
of thing before will sit down with the client and see what
does the road map look like, and what does the ROI look like
once we are done with that road map.
We thought it was a huge customer analytics opportunity
that if you could somehow, using IoT, identify who the
customer is as they drive into the gas station—and there are
a number of ways of doing that—you can use computer
vision or image analysis to look at the license plate of the
car. Or you can install beacons in these gas stations, and in
your mobile app, or the Gulf app, you would have the
identity of the person who’s just driven into the gas station.
And now you can say, “Oh, by the way, the gas price is $3 a
gallon, but because you’re such a loyal customer, because
you’ve been here twice already this week, we’re going to
lower the price for you to $2.75 a gallon.
And then you could say, “By the way, this person also buys
coffee from the convenience store every time so they can be
given that while they’re in their car,” because you already
have the pattern of spending. Similarly, in economies like
Mexico where this experiment is going on, there is this need
for prepaid cards and things like—if you want to send a
package through courier, often the gas stations end up being
the location where the courier services are also installed. So
a lot of these value added services like prepaid cards and
other things can be added. You know, folks don’t have
printers in their homes, so you could even have a way to
print things and the gas station attendant on their app can
provide these value added services and bill the customer
seamlessly without accepting any cash and it all happens
electronically.
Those are the kinds of things that we’re doing on Experfy,
and they have the potential to really reimagine how work
gets done in these industries that are so boring and they
haven’t changed in a hundred years. And thanks to IoT and
analytics, we are going to start seeing a shift where new
models of doing business emerge. We are very excited to be
an enabler in this space.
Kirill: Wow! That’s fantastic! That’s such an interesting case study
of personalizing services through data science and not just
data science, but machine learning, deep learning, you
mentioned computer vision, image recognition, facial or
number plate recognition. That is the full suite of analytics
at play. So, thank you so much for that. These case studies
are so useful because they broaden people’s horizons on
what can be done with analytics, on how much power
analytics has, and data science has, and machine learning
has, and how it’s becoming more and more embedded into
all of these different industries.
Thank you so much for sharing that. I’ve got a couple of
questions leading towards the end of this podcast. First one
I’d like to ask you is what would you say is the secret sauce
for being a data scientist? I don’t usually ask this question,
but you have seen so many data scientists come in to
Experfy, so many people looking for data science skills, and
you’ve educated so many data scientists. You’ve influenced
so many data scientists. What would you say is the secret to
becoming successful in data science?
Harpreet: I guess the secret is to be someone who is able to ask a lot of
questions, form a lot of hypotheses, not start with one
particular solution or approach. The way I look at it, data
science is really about asking many hypotheses and then
validating or invalidating those hypotheses. And then you
come to some kernel of truth that can then be helpful in that
business. I guess the best data scientists that I know are the
ones that are not married to one approach, that are always
looking for answers to a broad range of questions that apply
to a particular problem.
And the second thing I would say is that domain expertise is
really important. If you’re a data scientist, it’s not a good
idea to be a jack of all trades. It’s much better to embrace
one industry and develop a fair amount of domain expertise
in that industry so that you can have a greater impact in
that industry. I think those are the two things that come to
mind.
Kirill: Fantastic. Thank you so much. That’s very good advice. So,
make sure you’re asking the right questions and you’re
open-minded to all of the things that are coming your way,
and pick an industry and start to specialize to build that
influence so people know you as the best data scientist in
that specific industry or space.
And the other interesting question I had as well, which I’d
really be curious to get your opinion on, is from where you
sit, from all the things that you see going on in the space of
data science, where do you think this field is going? What
should our listeners prepare for to be ready for the data
science of 2020? Or the data science of 2025? What would
you recommend for them?
Harpreet: This field is changing so rapidly that it would be a fool’s
errand to make many predictions. But one thing is for sure.
You know, there is a lot of automation going on, we have a
lot of tools that are being developed, and this is going to be a
very exciting space and it’s going to impact every industry.
And the industries that are going to see the most change are
the ones that have the best data or the richness of data, so
those we will see evolving much faster than the others.
And if you are in such an industry, then I think it’s a very
good idea to embrace analytics. Even if you’re not a data
scientist, even if you’re a manager, understanding how one
can become data driven and how processes can benefit from
different types of analyses is really important. You know,
making sure that the company has some kind of a data
strategy to capture the right data is another important
consideration because companies that are not going to do
that are frankly not going to be very competitive. They
probably won’t even exist in the next 5-10 years. It’s a bold
sort of assumption, but if we look at how many Fortune 500
companies exist from the last century, let’s say 1950s, I
would say at least 30 or 40 have disappeared. I think
companies that do take data science seriously are the ones
that are going to stick around.
Kirill: Yeah, I totally agree with you. That’s some very interesting
advice and overview of what to expect. And you’re totally
right, it’s evolving so quickly. It’s hard to make very
definitive predictions, but it’s very interesting, what you said
about automation and that managers should also look into
data science. And I totally agree with you that there is even
some predictions that out of the Fortune 500 companies,
over half of them will disappear in the next decade just
because of what’s happening in the space of data science, so
it’s a huge disruptor as well as an enabler for companies.
Thank you so much, Harpreet, for coming on the show and
sharing all your insights. How can our listeners follow you or
contact you or get more access to all of these—I don’t have a
better word for it—bombs of knowledge that you’re sharing?
You know, you just write an article and you open up a whole
new world of how data science is being applied. What’s the
best way for our listeners to follow you?
Harpreet: Well, there are over 200 projects that are listed on Experfy
and you can look at them in quite a bit of detail in terms of
the description of these projects. So you can go to
experfy.com and you can find me on Twitter @hsingh and we
can connect there as well.
Kirill: Okay, beautiful. Thank you so much. Guys, definitely check
those out, check out Experfy and connect with Harpreet on
Twitter. And one final question I have for you today: What is
your one favourite book that you can recommend for our
data scientists to become better at what they do?
Harpreet: This is a tough question. I’m a voracious reader and I read a
lot. One book does come to mind, thinking about the
audience. There is a book by Eric Siegel called “Predictive
Analytics: The Power to Predict Who Will Click, Buy, Lie, or
Die.” It’s a funny title, but what Siegel is doing, he is
bringing to life the power of predictive analytics in the
context of marketing. It’s a fascinating read, even if you’re
not a specialist.
Kirill: Okay, beautiful. Thank you. We’ve already had somebody
recommend that book on the podcast previously, so Eric
Siegel, “The Power to Predict Who Will Click, Buy, Lie, or
Die.” Once again, thank you so much, Harpreet. It has been
a pleasure having you on the show to learn all of this
amazing knowledge that you have to share. Thank you
again.
Harpreet: Thank you, Kirill, for having me. Take care.
Kirill: So there you have it. The amount of knowledge and practical
examples of data science application Harpreet shared with
us today is immense. I mean, in just that one hour that we
had today, we’ve covered so many different applications from
marketing and pharmaceuticals to insurance fraud to
Internet of Things to prognostic analytics, which I like so
much. I think it’s a huge space and there’s a lot of
disruption that can happen in prognostic analytics. Sensors
are really dominating the world, but not that many
companies are leveraging them to their full potential, so that
is always going to be a space where you can add value.
And my favourite part of the podcast is perhaps what
Harpreet mentioned about their upcoming assessment
platform linked to Experfy. It’s definitely something that is
needed in the space of data science and it’s very cool to see
that they are pioneering this feature, they’re pioneering this
new edition where you will be able to go to Experfy and just
tell them about your skills, submit your application, perhaps
pass some sort of assessment tests and get your skills
verified by Experfy so then you can take it to employers, you
can take it to different companies to show that you do have
these data science skills. Because a lot of the time we are
learning data science, we are educating ourselves, and that’s
what it’s all about. It’s not about that piece of paper that you
get at university. Sometimes you want to go to university
and get the knowledge and go through the experience. But
sometimes you just want to learn online. And having a way
to verify your knowledge is going to be very, very valuable
and I hope that more and more companies are going to start
doing that and following Experfy’s example.
So, there we go. That was Harpreet Singh from Experfy.
Definitely go check out Experfy, and if you have some free
time, you want to do some freelancing work, or you just
want to try yourself out in the marketplace of data science
and you think you have the skills and you have what it
takes, then submit an application to Experfy and become
one of their data scientists in their marketplace.
Also check out the courses on Experfy, some very valuable
courses. You can also find my Tableau course there and
maybe other ones as well. And also make sure to follow
Harpreet on Twitter so that you can get updates about his
articles as well as updates about what’s going on at Experfy.
And as usual, all of the links, resources and show notes are
available at www.superdatascience.com/37. And one more
thing for today. If you are enjoying these sessions, if you like
this podcast, then we would really appreciate if you could log
onto iTunes and leave us a rating or review. That would
really help us propel the podcast forward and bring it to
more people. And on that note, thank you so much for being
here, for sharing this time, for taking an hour out of your
day to listen, to talk about data science with Harpreet. I
can’t wait to see you next time. Until then, happy analysing.
top related