sds podcast episode 37 with harpreet singh · experfy, are very good both for clients and for data...

31
SDS PODCAST EPISODE 37 WITH HARPREET SINGH

Upload: others

Post on 03-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

SDS PODCAST

EPISODE 37

WITH

HARPREET SINGH

Page 2: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

Kirill: This is episode number 37 with Founder and Co-CEO of

Experfy Harpreet Singh.

(background music plays)

Welcome to the SuperDataScience podcast. My name is Kirill

Eremenko, data science coach and lifestyle entrepreneur.

And each week we bring you inspiring people and ideas to

help you build your successful career in data science.

Thanks for being here today and now let’s make the complex

simple.

(background music plays)

Welcome to the SuperDataScience podcast. Super excited to

have you on board, and today we've got a very interesting

guest. Today we've got the Founder and Co-CEO of Experfy

Harpreet Singh. So what you need to know about Experfy is

this is a huge online marketplace for data science. So

basically, companies come along to Experfy to post their

problems, their challenges that they're facing that can be

solved, or that they think can be solved, with data science.

And then data scientists actually bid for those projects to

participate or to solve those projects. And at Experfy they

have a total of a staggering 30,000 data scientists. And so

how do they have so many data scientists? Well, because it

is a marketplace where anybody can come and apply to be

part of this marketplace. So basically, you could go to

Experfy, submit an application, and become a data scientist

that has the opportunity to bid for these projects, to

participate in these amazing projects that are changing the

world.

Page 3: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

So in this podcast, you'll get to know more about Experfy

and how they operate, and also you'll get a good overview of

what other services they offer which are some interesting

ones, such as education, and Harpreet will actually make a

first time public announcement about a new project that

they launched. Plus, in this podcast, I could not resist the

temptation to use this opportunity to actually ask Harpreet

about all these applications of data science, machine

learning, analytics, deep learning, to real world projects. So

in this podcast, we're actually going to go over four real

world case studies of how data science has been applied to

different industries.

We'll talk about industries such as marketing in medicine,

predicting insurance fraud, prognostic analytics, and the

Internet of Things. So this is a podcast you definitely don't

want to miss. Buckle up for a fun ride. We're going to talk

about so many different applications of data science and

you're definitely going to have a lot of takeaways from today.

And without further ado, I bring to you my good friend,

Founder and Co-CEO of Experfy, Harpreet Singh.

(background music plays)

Hello everybody and welcome to the SuperDataScience

podcast. Today I've got a very special guest, a good friend of

mine, Harpreet Singh, calling in from Boston. How are you,

Harpreet, today?

Harpreet: I'm very well, Kirill. How are you doing?

Kirill: I'm doing great as well, especially having you on this show.

Harpreet is the Founder and Co-CEO of Experfy, a huge

online learning platform, and not just learning, it's a huge

Page 4: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

data science platform launched through the Harvard

Innovation Lab. So this is going to be a very exciting

podcast, especially for those of you looking to break into the

space of data science or get some education or get some

experience in data science. Super excited about this.

Harpreet, how are you feeling about the podcast?

Harpreet: I'm very excited to be speaking with you.

Kirill: Awesome. Thank you so much. Alright, to get us started,

could you give us a bit of an overview of Experfy? What is

Experfy? What do you guys do?

Harpreet: Yeah, so Experfy is a platform where we have curated a very

large number of data scientists for on-demand consulting

and training. We have 30,000 data scientists, perhaps the

largest platform in the world, where companies can come to

us and seek experts for various use cases that they're

working on. Also companies can leverage the same

practitioners to upskill their own workers, their own

professionals within their firms. So there's a very interesting

dynamic going on, but if you look at the macro trend, there

is a growing scarcity of data science talent. And it's only

going to get worse, and companies are realising that and

they want to equip themselves with their own in-house staff

so that they don't have to rely on outside consultants. So

training is also a very important area for us, that we are

fulfilling a need in a very different way than the traditional

companies out there.

Kirill: Gotcha. That's very interesting. In all of that, I have so many

questions. Probably the first one is—30,000 data scientists.

I’m assuming they don’t all work in the same building. How

Page 5: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

did you build up that capability? Where are these people

located? How are they connected and how did this all come

to be?

Harpreet: You know, marketplaces are extremely hard to start because

you have a chicken and egg problem. Unless you have the

demand, you don’t get the supply and unless you have the

supply, you don’t get the demand. So getting that started

was quite hard. We were lucky, however, that we started

three years ago. We were first to market. We got some very

good media coverage in the beginning with TechCrunch,

Forbes, Mashable, Wall Street Journal and the like. That

kind of propelled us in the limelight. And because we were

the only consulting platform, many data scientists decided

to join us. And once the projects started flowing in—you

know, marketplaces are like a machine, they kind of work

themselves—and we’ve been growing since. The supply is

growing very nicely, and the demand is also growing because

there is a real need out there.

Kirill: So, to understand it better, it’s basically a marketplace

where a company can come in and post their data science

problem and then data scientists come in and bid on who is

going to be solving it and then they build a relationship and

that’s how it goes from there. Is that about right?

Harpreet: Yes. However, there is a high-touch aspect to the service we

provide because unlike other disciplines or other

marketplaces, data science is quite complex as a field and

the problems can also be very complex. And every problem is

so unique because the data that a company possesses, the

Page 6: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

format that data may be in, and other systems that that

data interacts with or comes out of is also quite unique.

So we provide an account management team that specializes

in data science in various verticals. So, if you are coming

from oil and gas or retail, we have an account manager for

you that understands that industry and then works with

you to articulate that use case and translate that into a

project description.

Once that project description has been articulated, then we

put it on the platform and we have an algorithm that looks

at who are the best matches for this project, and then those

people are invited to come in to provide a proposal. Even

though these are all bids, it’s never the cheapest or the most

cost-effective resource that wins. It’s always the person

that’s most qualified. So, you’ll see rates ranging from $100

all the way to $300-$400 on our platform.

Kirill: Per hour?

Harpreet: Yeah, per hour. U.S. Dollars, yes. But that’s still quite a

bargain because if you’re going to go to a Big Four

professional services firm, or if you go to a larger consulting

firm, I guess the cost is much greater there and could be

running to six or seven figures. Whereas on Experfy, a proof

of concept on average costs $10,000-$20,000.

Kirill: Yeah, I can totally agree with that. I attest to that, having

worked at a Big Four consulting firm. I worked at Deloitte

and the fees, of course, are much greater. On the other

hand, what Experfy charges, or the fees that are available on

Experfy, are very good both for clients and for data

scientists. So somebody working in that space of data

Page 7: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

science, being an individual data scientist, having an

opportunity to make $100-$400 an hour, that’s a very, very

good price, especially for a freelance type of work when

you’re not really committed to any consulting firm or

company. With that in mind, can data scientists listening to

this podcast somehow get onto Experfy and become part of

this talent pool of 30,000 that you have currently?

Harpreet: Absolutely. We are always looking to expand our pool of

experts. It’s very simple: you go to experfy.com and you sign

up. There’s an application process you have to go through.

You fill out the application, we pull in your LinkedIn profile

as well so that you don’t have to do a lot of hard work, and

basically then we review the application and see if you are a

good fit for the platform.

Kirill: That’s very interesting. And what determines a good fit so

that people listening to this podcast can be prepared or

maybe start thinking in the right direction? What is deemed

a good fit? Maybe number of years of experience, or a

different variety of toolset? What are the things that you look

out for the most?

Harpreet: Data science is something that you can’t just learn part-

time. It requires years of education, you know, some

quantitative education, not necessarily data science

education. For example, you may be someone who studied

theoretical physics and that kind of person deals with a lot

of data and would make a terrific data scientist. So, we look

for relevant education and we also look for relevant

experience. You know, in the application it’s very good to

talk about the kind of use cases you may have worked on.

Page 8: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

So, the tools are not as important as the actual ability to

work with large amounts of data or to think analytically.

Kirill: Okay, gotcha. And speaking of education, you guys have

your own educational platform and I’m proud to say that I

have a course published on Experfy, so that was a very

interesting start to our relationship and I’m very excited

about that. I can see people who are taking this course and

are excited to learn data science. So, with that, tell us a bit

more about your educational platform. How many courses

do you have? Who is it tailored towards and what are the

volumes of students coming through right now?

Harpreet: I want to preface that, that your course is a terrific one and

it’s really something that people are taking quite a bit and

we see a lot of enrolments and people are really benefitting

from that Tableau course on visualization.

Kirill: Thank you.

Harpreet: Maybe I can take a step back and tell you the genesis of this

platform and how it began. You know, we started as a

consulting marketplace, and we’ve been talking about that

briefly, but while we were providing this consulting, we

noticed that a lot of companies were coming to us and

posting projects related to training.

For example, University of California Davis came in and

posted a project that they wanted to launch a data science

program and they were looking for experts. This was two

years ago. And then many Fortune 500s were also struggling

to find subject matter experts. For example, someone came

to us and said, “I need someone who can teach supply chain

optimization” or “I need someone who can teach how do you

Page 9: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

analyse certain kind of health care data.” Those kind of

courses are not available anywhere, not even on the MOOCs.

The MOOCs are a great place to learn for the sake of

learning, to build that foundational knowledge. And they’re

providing a very important function because much of the

education is free and you can really learn the basics of

something.

But as you want to progress into something that is more

industry specific, something that requires understanding of

a domain and the use cases within that, then you really

have to learn from someone who is working in the trenches,

someone who is actually doing that every day. And the

reason for that is that these technologies are changing so

rapidly that an academic cannot help you in understanding

that kind of content.

So we find ourselves in a very good place because we have

access to the best thought leaders in the industry, they’re on

the platform consulting, and we are able to also look at

which use cases are hot, which use cases are actually being

requested in the consulting context. So, we can combine the

thought leadership of our experts and also the project-based

work we’re doing and say, “Okay, these are the projects.” For

example, in the context of media and advertising or retail,

there are use cases like recommender systems that every

retailer wants to have. So every retailer is trying to build the

recommender system that may look like a Netflix

recommendations or what Amazon is doing.

We’ve executed dozens of such projects so when we think

about creating a course, we are seeing where the trends are

Page 10: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

in the retail industry and we are building a retail track for

retail companies so we know which courses are important

even though the retail managers themselves may not know.

Or the Chief Learning Officer at a large retailer is a

generalist, so that Chief Learning Officer isn’t really aware

what kind of courses they should be offering to their

employees. They are thinking in a broad sense of, “I want to

facilitate digital transformation of my company so I should

look at data science, big data,” but they don’t really know

what to offer.

So we can then go into our library of projects we are

performing and make recommendations. And often we see

ourselves co-creating these courses with our industry

partners. That’s what makes us very unique. You know, we

are more focused on the B2B model than B2C, so we are

partnering with companies like Duracell and we’ve done

some text analytics training recently for the Federal Reserve

Bank of San Francisco. We’ve even had some of our experts

fly into India to present a training program for the executives

at Tata Teleservices, which is one of the largest telecom

companies in India.

So if you’re looking for training in emerging technologies,

like Internet of Things, certain types of industry analytics,

then we’re a much better venue than others that exist out

there because we have the courses.

Kirill: Gotcha. That’s interesting that you mentioned it because

that was my next question: How Experfy actually differs to

platforms out there like Udemy and Coursera and so on,

that offer either free or near to free training? That’s a great

Page 11: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

answer. Like, those marketplaces have merits, they definitely

have advantages and they teach you the broad spectrum of

data science and the skills that you want to learn. But with

Experfy it sounds like you guys are doing something

completely different, where you’re going into what’s exactly

happening in the industry right now in these specific use

cases, and then from there you’re extracting the right

knowledge, you’re finding the right instructors to create that

content and offer it to your clients so that they can get

upskilled in a very laser specific way in what they need.

With that, you mentioned you mostly deal with B2B clients.

We have about 10% of our listeners who either own their

business or are entrepreneurs, and they should definitely

check out Experfy if they are looking to upskill themselves or

their team in data science. But for the majority of our

listeners, is there still an option for people to take these very

interesting courses if they are just a client, if they’re not a

business?

Harpreet: Yeah, absolutely. We are an online platform and all the

courses are available online. It’s as simple as finding the

course you like, or a learning path for that matter, and just

clicking on the “enrol” button and enrol in that course.

When we think about our go-to-market strategy as

entrepreneurs or as a business, we are selling primarily to

our business clients in a B2B fashion. But there is still a

very large population of students who are enrolling in the

courses who are just consumers.

We have, for example, the University of Alberta in Canada.

They’re having their students enrol in our data science

Page 12: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

certification program, so we have a certification program

which is five courses and the first course on probability and

statistics using R is taught by a Harvard professor, Michael

Parzen and Kaitlin Hagan. Kaitlin is at Harvard Medical

School and Michael Parzen is at the Harvard University,

Harvard College. He’s been teaching this content for 30

years, so it’s fantastic for folks to learn from them.

And then there’s a course on data wrangling using R, and

that course is taught by Connie Brett. She was the founder

of Analytics Incubation Center at Cisco. And then there’s

econometrics course taught by Alan Yang, who is a professor

at Columbia University. And then there are others from the

industry, from Target and other major corporations who are

teaching in that track.

So we are trying to develop these certification tracks or

learning tracks so that you can say, “Okay, I want to become

a fraud and risk analyst, a data scientist who specializes in

fraud and risk or a data scientist who specializes in retail

analytics,” and then we will provide a pathway to take five or

six courses, or perhaps even more, that leads you to that

qualification. So there’s a lot of interest in upskilling

employees among companies. So we are taking this very

specific approach of how do you get someone going from the

basics all the way to a practitioner in a specific use case.

Kirill: Gotcha. That’s very interesting. I just wanted to comment

that it’s very cool how a university outsources their main

function of teaching students. They outsource it to you guys.

Instead of teaching them at the University of Alberta, they

send them to you to upskill them on certain topics. I imagine

Page 13: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

that’s just the way of them recognizing that some certain

skills are so cutting edge that they just can’t keep up with

the university curriculum.

And in terms of your comment on the certification tracks, I

think that’s just fantastic. That’s not something you see

often in many places. For instance, Coursera has

certification tracks, but they’re like just data science. They’re

very general certification tracks, like a specific skillset for

data science, a certain industry whether it’s fraud analytics,

or it could be predictive analytics, or certain retail or

industry sector. I think that’s very valuable. And do you

guys provide, upon completion of these certification tracks—

a question that a lot of MOOCs get—do you provide a

certificate of completion that people can show off or show to

their employers and so on?

Harpreet: Yes, absolutely. We do exactly what Coursera and others

may do. You’ll get a certificate of completion that’s generated

by our systems and you can attach it to your LinkedIn

profile, the same way you would attach other certificates.

And we haven’t announced this yet, this is the first time I’m

actually talking about this publicly, that we are launching

an assessment platform as well. This assessment platform

will focus on different types of skillset, so anyone who hasn’t

even taken a course on Experfy could go and take an

assessment and we will then validate this person has certain

skills.

Again, our target here is more of a B2B market where

companies, or the HR departments, are struggling to

understand whether someone is a qualified data scientist so

Page 14: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

we are giving them a lot of tools to say, “Okay, you are hiring

someone who understands R and Python in a role where

they’re going to be doing insurance analytics, for example.

So how do you validate that this person knows R and Python

in the context of insurance analytics and also has some of

the other skills that you may desire, like understanding of

Hadoop and Spark and Scala?” So we are focused on

building these test banks that will be incredibly useful to not

only the industry, but also to individuals who can come on

to Experfy and then take these assessments.

Kirill: Fantastic. I just want to preface my answer with, everybody

listening to this, did you hear that? It’s the first time this

information is available publicly! I am so proud that it’s been

announced on this podcast. That’s the first time this has

ever happened, that this podcast is being used as a source

to get information out there into the world, so thank you for

that, Harpreet.

Yeah, assessment platform—I can totally see where you’re

coming from. It is such a needed thing. I get questions all

the time, like, “Hey, I have these skills. I’ve taken these

courses. I’ve done this type of work, but how do I prove to

employers that I have this knowledge, that I’m ready?” And

you get this from passionate people who want to make a

difference in the world, but their main barrier is the fact that

their skills, even though they’re very strong when you

actually speak to them and they know they’re very strong,

other people, employers can’t see that. And I think this

assessment platform—congratulations on that—I think

that’s one of the first, if not the first in the world. So I’m very

Page 15: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

excited for you guys. I’ll definitely check it out when it’s

ready. It sounds like a very, very big and exciting thing.

Harpreet: Yeah, thank you.

Kirill: I have so many questions. I could keep going and talking on

about Experfy for much, much longer, just drilling into

what’s going on there and how you guys are doing things,

but I would like to actually also talk about something else,

Harpreet, about some of the very interesting case studies

that you are sharing, about the successes that Experfy is

having. For example, you’ve posted close to a dozen articles

on LinkedIn about different successes of Experfy. I’ve had a

look through them and found them very interesting and

fascinating, the way you apply data science to different

projects and different industries. Are you happy to talk us

through a few of those?

Harpreet: Absolutely. It would be my pleasure.

Kirill: Okay, awesome. How about we start with your most recent

one, the most recent one just published like a week ago, or

two weeks ago? Artificial intelligence for marketing mix

models in the pharmaceutical sector reducing cost and

boosting sales. I’m just going to read out a couple of figures

from here. The pharmaceutical industry is over $30 billion.

Over $30 billion is spent on pharmaceuticals annually. This

is from your article. Basically it’s all about the fact that this

is a huge global industry, and therefore it provides access to

lots of markets for pharmaceutical companies, but at the

same time it’s highly, highly competitive and you need to

have effective marketing there. Otherwise you’ll end up

spending so much money on marketing instead of the actual

Page 16: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

product. And this isn’t a high margin product like with

online products. This is a physical product that is tangible,

that needs to be shipped, that needs to go places and that

people actually need. So you can’t afford to spend too much

on marketing. And therefore a lot of responsibility is on data

science to optimize that. What were the challenges,

opportunities, and what solutions did you guys come up

with at Experfy?

Harpreet: Yeah, this is a very interesting use case. As you mentioned,

$30 billion are spent on the marketing of these drugs alone.

There’s additional expense like R&D and others, but we’re

just talking once you’ve got a drug that’s been approved,

how do you get it out the door? So, you have to influence the

physicians, and you have to influence others out there to

prescribe your drug—you know, the patients want to see

them, you see these infomercials on television, so it’s tricky

business.

So the way we’ve thought about this problem is that it’s all

about having access to good data. You know, what we are

after is, what are these pharma companies spending? So,

once a drug is launched, a pharma company may spend over

a billion dollars to market that drug, so if they can be more

judicious, they can save lots of money, hundreds of millions

of dollars, if they are judicious in how they’re spending, and

if they are able to track the ROI, what is being effective and

what is not. So it is possible today to track the sales of these

drugs on a zip code level. You know, there are these

providers who are capturing that data and then

extrapolating it to say, “Okay, this is how much this drug

sold in this week.” And then there are other ways.

Page 17: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

You know, some drugs are renewed, so you’re looking at

renewals as well, and then you’re looking at fresh

prescriptions as well, and they’re all tracked as individual

line items for each zip code. So if one can isolate the

marketing for each of these regions and say, “Okay, I had a

conference in this region,” or “I actually ran television ads

and radio ads,” or even “I had Google ads or ads on

WebMD,” all of that can be captured, one can then create a

marketing mix model against the sales. So you can have a

control group where in one adjacent zip code or a different

region altogether, you don’t do certain activities.

For example, in a zip code you may have a sales rep going to

a doctor and doing these lunch conferences where they’re

trying to educate the doctors by doing lunch and learn sort

of activities, and then in a different region altogether, you

don’t do those things, and then you try to compare what

exactly is the difference in terms of sales, in terms of

adoption.

By creating these kinds of control groups and by looking at

the data of the sales and the spend, one can then begin to

model the spending. What we’ve done is we’ve been able to

create machine learning models where you can say, “I’m

going to spend this much money on radio, this much on

television, this much on Facebook ads, and then predict how

much sales that’s going to generate, that kind of a mix.” And

surprisingly, these models become more and more accurate

as you feed more data into them. So there’s a lot of benefit to

the pharma companies as a result.

Page 18: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

Kirill: Fantastic! That’s a very good description, and I like the term

“marketing mix model.” So, guys, it sounds like that term is

going to be picking up in the future, so that was a good

overview of that as well. Okay, thank you for that. And now

I’d like to move on to a case study that is very close to my

heart. It was so cool reading this. I actually shared it around

on LinkedIn last week and a lot of my students actually

responded the same way. It’s called “The Internet of Things

and Prognostic Analytics for Predictive Maintenance in

Control Systems.”

So what this talks about is that you have huge companies—

well, let’s start with the basics. We have sensors everywhere,

right? For instance, an iPhone, you might think it has four

or five, but it actually has close to 30 sensors. And that’s

like sensors about geolocation, about the gyroscope, it’s got

some sensors for audio coming in or light sensors, and so

on, so close to 30 sensors. And that’s just an iPhone.

Everything around us is slowly getting covered with sensors,

and when you connect sensors to other devices all around

the Internet, that becomes the Internet of Things, and by

2020 we’re predicted to have—and this is from another one

of your articles—we’re predicted to have about 50 billion

things connected to the Internet of Things. That’s more than

the number of people that we’re going to have on the planet

at the time.

So this specific case study which you wrote about talks

about using this inter-hyperconnectedness of things to run

prognostic analytics, and that specifically means

maintenance and improving efficiency of control systems in,

Page 19: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

for instance, large power plants or airlines or large

machinery. And you quote some interesting numbers.

For instance, just a 1% increase in efficiency of control in

airlines, and therefore prognostic analytics, can lead to a

cost saving between $2 to $3 billion; in utilities, $4 to $5

billion; in oil and gas companies, $5 to $7 billion; $4 to $5

billion in health care, and $1 to $2 billion in the transport

sector. And I’m assuming this is, for instance, if you have an

airplane and you’re running all these analytics, you don’t

have to wait for something, even for your data to show that

there’s a problem. Running prognostic analytics, you can see

that this performance is dropping. It’s still above average, it’s

still good performance, but it’s dropping. You can see the

trend in which it’s going, and therefore you can predict

basically that something is going to happen and it’s going to

need maintenance, and you can account for that

maintenance early on. Can you walk us a bit more through

this case study, please?

Harpreet: Yeah, absolutely. As you mentioned, a lot of the heavy

industry machinery uses control systems. These control

systems generate tons and tons of data. This has been

happening for 10, 20, 30, 40 years. This is not something

recent. The control systems, by definition, they are storing

that data and that data then goes into some black hole and

it’s never used. So there is a huge opportunity here for heavy

manufacturers. For example, Siemens happens to be one of

the manufacturers of control systems. This is a very highly

fragmented market. Siemens probably has 10%-12% of the

market share, so there are many others like that.

Page 20: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

So if somehow we can take the data from these control

systems, the data that’s being generated as the machine

works, if we can take that and build some streaming

pipelines into the Cloud, whether they go to AWS or

somewhere else, maybe even a private cloud if people are not

happy with a public cloud, then we can look at this data for

anomalies. We can start analysing this data for preventive

maintenance and for other things.

As you pointed out in these numbers, how much can you

save if you just improved efficiency by just 1%, right? I

mean, these numbers are staggering. And the way to think

about this is, if you are in a power plant and your machine

fails, someone from Siemens has to get on a plane from a

different city, bring that part to your plant, and replace that

part. So that is all cost, someone had to rush over there to

do this job.

But if we start doing prognostic analytics—and I want to

differentiate prognostic analytics from predictive analytics in

a sense that predictive analytics tells us that something is

going to fail, you know, that “I’m going to predict that this

part is going to fail some time in the near future,” whereas

prognostic analytics tells us that something is going to fail in

the next two weeks or in the next ten days. So there is

almost a time dimension to prognostic analytics that isn’t so

accentuated in predictive analytics.

And how many times has it happened where we’re trying to

take a flight and something goes wrong with the aircraft and

then we’re sitting there until someone comes and changes

that part or fixes that issue? So, all of that, again, can be

Page 21: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

avoided if we are making use of the data that the aircraft has

been collecting, but no one is actually making use of that

today.

So somehow, if we can start building these streaming

pipelines, and if we can start taking the data and start

building preventive maintenance use cases, it can be a huge

saving to everyone. Obviously, as passengers in the airline

context, airlines may pass that onto us and lower airfares.

So I think there is a value chain here that gets impacted as

we start to do more of this sort of analytics.

Kirill: Thank you for that. That’s a great overview. I was actually

after that definition or distinguishing terminology from you

about prognostic versus predictive, and that’s a very good

description, that prognostic actually has a time dimension to

it. Alright, that was awesome. I hope people are picking up

some value from these.

And we’re moving on to case study number three: using big

data to prevent health insurance fraud. Very interesting

space. And as we learned from one of our earlier podcasts, I

think it was podcast #5 with Dmitry Korneev, fraud is

actually a huge industry. You don’t hear about data science

and analytics in fraud that much, it’s not a huge focus, but

especially in the U.S., where the legal system is such that a

lot of companies are unfortunately in a lot of lawsuits with

other companies, the space of fraud analytics is huge,

specifically here—we’re talking about health care.

Some numbers that you’ve mentioned is that the National

Health Care Antifraud Association estimates that the

country has fraud costs of $68 billion annually. That’s 3% of

Page 22: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

the whole health care spending, which is about $2.26

trillion. Some people will be interested to know I was

actually very surprised to know that the health care industry

is so large. $2.26 trillion! That’s 18% of the GDP of the

U.S.A. It’s a huge number. So, please, tell us a bit more

about fraud analytics in the health insurance space.

Harpreet: Again, this is a very valuable use case, fraud analytics, when

it comes to health insurance fraud. The challenge that most

insurance companies are facing is that the laws of the U.S.

are such that if someone were to submit a medical billing

claim to a health insurer, they have no choice but to pay it

within a certain time duration. You know, it’s like two days

or three days, and if the claim is not paid, then the insurer

is liable and they can be fined.

For that reason, the claims are paid like clockwork. As they

come in, they’re paid. So one has to get to a point where you

can start predicting fraud in real time for this to be valuable.

So, you know, there are a number of ways in which this can

be done, the data that is being gathered. Unfortunately,

today the way a lot of these claims are paid is through

paperwork. It’s a paper intensive activity. So, the first

challenge is how do you—

Kirill: —convert that to digital.

Harpreet: Exactly, so the digitization. A lot of progress has been made

in recent years, and I’m sure we will eventually get there.

And then the second question becomes—once you’ve got

that, then how do you start modelling for fraud and what are

the characteristics of fraud that you’re looking at? And as

you start developing—here, one thing that we’ve learned

Page 23: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

through our consulting practice is that the better training

data you have for a specific use case, the better algorithm

you are going to build.

So, because there is such a high volume of fraud, and

because this is such a big market, it is certainly possible to

create these training datasets that are very helpful. And then

you can do feature engineering and you can then start

looking at which features are the most useful. You know, the

features may differ if I’m trying to prevent fraud for dental

insurance versus health insurance. We’re currently working

on a very exciting project to detect fraud in the life insurance

sector, and that’s even more challenging.

But it’s certainly doable because you don’t have to predict

everything 100%. You can say that if I can predict with 70%

confidence that this is fraud, then at least someone can take

a look and say, “Let me take these additional three steps to

find out what happened, or request more information on this

particular claim.” That’s the opportunity here, that we don’t

have to build models that are 100% accurate. We can still

build models that are useful and then there is some human

intervention to get more information before a claim is paid

out.

Kirill: Okay. That is definitely going to be useful. Again, it’s such a

huge industry. It’s just mind-blowing that $68 billion—

whoever solves that problem, that’s a multibillion dollar

analytics company waiting to be created right there. So

thank you again for that overview. And I’m just looking at

the number of different case studies that you have so kindly

shared with everybody. To be honest, I’m getting torn apart.

Page 24: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

We’ve done three, and we definitely have time for at least one

more. What I would like to suggest is, if you could, could you

choose the best one? What would you like to talk about?

What is, in your view, one of the most successful

breakthroughs that you guys have had at Experfy, and if you

can share that with us?

Harpreet: Yes, I mean, there are a lot of very exciting things we are

doing in the IoT space and that doesn’t get talked about

enough. We had a very interesting project that we embarked

on with Gulf Oil, which has their gas stations. This was Gulf

Oil out of Mexico, their franchise there, and they had a

wonderful idea of how do you differentiate yourself from

other similar businesses. One way is that, if you are a full-

service gas station, then you have to add more value. How

do you do that? We started with that question.

The way we work on it at Experfy is that generally, when

there’s a big question, we start with a road map of some

kind of a visioning exercise. So someone who’s done this sort

of thing before will sit down with the client and see what

does the road map look like, and what does the ROI look like

once we are done with that road map.

We thought it was a huge customer analytics opportunity

that if you could somehow, using IoT, identify who the

customer is as they drive into the gas station—and there are

a number of ways of doing that—you can use computer

vision or image analysis to look at the license plate of the

car. Or you can install beacons in these gas stations, and in

your mobile app, or the Gulf app, you would have the

identity of the person who’s just driven into the gas station.

Page 25: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

And now you can say, “Oh, by the way, the gas price is $3 a

gallon, but because you’re such a loyal customer, because

you’ve been here twice already this week, we’re going to

lower the price for you to $2.75 a gallon.

And then you could say, “By the way, this person also buys

coffee from the convenience store every time so they can be

given that while they’re in their car,” because you already

have the pattern of spending. Similarly, in economies like

Mexico where this experiment is going on, there is this need

for prepaid cards and things like—if you want to send a

package through courier, often the gas stations end up being

the location where the courier services are also installed. So

a lot of these value added services like prepaid cards and

other things can be added. You know, folks don’t have

printers in their homes, so you could even have a way to

print things and the gas station attendant on their app can

provide these value added services and bill the customer

seamlessly without accepting any cash and it all happens

electronically.

Those are the kinds of things that we’re doing on Experfy,

and they have the potential to really reimagine how work

gets done in these industries that are so boring and they

haven’t changed in a hundred years. And thanks to IoT and

analytics, we are going to start seeing a shift where new

models of doing business emerge. We are very excited to be

an enabler in this space.

Kirill: Wow! That’s fantastic! That’s such an interesting case study

of personalizing services through data science and not just

data science, but machine learning, deep learning, you

Page 26: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

mentioned computer vision, image recognition, facial or

number plate recognition. That is the full suite of analytics

at play. So, thank you so much for that. These case studies

are so useful because they broaden people’s horizons on

what can be done with analytics, on how much power

analytics has, and data science has, and machine learning

has, and how it’s becoming more and more embedded into

all of these different industries.

Thank you so much for sharing that. I’ve got a couple of

questions leading towards the end of this podcast. First one

I’d like to ask you is what would you say is the secret sauce

for being a data scientist? I don’t usually ask this question,

but you have seen so many data scientists come in to

Experfy, so many people looking for data science skills, and

you’ve educated so many data scientists. You’ve influenced

so many data scientists. What would you say is the secret to

becoming successful in data science?

Harpreet: I guess the secret is to be someone who is able to ask a lot of

questions, form a lot of hypotheses, not start with one

particular solution or approach. The way I look at it, data

science is really about asking many hypotheses and then

validating or invalidating those hypotheses. And then you

come to some kernel of truth that can then be helpful in that

business. I guess the best data scientists that I know are the

ones that are not married to one approach, that are always

looking for answers to a broad range of questions that apply

to a particular problem.

And the second thing I would say is that domain expertise is

really important. If you’re a data scientist, it’s not a good

Page 27: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

idea to be a jack of all trades. It’s much better to embrace

one industry and develop a fair amount of domain expertise

in that industry so that you can have a greater impact in

that industry. I think those are the two things that come to

mind.

Kirill: Fantastic. Thank you so much. That’s very good advice. So,

make sure you’re asking the right questions and you’re

open-minded to all of the things that are coming your way,

and pick an industry and start to specialize to build that

influence so people know you as the best data scientist in

that specific industry or space.

And the other interesting question I had as well, which I’d

really be curious to get your opinion on, is from where you

sit, from all the things that you see going on in the space of

data science, where do you think this field is going? What

should our listeners prepare for to be ready for the data

science of 2020? Or the data science of 2025? What would

you recommend for them?

Harpreet: This field is changing so rapidly that it would be a fool’s

errand to make many predictions. But one thing is for sure.

You know, there is a lot of automation going on, we have a

lot of tools that are being developed, and this is going to be a

very exciting space and it’s going to impact every industry.

And the industries that are going to see the most change are

the ones that have the best data or the richness of data, so

those we will see evolving much faster than the others.

And if you are in such an industry, then I think it’s a very

good idea to embrace analytics. Even if you’re not a data

scientist, even if you’re a manager, understanding how one

Page 28: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

can become data driven and how processes can benefit from

different types of analyses is really important. You know,

making sure that the company has some kind of a data

strategy to capture the right data is another important

consideration because companies that are not going to do

that are frankly not going to be very competitive. They

probably won’t even exist in the next 5-10 years. It’s a bold

sort of assumption, but if we look at how many Fortune 500

companies exist from the last century, let’s say 1950s, I

would say at least 30 or 40 have disappeared. I think

companies that do take data science seriously are the ones

that are going to stick around.

Kirill: Yeah, I totally agree with you. That’s some very interesting

advice and overview of what to expect. And you’re totally

right, it’s evolving so quickly. It’s hard to make very

definitive predictions, but it’s very interesting, what you said

about automation and that managers should also look into

data science. And I totally agree with you that there is even

some predictions that out of the Fortune 500 companies,

over half of them will disappear in the next decade just

because of what’s happening in the space of data science, so

it’s a huge disruptor as well as an enabler for companies.

Thank you so much, Harpreet, for coming on the show and

sharing all your insights. How can our listeners follow you or

contact you or get more access to all of these—I don’t have a

better word for it—bombs of knowledge that you’re sharing?

You know, you just write an article and you open up a whole

new world of how data science is being applied. What’s the

best way for our listeners to follow you?

Page 29: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

Harpreet: Well, there are over 200 projects that are listed on Experfy

and you can look at them in quite a bit of detail in terms of

the description of these projects. So you can go to

experfy.com and you can find me on Twitter @hsingh and we

can connect there as well.

Kirill: Okay, beautiful. Thank you so much. Guys, definitely check

those out, check out Experfy and connect with Harpreet on

Twitter. And one final question I have for you today: What is

your one favourite book that you can recommend for our

data scientists to become better at what they do?

Harpreet: This is a tough question. I’m a voracious reader and I read a

lot. One book does come to mind, thinking about the

audience. There is a book by Eric Siegel called “Predictive

Analytics: The Power to Predict Who Will Click, Buy, Lie, or

Die.” It’s a funny title, but what Siegel is doing, he is

bringing to life the power of predictive analytics in the

context of marketing. It’s a fascinating read, even if you’re

not a specialist.

Kirill: Okay, beautiful. Thank you. We’ve already had somebody

recommend that book on the podcast previously, so Eric

Siegel, “The Power to Predict Who Will Click, Buy, Lie, or

Die.” Once again, thank you so much, Harpreet. It has been

a pleasure having you on the show to learn all of this

amazing knowledge that you have to share. Thank you

again.

Harpreet: Thank you, Kirill, for having me. Take care.

Kirill: So there you have it. The amount of knowledge and practical

examples of data science application Harpreet shared with

us today is immense. I mean, in just that one hour that we

Page 30: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

had today, we’ve covered so many different applications from

marketing and pharmaceuticals to insurance fraud to

Internet of Things to prognostic analytics, which I like so

much. I think it’s a huge space and there’s a lot of

disruption that can happen in prognostic analytics. Sensors

are really dominating the world, but not that many

companies are leveraging them to their full potential, so that

is always going to be a space where you can add value.

And my favourite part of the podcast is perhaps what

Harpreet mentioned about their upcoming assessment

platform linked to Experfy. It’s definitely something that is

needed in the space of data science and it’s very cool to see

that they are pioneering this feature, they’re pioneering this

new edition where you will be able to go to Experfy and just

tell them about your skills, submit your application, perhaps

pass some sort of assessment tests and get your skills

verified by Experfy so then you can take it to employers, you

can take it to different companies to show that you do have

these data science skills. Because a lot of the time we are

learning data science, we are educating ourselves, and that’s

what it’s all about. It’s not about that piece of paper that you

get at university. Sometimes you want to go to university

and get the knowledge and go through the experience. But

sometimes you just want to learn online. And having a way

to verify your knowledge is going to be very, very valuable

and I hope that more and more companies are going to start

doing that and following Experfy’s example.

So, there we go. That was Harpreet Singh from Experfy.

Definitely go check out Experfy, and if you have some free

time, you want to do some freelancing work, or you just

Page 31: SDS PODCAST EPISODE 37 WITH HARPREET SINGH · Experfy, are very good both for clients and for data scientists. So somebody working in that space of data . science, being an individual

want to try yourself out in the marketplace of data science

and you think you have the skills and you have what it

takes, then submit an application to Experfy and become

one of their data scientists in their marketplace.

Also check out the courses on Experfy, some very valuable

courses. You can also find my Tableau course there and

maybe other ones as well. And also make sure to follow

Harpreet on Twitter so that you can get updates about his

articles as well as updates about what’s going on at Experfy.

And as usual, all of the links, resources and show notes are

available at www.superdatascience.com/37. And one more

thing for today. If you are enjoying these sessions, if you like

this podcast, then we would really appreciate if you could log

onto iTunes and leave us a rating or review. That would

really help us propel the podcast forward and bring it to

more people. And on that note, thank you so much for being

here, for sharing this time, for taking an hour out of your

day to listen, to talk about data science with Harpreet. I

can’t wait to see you next time. Until then, happy analysing.