sds podcast episode 427: impacting through …

41
Show Notes: http://www.superdatascience.com/427 1 SDS PODCAST EPISODE 427: IMPACTING THROUGH TECHNOLOGY

Upload: others

Post on 07-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 1

SDS PODCAST

EPISODE 427:

IMPACTING

THROUGH

TECHNOLOGY

Page 2: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 2

Kirill Eremenko: 00:00:00 This is episode number 427 with VP of Data Science at

Gojek, Syafri Bahar.

Kirill Eremenko: 00:00:12 Welcome to the SuperDataScience Podcast. My name is

Kirill Eremenko, Data Science Coach and Lifestyle

Entrepreneur, and each week we bring you inspiring

people and ideas to help you build your successful career

in Data Science. Thanks for being here today, and now

let's make the complex simple.

Kirill Eremenko: 00:00:44 Welcome back to the SuperDataScience Podcast

everybody, super excited to have you back here on the

show. This episode is incredibly fun and cool. Today we

had the VP of data science from Gojek join us on the

episode. If you're from Southeast Asia, you have probably

heard of Gojek and actually very likely used it.

Kirill Eremenko: 00:01:10 But in case you are not from Southeast Asia or you

haven't heard about Gojek, this is a huge company. It is

valued at $10 billion as of today. It's had extreme rapid

growth and it is a super app. It is one app inside which

you can get 20 different services from ride-sharing, to

shopping, to food delivery, to insurance, to cleaning, to

even hair styling. How cool is that?

Kirill Eremenko: 00:01:42 The app serves millions of people across Indonesia,

Vietnam, Singapore, and Thailand. And they're growing

extremely fast. They have been growing extremely fast,

they continue to grow extremely fast.

Kirill Eremenko: 00:01:57 And today we had the pleasure of speaking with the VP of

Data Science from there, Syafri Bahar. And before I

continue onto what this episode is all about and what we

talked about, I wanted to say why I keep saying, "we," we

spoke with Syafri. Because today we have a second host,

Jon Krohn joined me as a co-host on this episode. You

may remember Jon from episode 365 in May this year.

Page 3: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 3

Kirill Eremenko: 00:02:25 And the reason why Jon is joining, there's something

super exciting coming up in 2021 as an exciting change.

Jon is actually going to be... I'll give you a heads up now

without going into too much detail. We'll talk about it. I'll

announce it more in the coming episodes, but Jon will be

taking over as host of this show.

Kirill Eremenko: 00:02:47 I know that might come as a surprise. It's the first time

I've mentioned this publicly, but it's going to be super

fun, it's going to be an amazing time. And we won't talk

about this too much right now and not detract from the

episode, we'll get into that in a future episode. But in this

episode we decided to co-host and talk with Syafri

together, and it turned out really fun. We had a lot of

laughs and I'm sure you will join us with them, with those

laughs.

Kirill Eremenko: 00:03:16 And so what did we speak about today with Syafri? Well,

we talked about Gojek and the impact it's having. We

talked about decision science versus data science. They

actually have three divisions under Gojek, decision

science, data science and business intelligence, and we

specifically discussed the difference between decision

science and data science.

Kirill Eremenko: 00:03:34 We talked about CartoBERT and Turing, so some more

technical things and some use cases are on this. Some

very interesting use cases. We talked about what it's like

to be a VP or vice president of data science, and what that

role entails at a rapidly-growing company like Gojek.

Kirill Eremenko: 00:03:55 We talked what it takes for a data science team to be a

high performance data science team. We talked about

mathematics in data science quite extensively. Both Jon

and Syafri are experts on mathematics and data science.

It was very interesting to have that conversation. And

finally, we talked about what it takes to thrive as a data

scientist in a company like Gojek.

Page 4: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 4

Kirill Eremenko: 00:04:18 So lots of very cool insights coming up. Can't wait for you

to check out this episode. Without further ado, I bring to

you Syafri Bahar, VP of data science at Gojek.

Kirill Eremenko: 00:04:34 Welcome back to the SuperDataScience Podcast,

everybody. Super excited to have you back here on the

show. Today we've got a very exciting episode. We've got

two hosts and one guest. Our guest for today is Syafri

Bahar calling in from Indonesia, from Bali. And we've also

got Jon Krohn as our co-host calling in from New York.

Hi, guys. How are you doing?

Syafri Bahar: 00:04:55 Hi, Kirill. Doing good, thanks.

Jon Krohn: 00:04:58 Hey, very well, Kirill. Yeah. Delighted to be here.

Kirill Eremenko: 00:05:01 Awesome. What's the time for you, Syafri?

Syafri Bahar: 00:05:05 Now it's 9:00 actually, so I'm calling from Bali.

Kirill Eremenko: 00:05:08 9:00 AM, right?

Syafri Bahar: 00:05:10 9:00 AM, yes.

Kirill Eremenko: 00:05:11 Awesome, awesome. And Jon, you?

Jon Krohn: 00:05:14 Yeah. 8:00 PM. Getting there.

Kirill Eremenko: 00:05:17 Yeah. Crazy. Across all the time zones.

Jon Krohn: 00:05:20 And how about you, Kirill?

Kirill Eremenko: 00:05:22 For me? It's about 6:30 AM. About 6:00 AM.

Jon Krohn: 00:05:28 Oh, man.

Syafri Bahar: 00:05:29 Oh, wow. That's very early.

Kirill Eremenko: 00:05:31 Yeah. That's okay.

Page 5: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 5

Jon Krohn: 00:05:32 Do you always get up that early?

Kirill Eremenko: 00:05:34 I do, my girlfriend doesn't. She was so dazed, I had to go

to another room to go and sleep there because this is the

only room where I can record. Took her blanket and

pillow and just went away.

Jon Krohn: 00:05:52 Our apologies.

Kirill Eremenko: 00:05:54 No, it's okay. It's okay. I'm glad we're all here. We've I

think met Jon. Our listeners have met Jon before from

other podcasts, but just quickly, Jon, if you could give us

a quick intro about your background.

Jon Krohn: 00:06:09 Sure. I'm the chief data scientist at a machine learning

startup in New York. That's my day job, but on the side I

do lots of data science education. I have a book, Deep

Learning Illustrated that was a number one best seller.

Not been translated into Indonesian yet, but we do have a

lot of translations around the world.

Jon Krohn: 00:06:34 And I've also been doing some work with

SuperDataScience. We've got a Machine Learning

Foundations course that just launched in the Udemy

platform together. And Kirill and I met through the

SuperDataScience podcast. I was a guest on the podcast

early in 2019, and I asked Kirill if he would like to be a

guest on my podcast, which I had just launched. At that

point I'd only had two episodes, and we hit it off. We had

a really great conversation and if you don't mind me

breaking it to your audience right now, Kirill.

Kirill Eremenko: 00:07:11 Yeah, sure.

Jon Krohn: 00:07:15 A couple of months ago, Kirill approached me to begin

hosting the SuperDataScience podcast, so I'm absolutely

blown away. I couldn't believe that he asked me to do

Page 6: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 6

that. Now we're getting me warmed up by co-hosting

today, and I couldn't be more excited.

Kirill Eremenko: 00:07:34 Me too. Super fun, super fun. It's going to be an exciting

time I think. I feel you're the right person to carry the

SDS podcast forward. Thanks for being here today, Jon.

Jon Krohn: 00:07:45 Yeah. An honor.

Kirill Eremenko: 00:07:46 Awesome. All right. Oh, and by the way, congrats on the

Machine Learning Fundamentals or Foundations. 90,000

students, right? Last I checked.

Syafri Bahar: 00:07:56 Wow.

Jon Krohn: 00:07:57 Yeah. I think it's 80,000, but that's about the same in

terms of the impact. And yeah, 80,000 students. It's only

been live for five or six weeks. And that's the kind of thing

that I couldn't have possibly ever dreamed of that kind of

thing. It's by association with you guys, with the

SuperDataScience podcast, and so thank you very much

for that.

Jon Krohn: 00:08:22 And we're only just getting started. There's three and half

hours live for the course right now, and I expect when the

podcast is released it'll still be about that three and a half

hour mark. But by the end of 2020 it'll be about six

hours. We'll have finished the first quarter or so of all the

content. In 2021, there'll be 25 hours of content in there,

covering linear algebra, calculus, probability, statistics,

computer science. Everything you need to know to be a

great machine learning practitioner, or data scientist.

Kirill Eremenko: 00:08:58 Fantastic. That's very cool. And that's a very good segue

to Syafri, because Syafri, you love mathematics, right?

Syafri Bahar: 00:09:04 Oh, yeah.

Kirill Eremenko: 00:09:04 Your whole story is mathematics.

Page 7: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 7

Syafri Bahar: 00:09:08 Sure, yeah. Exactly.

Kirill Eremenko: 00:09:09 Please tell us a bit about that.

Syafri Bahar: 00:09:13 Yeah. I've actually been into mathematics since I was a

child, actually. My father is actually a math teacher, so

when I was a child-

Jon Krohn: 00:09:20 There you go.

Syafri Bahar: 00:09:20 Yeah. I remember a day where I was I think in elementary

school and I start asking about this sequence problem. I

just make a sequence problem with the three differential

layer of arithmetic sequence to my teacher. And then I

actually asked the problem to my father, but he just

tossed me a book.

Syafri Bahar: 00:09:45 But later I found out that it's actually in a university

book. I'm kind of being crunching in order to find the

solution of the problem, and since then I've actually

grown my interest to math. In fact, I'm also lucky enough

to represent Indonesia actually to a couple of math

Olympiad competitions, so that's a very nice experience.

Jon Krohn: 00:10:06 That's huge because Indonesia is the fourth most

populous country on the planet, so you're representing a

big population there.

Syafri Bahar: 00:10:15 Yeah. It's quite surreal also for me back then because I

was kind of from, how do you call it, the underdog regions

of Indonesia, so to say. A lot of the representatives

[inaudible 00:10:28] always come from the Jakarta area,

and I was probably the first representative from that

region, from that province actually, after let's say eight to

10 years. It was quite a euphoria for me as well.

Kirill Eremenko: 00:10:42 What province?

Syafri Bahar: 00:10:42 Sorry?

Page 8: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 8

Kirill Eremenko: 00:10:42 What province is that?

Syafri Bahar: 00:10:44 Sulawesi province. Sulawesi.

Kirill Eremenko: 00:10:45 Sulawesi.

Syafri Bahar: 00:10:46 Yeah.

Kirill Eremenko: 00:10:49 I know there was a few active volcanoes. I was doing a

data science analysis of the active volcanoes of the past, I

don't know, centuries. And there's quite a few in

Sulawesi. I think four or five [inaudible 00:11:02]

hundreds.

Syafri Bahar: 00:11:10 Yeah. It looks like a K actually on the map. It's easily

recognizable. Since then I've grown my interest and I'm

actually still actively reading, learning about math book. I

think I consider it as a hobby actually, because I find it

beautiful as a discipline. So yeah, you're right about it.

I'm a big fan of math.

Kirill Eremenko: 00:11:26 That's awesome. And when you don't do math, what is it

that you do? Because it sounds like you're so into

mathematics, sounds like your full time job, but you have

a different full time job. Tell us a bit about.

Syafri Bahar: 00:11:39 Oh, yes. Yes. Actually it's my day job. I am a VP of data

science for Gojek, so I'm Gojek is actually an on demand

super app platform. We have around 20 products. I think

we basically from ride hailing, we have food delivery, we

have entertainment kind of like Netflix streaming services.

We also have insurance, for example. It's a super app.

Syafri Bahar: 00:12:09 We used to actually have even a service where you can

actually order a masseuse coming to your house within

15 minutes actually, just with a click of a thumb. But

unfortunately, we can't get the service to sell. But yeah, it

is quite a hyper growth product. We become the first

Page 9: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 9

unicorn of Indonesia and then two, three years after we

became the first decacorn of Indonesia, which is surreal

in terms of growth I would say.

Kirill Eremenko: 00:12:41 What's a decacorn?

Syafri Bahar: 00:12:43 A decacorn is with a 10 billion valuation basically.

Kirill Eremenko: 00:12:45 10 billion valuation. Oh, my gosh. In 10 years, right you

said?

Syafri Bahar: 00:12:49 Well, it's eight years actually to be precise.

Kirill Eremenko: 00:12:52 Yeah. Wow. Wow. Very cool. Are you subscribed to the

Data Science Insider? Personally, I love the Data Science

Insider. It is something that we created, so I'm biased.

But I do get a lot of value out of it. Data Science Insider if

you don't know is a free, absolutely free newsletter which

we send out into your inbox every Friday. Very easy to

subscribe to. Go to SuperDataScience.com/DSI.

Kirill Eremenko: 00:13:20 And what do we put together there? Well, our team goes

through the most important updates over the past week

or maybe several weeks and finds the news related to

data science and artificial intelligence. You can get

swamped with all the news, even if you filter it down to

just AI and data science, and that's why our team does

this work for you.

Kirill Eremenko: 00:13:39 Our team goes through all this news and finds the top

five, simply five articles that you will find interesting for

your personal and professional growth. They are then

summarized, put into one email, and at a click of a

button you can access them, look through the

summaries. You don't even have to go and read the whole

article, you can just read the summary and be up to

speed of what's going on in the world.

Page 10: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 10

Kirill Eremenko: 00:14:01 And if you're interested in what exactly is happening in

detail, then you can click the link and read the original

article itself. I do that almost every week myself. I go

through the articles and sometimes I find something

interesting, I dig into it. So if you'd like to get the updates

of the week in your inbox, subscribe to the Data Science

Insider absolutely free at SuperDataScience.com/DSI.

That's SuperDataScience.com/DSI. And now, let's get

back to this amazing episode.

Jon Krohn: 00:14:32 And you're financed by some of the biggest possible

financiers around. Sequoia Capital, Tencent, Google,

Facebook. So it's interesting that you would think that a

lot of those companies would actually be competing

companies, and so it's interesting. I guess they see a lot of

potential in Indonesia.

Jon Krohn: 00:14:50 Something that really interests me and may interest a lot

of our listeners is what is a super app? In the West, I

don't think we have anything like that. It seems almost

like in the West they deliberately fragment apps. So

Facebook fragmented into Messenger, and as many

different pieces as possible.

Jon Krohn: 00:15:11 When you have a super app, when you look on your

phone it's just one app that you click on, and then when

you're inside you navigate to all these? You get your

massage and your insurance once you're inside?

Syafri Bahar: 00:15:24 Exactly. No, exactly, Jon. Yeah. It is very interesting

indeed, because if you think about it there's not really a

comparable I would say platform on there. But just the

idea is we built the whole ecosystem within one app. And

I think [inaudible 00:15:39] actually managed to create

this network, and then you actually start to reap the

benefits of having. Because anything that you put in that

ecosystem scales very fast actually.

Page 11: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 11

Syafri Bahar: 00:15:48 So we became, for example for the food delivery, the

biggest in Asia excluding China. Logistics for example

also became the biggest in Indonesia just from leveraging

of this network effect actually that we have within the

app. But you're right, if you think about it the

opportunities to implement data science, machine

learning just meshed in terms of personalization.

Syafri Bahar: 00:16:17 It's just amazing. For example, being able to know

[inaudible 00:16:21] of food orders or massage

appointments allows us to recommend what is the best

service for that. You might think of music actually.

[crosstalk 00:16:34]. There's so many kind of information

within the network which can actually be leveraged to

build a very powerful personalization. It's quite an

exciting environment. It's like having 20 companies within

one umbrella, pretty much.

Jon Krohn: 00:16:52 Yeah. The data science perspective of it sounds absolutely

amazing. And I guess we'll spend most of today's program

talking about that, so it's great. I love this idea of how you

can be like, "Oh, yeah. If you like a deep tissue massage,

then you'll probably be interested in our athlete

insurance."

Syafri Bahar: 00:17:08 Exactly.

Kirill Eremenko: 00:17:12 It's like recommender systems on Netflix or Amazon but

on steroids. You get the network effect of the

recommender systems. It's exponential on exponential. No

wonder it grows so fast.

Syafri Bahar: 00:17:25 Exactly, exactly.

Kirill Eremenko: 00:17:25 That's so cool. As I understand, you're operating in

Thailand, Vietnam, Singapore, and Indonesia. Is that

correct?

Page 12: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 12

Syafri Bahar: 00:17:32 Correct. Yes.

Kirill Eremenko: 00:17:34 And how many people, just for those... Of course, for

those people who are from those countries will know you

well, but for those from the West who maybe haven't

heard of Gojek, how many users do you have on your

platform? How many people do you work with on your

platform?

Syafri Bahar: 00:17:51 Sure, sure. Maybe just to give the idea of the scale. The

app itself has been downloaded 170 million times,

actually. And I think one every four Indonesian have the

app installed. They have actually [inaudible 00:18:08].

And then we have already around-

Kirill Eremenko: 00:18:10 I have the app installed, too.

Syafri Bahar: 00:18:11 Oh, really?

Kirill Eremenko: 00:18:13 I've had at least.

Syafri Bahar: 00:18:14 [crosstalk 00:18:14].

Kirill Eremenko: 00:18:15 Yeah. When I was in Bali I asked for a ride. You get on the

scooter behind this driver and you hold on for your life.

It's a really cool experience.

Syafri Bahar: 00:18:26 Yeah. And just to give you the scale because that's very

interesting, because it has around total of drivers and

then also service providers, we have around two, two and

a half million. So that's almost 1% of the population of

Indonesia, so it's quite crazy. Basically that thing that a

lot of people's lives actually depend on us.

Syafri Bahar: 00:18:49 So it's also quite a privilege I feel, because we need to do

our jobs really well in order to be able to survive and

really provide these people with the day-to-day livings as

well. Maybe couple of other things [inaudible 00:19:05].

In in terms of the economy also we've contributed

Page 13: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 13

immensely in Indonesia. I think if we total everything for

all the incomes coming from the platform itself, it's

actually contribute to 1% of Indonesia's GDP. So it's

pretty big.

Jon Krohn: 00:19:22 That's incredible. Yeah.

Syafri Bahar: 00:19:24 Yeah. And actually, we also hit our two billion orders

milestone last year, if I'm not mistaken. It's actually quite

a milestone also for us.

Kirill Eremenko: 00:19:33 Congrats. That's really cool. I'm sure data science played

a huge role in that.

Syafri Bahar: 00:19:40 Yes, yes.

Jon Krohn: 00:19:42 I had a question along a similar vein. I've been queued up

for it perfectly, which is how big is the core company at

Gojek? For example, how many data science people are

there?

Syafri Bahar: 00:19:57 Yeah. Data science, there are around 60 to 80 people I

think now. In total within data [inaudible 00:20:05] we

have around 150-180 people in total. Now we actually

have three different I would call it analytic professional

within the company. We have data scientists, we have BI,

business intelligence, we also have decisions scientists.

Syafri Bahar: 00:20:22 Recently we introduced this basically to kind of help us

making the right decisions for million-dollar decisions

that we need to take. We need a really specialized

knowledge to, how do you call it, to clear out all the

ambiguities in terms of asking questions and being able

to systematically taking decisions in a more rigorous way,

basically. That's about the size of the data team.

Jon Krohn: 00:20:49 Nice. The decision sciences team sounds like the holy

grail in business. That's what everybody wants to be

Page 14: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 14

doing, and maybe because you guys do it that's why

you're having this incredible hyper growth, and you've

become a decacorn. It could be a big part of it.

Syafri Bahar: 00:21:03 Yeah. We actually just recently started. We'll see, but I

think we're probably one of the first that's introduced the

job ladder, the job family in Indonesia. I am really looking

forward to what kind of impact actually it can. But if you

look at already the use cases, there's quite a lot of use

cases already where we need to take, for example

decisions about expansion, decisions about releasing

certain features. Decisions about for example distributing

[inaudible 00:21:30]. I think those are the typical

questions that these job I'll say architects will focus in on

for that.

Kirill Eremenko: 00:21:40 What's the difference in skill sets for a decision scientist

versus a data scientist?

Syafri Bahar: 00:21:48 Yeah. Our definition, because again within the market

especially Indonesia, every company has their own ways

to define data science. I think our definition of data

science versus decision science, if you look at the core

skill set, data scientists within our companies are very

strong in software engineering as well. So they're trained

to build scalable machine learning system.

Syafri Bahar: 00:22:14 A little bit more like the applied machine learning

engineers actually. Very close to that, while our decision

scientists they need to be very strong with the statistical

analysis, like causal inference for example. Being able to

do hypothesis test and they need to be good with

experimentation. The focus are a little bit different.

Syafri Bahar: 00:22:34 Data scientists, they really build data products, scalable

data products. And our decision scientists really help

with decision making actually, by running some certain

Page 15: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 15

analysis. Statistical analysis that can help us making

better decisions.

Kirill Eremenko: 00:22:48 Wow. That's very cool. I guess in smaller companies or

companies that are not as advanced in terms of data

science, that is all combined in the analyst or the data

scientist, those hypothesis testing and so on. But as you

scale, I guess you made the call to separate those two and

really specialize people. "All right. You are in hypothesis

testing and you can run all these experiments, whereas

you're in machine learning and engineering of features,"

and things like that. So people can actually focus and get

really good at, not one thing but that group of things that

are relevant to that profession.

Syafri Bahar: 00:23:31 Yeah. Indeed, indeed.

Kirill Eremenko: 00:23:32 That's very cool.

Jon Krohn: 00:23:34 It seems like those data scientists I've been reading about

Gojek's machine learning platform, they're a series of

articles on medium. And some very cool specialized tools

like CartoBERT, so using the BERT system, the

transformers in natural language processing. So

leveraging particular deep learning techniques to allow

you in the ride hailing product to be able to create names

for pickup points, right?

Syafri Bahar: 00:24:04 Correct. Indeed.

Jon Krohn: 00:24:06 And then I read about Turing, which named after the

great British computer scientist Alan Turing. And it's a

tool for evaluating machine learning models I guess before

they go into production, or maybe after they're also in

production to make sure that they're still performing as

you'd expect?

Page 16: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 16

Syafri Bahar: 00:24:24 Yeah. I'm actually very, very happy that you've basically

spent some time in visiting our medium blog, and there

are like great articles over there. But to write about

CartoBERT, I think the idea is one of the things that we

would like to [inaudible 00:24:39]. This is also very

interesting in terms of how we really bring data end to

end. Just particularly if you don't mind, I'll tell a little bit

stories about CartoBERT.

Jon Krohn: 00:24:48 Please.

Syafri Bahar: 00:24:50 Yeah. It used to be that we learned from the data

[inaudible 00:24:54] people who actually been pick up

from a very crowd location, like a shopping mall, et

cetera, et cetera. We basically look at the percentage of

people who call the drivers. It's actually two X compared

to the other place.

Syafri Bahar: 00:25:09 Basically we have concluded that people [inaudible

00:25:12] around these areas. So what we did is that we

run some clustering, we [inaudible 00:25:18] basically,

and we found out that among these pickup points

apparently we can actually find the center of those

clusters, where people ask to being pick up.

Syafri Bahar: 00:25:30 And then what we do is that we also have the chat history

of drivers and then our customers. So what we did is that

with a clustering system we picked the center point, and

we need to basically attach a label into that. And that's

where CartoBERT actually plays into the role, because it

allows us to crunch millions of chat logs, and then

summarize it into a pickup point.

Syafri Bahar: 00:25:54 Especially given the size of Indonesia, it's just not

possible to do it manually. So what we did is that we ran

100,000 pickup points in Indonesia for the shopping

mall, and then it translates into product features that

people love. And then we all see quite a significant reduce

Page 17: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 17

in the number of calls between drivers and customers.

Just to illustrate how we really use data to improve the

experience of our users.

Kirill Eremenko: 00:26:20 That's a cool one.

Syafri Bahar: 00:26:25 In addition to that, actually a couple of months ago we

released also together with Hong Kong University of

Technology, we worked together and we released probably

one of the bigger BERT... One of the biggest BERT model

pre-train NLP models for Indonesian language actually.

And we have open sourced it. People here if you happen

to be interested in Indonesian language, NLP for

Indonesian language, you can actually go to

www.IndoNLU.com. You actually can download the pre-

trained model for Indonesian language.

Kirill Eremenko: 00:27:07 Beautiful. That's awesome.

Jon Krohn: 00:27:07 Yeah. It's great to be sharing your expertise with the

world. Really wonderful. Seems like you guys are doing

great things on your team. Maybe Kirill already knows

this, I don't know how much you know about each other's

backgrounds, but in your role are you... Who reports in to

you? How big are the teams? What does being a VP of

data science mean at Gojek?

Kirill Eremenko: 00:27:38 Yeah. I'm also very curious. That's a great question.

Syafri Bahar: 00:27:41 Yeah. Thanks a lot, actually. Especially in the hyper

growth startup, I've probably changed my role three to

four times already within two years in terms of scope. I

was originally hired to develop the data science

capabilities [inaudible 00:27:56] originally, and then

became the head accountant for data science basically.

Syafri Bahar: 00:28:00 At that the teams are still around 40-50 people I think

within the machine learning engineers. [inaudible

Page 18: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 18

00:28:07] platform. And then recently, the portfolio has

grown a little bit. Not only that, I actually have two other

peers within Gojek, so we both report to the chief data

officer of Gojek.

Syafri Bahar: 00:28:23 Together with my peers we basically split the portfolio, so

I currently oversee around nine verticals. Our

entertainment, third party platform, groceries, marketing,

for example. It's not all logistics. There are a couple of

verticals and we oversee both the analytic and science

part of the portfolio.

Syafri Bahar: 00:28:46 What I refer to analysts is the BI and analysts, data

analysts. And then the science part is decision scientists

and data scientists. Probably there are around 50-70

people, 60 people I think eventually reporting to me

currently.

Syafri Bahar: 00:29:07 In terms of scope of work, it basically encompasses

almost all spectrums. If I were to decide the cluster

[inaudible 00:29:15] activities, it's starting from the

people itself. We're taking care of the technology, what

impact. Which technology that we need to for example

invest in next year.

Syafri Bahar: 00:29:25 We also deal with building organizations. How do we

organize ourself actually to prepare us to tackle the

company strategic team, positions ourself. Basically all of

these aspects from hiring and everything. Even the dirty

one, like the financing, cleaning up the systems and stuff

like that.

Syafri Bahar: 00:29:47 It encompasses almost everything, basically. I actually do

see myself as a problem solver in a way that whatever, I

try to fill the sack in terms of, "Hey, I don't think that

there is a, for example clear career path for some of our

people," so I'm going to immediately jump talking to HR

and ensuring that for example that we've managed to

Page 19: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 19

create a good system that allows people to basically follow

their aspirations.

Syafri Bahar: 00:30:15 But sometimes I'm also put in a very project-specific

activity, like for example really understanding our

customers. Creating a framework in order to be able to

actively manage our customer portfolio for example, by

properly [inaudible 00:30:31] customer lifetime, for

example. I think those are different spectrums, just to get

some flavor of what I'm doing on day-to-day basis. I hope

that answers your question.

Jon Krohn: 00:30:38 Yeah. That was an amazing answer, and it sounds like a

really interesting role. Wow.

Kirill Eremenko: 00:30:46 Do you still do much technical work?

Syafri Bahar: 00:30:50 Yes. I try to do so because I think, and especially in this

field, things just evolve very rapidly so I try spend couple

of hours still coding basically, and really pushing codes

as well to the repository. Being involved also in the

technical discussion in the modeling, so I still try to do

that.

Kirill Eremenko: 00:31:10 Yeah. That's impressive.

Syafri Bahar: 00:31:12 [crosstalk 00:31:12]. Yeah, exactly.

Kirill Eremenko: 00:31:15 Absolutely. That's very, very good to hear. I like what you

said in one of your interviews about a high performing

data science team that requires three main components.

Do you mind telling us a bit about that? As a VP of data

science, you have a unique position that not only you

need to deliver the work, but you also need to evaluate

the performance of your team.

Kirill Eremenko: 00:31:41 And report to higher up executives on, "We are delivering

value. This is a very useful team to the company." You

Page 20: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 20

have accountability and you have a responsibility to your

team to do that, otherwise there's stories of whole teams

getting disbanded because executives didn't see value.

And I found your philosophy about what is a high

performing data science team very structured.

Kirill Eremenko: 00:32:10 And I think not only managers listening to this podcast

will find valuable, but also individual contributor data

scientists will find it valuable to understand. To evaluate

for themselves if they're part of such a team, and what

they can do in order to be part of such a team. If you

could jump into that, that'd be great.

Syafri Bahar: 00:32:30 Sure. Yeah. Thanks a lot for that. Indeed, I think what I

found actually to be very challenging is really to establish

values. Especially data science itself as a discipline is a

valued thing. It takes a lot of faith I would say from

executives to kind of even invest in the team, because

typically the investment will probably take, especially for

our largest machine learning systems that we have that

really move the needle, it probably took a year in the

making.

Syafri Bahar: 00:33:01 Involves a lot of iterations, trials and error. So I think

what I also highlighted in my interview that basically

what we need to show to the company is that, first of all I

think we need to measure everything. And that's also the

reason why we're actually integrating all of our machine

learning system. We integrate also the measurement

system inside, just to ensure that we are actually able to

quantify the impact even on a team level.

Syafri Bahar: 00:33:26 I'm able to know for example what is the dollar impact

that a team of three people basically deliver for a

particular project. That's how rigorous we are in terms of

measuring impact. And really, I think the case that we try

to make is that we want to make a case that we're not

[inaudible 00:33:43], because there are very tangible

Page 21: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 21

dollar savings or dollar generating, actually that we do for

the company.

Syafri Bahar: 00:33:50 And we are able to achieve that by really putting a very

strong measurement in place, even before we engage in

any of the machine learning projects actually. I think the

first thing that we ask for product engineers counterpart,

to have a measurement system in place. We have

experimentation system in place actually, just to

understand where basically the things that we will build

for them will actually lead to many impacts.

Syafri Bahar: 00:34:15 And the fact about being in the hyper growth startup,

there's hundreds of things that we can actually do for

next year. So we need to have a very ruthless [inaudible

00:34:24]. Having a proper way to measure the impact or

potential impact is very essential in order to establish a

case for the company.

Syafri Bahar: 00:34:35 I think one of the characterizations of high performing

team will be that they deliver impact, and how do they

know whether they deliver any impact? It's by really

putting this measurement in place. And then by

educating as well. I think what I seem to learn as well

during my experience within Gojek is that a lot of these

end to end, a lot of these projects have actually managed

to deliver big impact.

Syafri Bahar: 00:35:01 A lot of the challenges, of course there are [inaudible

00:35:03] challenges as well, but I think not to be

undermined as well is the non-technical challenge of

really ensuring that we have created a good structure for

our data scientists and product engineers. And engineers

actually to work different pace, but they are able to

integrate their solution.

Syafri Bahar: 00:35:27 This is one, and the second thing is also about constant

education to stakeholders to try to convince them why it

Page 22: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 22

is okay actually for their millions of dollars of their

money, actually being managed by a black box. I think

that also requires a lot of convincing, I would say. So

really establishing a good operation model is very

essential I think for the high performing team, because by

having a good operational model.

Syafri Bahar: 00:35:53 Just to give a little bit more flavor to that one. For

example, we have recently basically declared that all of

our solutions need to be basically communicate with

product engineer systems using API base. Because that

allows people basically to move in a different pace, and

then meet up again like a couple of weeks later to

integrate their solutions.

Syafri Bahar: 00:36:14 But as long as before the start of a project, the teams are

very clear in terms of what they will deliver. And we only

can achieve that by having a proper API contract. It

allows team really to reiterate. And the thing about data

science, I think what I found to be very interesting is that

their iteration, the sprint cycles are very different with

product engineering teams actually.

Syafri Bahar: 00:36:38 There have so many data dependencies when we look at it

from data science perspective, so we can't really treat it as

an engineering sprint. So they need to be able to have the

flexibility to move at different pace. But then eventually

their solutions that they built need to match actually the

API.

Syafri Bahar: 00:36:56 And what I also find to be very important is to ensure that

the team is empowered to make decisions, by having a

proper experimentation system, having a robust

methodology to decide whether the team needs to go left

or right. And empowering them to make decentralized

decision making actually. I think that I found to be very

important to ensure that the team can move very fast.

Page 23: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 23

Syafri Bahar: 00:37:20 So we need to trust them with decision making in

decentralized manner, as long as the methodology and

the system that they've actually created to make those

decisions are robust. I hope that answers your question.

Kirill Eremenko: 00:37:31 Yeah. And empower them to fail as well, right? You said

that in one of your other interviews.

Syafri Bahar: 00:37:36 Yeah.

Kirill Eremenko: 00:37:37 Decisions sometimes will be wrong, and they should know

it's okay.

Syafri Bahar: 00:37:42 Exactly. And I'm actually very glad that we [inaudible

00:37:44] or CEOs or co-CEOs [inaudible 00:37:49] are

very supportive of that with the cultures of it's okay if

they actually fail. It's better to fail fast and learn from it,

rather than moving very slow because especially the

competitions is very fierce, the market also moves very

fast. So agility is definitely something that we value very

high within our company.

Kirill Eremenko: 00:38:14 Fantastic. Thank you. Thank you for that answer.

Jon Krohn: 00:38:17 It sounds like you guys are doing everything right. Yeah.

If I was in Indonesia and listening, I'd be like, "Man, how

can I get involved with this company?" Really amazing.

You're saying all of the things that I think are spot on

from a quantitative data management perspective. How

you are treating your data scientists and relating that into

the broader operations of the organization and evaluating

it. Brilliant.

Syafri Bahar: 00:38:52 Yeah. I feel also very privileged actually to work with

these amazing people, and I think I learned a lot from my

team. And the ability just to work with amazing people

who are actually distributed. Our teams are actually well

distributed, even our CEO is actually working from US.

Page 24: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 24

We have 31 nationalities working for the company, so

we're really chasing talent also globally. We have people

across different continents actually working for us. Just

as additional information.

Jon Krohn: 00:39:27 There you go. How did you find yourself here? What was

your journey? I mean, I know that you've worked across

the world, you studied in the Netherlands and then you

worked there for a while at banks, asset management

company. And so what was your journey from that world,

so from a different continent?

Jon Krohn: 00:39:49 I expected when I was talking to you that you would have

been involved in a lot of the finance applications at Gojek.

I thought that that would be what you were working on.

But it sounds like it's much broader than that, so how

did you end up making that journey from financial

companies, really traditional financial companies? Big

banks in the Netherlands to a hyper growth decacorn in

Indonesia?

Syafri Bahar: 00:40:18 Yeah. Thanks a lot actually for asking the question. I

think because especially the journey has been very

intimate to me, and I think the reason of that because I've

always had doubt whether a pure mathematician like me

is actually able to make an impact for the society. I

always see it as very remote.

Syafri Bahar: 00:40:40 I remember there was some certain time in my life that I

say [inaudible 00:40:44]. Because my background is

actually in pure mathematics, so my thesis back then is

about topological structure basically, so I haven't really

seen data. I worked a lot with writing formulas [inaudible

00:40:58] formulas, and I had my bachelor education.

Syafri Bahar: 00:41:04 It just felt very remotely and not [inaudible 00:41:07], and

I actually switched a little bit to the applied mathematics.

I was actually taking education to be a quant, and that

Page 25: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 25

there actually I got myself into a lot of high performance

computing. A little stochastic courses, and then being

able to actually see data. How do you say? It's quite a

spotty journey actually to, how do you call it, to come

from pure mathematics-

Jon Krohn: 00:41:42 [inaudible 00:41:42]. Yeah.

Syafri Bahar: 00:41:42 ... to applied. Exactly, right.

Jon Krohn: 00:41:47 You might not even have had numbers for many years.

Syafri Bahar: 00:41:50 No. No, no, no.

Jon Krohn: 00:41:50 It was just variables, right?

Syafri Bahar: 00:41:54 It's just variables. Indeed, indeed.

Jon Krohn: 00:41:54 That's so interesting.

Syafri Bahar: 00:41:55 Yeah, exactly. And then when I came to bank and I

started actually my education, I was trained I would say

in a very classical environment. I remember one of

mentors back then, I was requested to do analysis with

only five basic statistics, mean, median, percentile, max,

and min. And I really need to kind of-

Jon Krohn: 00:42:14 Oh, no.

Syafri Bahar: 00:42:15 Yeah, exactly. But I got really rely a lot on my problem

solving skills, and getting to know what these

measurements are actually doing. Because actually, I was

surprised that a lot of things can be done with these very

basic statistics actually. A lot of insight can be uncovered

by just playing around with the weighted average for

example, and then being able to compare these different

statistic.

Page 26: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 26

Syafri Bahar: 00:42:38 And really make an educated guess in terms of what is

the [inaudible 00:42:41] distribution, is there an anomaly

or not. Actually with some basics, as long as one knows

very well [inaudible 00:42:49] actually there a lot of things

can be done. So I was actually trained in that

environment and I was also lucky enough to work with

different type of risk.

Syafri Bahar: 00:42:59 And actually, for the audience who's not very familiar

with risk management, there are actually different type of

risk. And what's very unique because for each different

type of risk, it actually deploys different type of

mathematical tool. Just as an example for credit risk, I

used a lot of predictive funnels with [inaudible 00:43:19]

risk.

Syafri Bahar: 00:43:19 For example, my last type of domain that I've worked with

before I moved to Indonesia, I actually needed to do a lot

of simulation kind of things. I actually maintained Monte

Carlo engine for the bank itself. So basically, what we

need to do, we have couple of hundreds of thousands of

trades and we need to do simulations of thousands of risk

factors.

Syafri Bahar: 00:43:44 And not only we need to simulate it for one day or two

days later, but really 30 years ahead. So basically, I used

a lot of the parametric simulation techniques in order to

be able to do that.

Syafri Bahar: 00:43:59 But basically, what I wanted to say is that I really built

the required skill set in a very classical environment,

really beat by beat. And then what I found to be very

beautiful about mathematics is because it's very, how do

you call it, transferrable to other type of domains.

Because the language are the same, especially the

language of linear algebra I think is very useful in order

for me to grasp the new concept as well.

Page 27: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 27

Syafri Bahar: 00:44:29 When I came back to Indonesia, I started at a fintech

company. And then by coincidence I gave a talk at Gojek

actually, and then I got approached by what now becomes

the co-CEOs of Gojek itself and I got hired from a coffee.

Kirill Eremenko: 00:44:46 Nice.

Syafri Bahar: 00:44:47 I'm actually very glad that he took a bet on me until I

basically managed to be where at where I am now.

Kirill Eremenko: 00:44:56 That's interesting.

Syafri Bahar: 00:44:56 It was quite a series of coincidences actually.

Kirill Eremenko: 00:45:00 That's very interesting. I've got a question, kind of I guess

a question that will challenge me more, and I'd like to get

your opinion on this. The way I teach data science in the

courses is very different to the way Jon teaches, and the

way I guess that you apply data sciences. I studied also

mathematics, studied mathematics and physics in my

bachelor, but it was a long time ago and I liked it a lot.

Kirill Eremenko: 00:45:30 But the way I applied data science when I was at Deloitte

in an industry, it required very little mathematics. And

that's how I teach it as well. I teach more as like a plug

and play type of instrument that, "All right, machine

learning. Here's an algorithm. I don't know, Naïve Bayes

clustering. This is intuitively how it works, this is what's

in the background. This is what's going on and this is

how you apply it."

Kirill Eremenko: 00:45:57 And I avoid teaching the mathematics. For instance, the

analogy I give is driving a car. To drive a car, you need to

know where to put the petrol, how to steer, where to press

the gas, where to press the brakes. And you need a lot of

practice. That's how you pass your driving test. You never

need to know what a crank shaft is, how it's different to a

cam shaft, what's under the hood.

Page 28: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 28

Kirill Eremenko: 00:46:20 I don't even sometimes know how to put the oil in the car,

for crying out loud. So my question to you is, is there a

right or wrong? Or if you think it's important for people to

learn mathematics in order to be data scientists, then

why?

Syafri Bahar: 00:46:42 Understand. Okay. Yeah. I think it all very depends on

the type of domain that they will work on in the future,

and what they're interested in. I would say because we

kind of look at the spectrum of applications of

mathematics within data science, I think we can define it

in a couple of clusters actually.

Syafri Bahar: 00:47:01 And particularly in Gojek why it is important for the

people to understand the basics, because we dealt a lot

with what I call green field projects. These are the type of

projects which we can't just Google and get the answer.

We really need to exercise the first principles in order to

understand what kind of mathematical apparatus that we

basically need to deploy to solve the problem.

Syafri Bahar: 00:47:22 [inaudible 00:47:22] can just come to us and say, "Hey,

we have this amount of budget. I want you to be able to

distribute it in an optimal way." Very, very vague and

ambiguous, so one really required to ask more, ask direct

questions first of all to understand the real problem.

Syafri Bahar: 00:47:38 How do you define optimal, what are the different levers

that we basically can use to distribute those things, and

how can we basically use the right apparatus to model

the problem itself? What I want to say, basically that

that's also the reason why we emphasize this a lot, the

context.

Syafri Bahar: 00:47:56 Even for example [inaudible 00:47:58] linear regression

during an interview. I think what we sometimes do, we try

to tweak it a little bit, the problems. "Hey, what if we take

these L1 penalty, what if we take L2 penalty? What if for

Page 29: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 29

example we shift the distribution of the target variables to

become very highly imbalanced?"

Syafri Bahar: 00:48:15 Just to test the ability of the candidate to adapt to

different reality that they might encounter while working

on the problems within Gojek. And the reason why we do

it is because we think that's a relevant skill set to have.

Syafri Bahar: 00:48:30 I can imagine for example when one will focus a lot on

building the data science platform or engineering

platform. [inaudible 00:48:41] to know the kind of two,

three layer [inaudible 00:48:44]. Like what you said, more

like a plug and play, but I think the emphasis will be how

to design the right architecture that can be very scalable.

And how do we use the mathematical concept to cut some

of the computational resources that we basically goes into

that?

Syafri Bahar: 00:49:00 And maybe in that case, it will be less obligatory to know

the two, three-layer depth. So maybe I apologize because

there is no straight answer, but I think it all depends.

And I think for the Gojek context it's very important to

understand those basic, because then the choice of

apparatus to deal with the problem is just quite immense.

Syafri Bahar: 00:49:22 We employ the economic technique, we employ also the

operation research technique for example in our

problems. If we play around with logistics, sometimes also

predictive models, supervised and unsupervised. And

even to some certain extent also some [inaudible

00:49:38] type of algorithm. There's just quite a lot of

possibilities over there, so it's really important to know

the at least two, three layers deep from the mathematical

perspective.

Syafri Bahar: 00:49:51 But I think for my personal opinion, I think it is also very

important to basically, how do you call it, like a painter.

Sometimes we need to be able to really bring people to

Page 30: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 30

the, how do you call it, to appreciate the painting itself.

And I think sometimes the best way to do it is that by not

starting with [inaudible 00:50:16] differential calculus.

Syafri Bahar: 00:50:17 But really starting with the stories and then, "Hey, why

this is important. Why [inaudible 00:50:22] is important.

Because hey, we can actually translate that fraction of

this problem by bringing it to the [inaudible 00:50:28] for

example." Then they're able to imagine the solutions of

the problem.

Syafri Bahar: 00:50:33 I think it depends. And I think my personal preference is

always to start very simple and then try to peal the layers

one by one, bringing them to a bit more, how do you call

it, depth. The required depth actually necessary. That's

just [inaudible 00:50:49] a lot of it in learning as well.

[inaudible 00:50:52] language or the way you present

your teaching.

Jon Krohn: 00:50:59 I love that answer, and I don't think I have too much

extra to add. I think that to kind of summarize the value

of understanding the underlying mathematics is that I

love the car driving analogy. But the beautiful thing about

machine learning is it isn't necessarily actually that

complicated, what's going on under the hood.

Jon Krohn: 00:51:29 And so I actually started teaching exactly the same kind

of way that you described teaching, Kirill. And it's only

relatively recently that I was like, "Maybe it is worth

getting into the partial derivative calculus, the linear

algebra that's happening under here." And I was inspired

to think that by colleagues of mine, people who work for

me.

Jon Krohn: 00:51:54 I would see them doing matrix algebra or I would see

them thinking about, "What's the right data structure for

this particular type of data in this model because of how

we're going to be scaling it, so that we can minimize

Page 31: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 31

computational resources." I was seeing people use these

underlying understandings to make on the science side,

huge intuitive breakthroughs that by only understanding

the [inaudible 00:52:29] API, there's no way you could

have had that breakthrough.

Jon Krohn: 00:52:33 And then on the engineering side, being able to think

about, "Okay. What is the time complexity or the memory

complexity of what I'm doing here? And then how can I

maybe make adjustments there, trade offs between

computational complexity versus memory complexity, so

that I can use fewer resources, or maybe have a faster

experience? Realtime experience for my users."

Jon Krohn: 00:53:02 It's a really recent thing for me that it seems so valuable,

but the more and more I dig into it, the more and more I

appreciate that, "Wow. There's so many possibilities." And

there's still absolutely a time and a place for using the

high-level APIs.

Jon Krohn: 00:53:19 I mean, maybe more often than not. But to be making

really cutting-edge algorithms, or to even be

understanding and trying to deploy some of the latest

things that you read that might only occur in papers or

graduate-level textbooks. There might not be a high-level

API for you to use yet. So if you wanted to make

CartoBERT, you can't just be able to use BERT. You have

to understand what's happening in BERT.

Syafri Bahar: 00:53:51 Yeah. Fascinating. If I can add couple of more sentences

to that, I personally think that I'm on personal mission to

really spark interest from people. Especially in Indonesia,

to really found this discipline to be fascinating. I really

want the people in Indonesia in their job for example are

being asked, "What do you want to do in the future?"

Syafri Bahar: 00:54:17 Instead of saying astronaut or doctor for example, they

say, "I want to be a data scientist." I think what I wanted

Page 32: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 32

to kind of emphasize, I think there's so many beautiful

things which you can actually put in more intuition, in

order to just make the first bridge for people to cross that

bridge to find it interesting.

Syafri Bahar: 00:54:34 And I found communications via wrapping up things in

terms of intuitions, like what Kirill just mentioned. I think

it's very helpful to spark their interest and really for

people to get interested and really to get motivated, and

they will give energy in order to go even deeper, to a

deeper level.

Syafri Bahar: 00:54:53 But I think I'm still learning. I try to also learn how can I

actually present all these different complex concepts to

actually make it very simple, intuitive, and exciting as

well. I think that's kind of my personal mission. I'm still

learning of course, but I think they're just so beautiful in

terms of discipline. And I think a lot more people actually

can benefit from that, and especially the society.

Kirill Eremenko: 00:55:22 Thanks. Thanks, guys. I asked to get challenged and I feel

challenged. Yeah. I think it's a good perspective that

there's room for both to get started, go down the intuition

path, but then always keep in mind that you can go

deeper and it'll give you more superpowers with the

mathematics.

Kirill Eremenko: 00:55:42 Syafri, you mentioned that Gojek is hiring, so where can

people apply? And then I wanted to ask you a second

question. What does it take to thrive as a data scientist at

Gojek?

Syafri Bahar: 00:56:00 Sure. Thanks a lot for asking this question specifically. It

helps us a lot. And I think that we can always find the

open positions actually within the recruitment. And I

think if people just type Gojek recruitment, they'll pop up

basically the website where they can see what are the

available positions at Gojek.

Page 33: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 33

Syafri Bahar: 00:56:26 And I think the second question is very interesting. I

think the fact that the company itself is, how do you call

it, we're going to the next phase now. It used to be that

we were in this very high growth phase, so to say where

things are a little bit ambiguous, I would say sometimes.

So people who can navigate in an ambiguous

environment will thrive within Gojek. People who can

actually systematically approach problems in general,

they will thrive. And I think it also takes a lot of

determination and grit to push things as well as a data

scientists.

Syafri Bahar: 00:57:03 And I think this is also the type of data scientist who

actually do not just stick with conventional approach of

things, but data scientists are required also to be able to

exercise first principle. And I think those are the type of

data scientist can actually thrive within Gojek

environment.

Syafri Bahar: 00:57:23 So they need to be able to very diverse enough to know

what are the different apparatus available to solve

problem, and be very skillful enough to. And I think also

it's being intellectually humble, to really acknowledge that

we don't know what we don't know. Because sometimes

it's just really like asking.

Syafri Bahar: 00:57:43 Sometimes actually there are a lot of things that actually

hidden behind all of these numbers and digits that we're

seeing on our screen, as a data scientist. So I think I

often also ask my data scientists just to go to the field.

Really talk to our drivers, really understanding their pain

points, and then that way it actually allows them to

understand and to basically rationalize what they see

under there.

Syafri Bahar: 00:58:11 How do you say, [inaudible 00:58:11] in terms of all the

figures and numbers. And then from those intellectual

curiosities, they are able to frame the problems correctly,

Page 34: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 34

and then frame it as data scientist problem. And then

again, the next level will be to find the right apparatus.

Syafri Bahar: 00:58:28 And I think another quality that I think also will help a lot

will be to be very practical. If you look at the overall in the

company, there are a lot of problems that require simple

solutions. Because there are a lot of low hanging fruits in

the company, these are the type of problems that we need

basically effort 80%. We can achieve the standard

solution.

Syafri Bahar: 00:58:51 But there are also rather mature problems where in order

to go from 95% to 97%, then we will need the

fundamental research. And I think what I always told to

my team is that we should be fine using hammer to kind

of hammer the problem, but we should not shy away from

using scalp as well in really formalizing the solutions. I

think this type of mindset will basically help people to try

within the Gojek environment.

Kirill Eremenko: 00:59:22 Wow. Fantastic. Thank you. Thank you for sharing that.

Jon Krohn: 00:59:25 Yeah. I think something that if people get a chance to

check out the video version of this podcast, you can see

Syafri is so happy this whole time talking about modeling.

And maybe that even comes through in the sound of his

voice, but there's so many points where he throws his

head back with a big smile, because you're so enlivened

by these questions and these ideas. It's wonderful to see.

Syafri Bahar: 01:00:01 Yeah. Thanks a lot, Jon and Kirill.

Kirill Eremenko: 01:00:01 Yeah. In one of the videos, you mentioned in one of your

interviews that when you were working as a quant back

in Europe I believe, you realized that the impact you

make cannot extent much further beyond the company

you work for. And that you just want to do more, you

want to work... Your quote in quotation marks, "I just

Page 35: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 35

want to do more. I want to do work to benefit a lot of

other people." Do you feel that you're doing that at Gojek

now?

Syafri Bahar: 01:00:32 Yeah. As a matter of fact, I do. And I actually feel lucky

myself, because when I wake up in the morning I still feel

that day is my first day, to be honest. Because I'm really

still very motivated to solve different problems. And the

thing about Indonesia because there are just so many

structural inefficiencies within the country, that I believe

people like me and other...

Syafri Bahar: 01:00:58 I think there's also an interview where I specifically call

all the expats over there, like Indonesian people who live

abroad, to just come and really contribute to the country.

Because there's just so many structural issues that we

need to fix, and I think exploitations of natural resources

is one way to extract values.

Syafri Bahar: 01:01:15 But I think solving structural inefficiency is also one way

to create value for the system actually. And I feel actually

blessed and privileged also to have the opportunity to

really be able to serve the community. Because these are

products that I can really relate from. My family will say,

"Hey, I feel that this app actually has helped me to

remove the daily frictions."

Syafri Bahar: 01:01:43 Even for example, there is something bad happening I'll

get immediate feedback. And even because I also

sometimes, before pandemic of course, I go ride to office. I

talk to the drivers as well, and then he mentioned how his

life actually has changed since he became one of our

partner. He was able to for example, adopt a couple of

children because of the fact that he works as a partner, a

driver partner within our platform.

Syafri Bahar: 01:02:14 I think those are all the stories that really keeps me going

through the day, and I feel blessed to be honest, to be

Page 36: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 36

able to have the opportunity to do that. Especially with

my remote discipline, what's considered to be very

remote. Mathematics, computer science, and social

impact.

Kirill Eremenko: 01:02:36 Wow. Thank you. That's very inspiring to hear. I wish for

as many people listening as possible to feel the same way

at work. It's clearly a very fulfilling place to be in.

Syafri Bahar: 01:02:57 Thanks.

Kirill Eremenko: 01:02:59 That's awesome. Jon, do you have any questions to finish

off?

Jon Krohn: 01:03:04 No. We've covered all of my questions and I love the ones

that you asked as well, Kirill. I've learned so much today.

I can't help but notice that it seems like Gojek's mission

is to impact its scale through technology. And so it

sounds like you're really living that as a data scientist at

the firm, Syafri.

Jon Krohn: 01:03:28 I don't have any other questions. I just felt like saying

that one more time, kind of reinforcing this idea of with

probably the vast majority of people listening to this

podcast are data professionals, or aspiring data

professionals. And to hear a story like this today, it made

me feel inspired and so I hope you feel inspired, too, to be

identifying places that you can be making a big positive

socioeconomic impact with your skills. Even if you started

with a pure math topology background, you too can make

a difference.

Syafri Bahar: 01:04:08 Oh, that's a nice [inaudible 01:04:09] over there, Jon. And

thanks a lot. I really enjoyed the conversation actually.

You have done a fantastic job in really controlling the

flow, and just really participating as well. Genuinely ask

questions I think. And I think to a lot of data

Page 37: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 37

professionals out there, I still fundamentally believe in the

futures of our professions actually.

Syafri Bahar: 01:04:32 I think we can do a lot of things for the community, even

for the world in general. I think we just scratch the

surface of what data actually can do and bring to lives of

millions of fellow people out there. I would really

encourage people who are in their learning journey to

keep going, find their energy and their motivation to keep

going. Because there's a beautiful thing and it's worth to

really put investment in really enhance the professional

and also the knowledge on the industry itself.

Syafri Bahar: 01:05:07 Thanks a lot. And I think both of you also have inspired

people with the podcast, and also especially for the

aspired data scientist and data professionals out there.

Thanks a lot for that contributing back to the community.

Kirill Eremenko: 01:05:25 Thank you, Syafri. It's been a really cool podcast. And for

those of our listeners who want to or would like to

connect with you or maybe just follow how your career

progresses, where are some of the best places to get in

touch?

Syafri Bahar: 01:05:37 Yeah. I think the best to get in touch with me on my

LinkedIn actually. I don't have a social media, like

Instagram or Twitter, intentionally. But I think the best

place to connect with me will be on my LinkedIn actually.

Kirill Eremenko: 01:05:53 Thank you. We'll share.

Syafri Bahar: 01:05:54 And actually, I do [crosstalk 01:05:55]- yeah. Sorry.

Kirill Eremenko: 01:05:56 Sorry. You go ahead.

Jon Krohn: 01:05:57 Go ahead. Sorry. I didn't realize who was Kirill speaking

to. Syafri, you go ahead. You go ahead.

Page 38: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 38

Syafri Bahar: 01:06:15 Thank you. Thank you. Thank you. I was actually

thinking also sharing more materials and sharing some of

more thoughts as well actually. I felt that I could have

done it a bit better because, especially for a lot of aspiring

data professionals in Indonesia. I think one of the things

that I personally commit at least to 2021, so hopefully

more content that I can share to the community as well in

the future.

Kirill Eremenko: 01:06:29 Nice. Jon?

Jon Krohn: 01:06:33 I was just going to say that on the LinkedIn point that I

don't think, Syafri you shouldn't feel ashamed that

LinkedIn is your go to social medium, because I think

Kirill and I feel exactly the same way.

Kirill Eremenko: 01:06:43 Yeah. Absolutely.

Syafri Bahar: 01:06:43 Okay. It makes me feel better at least that I'm not the

only one there.

Kirill Eremenko: 01:06:50 Yeah. That's the only one that I really use. I don't think I

use any other ones.

Jon Krohn: 01:06:57 Same.

Kirill Eremenko: 01:06:58 Yeah. Syafri, one final question for you. What's a book

that you would like to recommend to our listeners?

Syafri Bahar: 01:07:07 Yeah. In terms of books there's actually quite a lot that I

have in mind. But maybe just to select few of them,

definitely Elements of Statistical Learning is a good start.

I recently also get myself into more causal learning,

basically because it happens to be that we're in the space

where we will need it a lot actually.

Jon Krohn: 01:07:26 Judea Pearl?

Page 39: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 39

Syafri Bahar: 01:07:27 Yes, yes. That is for the mathematics. There's also the

title is What If? I forget the author again, but what I think

is also a good mix of combinations of theory and practice

as well. And Judea Pearl is definitely, if you're into math

itself, I think you will enjoy reading Judea Pearl's book on

that. And I also like-

Kirill Eremenko: 01:07:56 What's it called? What is it called? The book.

Syafri Bahar: 01:08:00 The book of-

Kirill Eremenko: 01:08:01 Judea Pearl.

Jon Krohn: 01:08:01 Causality.

Kirill Eremenko: 01:08:04 Causality, okay.

Syafri Bahar: 01:08:05 Causality, yeah. Exactly. Yeah. And Elements of

Statistical Learning is also a good book, as I mentioned

earlier. And there's also this 100-page machine learning

book that I just from one time like to read as well as a

refresher, because it condensed everything within one

book.

Jon Krohn: 01:08:25 Nice.

Syafri Bahar: 01:08:25 Do you happen to recall again the name of the author,

Jon? The 100.

Jon Krohn: 01:08:30 It's Andriy. It's so embarrassing, I can't remember.

Syafri Bahar: 01:08:37 Burkov?

Jon Krohn: 01:08:38 Yeah, that's right. Andriy Burkov. Exactly.

Kirill Eremenko: 01:08:42 Oh, yeah. Andriy Burkov. Jon, you might have him on the

podcast sometime. We've been talking with him.

Page 40: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 40

Jon Krohn: 01:08:47 Well, he's been making quite a splash. I would love to

have him on the podcast.

Kirill Eremenko: 01:08:52 He's from Canada, right?

Jon Krohn: 01:08:53 I think he's in Montreal.

Kirill Eremenko: 01:08:55 He's Russian, ex-Russian but in Canada.

Jon Krohn: 01:08:59 Yeah.

Kirill Eremenko: 01:09:00 Awesome. Okay. Well Syafri, thank you so much. Jon,

thank you a ton. It's been a huge pleasure being part of

this podcast. Been great.

Jon Krohn: 01:09:11 Same.

Syafri Bahar: 01:09:12 Sure. Yeah. Thanks a lot. Thanks, Kirill. Thanks, Jon.

Kirill Eremenko: 01:09:20 There you have it, everybody. Hope you enjoyed this

episode and enjoyed the conversation we had with Syafri

and Jon. I definitely had some great laughs. My favorite

part of this episode, there's lots of really cool insights that

we shared.

Kirill Eremenko: 01:09:36 My favorite part was the use case that Syafri shared

around CartoBERT and how they modify BERT, and how

they used it to analyze all those interactions between

customers and drivers, to figure out the best. How to

optimize their logistics for pickups, and also how in result

it helped reduce the number of calls, and basically

improve efficiency.

Kirill Eremenko: 01:10:06 Also, I really enjoyed hearing about what Syafri

mentioned about meaning and purpose, that he is very

excited to be helping people to be contributing to

improving people's lives, in that example that he shared

of a driver that was able to adopt children. I think that's

very noble and I wish for all data science to ultimately

Page 41: SDS PODCAST EPISODE 427: IMPACTING THROUGH …

Show Notes: http://www.superdatascience.com/427 41

result in great things for communities and people across

the world.

Kirill Eremenko: 01:10:37 That would be very good, and if we all look out for that

and try and strive to find jobs, and make our jobs about

impact, I think that will help serve the world and also

create more happiness around the world.

Kirill Eremenko: 01:10:55 As usual, you can find the show notes at

SuperDataScience.com/427. That's

SuperDataScience.com/427. There you'll find any

materials that are mentioned on the show, all of the

books that Syafri mentioned and Jon mentioned as well.

Plus the URL to Syafri's LinkedIn. We'll also include the

URL to where you can apply for a job at Gojek as a data

scientist, if you would like to explore that further.

Kirill Eremenko: 01:11:24 Make sure to connect with Syafri, make sure to connect

with Jon. They're both open to connecting on LinkedIn.

And yeah, you'll hear more from Jon in the coming weeks

as mentioned in the beginning. There'll be this transition.

I'll talk more about that in the coming episodes.

Kirill Eremenko: 01:11:40 And yeah, on that note if you enjoyed today's episode,

make sure to share it with somebody. It's very easy to

share. Send them the link, SuperDataScience.com/427.

And I look forward to seeing you back here next time.

Until then, happy analyzing.