sds podcast episode 171 with nathan stephens · from california, where did you study? and what took...

47
Show Notes: http://www.superdatascience.com/170 1 SDS PODCAST EPISODE 171 WITH NATHAN STEPHENS

Upload: others

Post on 30-Aug-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Show Notes: http://www.superdatascience.com/170 1

SDS PODCAST

EPISODE 171

WITH

NATHAN STEPHENS

Show Notes: http://www.superdatascience.com/170 2

Kirill Eremenko: This is episode number 171 with director of solutions

engineering at R Studio, Nathan Stephens.

Welcome to the Super Data Science Podcast. My name

is Kirill Eremenko, data science coach and lifestyle

entrepreneur. And each week, we bring you inspiring

people and ideas to help you build your successful

career in data science. Thanks for being here today,

and now let's make the complex simple.

[Music 00:00:35]

Welcome back to the Super Data Science Podcast,

ladies and gentlemen; super excited to have you on

this show. And today, all the way from R Studio, we

have Nathan Stephens joining us. So, a lot of you

already use R programming in your data science

careers, or in your data science education. If you don't

use R, then you probably have heard of it in one way

or another. R programming is one of the two titans. R

programming language is one of the two titans of data

science, alongside Python. It's one of the two

languages that we use, predominantly use to create

models, do machine learning, do data science, build

deep learning models, even create artificial

intelligence.

And today, we have Nathan Stephens joining us. And

he's a director at R Studio. And R Studio is, by far, the

most popular program through which you program, or

through which you code in R. And in this podcast, we

had a great time. We had a blast. So, some of the

things that we chatted about are Nathan's

background. I deliberately went through all of his

background, because he's got such an interesting

Show Notes: http://www.superdatascience.com/170 3

story. Even before we got to R Studio, there was so

many fun and exciting things that we talked about.

And one of them being that an analytic admin does.

Because Nathan is now in solutions engineering, he

knows a lot about what goes into building the

environment, building the infrastructure for a data

scientist, or a data science team, or a data-driven

company, so that is a very valuable part of our

conversation.

If you're not familiar with things like data engineers,

data architects, data analytics admins, servers, and all

these other things that are components of a data

science environment, highly recommend checking it

out, listening to the podcast, because you will learn a

lot about that. And after we meticulously went through

Nathan's career, we finally got to R. So, you'll learn a

lot about what the R language is all about, where R

Studio is headed, what the recent updates are, who

they just hired, how they compare to Python, and all

these other cool and exciting topics.

So all in all, very exciting podcast; can't wait to get

started. Let's dive straight into it. Without further ado,

I bring to you Nathan Stephens, director of solutions

engineering at R Studio.

[Music 00:03:20]

Welcome, ladies and gentlemen, to the Super Data

Science Podcast; super excited to have you back. And

today, we've got a very exciting guest, Nathan

Stephens, director of solutions engineering at R

Studio. Nathan, welcome to the show. How are you

doing today?

Show Notes: http://www.superdatascience.com/170 4

Nathan Stephens: I'm doing great. Thanks for having me on, Kirill.

Kirill Eremenko: Thank you for coming on. And where are you calling in

from?

Nathan Stephens: The Baltimore-Washington area.

Kirill Eremenko: Baltimore, we were just talking about it. You're like, in

between Baltimore and Washington, like can't decide

which one it is?

Nathan Stephens: Yeah. I go back and forth. I'm a little closer to

Baltimore.

Kirill Eremenko: And where is work? So home is in between, where's

work?

Nathan Stephens: Well, the company is, you know, technically based in

Boston, Massachusetts, but we all work from our

homes. So, I work from my home office.

Kirill Eremenko: Okay, wow. Well, that's so cool. We'll get to that in a

second. So, how's the weather in Baltimore?

Nathan Stephens: It's been very, very wet and cold, which has been great

for my lawn. [crosstalk 00:04:26] The yard's doing

great.

Kirill Eremenko: Wow. And we're in June. Why is it so wet? Like, does it

get hot in summer?

Nathan Stephens: I don't know. Yeah, no, it's usually a lot warmer than

this, but I haven't been to the pool yet. It's just been

an especially cold June, but the kids are eager to get

into the pool.

Kirill Eremenko: Wow, crazy. How many kids do you have?

Nathan Stephens: I've got two; two young boys.

Show Notes: http://www.superdatascience.com/170 5

Kirill Eremenko: Nice, very nice. It's pretty insane, what's happening

with the weather, right? Like in California, you have

these fires all the time. And then, you have the

hurricanes down south. And then now, it's like wet

and cold in summer in Maryland. Don't know what to

expect.

Nathan Stephens: Yep. So, I'm actually from California, so I'm used to the

earthquakes and fires. And then, I lived in Kansas,

and I got used to the tornadoes. And now I'm out in

the east, and we do hurricanes. So, pretty familiar

with all of those things.

Kirill Eremenko: Interesting. And so, out of all those places, you found

your home in Baltimore? You recommend that as like

the nicest place to settle down?

Nathan Stephens: Well, yeah. I think I came out for a job, and the jobs

out here are plentiful. And it's a great place to build a

career. I think Washington DC attracts people from all

over the world, especially in the United States. It

brings a lot of people in. So you know, it's just a

crossroads for a lot of people. And I find that really

exciting, a lot of fun. [crosstalk 00:05:47] So, it's been

good to build a career out here. It's a good place to

work.

Kirill Eremenko: Gotcha. Okay, all right. Well, being the director of

solutions engineering at R Studio, you warned me just

before the podcast, before we started recording, that

the podcast is going to be R-focused. And I wanted to

pass on that message to our listeners, that this

podcast is R-focused. And we're going to learn all

about the lovely language of R, and what it's been up

to in these past years, and where it is currently.

Show Notes: http://www.superdatascience.com/170 6

But before we jump into that, Nathan, could you give

us a quick overview of your background? Like, coming

from California, where did you study? And what took

you on this journey into data science, because

ultimately, our listeners are all very interested in

following this journey from the start, how you went

about getting into data science.

Nathan Stephens: Yeah, I'll do my best to keep my answers brief. I

actually learned R and SAS at the exact same time

when I was an undergrad in college, and that would

have been 1999. So, I'm a very old R user, and a

somewhat young SAS user. And I learned both of those

through the statistics department at my university,

and that was a really great experience. Statistics

taught me how to think scientifically. You study

hypothesis testing. You study science as a statistician.

And then, there's this notion of making, doing

empirical work by studying data and applying your

knowledge to actual problems that I found very

interesting in statistics.

So, I actually got off to a great start. I was very

fortunate, very young in my studies to get some great

programming languages, some great scientific

thinking, and then exposure to applied science with

data backing up those conclusions.

Kirill Eremenko: Gotcha.

Nathan Stephens: That set the foundation for everything that would come

later.

Kirill Eremenko: And so that was in your undergraduate?

Show Notes: http://www.superdatascience.com/170 7

Nathan Stephens: That was my undergraduate, yeah, when I was in

university.

Kirill Eremenko: Gotcha. And where did you go after that?

Nathan Stephens: After I graduated?

Kirill Eremenko: Yeah.

Nathan Stephens: Yeah. So, I made this interesting detour over into

actuarial science, and that's a whole nother discussion

entirely. That didn't last very long. I went back to grad

school after I tried my hand at actuarial science. I

didn't find that to be particularly satisfying. It didn't

suit my interests. So I went back to graduate school,

and I got a master's degree in statistics.

Kirill Eremenko: Gotcha. And just for maybe like our non-English

speaking listeners, or for whom English is not their

first language, actuarial science, because it took me a

while when I [inaudible 00:08:47] to wrap my head

around what that means. It's like statistics applied to

population and demographics. Is that correct?

Nathan Stephens: Yeah, yeah. It is a broad field. Statistics is, actuarial

science is actually a regulated practice in the United

States. It's like being a lawyer in the United States.

You have to actually have some sort of license to

practice actuarial science. And so if you want to be an

actuary, you have to go through this series of exams,

and you have to comply with certain regulations in

order to practice it.

Kirill Eremenko: Okay, gotcha. All right. And so, then you did a

master's in statistics. And where did that take you,

after that?

Show Notes: http://www.superdatascience.com/170 8

Nathan Stephens: So after leaving my master's program, I worked for a

manufacturer of greeting cards in the Midwest. And I

worked in their research department, and that was a

really, really good experience. I got to cut my teeth on

a lot of very interesting problems there. I also got to do

more R there as well.

Kirill Eremenko: Okay. And-

Nathan Stephens: So just to characterize that, you have to keep in mind,

this is back in like, 2005. So you know, Hadoop hasn't

even really caught on yet, right? Big data's kind of on

the ramp-up. Data science hasn't been coined as a

term. There's no such thing as data science. It hasn't

been, that term hasn't been invented at this point. And

most analytic jobs are sprinkled throughout the United

States. So as a statistician in 2005, when you're

looking for a job, you're actually ... Actually, I got my

job in 2004, so let's say 2004 ... you're actually looking

for little pockets of analysts here or there. They didn't

really clump together in large amounts, by and large.

And so, you're actually fighting for those jobs. We've

come a long way, right? So it's like, back in 2004,

you're actually fighting for a job where a statistician

can work.

Kirill Eremenko: Yeah. And it's such a different world, right? Like back

then, data science was pretty much statistics, right? It

was called statistics. And I had a guest on the podcast

like a few months ago, who put it very aptly; that the

difference between statistics and data science is that

in statistics, you still have to think through a lot of the

mathematical components, come up with eloquent

equations, and so on, and solutions; whereas in data

Show Notes: http://www.superdatascience.com/170 9

science, a lot of the time you can just brute force your

way through things, facilitated through different

machine learning algorithms.

Nathan Stephens: Yeah, I think that's fair. I think data science is a term

that's really grown on me over time, because I think

statistician is a little too narrow to define what the

world really needs. And the term data science is such

a broad umbrella, you know, almost nebulous term;

that it does a pretty good, that that's the strength of

that term, that it actually just, it's all-inclusive of this

idea that we're going to use data, we're going to be

data-drive, we're going to be scientifically minded, and

we're going to apply that information to problems.

So, I really like the idea that that's a general, nebulous

term. I think that's the strength of the term.

Kirill Eremenko: Yeah. And also, that allows people from different

backgrounds to come into data science, right? Like, it's

not just statisticians or mathematicians. I know people

who were in something very creative, like acting, and

they leveraged their skills in data science through the

component of communication of their results.

Nathan Stephens: Exactly, exactly. It's very inclusive. If you want to be in

data science, we welcome you in. Please, be a data

scientist. We need more data scientists. We want

people to, yeah, think scientifically in their view of the

world.

Kirill Eremenko: Yeah, gotcha. True. Okay, and so you worked with

Hallmark Cards for a couple years.

Nathan Stephens: Yep.

Show Notes: http://www.superdatascience.com/170 10

Kirill Eremenko: And where'd you move on, after that?

Nathan Stephens: So after Hallmark Cards, I worked for an ad network.

And at the ad network, I got to build ... This is where I

start, well, my background's always been in big data.

So even at Hallmark, I was working with massive

datasets, mostly on Teradata. At the ad network, I got

to work with large amounts of data on data sources

like Netezza, Greenplum, and that's where I started

learning Hadoop. We were early adopters of the

Hadoop platform. And this is also at the same time

when AWS was coming online. So, AWS was spinning

up and doing all sorts of interesting things. And we got

to jump on that platform.

Kirill Eremenko: So, is that around 2012?

Nathan Stephens: No, no. This is around like, 2008.

Kirill Eremenko: Ah, okay.

Nathan Stephens: Yeah, so we were early-on adopters of Hadoop at that

point.

Kirill Eremenko: Okay. I mean, AWS, was it coming online around 2008

as well?

Nathan Stephens: Yeah, yeah.

Kirill Eremenko: Okay, cool.

Nathan Stephens: Yeah. So, part of my good fortune has been to work

with really interesting, you know, managers and

leaders in my career, so that's been a real fortunate

thing. And I always encourage people to, you know,

when they go to select their jobs, put a lot of emphasis

and weight on the person that you're going to be

Show Notes: http://www.superdatascience.com/170 11

reporting to, because that person's going to dictate a

lot of things about the quality of life of the job, and

also future opportunities that you'll have. And at the

ad network, I had just a real great visionary, who was

very passionate about cloud technology and

distributed computing. And so, yeah, we went down

that route. It was a very exciting time, actually.

Kirill Eremenko: That's really cool, because on the podcast, sometimes I

mention that it's important to, during the interview,

when you're applying for a job, important to

understand what the job itself will be and will entail,

and the company itself, because that shapes your

future. But you're right, you have to also understand

the person who you're going to be working for, who's

your direct manager. What are they like?

Nathan Stephens: Yeah.

Kirill Eremenko: Yeah. Like you, I've been fortunate to have some very

impactful direct managers in my life. What would you

say was your one biggest takeaway that pops to mind

from that person at the ad network?

Nathan Stephens: Oh, with that manager?

Kirill Eremenko: Yes.

Nathan Stephens: I think there's this notion of, you know, rejecting the

status quo, right; thinking differently, accepting new

ideas. I think there's also this, with him ... I'm

struggling to explain it, but he was very interested in

philosophy. And he was a much broader thinker,

right? So, it's nice to work with somebody who has a

broad world view, and can kind of articulate how the

Show Notes: http://www.superdatascience.com/170 12

work that we do in technology fits into that world. So, I

found that really interesting as well.

The other thing that was interesting about him, and a

lot of my managers, is that I've had very few statistical

managers, people that really, actually can do what I

do, which has been a real blessing, because it allows

me to differentiate myself and bring something

valuable to the table; but it also allows me to pick up a

lot of the skills that I hadn't acquired through my

normal channels. For example, like, you know, the

consultative work, and being successful, navigating up

the political landscape of a corporation, right? But

also, a lot of the engineering work, a lot of the ETL

pipelines, a lot of these things, you know, my

managers and other people that I've worked with have

brought to me.

So, I think it's great to work with a manager who

compliments your skills as well. Or at least, that's one

thing that has been really nice in my experiences. You

know, learn from your manager [inaudible 00:17:04]

doesn't have the exact same background you do.

Kirill Eremenko: Yeah, gotcha. And that's actually a sign of a good

leader, when a person can hire somebody that's better

than them at something. Because like, sometimes

managers can be a bit intimidated if their reports are

like, better than them at something. And therefore,

that team won't work out; but like in your example,

that worked perfectly fine. And that usually, for me,

shows that the leader knows what they're doing, and is

confident enough to lead a team of experts in different

Show Notes: http://www.superdatascience.com/170 13

fields. They don't have to be an expert themselves in

those same areas.

Nathan Stephens: Yeah. I think it was funny. I remember this time when

this manager in particular, he did a statistical analysis

and presented it to me and a few other people on the

team. And we kind of shot it down. [inaudible

00:17:57] that he didn't do this right. He was so

gracious about it. He's like, "Oh, okay. Okay, I see." It's

like, "I'll leave it to you guys." That's good. It was all

done in good humor, but we're like, "Yeah, yeah. That

wasn't right." [inaudible 00:18:10]

Yeah, but you know, I think diversity is good, right?

Diversity, I'm a big proponent of diversity and building

diverse teams. And that's another thing that this guy

did. He built a team. And it was kind of funny that we

called it the data analytics team at the time, because

data science, again, wasn't a term; but we had data

experts, data engineers. We had machine learning

engineers, system integrators, DevOps people, and

statisticians, as well as domain experts. So, we had

this nice crosscut of everything that you would need to

build a singular data science team that can pretty

much lay waste and devastation to the world. Like, we

had all the capabilities that we needed in that team,

because it was a cross-functional team. And that was

great. That was just a wonderful experience.

Kirill Eremenko: Gotcha. And in terms of the work and tools that you

used at the time, and techniques, would you say that

advertising, data science and advertising now is

different, is radically different to what it was back

then, in the 2008 to 2011 period?

Show Notes: http://www.superdatascience.com/170 14

Nathan Stephens: Well, certainly the complexity has risen. I think the

main objectives are pretty much similar when it comes

to targeting and promotion. Advertising is still

advertising. I think one thing that I found fascinating

about going from a manufacturer of greeting cards to

an ad network as a statistician was, I used all of the

same skills in both places. So, when I went into my

next gig, the skills carried over. So, I was still doing

predictive models, segmentation, clustering,

supervised and unsupervised learning techniques. I

had to still scrub data. I had to understand the data.

So, the principles of doing the data didn't seem to

matter so much with the application. I was still using

those exact same principles, despite the fact that I was

going from one domain to another domain.

Kirill Eremenko: Okay, interesting. And before we move onto your next

role in your career journey, just a quick question on

working with ads, because even today, or especially

today, advertising is one of the biggest applications of

data science. What would you say to people who are

studying data science, and are considering a role in

advertising, but have never had any exposure to using

data science for advertising? I guess the core of the

question is, is it a fulfilling experience? Is it something

that you can build a career around, and at the same

time, not feel like sometimes we see in the movies,

where people just feel like all they do is sell, sell, sell

all the time, and they have no meaning to their lives?

Nathan Stephens: Yeah. I actually have a lot of, I've actually had that

same question in my own experience. And I think it's

an existential question, right, to say like, what is

Show Notes: http://www.superdatascience.com/170 15

fulfilling to you, and what is meaningful to you in your

life. I mean, as a statistician, you aren't rushing into

burning buildings and saving children from fire, right?

And you're not saving people from cancer. You're not

fighting world ... Well, you might fight world hunger as

a data scientist. I mean, and you can work on these

areas [crosstalk 00:21:36]. Yeah, you can do that.

And so, I think finding what's fulfilling to you is an

individual question. What I will tell you about my

experience in the ad world is that the technologies are

amazing. And the sophistication is bottomless. And the

complexities are high. It's also extremely challenging,

so it's intellectually challenging. So if you're a person

that, you know, really enjoys a challenge, that's good

as well. I think if you're the type of person that says

like, you know, "How do I do the most good with the

skills and talents that I have in my life," I think that's

a very thoughtful question. And I think there are

probably more noble things that we can do than, you

know, targeted advertisements, right?

And so, I always encourage people to follow those

aspirations. And I think that's actually one reason I

actually moved onto the next area of my life, which

was to do client services. I wanted to learn a little bit

more about that. And then, onto R Studio as well,

because I, myself, have been trying to figure out what

satisfies me in my life, and what things can I, what

types of impact can I have to the world.

Kirill Eremenko: Gotcha.

Nathan Stephens: But for me, doing targeted advertising, it was one step

in the journey. I made a lot of connections. I got to

Show Notes: http://www.superdatascience.com/170 16

learn a lot of technology. I got to challenge myself. It

was a time of intense, analytic effort. And I think all

those things made me better, but it was just one step

in that journey.

Kirill Eremenko: Mm-hmm (affirmative). All right, gotcha. And thank

you, thank you for that overview. I'm sure that will be

helpful for some of our listeners.

So, let's talk about your next role. You mentioned you

went onto work in customer service. And from your

LinkedIn, I see that that was quite a lengthy role that

you had there.

Nathan Stephens: Yeah, yeah. So, I wanted to learn how to build a

business around analytics. That's one reason I went to

the client services company, because I worked for a

data-driven organization, a company that was actually

selling analytics as part of its solutions. And I was

really impressed with the quality and the caliber of the

people at that organization. And that was the next set

of skills that I wanted to learn about.

So, yeah. I went over to client services, and that was

another ... Anyway, I could talk forever about like,

what I learned in client services. That was an amazing

adventure, to be honest. Yeah, I don't know if you've

ever worked in that background, but that's quite the

field, working for clients.

Kirill Eremenko: No, I actually, like, I worked in consulting; you know,

selling consulting solutions to clients, but I'm not sure

if it's exactly the same as what you're describing.

Maybe let's go through your experience a bit, and I'll

Show Notes: http://www.superdatascience.com/170 17

pitch in a little bit if I can add value to the

conversation.

Nathan Stephens: Yeah. We can call it consulting. It's a very human-

driven endeavor, right? You're trying to help other

people be successful with their work and their

challenges. And some of those challenges are going to

be technical. And then, a lot of those challenges are

not going to be technical. And I think that's what I

found interesting, was that balance of the technical

and nontechnical requirements.

Kirill Eremenko: Yeah, no. That's definitely true, especially in

consulting. It's like what we found is data science is

more of a bottom-up approach, whereas consulting

itself, at its core, is a top-down approach. And you

start from the executive team, you define the strategy,

and then it trickles down. And when you combine the

two, you have both the technical and nontechnical

aspects, and it's interesting to see where and how they

meet; because data will be telling you the truth, from

the point of view of data, but consulting or people will

be telling you the truth from the point of view of their

experience. And it's always interesting to see when

there's conflict in that, and how to resolve that.

Nathan Stephens: Yeah, I think that's really insightful. I totally agree

with that. I think what you see in consulting is, you

see what is required to take action on the insights and

the understanding that you glean from your data. So,

just learning about the data, that bottom-up

approach, you know, that's not necessarily enough to

actually take action on those insights. There's a lot of

other pieces in that chain, and you see that in the

Show Notes: http://www.superdatascience.com/170 18

consulting, when you go to the top-down. You see,

"Oh, I see how that information is combined with other

pieces of information to lead to actions."

Kirill Eremenko: Yeah, yeah, definitely very interesting. Okay. So, what

is your biggest takeaway from your time in client-

facing data science?

Nathan Stephens: Yeah. My biggest takeaway, well, I'll circle back with

what I said about monetizing analytics. That's why I

wanted to go there, and I got a good idea of building a

business with analytics. The answer that I came to

was that analytics is one piece of a much larger pie for

monetization. So, you don't build a predictive model

and then make money on that predictive model. Even

in the ad network that I worked for before, where we

were putting models into production, that wasn't the

whole story. The entire story is, how do you set that

strategy? How do you influence the key players? How

do you line up against the market? You know, yeah, so

those, that broader ... So, what I learned was that the

analytic piece is actually a part of an overall bundle of

goods that ends up getting sold.

Sometimes, I kind of compare it to like, you know,

maybe like your Siri on your phone, or you know,

Google ... What's the Google Answer, Google Now, the

Google Assistant ... Like, you don't usually buy, I don't

know many people who buy their phone for Siri, or

buy their phone for Google Assistant; but it is part of

the overall value of that platform, right? And that's

what I've seen with a lot of analytic work as well. It's

like, you know, I have a great predictive model. Okay,

that's great that you have a great predictive model, but

Show Notes: http://www.superdatascience.com/170 19

that's one piece of an overall solution that you're trying

to come up with.

Kirill Eremenko: Mm-hmm (affirmative), yeah. Okay, very, very

interesting takeaway and recommendation, I guess, for

the people listening, for the future, that it's not just

about analytic solution. That is often just a

component.

Okay, all right. And before we jump to your current

role, which was the next step in your career, I know

people are dying to hear about R Studio and what

you're doing there, I just have one more question. So,

you've moved through different roles. So, you were in a

company that creates cards for about three and a half

years or so. Then, three years in the ad network. And

then, four years in the company that does the

consulting services in data science. My question to you

would be, what was always, was there a common

trigger that prompted you to move onto the next role?

So as we can see, the industries are quite varying, and

it doesn't seem like a natural progression from one to

the other, except for this last one, where you actually

intended to find out how to build a business around

data science.

So, what would you say, is there a trigger that, or like

a point of saturation, why did you choose to move on

and leave, not just the company, but the industry as a

whole, to move onto the next thing?

Nathan Stephens: Yeah, I'm actually glad you brought that up. My

personal experience is that jobs really change, and

jobs definitely changed for me. So, I'll have a job where

things are really great. And then something will

Show Notes: http://www.superdatascience.com/170 20

change, and it will change the dynamic of that job.

And in that situation, you can decide to stick it out

and keep going, which is one option; and then, the

other one is to tack and go a different direction, which

has been the strategy I've taken.

So, what was that key change for me? In all three of

those cases, it was a change in manager. Like, I moved

from a manager that I really enjoyed working for, to a

manager that was out of alignment with what I wanted

to accomplish. And that's not going to be a trigger for

everybody. I think you can be really successful in a lot

of careers by staying around, and you know, working

through a change of manager. But I think what is

important is to know that jobs are highly in flux, and

you can go from a great job to a not-great job in a day;

because either the company acquires another

company, or gets sold to another company, or you get

a reorganization of leadership, or your manager leaves,

and another manager comes in, which is kind of what

I'm talking about. But those things actually do have

big impacts on your day-to-day quality of life, and

wellbeing, and your potential future.

So I think for me, personally, I think if anything, I

spent too long trying to make a difficult situation work.

I think, looking back on it, one of the lessons I've

learned is like, you know, when things change, when

life changes on you, make the change quickly. Like,

say, "Okay. You know, this isn't what I used to have.

Maybe I'll go do something else. That's going to change

now." Or like, "I didn't really want this reorganization. I

didn't want my company to be sold to some other

Show Notes: http://www.superdatascience.com/170 21

company, but it is what it is. And so, what am I going

to do about it," you know? And I think if I had actually

moved faster in those switches, I probably would have

been a lot happier. But you know, it worked out pretty

well for me. I'm pretty happy with the journey. I've

been really fortunate to have had good opportunities

along the way.

Kirill Eremenko: Yeah. It's all a learning experience at the end of the

day. It's not about the end destination. It's about the

people we become on the journey, taking us to that

end destination.

Nathan Stephens: Absolutely. I've learned a lot about data science in my

life, but my career and experiences with other people

have also taught me a lot about who I am and what

I'm interested in.

Kirill Eremenko: Yeah. Very interesting you mention that, because I

never thought of it in that way; but like, looking back

now, the reason I left Deloitte was exactly the same,

that the partner that was managing our division, he

moved onto a more senior role, a more national-

focused role, and a new partner came in. And while he

was very talented, definitely, it didn't align. I didn't feel

in the right place. I didn't see that I could learn as

much as I could from the first one. And so, like after a

few months, I handed in my resignation.

Nathan Stephens: I wish somebody would just like, have put their arm

around me, and told me much younger that it's like,

"Look, things are going to happen to you in your

career, and they're not fair, and you're not going to like

it. But that's okay. That's just the way it goes, and

you're going to be okay."

Show Notes: http://www.superdatascience.com/170 22

Kirill Eremenko: Yeah, yeah. Well, there you go. You're passing on this

message to all of our listeners now. And if anybody's

feeling the same, then don't worry. Nathan is putting

his hand around you right now and saying,

"Everything will be okay."

Nathan Stephens: I am extremely empathetic to people who are under a

lot of stress in their jobs. I understand that that

happens. And yeah, I am saying it's going to be okay.

Kirill Eremenko: Yeah, awesome. Okay, well that nicely brings us to

your current role at R Studio, where you're the director

of solutions engineering. So to start off, maybe give us

a quick overview of R Studio, because we will have

some listeners on the podcast who haven't used R or R

Studio before. Can you give us a quick overview of

what R programming is all about, and what is R

Studio?

Nathan Stephens: Okay. So, and those are two questions, so I'll answer

them separately. So the R programming language is an

opensource programming language, like Python, or C,

or Java, or any other programming language that you

might use to do data analytics. And it's been around

for a long time, and it's run by a core group that's

totally unrelated from R Studio. And it's primarily

designed for statistical computing and visualization.

And it turns out that it has some other really nice

strengths that we can talk about, too.

R Studio is a company, right? So, R Studio was

founded by JJ Allaire, along with Joe Cheng, who was

one of the early employees, and Hadley Wickham, that

you probably know about if you're in the R space, who

works at R Studio as well. And the mission of R Studio

Show Notes: http://www.superdatascience.com/170 23

is to improve computational and scientific reasoning

through data, using programming. And we don't even

necessarily limit ourselves to R, but we're very R-

centric, right? We believe in APIs, that you know, you

should be doing, connecting with other systems. And

we also believe in reproducible resource, that all of

your work should be scripted out and programmed, so

that you can communicate with other people, and

collaborate with other people on your research.

So what R Studio does is, it builds tools that sit on top

of the R programming language, that really take full

advantage of the R programming language.

Kirill Eremenko: Okay, gotcha.

Nathan Stephens: Our most popular product, by far, is the R Studio IDE,

and if you've used R, you've probably used the R

Studio IDE. It's free, opensource software that you can

download and use to interact with R.

Kirill Eremenko: Yep, and IDE stands for integrated development

environment. That's like the window in which you

program things.

Nathan Stephens: It's, yeah, the data scientist's lightsaber, right? That's

the tool they're going to use to do their work.

Kirill Eremenko: Yeah. I tried programming, when I was learning R

myself, I tried programming a little bit. And you know,

you can program R in a text editor, and then just

apply, it's like R is a compiled ... R is an interpretive

language, not a compiled one. So, you apply the

interpreter to the text editor. And you can still get the

results, but it's so much easier and more efficient in

an IDE.

Show Notes: http://www.superdatascience.com/170 24

Nathan Stephens: Mm-hmm (affirmative), exactly.

Kirill Eremenko: Gotcha, all right. So, that's a great overview of R

Studio and R. And what about your role? What's your

role in R Studio? Or in fact, you started R Studio three

years ago. Has your role evolved over time?

Nathan Stephens: Yeah, my job changes every six months. You know, I'm

doing something new every six months, because it's a

small company, and it's a growing company. And

that's what happens at small, growing companies, is

your roles change. So, I'm a solutions engineer now.

And what we do in the solutions engineering group is,

we help customers integrate our products into their

systems. So if you buy our products, and you want to

work with them, with databases, or with Hadoop, or

Spark, or crypto-authentication, or on the cloud, any

of those types of problems, we get involved with those

problems.

So, we're really there to help build enterprise systems,

and help the architects and the IT groups manage

these workflows.

Kirill Eremenko: Mm-hmm (affirmative). Okay, gotcha. And just before

the podcast, you mentioned that, or in the email

correspondence, you mentioned that you have moved

on from being a data science practitioner, more to the

role of a data science tool builder. And that gives you a

unique perspective on career opportunities for data

scientists. Could you tell us a bit more about what is,

what does a role of a data science tool builder entail?

And how does it compare to just a data science

practitioner, a standard role? And what are those

unique career opportunities that you mentioned?

Show Notes: http://www.superdatascience.com/170 25

Nathan Stephens: Yeah. So, let me be clear on like, what the shift is. So,

I no longer analyze live data. So, data scientists are

largely there to, a chief component of the data

scientist's job is to get insights and understanding

from their data, to influence decisions, actions, and

results. I no longer do that. I don't have live data. I

don't analyze live data, and I don't take any data

insights to influence actions and results, not from ...

By live data, what I mean is, you know, living data

that's coming in through other data sources that I can

analyze.

So, let me explain how I got here. So when I was at the

client services, I always was very interested in this idea

of systems, and architecture, and building data

products. That's what I got to do at the ad network.

And when I went into client services, I actually got a

part of my time reserved to building analytic

infrastructure, in addition to all the client services

work I did. And as time went on, I found that I got

more and more interested in that analytic

infrastructure role, to the point where I was helping

my other clients learn how they would implement their

analytic infrastructure as well.

So, I was working heavily with IT at this point, and

other architects. And I was like, you know, working

with the CTO to expand out the use of R. And that's

why R Studio got interested in me, was because that

particular skillset was what they needed over at R

Studio. What was interesting about that was like, that

wasn't the primary core of the job. My job was actually

to work with the clients, you know, as a data scientist;

Show Notes: http://www.superdatascience.com/170 26

but it kind of morphed into this other interest of me

doing analytics infrastructure.

Kirill Eremenko: Yeah, interesting how you can discover new things on

the job, and find out new interests that you have, and

passions.

Nathan Stephens: Well, it was actually a real struggle, to be honest,

because you know, if you worked for Deloitte, right,

you work on billable hours, right?

Kirill Eremenko: Yep.

Nathan Stephens: And so, you're under a lot of pressure to bill a lot of

hours. And you had hourly targets, yearly targets, that

you're supposed to hold up. And all the while, I'm

doing this other thing that isn't tied to billable hours,

and isn't necessarily aligned with the corporate

strategy, but something I feel really passionate and

really curious about, right?

So, there was a real tension there about like, how to

spend my time.

Kirill Eremenko: Yeah. That's when you start working like, evenings,

and weekends, and you lose any kind of personal life,

or sports, and health. Everything goes down the drain.

Nathan Stephens: Yeah. That's kind of client services in a nutshell,

actually.

Kirill Eremenko: Exactly. All right. Well, that is very interesting. Tell us

a bit more about, before we move onto the other

components about the career opportunities, tell us a

bit more about the analytic infrastructure. So, I

encountered that, like when I was at Deloitte, it was

quite closed off to me. I was just doing the consulting

Show Notes: http://www.superdatascience.com/170 27

work, just doing the data science side of things. But

then, when I moved onto the superannuation fund, or

the pension fund, in Australia, I was heavily involved

with infrastructure, and data architects, solutions

engineers, and all these other different roles that I

didn't even know existed. And I found that to be a

fascinating role. Could you give us like a short

excursion to the world of analytic infrastructure? What

is it all about?

Nathan Stephens: Yeah, so that is a great question. That is a fantastic

question. So, analytic infrastructure has two, the way I

view analytic infrastructure right now is kind of in two

compliments. You have this notion of a data lab, right?

You have this idea that you have a sandbox to play in,

where analysts can work with their data, and learn,

and discover, and create. And most analysts I know

love that part of the job. They want to go create. They

want to build applications. They want to generate

reports. They want to try new technology. They want to

blow things up, right? I always say like, [crosstalk

00:41:47] not a big difference between a data scientist

and a mad scientist, right; just a few letters.

So, I think creating a data lab for people, a playground

for people to play in, is really important. And then,

there's this other notion of running analytics in a

production environment. And the difference between

those two is that, in the data lab, the data scientists

are in charge; and in a production environment, the IT

group, or the IT operations are in charge.

Kirill Eremenko: Yeah.

Show Notes: http://www.superdatascience.com/170 28

Nathan Stephens: And that handoff becomes ... Well, we could talk about

the handoff, but spanning those two worlds is the part

that I find very fascinating, so that's that fuzzy area

where I've lived. It's like, how do you connect this data

lab to this production world?

Kirill Eremenko: Gotcha. I totally agree. Like when I went to this

company, you would always get slapped on your

hands for trying to like, run a query without asking in

advance. Like, data scientists didn't even have access

to SQL before I came in. Then I requested the access,

and finally, after certain hurdles, we got it. And like,

every time you run a query, they're like, "Oh, you

could have hung the whole server, and you know, the

production environment." And then, they have ... Oh,

what's it called? They have these time slots, like in the

night, when all the queries are supposed to run. I

forgot what the exact, technical term for it is, but like,

they have allocated time slots for certain queries

because they know how much time it's going to run

and so on. And they need to get a certain amount done

in 24 hours.

And one of the first things that we did after a couple of

those incidents, where data scientists were like,

slapped on the hand, what we did is, we set up this

data lab. It was like, I think it was called the sandbox.

Some people called it the data lab, playground-

Nathan Stephens: Yeah, sandbox.

Kirill Eremenko: ... yeah, or called it a sandbox. And that really solved

the whole issue, because you can just experiment as

much as you want. There is still that issue of

handover, which you briefly touched on, but at least

Show Notes: http://www.superdatascience.com/170 29

it's not as bad. Like, people are not constantly chasing

you up about things that you're allegedly doing wrong,

and you get the freedom to experiment at the same

time.

Nathan Stephens: Yeah. That's so fun to hear you share that, because

that's been my exact experience as well. I had two guys

from IT come over to my desk when I was at the ad

network, and they were not happy at all. They towered

over my desk with very unhappy faces, and wanted me

to account for myself, you know?

Kirill Eremenko: Yeah.

Nathan Stephens: And that didn't happen once, but it happened twice.

And then I got a sandbox, also.

Kirill Eremenko: Yeah, yeah. True. It's interesting how they have these

systems in place to track down who exactly is the

culprit. They find you very quickly.

Nathan Stephens: Yeah, yeah. They will find you.

Kirill Eremenko: Yeah, okay. And so, in the case of analytic

infrastructure, so it's not ... Like, one of the steps is

setting up the sandbox or the playground, and then

dealing with different servers in the production

environment, and things like that. What else is part of

a role, the role of somebody who is in analytic

infrastructure? What does the day-to-day look like

there?

Nathan Stephens: Okay, yeah. So, I'm trying to parse that question,

because that also feels like two questions. So, let me

do the role, and then the day-to-day. So, the role of

like, let's call it an analytic administrator. I actually

Show Notes: http://www.superdatascience.com/170 30

wrote an article on R Views, which is one of our

corporate blogs about R, about an analytic

administrator.

So those analytic administrators, they have to be

pretty awesome, to be honest, right? Like, they have to

be connected to their data scientists, to understand

what the data scientist needs. They have to be aligned

with the executive audience to know like, what matters

to the company; like, where the value is going to be,

like what types of solutions are going to produce

business value. They have to get along well with IT. So,

they have to bring them doughnuts, and make sure

that their voice is being heard, and that they're

complying with all of those rules. And they have to be

really good evangelists in general, about promoting the

need for data science in the organization. If you're

fortunate enough to work for an organization where

data-driven decisions are happening, then that will be

easier. If you're working for an organization that's

maybe still more like, politically oriented, or making

decisions from their gut, then you're going to have a

little bit more work to persuade them that data science

is meaningful in your organization; but being a proper

evangelist is a really important part of the role.

What does that mean, day-to-day? Well, part of the

day-to-day is going to be managing that data science

lab that we talked about, right? Like, somebody's got

to be overseeing that architecture, making sure that

that thing is running. And you can either have, in

some cases, IT will manage that; but what I've seen,

usually, is more effective is if the analytics admin has

Show Notes: http://www.superdatascience.com/170 31

like, some nice levers that they can kind of pull to

control those things. They're also, you know, teaching

best practices. So, they're educating data scientists on

how to do things properly. I sometimes call it like,

shared infrastructures. Like airports, you can't have

all of the planes landing on the runway at the same

time. The data scientists have to know who else is

flying around them in the space, and who's coming in

for a landing. So, you have to be aware of those

resources.

And they don't know. Like, here's the thing with data

scientists. Data scientists, they're just not trained to

do this. Like, you don't learn it in school. So,

somebody has to teach them, and it's going to he the

analytic admin that's going to teach them, right? Like,

they're just going to do things, like you and I were just

saying, like we're going to blow up stuff, right?

Kirill Eremenko: Mm-hmm (affirmative).

Nathan Stephens: They're just going to do it. They're not going to know.

And that's okay, because nobody taught them. So,

there's an opportunity there.

Kirill Eremenko: Yeah.

Nathan Stephens: Other day-to-day would be, you know, making sure

that you're getting your architectural review board

presentation ready to go, to make R, or whatever

language you're using, an analytic standard, to make

sure that you have resources dedicated to that; like,

that people are actually funneling human and

financial resources into that work. And then of course,

the production work is a whole nother ball of wax, but

Show Notes: http://www.superdatascience.com/170 32

you know, if you're in the production side, you can

actually make even greater impacts.

So, it's like a big job. And I tell people like, analytic

admin, you're not going to see that in Indeed, or on

your job searches. Like, people aren't advertising for it,

but it's an actual need in organizations. And I know

that because I talk to a lot of organizations in my role.

Like, I'm on the phone every day with customers and

potential customers, and almost all of them have this

need. So, I like to tell people that if this is something

that they're interested in doing, I would definitely go

for it, because the need is definitely there; even though

the job description might not be written for it yet.

Kirill Eremenko: Gotcha, okay. I just have one burning question from

that. That was a great description of this whole role,

and I think I learned quite a bit of new things for

myself, just now. My question is, could you let me

know why does it sometimes ... And I'm sure other

data scientists will have exactly the same question ...

Why does it sometimes take so long to implement a

tool in an organization, especially like, for instance, I'm

in an organization, and I want an opensource tool,

such as R; like, I can download it on my computer,

and run it within 30 minutes. Why does it take several

weeks for an organization to roll that out to me, and to

allow me to use it for analyzing their data?

Nathan Stephens: Or months, or years, right?

Kirill Eremenko: Yeah.

Nathan Stephens: Yeah. So, there are barriers here. And I don't want to

be like a downer, right, but when we talk about large

Show Notes: http://www.superdatascience.com/170 33

corporations, it's important to know that there's this

long journey, decades-long journey, on how they get

here. And a lot of them aren't really geared towards

data-driven decisions. And a lot of companies don't

really know what to do with data scientists, is the

problem. Like, there's this notion of like, yes, it's

important. We need really smart guys. Let's go get

some really smart guys, and boom, we'll have a bunch

of financial success.

And that's not really the way it works. Like, you really

have to be thoughtful about how you're going to align a

data science team with the overall corporate strategy.

And the reality is that most companies struggle with

that. So when you're in an organization, and you say,

"Hey, I need a data science lab," a lot of organizations

are not even going to know what that means. Or if they

do know what that means, they're not going to be

geared to a way to fulfill that request.

So, it's an evolution. And I think a lot of the younger

companies that are coming up, like if you work for a

startup, that's not going to be as big of an issue. Like,

they're just going to know like, we run on Amazon.

We're going to [inaudible 00:51:12], you know, a VPN

... Or, I'm sorry, yeah, basically a new server

infrastructure, right, or an existing server

infrastructure inside of Amazon, and will serve your

needs. But like, a larger organization is going to

struggle with that.

Kirill Eremenko: Mm-hmm (affirmative), yeah. And do you think that's

going to be the cause, why all large organizations are

going to end very soon? Or [crosstalk 00:51:45].

Show Notes: http://www.superdatascience.com/170 34

Nathan Stephens: No, I'm not saying that that's going to happen. I think

there is a tension between like, the large corporations

and the smaller companies; but I think, I'm actually

very optimistic. And there's, I've met very talented

people in all sorts of groups. You know like, I work

with large financial groups, insurance groups,

consumer packaged goods groups. And I'm always

impressed with the quality of talent that these different

organizations can attract.

So, I'm actually very, very optimistic about the future

of data science, and the direction. What I get more

concerned about, frankly, is that the data scientists

themselves don't always really understand what they

bring to the table. So, I'll be more specific. Data

scientists are responsible for understanding their data.

And nobody else in the organization has that

responsibility. And so if you're a data scientist, and

you're spending 80% of your job like, scrubbing data,

that's because that is, you're in the role that does that.

Like, nobody else is doing that. And the power of that

is that when you speak about something, you can

speak authoritatively about that. You have

ammunition to say, "I know this is true because I

actually have been in the data, and I've seen it." And I

think that's one of the under-leveraged skills that I see

with data scientists, is that they take ... Not everyone,

but some data scientists will take that for granted.

Like, "Oh, yeah." It's like, "I just happen to know all

this stuff."

It's like, no, you know all that stuff. Like, take

advantage of that. Make sure other people know about

Show Notes: http://www.superdatascience.com/170 35

that. Like, broadcast that information. Make sure you

communicate what it is you're learning, because I

guarantee you, your boss, and your boss's boss,

they're not looking at that data. They don't know

unless you tell them. So, getting that information out

is extremely critical for the success of the data

scientist, and for their overall happiness in the job.

Kirill Eremenko: Yeah, and for the success of the business as well.

Nathan Stephens: Yeah, great point. I left that one out, but that's

probably the most important one.

Kirill Eremenko: Yeah, all right. Wow, fantastic. That's such a good

excursion to that world. Thank you so much. How

about we shift gears a little bit and jump into R? Let's

talk about R, and what's going on in R these days. And

you know, like some great things, I'm sure you have so

many great things to say about R.

Nathan Stephens: I do. I think R is fantastic. We were talking, before the

call, about R and Python. Could we just jump into that

one? Why don't we just hit the elephant in the room?

Kirill Eremenko: Yes.

Nathan Stephens: Okay. So, I don't think there's a war between R and

Python. I think the analytic space is plenty big to

accommodate two programming languages. And it

reminds me a little bit of the conversation back in the

'90s, when people were like, "Oh, it's got to be Apple or

Microsoft." Well, guess what; computation is big

enough to handle two large companies, right? We still

have both of these.

Show Notes: http://www.superdatascience.com/170 36

So, I don't think there's a war between R and Python. I

think that what needs to happen is, you know, you

can ... Well, what needs to happen is that those two

things need to work really well together. And in case, I

just want to mention that we recently made some

progress in that area, if you missed the

announcement. We actually brought Wes McKinney

on-staff at R Studio, and he's one of the well known

developers in the Python world. He's the father behind

Pandas, and he's now in charge of working in this

thing called Ursa Labs. And you can query that, if you

haven't seen Ursa Labs. It's named after the bear,

right; Ursa Major, Ursa Minor; the Big Dipper and the

Little Dipper.

And the job that he is leading up is really around

interoperability between datasets and programming

languages. So, what do I mean by that? If you're

familiar with, Apache Arrow is the project that's

building datasets that can be loaded into memory,

both in Python, and into R, and into other

programming languages. And if you can load, if you

can share data across programming languages, you

can easily jump in between the programming

languages. Like, you could say, "Okay, I've got this R

data frame. I want to like, use some Python magic on

this." I'd boot up my Python instance, and I suck that

data over into Python. Right now, transferring data is

an extremely painful process. And you know, Wes is

trying to make that a much easier process. And it's a

very foundational piece in the toolchain that I'm really

excited about.

Show Notes: http://www.superdatascience.com/170 37

So basically, my point is that we brought on one of the

key Python developers, who works for R Studio now.

We've made R Studio much more Python-friendly.

We're still R-centric, right? Like, we are still saying,

"We like R." But if you're an R developer, it's getting

easier and easier to work with the Python tools; to call

Python functions, and modules, and interoperate

between the two languages. And I think that's a huge

advantage for data scientists. The next generation of

data science development is to be multilingual, and to

take advantage of the things that Python and R both

offer; and Julia, and you know, whatever other

languages you might be working with as well.

Kirill Eremenko: Yeah, wow. I didn't know that. That's a very ... That's a

huge stride forward with getting the languages closer,

and hiring-

Nathan Stephens: I think it'll take a couple of ... Yeah, it'll take a little

while to play it all out, right? Like, it's definitely part of

the long game, but if I look down the road, I see a

future where you've got people who know R really well,

that are also very comfortable, you know, taking

advantage of Python. So, Python opens the door to

TensorFlow, Spark; and those are things that we've

already incorporated in the R stack, is good connectors

to Python, and to Spark, and to TensorFlow, via

Python. And I think there'll be more things like that

coming in the future.

Kirill Eremenko: Yeah. And I like your comment about multilinguality.

That's very important; or, it's a great selling point for

any data scientist to have on their résumé, that I know

Show Notes: http://www.superdatascience.com/170 38

both R and Python. I have experience with both. That's

where the world's going, right? [inaudible 00:58:34]

Nathan Stephens: Yeah, right. If you're a hiring manager, and you've got

one person who knows Python, and another person

who knows R and Python, yeah.

Kirill Eremenko: Gotcha.

Nathan Stephens: Yeah, it's an easy call.

Kirill Eremenko: It's a no-brainer. And so, just to clarify, is your vision

that in a couple years, we're going to have one,

combined language, R-Python? I'm assuming not. I'm

assuming we're still going to have separate, R and

Python, but the interlink between them is going to be

very efficient and very high. In that case, what would

you say that R and Python are good for, separately?

Like, which one would you use for certain things, and

the other one for other things?

Nathan Stephens: Yeah, yeah. I think you could answer that in a lot of

ways. I've asked a lot of people, "Why did you choose

R? Or why did you choose Python?" And I get a lot of

different answers from that, but one thing I hear

frequently, one thing that doesn't surprise me is that it

seems like, it's like it's not even a question in their

mind. They just kind of went to the language that

actually resonated with them. They're like, you know,

and R users are very much this way; it's like, "I just

love R." You know it's like, you talk to people, it's like,

"I just love that experience. I love what it does, and it's

just part of like, who I am, even," like the people that

really, really love it. Or maybe you want to build Shiny

Show Notes: http://www.superdatascience.com/170 39

applications, right? There are things that R does, that

Python won't do.

You know Python, I've talked to a lot of people that use

Python. And sometimes, the answer is back, like,

"What is R? I don't even know what it is." So, it's like

maybe they don't even know what it is. I think if it's an

individual choice, I think that's fine. Like, I think

that's great. If you're a Java guy, and you love Java,

that's fantastic. Just use the language that you want.

But what's interesting about the R language is that R

is so, I guess, forgiving, or just inclusive of other

languages. R is a little, there's some humility in the

language. And it kind of gives up a lot of its control

and power to other languages. So when you run a

model in R, you don't actually run it in R. You call a C,

or a C++, or a Fortran library to run it, right? When

you run a Spark job, you don't run a Spark job in R.

You're calling into the Scala API, right?

So like, and that's totally fine with what R is about. R

doesn't really want to do that, anyway. R's just like,

"Let me just introduce you to these other things." And

that's ... So anyway, not a lot of people look at R that

way, but that's the way I see R, as more of a way to

orchestrate, you know, a lot of power and goodness to

work with other systems.

Kirill Eremenko: Yeah, gotcha. And it makes the best of many worlds,

rather than just trying to introduce everything on its

own. That's pretty good.

Nathan Stephens: Yeah. R's pretty slow, right? Like, if you run things

inside of R, it's pretty slow.

Show Notes: http://www.superdatascience.com/170 40

Kirill Eremenko: Well, not with everything. Some things, like specific ...

What's it called ... like vector operations. There, I

think, R outperforms Python in some of those cases.

Nathan Stephens: Right, right. Yeah, yeah.

Kirill Eremenko: But like, loops and stuff [crosstalk 01:02:04], totally

agree with you. Like, R-

Nathan Stephens: Loops are pretty slow. Yeah, yeah.

Kirill Eremenko: [crosstalk 01:02:10] Yeah, all right. And what would

you say about R and deep learning? Like, with the

recent developments in using Keras with R and things

like that, those are pretty exciting.

Nathan Stephens: Yeah, yeah. So, just piggybacking on that, that R is

slow, it's like the solution to R is slow is to push that

information somewhere else. Like, don't do it in R. Do

it somewhere else. So with Python, with Keras, and

deep learning, all of those routines are also, that's a

Python world, right? Like, those are all written in

Python. And what JJ has done, JJ's our founder, and

done a lot of the engineering around Python and

TensorFlow, JJ has written a nice library of connectors

that allows somebody who knows R to take advantage

of all of the work that's being done in TensorFlow; and

not only take advantage of it, but actually give them a

really nice experience.

So, we put things into the IDE to help you debug your

models. JJ's very good at documentation as well, so

there's a really nice set of ... There's a book that you

can read. There's a library. There's a website with

examples to learn about this. So basically, that

technology is like, there and available today. Like, that

Show Notes: http://www.superdatascience.com/170 41

landed a few months ago. And we're trying to invite as

many people as are interested, to come experience it,

try it out, and learn from it. It's really cool stuff. I have

to say, it opens up a whole new dimension into

problems that we previously didn't have tools for.

Kirill Eremenko: Mm-hmm (affirmative), yeah. Definitely exciting, and

very, very exciting, especially for those who are used to

R, and are now interested in deep learning and AI. And

this is finally going to be available.

Nathan Stephens: Right.

Kirill Eremenko: Yeah, all right. Well, we're kind of like, coming close to

the end of this session. And time has flown by, and I

still have so many questions that I would love to ask

you, but I guess I'll hand it over to you. Like, is there

anything you would like to share with our listeners, or

with aspiring and professional data scientists who

want to grow their careers?

Nathan Stephens: Yeah. I think I've shared a lot with the career advice.

Can I just make a shameless plug for what we do at R

Studio?

Kirill Eremenko: Sure, of course. Go for it.

Nathan Stephens: All right, because a lot of people don't realize that we

actually do sell professional grade products for the

enterprise. And those are designed to work with all of

our opensource packages and tools. So if you're in the

enterprise world, you're typically looking at like,

security, authentication. You're trying to figure out

high-availability scaling. You have like, mission-critical

applications and whatnot in there. And we sell

Show Notes: http://www.superdatascience.com/170 42

products to bring R into the enterprise, and make it an

analytic standard in there.

So if you, today, if you are using R on your desktop at

your job, and you're downloading data from your SQL

server database, onto your laptop, and then taking it

home, you know, and leaving it at a café or something,

I would encourage you to think about going to the

website, seeing what we have to offer, because we

actually have a really nice platform for scaling out R in

the enterprise; a really nice toolchain for doing that.

And it'll make your life better, and increase the

capabilities of your tools. And not a lot of people know

that like, that's all available. So, yeah. I just wanted to

point that out. Thank you for letting me make a

shameless plug.

Kirill Eremenko: That's all right [crosstalk 01:06:00]. I just, I will

reiterate that. Like, there's a lot of organizations, like

we have executives, and directors, and entrepreneurs

listening to this. And just for their purpose, for their

sake, there's a lot of organizations that still use large,

corporate tools, such as SAS, and other tools that are

just there, archaically. And it's time to change. And I'm

not saying anything against SAS, but the world is

going opensource. The power of opensource is

incredible, and the communities behind opensource

tools are really empowering very fast changes, very fast

developments in the algorithms, in the speed, and in

everything that the tool requires.

And so, if it's time for change for your organization,

then R Studio is there to help. And also, if you are

starting a new business, an enterprise, or taking an

Show Notes: http://www.superdatascience.com/170 43

idea to execution, to actually building a company

around an idea, then don't go, it's probably not the

best idea to go for some enterprise-specific tool that is

not opensource. Why not go for an opensource tool,

and get in touch with Nathan? He'll set everything up

for you.

Nathan Stephens: I think that's fantastic. Can I add one thing on that?

Kirill Eremenko: Yeah, sure. Of course.

Nathan Stephens: Because I 100% agree with everything you just said.

Things are changing rapidly, and when I talk to people

who are in the hiring position, who are trying to build

out their platforms, you know, and bring in the best,

you know, to adapt to this new world, there's this idea

of bringing in the best talent, as well. You're trying to

capture the data scientists, and they're in high

demand. They can be expensive, right? And it's a big

investment.

And by and large, that new demand that's coming in

from colleges, they're going to know R, and they're

going to demand that there's R tools available to them

in their job. And so, making an investment in R, I feel

very, very strongly, obviously, because I work for R

Studio, but feel very strongly that an investment in R

is a good move in bringing in the best talent out there.

Kirill Eremenko: Gotcha, couldn't agree more. All right, Nathan, so

thank you so much for sharing all the insights, and

your wisdom, and your career journey. Where could

our listeners get in touch with you and contact you, if

they'd like to learn more, or maybe explore the

opportunities with R Studio?

Show Notes: http://www.superdatascience.com/170 44

Nathan Stephens: Yeah, you're welcome to reach out. My email at R

Studio is [email protected]. My Twitter handle is

NWStephens; and also, everywhere else on the

internet, it's going to be NWStephens.

Kirill Eremenko: Yep. And LinkedIn is a good place to get in touch with

you?

Nathan Stephens: NWStephens, yeah.

Kirill Eremenko: Yeah, awesome.

Nathan Stephens: Yeah, LinkedIn is great.

Kirill Eremenko: Awesome, all right. We'll include those links in the

show notes, and we'll try to find that article that you

mentioned, that you wrote about the analytics admin.

That was really interesting. I have one more question

for you today. What is a book that you can recommend

to our listeners, to empower their careers even more?

Nathan Stephens: Well, I'm going to make another shameless plug for

Hadley Wickham's book, called R For Data Scientists.

It is about R, but it also has some great foundational

material, just about how to think about and approach

data science. And so, that's why I recommend it.

Kirill Eremenko: Yeah. Does Hadley have a few books, because I'm sure

I've read one of them, and I think it's this one.

Nathan Stephens: Hadley has, yeah, Hadley is amazing with the amount

of content he pumps out. And yeah, he's got a few

books. I neglected to mention that it's co-authored

with Garrett Grolemund, as well, who also works at R

Studio.

Show Notes: http://www.superdatascience.com/170 45

Kirill Eremenko: Okay, gotcha. It seems like you've got all the top

analytics talents working for R Studio, and now you're

poaching from Python as well.

Nathan Stephens: I have the great ... When I go to a meeting, I assure

you, I'm the dumbest one in the meeting. It's really

nice to work with such amazing people.

Kirill Eremenko: Exactly. Like, that's my, I always appreciate when I'm

the dumbest person in the room. That means there's

places I can grow, right? Like if you're the smartest

person in the room, you should be in a different room.

Nathan Stephens: Yeah, yeah. I know, and I don't say that, yeah, just in

false sincerity. I really mean it. I'm the dumbest one in

the room. It's really a great experience, actually

working with so many wonderful people. And they're

not just smart at their jobs, but they're wonderful

people to get to know as well. I'm just really impressed

with the character of these people that I get to work

with.

Kirill Eremenko: Yeah. Well, the character of this podcast has been

amazing. Thank you so much, Nathan, for coming onto

the show and sharing all these wonderful insights.

Nathan Stephens: Thank you so much for the opportunity. I really

enjoyed it. I learned a lot.

Kirill Eremenko: All right. Talk to you soon. Bye.

Nathan Stephens: Bye.

Kirill Eremenko: So there you have it. That was Nathan Stephens from

R Studio, sharing his career journey, and all the recent

and greatest updates from R Studio; directly, you hear

it from, directly, the person who works there as a

Show Notes: http://www.superdatascience.com/170 46

director. And what was your favorite part of this

podcast? Mine, by far, was the analytic admin concept

and description. Nathan obviously has a lot of

experience in this space, and he described the idea

behind what an analytic admin does, or what that role

entails, very aptly, and makes a lot of sense that

companies should have a person like that onboard if

they are looking to build a lasting analytics culture, a

sustainable approach to data science, where

everybody's happy. The IT team is happy, and the data

scientists are happy as well.

So there we go. That was Nathan Stephens. All of the

show notes, and links, and all the things mentioned in

this episode are available at

www.superdatascience.com/171.

There, you will also find a transcript for this episode,

and the URL to Nathan's LinkedIn. Make sure to

connect with him, hit him up, and stay in touch. If you

are looking to implement R Studio at an enterprise

level, or a corporate level in your company, then make

sure to get in touch with Nathan. He'll guide you

through the process, and at least give you some tips.

And finally, if you know somebody who uses R

programming in their language, who is a big fan of R,

or who loves R Studio, why not send them this

podcast? There's a lot of valuable information, a lot of

updates on what's going on in the R space, and I think

there's a lot to learn here. So, make sure to forward it

on, and you might help somebody out; your friend,

your colleague, your relative. Help them out in their

career in data science.

Show Notes: http://www.superdatascience.com/170 47

And on that note, thank you so much for being here.

Can't wait to see you next time. And until then, happy

analyzing.

[Music 01:13:12]