sds podcast episode 171 with nathan stephens · from california, where did you study? and what took...
TRANSCRIPT
Show Notes: http://www.superdatascience.com/170 1
SDS PODCAST
EPISODE 171
WITH
NATHAN STEPHENS
Show Notes: http://www.superdatascience.com/170 2
Kirill Eremenko: This is episode number 171 with director of solutions
engineering at R Studio, Nathan Stephens.
Welcome to the Super Data Science Podcast. My name
is Kirill Eremenko, data science coach and lifestyle
entrepreneur. And each week, we bring you inspiring
people and ideas to help you build your successful
career in data science. Thanks for being here today,
and now let's make the complex simple.
[Music 00:00:35]
Welcome back to the Super Data Science Podcast,
ladies and gentlemen; super excited to have you on
this show. And today, all the way from R Studio, we
have Nathan Stephens joining us. So, a lot of you
already use R programming in your data science
careers, or in your data science education. If you don't
use R, then you probably have heard of it in one way
or another. R programming is one of the two titans. R
programming language is one of the two titans of data
science, alongside Python. It's one of the two
languages that we use, predominantly use to create
models, do machine learning, do data science, build
deep learning models, even create artificial
intelligence.
And today, we have Nathan Stephens joining us. And
he's a director at R Studio. And R Studio is, by far, the
most popular program through which you program, or
through which you code in R. And in this podcast, we
had a great time. We had a blast. So, some of the
things that we chatted about are Nathan's
background. I deliberately went through all of his
background, because he's got such an interesting
Show Notes: http://www.superdatascience.com/170 3
story. Even before we got to R Studio, there was so
many fun and exciting things that we talked about.
And one of them being that an analytic admin does.
Because Nathan is now in solutions engineering, he
knows a lot about what goes into building the
environment, building the infrastructure for a data
scientist, or a data science team, or a data-driven
company, so that is a very valuable part of our
conversation.
If you're not familiar with things like data engineers,
data architects, data analytics admins, servers, and all
these other things that are components of a data
science environment, highly recommend checking it
out, listening to the podcast, because you will learn a
lot about that. And after we meticulously went through
Nathan's career, we finally got to R. So, you'll learn a
lot about what the R language is all about, where R
Studio is headed, what the recent updates are, who
they just hired, how they compare to Python, and all
these other cool and exciting topics.
So all in all, very exciting podcast; can't wait to get
started. Let's dive straight into it. Without further ado,
I bring to you Nathan Stephens, director of solutions
engineering at R Studio.
[Music 00:03:20]
Welcome, ladies and gentlemen, to the Super Data
Science Podcast; super excited to have you back. And
today, we've got a very exciting guest, Nathan
Stephens, director of solutions engineering at R
Studio. Nathan, welcome to the show. How are you
doing today?
Show Notes: http://www.superdatascience.com/170 4
Nathan Stephens: I'm doing great. Thanks for having me on, Kirill.
Kirill Eremenko: Thank you for coming on. And where are you calling in
from?
Nathan Stephens: The Baltimore-Washington area.
Kirill Eremenko: Baltimore, we were just talking about it. You're like, in
between Baltimore and Washington, like can't decide
which one it is?
Nathan Stephens: Yeah. I go back and forth. I'm a little closer to
Baltimore.
Kirill Eremenko: And where is work? So home is in between, where's
work?
Nathan Stephens: Well, the company is, you know, technically based in
Boston, Massachusetts, but we all work from our
homes. So, I work from my home office.
Kirill Eremenko: Okay, wow. Well, that's so cool. We'll get to that in a
second. So, how's the weather in Baltimore?
Nathan Stephens: It's been very, very wet and cold, which has been great
for my lawn. [crosstalk 00:04:26] The yard's doing
great.
Kirill Eremenko: Wow. And we're in June. Why is it so wet? Like, does it
get hot in summer?
Nathan Stephens: I don't know. Yeah, no, it's usually a lot warmer than
this, but I haven't been to the pool yet. It's just been
an especially cold June, but the kids are eager to get
into the pool.
Kirill Eremenko: Wow, crazy. How many kids do you have?
Nathan Stephens: I've got two; two young boys.
Show Notes: http://www.superdatascience.com/170 5
Kirill Eremenko: Nice, very nice. It's pretty insane, what's happening
with the weather, right? Like in California, you have
these fires all the time. And then, you have the
hurricanes down south. And then now, it's like wet
and cold in summer in Maryland. Don't know what to
expect.
Nathan Stephens: Yep. So, I'm actually from California, so I'm used to the
earthquakes and fires. And then, I lived in Kansas,
and I got used to the tornadoes. And now I'm out in
the east, and we do hurricanes. So, pretty familiar
with all of those things.
Kirill Eremenko: Interesting. And so, out of all those places, you found
your home in Baltimore? You recommend that as like
the nicest place to settle down?
Nathan Stephens: Well, yeah. I think I came out for a job, and the jobs
out here are plentiful. And it's a great place to build a
career. I think Washington DC attracts people from all
over the world, especially in the United States. It
brings a lot of people in. So you know, it's just a
crossroads for a lot of people. And I find that really
exciting, a lot of fun. [crosstalk 00:05:47] So, it's been
good to build a career out here. It's a good place to
work.
Kirill Eremenko: Gotcha. Okay, all right. Well, being the director of
solutions engineering at R Studio, you warned me just
before the podcast, before we started recording, that
the podcast is going to be R-focused. And I wanted to
pass on that message to our listeners, that this
podcast is R-focused. And we're going to learn all
about the lovely language of R, and what it's been up
to in these past years, and where it is currently.
Show Notes: http://www.superdatascience.com/170 6
But before we jump into that, Nathan, could you give
us a quick overview of your background? Like, coming
from California, where did you study? And what took
you on this journey into data science, because
ultimately, our listeners are all very interested in
following this journey from the start, how you went
about getting into data science.
Nathan Stephens: Yeah, I'll do my best to keep my answers brief. I
actually learned R and SAS at the exact same time
when I was an undergrad in college, and that would
have been 1999. So, I'm a very old R user, and a
somewhat young SAS user. And I learned both of those
through the statistics department at my university,
and that was a really great experience. Statistics
taught me how to think scientifically. You study
hypothesis testing. You study science as a statistician.
And then, there's this notion of making, doing
empirical work by studying data and applying your
knowledge to actual problems that I found very
interesting in statistics.
So, I actually got off to a great start. I was very
fortunate, very young in my studies to get some great
programming languages, some great scientific
thinking, and then exposure to applied science with
data backing up those conclusions.
Kirill Eremenko: Gotcha.
Nathan Stephens: That set the foundation for everything that would come
later.
Kirill Eremenko: And so that was in your undergraduate?
Show Notes: http://www.superdatascience.com/170 7
Nathan Stephens: That was my undergraduate, yeah, when I was in
university.
Kirill Eremenko: Gotcha. And where did you go after that?
Nathan Stephens: After I graduated?
Kirill Eremenko: Yeah.
Nathan Stephens: Yeah. So, I made this interesting detour over into
actuarial science, and that's a whole nother discussion
entirely. That didn't last very long. I went back to grad
school after I tried my hand at actuarial science. I
didn't find that to be particularly satisfying. It didn't
suit my interests. So I went back to graduate school,
and I got a master's degree in statistics.
Kirill Eremenko: Gotcha. And just for maybe like our non-English
speaking listeners, or for whom English is not their
first language, actuarial science, because it took me a
while when I [inaudible 00:08:47] to wrap my head
around what that means. It's like statistics applied to
population and demographics. Is that correct?
Nathan Stephens: Yeah, yeah. It is a broad field. Statistics is, actuarial
science is actually a regulated practice in the United
States. It's like being a lawyer in the United States.
You have to actually have some sort of license to
practice actuarial science. And so if you want to be an
actuary, you have to go through this series of exams,
and you have to comply with certain regulations in
order to practice it.
Kirill Eremenko: Okay, gotcha. All right. And so, then you did a
master's in statistics. And where did that take you,
after that?
Show Notes: http://www.superdatascience.com/170 8
Nathan Stephens: So after leaving my master's program, I worked for a
manufacturer of greeting cards in the Midwest. And I
worked in their research department, and that was a
really, really good experience. I got to cut my teeth on
a lot of very interesting problems there. I also got to do
more R there as well.
Kirill Eremenko: Okay. And-
Nathan Stephens: So just to characterize that, you have to keep in mind,
this is back in like, 2005. So you know, Hadoop hasn't
even really caught on yet, right? Big data's kind of on
the ramp-up. Data science hasn't been coined as a
term. There's no such thing as data science. It hasn't
been, that term hasn't been invented at this point. And
most analytic jobs are sprinkled throughout the United
States. So as a statistician in 2005, when you're
looking for a job, you're actually ... Actually, I got my
job in 2004, so let's say 2004 ... you're actually looking
for little pockets of analysts here or there. They didn't
really clump together in large amounts, by and large.
And so, you're actually fighting for those jobs. We've
come a long way, right? So it's like, back in 2004,
you're actually fighting for a job where a statistician
can work.
Kirill Eremenko: Yeah. And it's such a different world, right? Like back
then, data science was pretty much statistics, right? It
was called statistics. And I had a guest on the podcast
like a few months ago, who put it very aptly; that the
difference between statistics and data science is that
in statistics, you still have to think through a lot of the
mathematical components, come up with eloquent
equations, and so on, and solutions; whereas in data
Show Notes: http://www.superdatascience.com/170 9
science, a lot of the time you can just brute force your
way through things, facilitated through different
machine learning algorithms.
Nathan Stephens: Yeah, I think that's fair. I think data science is a term
that's really grown on me over time, because I think
statistician is a little too narrow to define what the
world really needs. And the term data science is such
a broad umbrella, you know, almost nebulous term;
that it does a pretty good, that that's the strength of
that term, that it actually just, it's all-inclusive of this
idea that we're going to use data, we're going to be
data-drive, we're going to be scientifically minded, and
we're going to apply that information to problems.
So, I really like the idea that that's a general, nebulous
term. I think that's the strength of the term.
Kirill Eremenko: Yeah. And also, that allows people from different
backgrounds to come into data science, right? Like, it's
not just statisticians or mathematicians. I know people
who were in something very creative, like acting, and
they leveraged their skills in data science through the
component of communication of their results.
Nathan Stephens: Exactly, exactly. It's very inclusive. If you want to be in
data science, we welcome you in. Please, be a data
scientist. We need more data scientists. We want
people to, yeah, think scientifically in their view of the
world.
Kirill Eremenko: Yeah, gotcha. True. Okay, and so you worked with
Hallmark Cards for a couple years.
Nathan Stephens: Yep.
Show Notes: http://www.superdatascience.com/170 10
Kirill Eremenko: And where'd you move on, after that?
Nathan Stephens: So after Hallmark Cards, I worked for an ad network.
And at the ad network, I got to build ... This is where I
start, well, my background's always been in big data.
So even at Hallmark, I was working with massive
datasets, mostly on Teradata. At the ad network, I got
to work with large amounts of data on data sources
like Netezza, Greenplum, and that's where I started
learning Hadoop. We were early adopters of the
Hadoop platform. And this is also at the same time
when AWS was coming online. So, AWS was spinning
up and doing all sorts of interesting things. And we got
to jump on that platform.
Kirill Eremenko: So, is that around 2012?
Nathan Stephens: No, no. This is around like, 2008.
Kirill Eremenko: Ah, okay.
Nathan Stephens: Yeah, so we were early-on adopters of Hadoop at that
point.
Kirill Eremenko: Okay. I mean, AWS, was it coming online around 2008
as well?
Nathan Stephens: Yeah, yeah.
Kirill Eremenko: Okay, cool.
Nathan Stephens: Yeah. So, part of my good fortune has been to work
with really interesting, you know, managers and
leaders in my career, so that's been a real fortunate
thing. And I always encourage people to, you know,
when they go to select their jobs, put a lot of emphasis
and weight on the person that you're going to be
Show Notes: http://www.superdatascience.com/170 11
reporting to, because that person's going to dictate a
lot of things about the quality of life of the job, and
also future opportunities that you'll have. And at the
ad network, I had just a real great visionary, who was
very passionate about cloud technology and
distributed computing. And so, yeah, we went down
that route. It was a very exciting time, actually.
Kirill Eremenko: That's really cool, because on the podcast, sometimes I
mention that it's important to, during the interview,
when you're applying for a job, important to
understand what the job itself will be and will entail,
and the company itself, because that shapes your
future. But you're right, you have to also understand
the person who you're going to be working for, who's
your direct manager. What are they like?
Nathan Stephens: Yeah.
Kirill Eremenko: Yeah. Like you, I've been fortunate to have some very
impactful direct managers in my life. What would you
say was your one biggest takeaway that pops to mind
from that person at the ad network?
Nathan Stephens: Oh, with that manager?
Kirill Eremenko: Yes.
Nathan Stephens: I think there's this notion of, you know, rejecting the
status quo, right; thinking differently, accepting new
ideas. I think there's also this, with him ... I'm
struggling to explain it, but he was very interested in
philosophy. And he was a much broader thinker,
right? So, it's nice to work with somebody who has a
broad world view, and can kind of articulate how the
Show Notes: http://www.superdatascience.com/170 12
work that we do in technology fits into that world. So, I
found that really interesting as well.
The other thing that was interesting about him, and a
lot of my managers, is that I've had very few statistical
managers, people that really, actually can do what I
do, which has been a real blessing, because it allows
me to differentiate myself and bring something
valuable to the table; but it also allows me to pick up a
lot of the skills that I hadn't acquired through my
normal channels. For example, like, you know, the
consultative work, and being successful, navigating up
the political landscape of a corporation, right? But
also, a lot of the engineering work, a lot of the ETL
pipelines, a lot of these things, you know, my
managers and other people that I've worked with have
brought to me.
So, I think it's great to work with a manager who
compliments your skills as well. Or at least, that's one
thing that has been really nice in my experiences. You
know, learn from your manager [inaudible 00:17:04]
doesn't have the exact same background you do.
Kirill Eremenko: Yeah, gotcha. And that's actually a sign of a good
leader, when a person can hire somebody that's better
than them at something. Because like, sometimes
managers can be a bit intimidated if their reports are
like, better than them at something. And therefore,
that team won't work out; but like in your example,
that worked perfectly fine. And that usually, for me,
shows that the leader knows what they're doing, and is
confident enough to lead a team of experts in different
Show Notes: http://www.superdatascience.com/170 13
fields. They don't have to be an expert themselves in
those same areas.
Nathan Stephens: Yeah. I think it was funny. I remember this time when
this manager in particular, he did a statistical analysis
and presented it to me and a few other people on the
team. And we kind of shot it down. [inaudible
00:17:57] that he didn't do this right. He was so
gracious about it. He's like, "Oh, okay. Okay, I see." It's
like, "I'll leave it to you guys." That's good. It was all
done in good humor, but we're like, "Yeah, yeah. That
wasn't right." [inaudible 00:18:10]
Yeah, but you know, I think diversity is good, right?
Diversity, I'm a big proponent of diversity and building
diverse teams. And that's another thing that this guy
did. He built a team. And it was kind of funny that we
called it the data analytics team at the time, because
data science, again, wasn't a term; but we had data
experts, data engineers. We had machine learning
engineers, system integrators, DevOps people, and
statisticians, as well as domain experts. So, we had
this nice crosscut of everything that you would need to
build a singular data science team that can pretty
much lay waste and devastation to the world. Like, we
had all the capabilities that we needed in that team,
because it was a cross-functional team. And that was
great. That was just a wonderful experience.
Kirill Eremenko: Gotcha. And in terms of the work and tools that you
used at the time, and techniques, would you say that
advertising, data science and advertising now is
different, is radically different to what it was back
then, in the 2008 to 2011 period?
Show Notes: http://www.superdatascience.com/170 14
Nathan Stephens: Well, certainly the complexity has risen. I think the
main objectives are pretty much similar when it comes
to targeting and promotion. Advertising is still
advertising. I think one thing that I found fascinating
about going from a manufacturer of greeting cards to
an ad network as a statistician was, I used all of the
same skills in both places. So, when I went into my
next gig, the skills carried over. So, I was still doing
predictive models, segmentation, clustering,
supervised and unsupervised learning techniques. I
had to still scrub data. I had to understand the data.
So, the principles of doing the data didn't seem to
matter so much with the application. I was still using
those exact same principles, despite the fact that I was
going from one domain to another domain.
Kirill Eremenko: Okay, interesting. And before we move onto your next
role in your career journey, just a quick question on
working with ads, because even today, or especially
today, advertising is one of the biggest applications of
data science. What would you say to people who are
studying data science, and are considering a role in
advertising, but have never had any exposure to using
data science for advertising? I guess the core of the
question is, is it a fulfilling experience? Is it something
that you can build a career around, and at the same
time, not feel like sometimes we see in the movies,
where people just feel like all they do is sell, sell, sell
all the time, and they have no meaning to their lives?
Nathan Stephens: Yeah. I actually have a lot of, I've actually had that
same question in my own experience. And I think it's
an existential question, right, to say like, what is
Show Notes: http://www.superdatascience.com/170 15
fulfilling to you, and what is meaningful to you in your
life. I mean, as a statistician, you aren't rushing into
burning buildings and saving children from fire, right?
And you're not saving people from cancer. You're not
fighting world ... Well, you might fight world hunger as
a data scientist. I mean, and you can work on these
areas [crosstalk 00:21:36]. Yeah, you can do that.
And so, I think finding what's fulfilling to you is an
individual question. What I will tell you about my
experience in the ad world is that the technologies are
amazing. And the sophistication is bottomless. And the
complexities are high. It's also extremely challenging,
so it's intellectually challenging. So if you're a person
that, you know, really enjoys a challenge, that's good
as well. I think if you're the type of person that says
like, you know, "How do I do the most good with the
skills and talents that I have in my life," I think that's
a very thoughtful question. And I think there are
probably more noble things that we can do than, you
know, targeted advertisements, right?
And so, I always encourage people to follow those
aspirations. And I think that's actually one reason I
actually moved onto the next area of my life, which
was to do client services. I wanted to learn a little bit
more about that. And then, onto R Studio as well,
because I, myself, have been trying to figure out what
satisfies me in my life, and what things can I, what
types of impact can I have to the world.
Kirill Eremenko: Gotcha.
Nathan Stephens: But for me, doing targeted advertising, it was one step
in the journey. I made a lot of connections. I got to
Show Notes: http://www.superdatascience.com/170 16
learn a lot of technology. I got to challenge myself. It
was a time of intense, analytic effort. And I think all
those things made me better, but it was just one step
in that journey.
Kirill Eremenko: Mm-hmm (affirmative). All right, gotcha. And thank
you, thank you for that overview. I'm sure that will be
helpful for some of our listeners.
So, let's talk about your next role. You mentioned you
went onto work in customer service. And from your
LinkedIn, I see that that was quite a lengthy role that
you had there.
Nathan Stephens: Yeah, yeah. So, I wanted to learn how to build a
business around analytics. That's one reason I went to
the client services company, because I worked for a
data-driven organization, a company that was actually
selling analytics as part of its solutions. And I was
really impressed with the quality and the caliber of the
people at that organization. And that was the next set
of skills that I wanted to learn about.
So, yeah. I went over to client services, and that was
another ... Anyway, I could talk forever about like,
what I learned in client services. That was an amazing
adventure, to be honest. Yeah, I don't know if you've
ever worked in that background, but that's quite the
field, working for clients.
Kirill Eremenko: No, I actually, like, I worked in consulting; you know,
selling consulting solutions to clients, but I'm not sure
if it's exactly the same as what you're describing.
Maybe let's go through your experience a bit, and I'll
Show Notes: http://www.superdatascience.com/170 17
pitch in a little bit if I can add value to the
conversation.
Nathan Stephens: Yeah. We can call it consulting. It's a very human-
driven endeavor, right? You're trying to help other
people be successful with their work and their
challenges. And some of those challenges are going to
be technical. And then, a lot of those challenges are
not going to be technical. And I think that's what I
found interesting, was that balance of the technical
and nontechnical requirements.
Kirill Eremenko: Yeah, no. That's definitely true, especially in
consulting. It's like what we found is data science is
more of a bottom-up approach, whereas consulting
itself, at its core, is a top-down approach. And you
start from the executive team, you define the strategy,
and then it trickles down. And when you combine the
two, you have both the technical and nontechnical
aspects, and it's interesting to see where and how they
meet; because data will be telling you the truth, from
the point of view of data, but consulting or people will
be telling you the truth from the point of view of their
experience. And it's always interesting to see when
there's conflict in that, and how to resolve that.
Nathan Stephens: Yeah, I think that's really insightful. I totally agree
with that. I think what you see in consulting is, you
see what is required to take action on the insights and
the understanding that you glean from your data. So,
just learning about the data, that bottom-up
approach, you know, that's not necessarily enough to
actually take action on those insights. There's a lot of
other pieces in that chain, and you see that in the
Show Notes: http://www.superdatascience.com/170 18
consulting, when you go to the top-down. You see,
"Oh, I see how that information is combined with other
pieces of information to lead to actions."
Kirill Eremenko: Yeah, yeah, definitely very interesting. Okay. So, what
is your biggest takeaway from your time in client-
facing data science?
Nathan Stephens: Yeah. My biggest takeaway, well, I'll circle back with
what I said about monetizing analytics. That's why I
wanted to go there, and I got a good idea of building a
business with analytics. The answer that I came to
was that analytics is one piece of a much larger pie for
monetization. So, you don't build a predictive model
and then make money on that predictive model. Even
in the ad network that I worked for before, where we
were putting models into production, that wasn't the
whole story. The entire story is, how do you set that
strategy? How do you influence the key players? How
do you line up against the market? You know, yeah, so
those, that broader ... So, what I learned was that the
analytic piece is actually a part of an overall bundle of
goods that ends up getting sold.
Sometimes, I kind of compare it to like, you know,
maybe like your Siri on your phone, or you know,
Google ... What's the Google Answer, Google Now, the
Google Assistant ... Like, you don't usually buy, I don't
know many people who buy their phone for Siri, or
buy their phone for Google Assistant; but it is part of
the overall value of that platform, right? And that's
what I've seen with a lot of analytic work as well. It's
like, you know, I have a great predictive model. Okay,
that's great that you have a great predictive model, but
Show Notes: http://www.superdatascience.com/170 19
that's one piece of an overall solution that you're trying
to come up with.
Kirill Eremenko: Mm-hmm (affirmative), yeah. Okay, very, very
interesting takeaway and recommendation, I guess, for
the people listening, for the future, that it's not just
about analytic solution. That is often just a
component.
Okay, all right. And before we jump to your current
role, which was the next step in your career, I know
people are dying to hear about R Studio and what
you're doing there, I just have one more question. So,
you've moved through different roles. So, you were in a
company that creates cards for about three and a half
years or so. Then, three years in the ad network. And
then, four years in the company that does the
consulting services in data science. My question to you
would be, what was always, was there a common
trigger that prompted you to move onto the next role?
So as we can see, the industries are quite varying, and
it doesn't seem like a natural progression from one to
the other, except for this last one, where you actually
intended to find out how to build a business around
data science.
So, what would you say, is there a trigger that, or like
a point of saturation, why did you choose to move on
and leave, not just the company, but the industry as a
whole, to move onto the next thing?
Nathan Stephens: Yeah, I'm actually glad you brought that up. My
personal experience is that jobs really change, and
jobs definitely changed for me. So, I'll have a job where
things are really great. And then something will
Show Notes: http://www.superdatascience.com/170 20
change, and it will change the dynamic of that job.
And in that situation, you can decide to stick it out
and keep going, which is one option; and then, the
other one is to tack and go a different direction, which
has been the strategy I've taken.
So, what was that key change for me? In all three of
those cases, it was a change in manager. Like, I moved
from a manager that I really enjoyed working for, to a
manager that was out of alignment with what I wanted
to accomplish. And that's not going to be a trigger for
everybody. I think you can be really successful in a lot
of careers by staying around, and you know, working
through a change of manager. But I think what is
important is to know that jobs are highly in flux, and
you can go from a great job to a not-great job in a day;
because either the company acquires another
company, or gets sold to another company, or you get
a reorganization of leadership, or your manager leaves,
and another manager comes in, which is kind of what
I'm talking about. But those things actually do have
big impacts on your day-to-day quality of life, and
wellbeing, and your potential future.
So I think for me, personally, I think if anything, I
spent too long trying to make a difficult situation work.
I think, looking back on it, one of the lessons I've
learned is like, you know, when things change, when
life changes on you, make the change quickly. Like,
say, "Okay. You know, this isn't what I used to have.
Maybe I'll go do something else. That's going to change
now." Or like, "I didn't really want this reorganization. I
didn't want my company to be sold to some other
Show Notes: http://www.superdatascience.com/170 21
company, but it is what it is. And so, what am I going
to do about it," you know? And I think if I had actually
moved faster in those switches, I probably would have
been a lot happier. But you know, it worked out pretty
well for me. I'm pretty happy with the journey. I've
been really fortunate to have had good opportunities
along the way.
Kirill Eremenko: Yeah. It's all a learning experience at the end of the
day. It's not about the end destination. It's about the
people we become on the journey, taking us to that
end destination.
Nathan Stephens: Absolutely. I've learned a lot about data science in my
life, but my career and experiences with other people
have also taught me a lot about who I am and what
I'm interested in.
Kirill Eremenko: Yeah. Very interesting you mention that, because I
never thought of it in that way; but like, looking back
now, the reason I left Deloitte was exactly the same,
that the partner that was managing our division, he
moved onto a more senior role, a more national-
focused role, and a new partner came in. And while he
was very talented, definitely, it didn't align. I didn't feel
in the right place. I didn't see that I could learn as
much as I could from the first one. And so, like after a
few months, I handed in my resignation.
Nathan Stephens: I wish somebody would just like, have put their arm
around me, and told me much younger that it's like,
"Look, things are going to happen to you in your
career, and they're not fair, and you're not going to like
it. But that's okay. That's just the way it goes, and
you're going to be okay."
Show Notes: http://www.superdatascience.com/170 22
Kirill Eremenko: Yeah, yeah. Well, there you go. You're passing on this
message to all of our listeners now. And if anybody's
feeling the same, then don't worry. Nathan is putting
his hand around you right now and saying,
"Everything will be okay."
Nathan Stephens: I am extremely empathetic to people who are under a
lot of stress in their jobs. I understand that that
happens. And yeah, I am saying it's going to be okay.
Kirill Eremenko: Yeah, awesome. Okay, well that nicely brings us to
your current role at R Studio, where you're the director
of solutions engineering. So to start off, maybe give us
a quick overview of R Studio, because we will have
some listeners on the podcast who haven't used R or R
Studio before. Can you give us a quick overview of
what R programming is all about, and what is R
Studio?
Nathan Stephens: Okay. So, and those are two questions, so I'll answer
them separately. So the R programming language is an
opensource programming language, like Python, or C,
or Java, or any other programming language that you
might use to do data analytics. And it's been around
for a long time, and it's run by a core group that's
totally unrelated from R Studio. And it's primarily
designed for statistical computing and visualization.
And it turns out that it has some other really nice
strengths that we can talk about, too.
R Studio is a company, right? So, R Studio was
founded by JJ Allaire, along with Joe Cheng, who was
one of the early employees, and Hadley Wickham, that
you probably know about if you're in the R space, who
works at R Studio as well. And the mission of R Studio
Show Notes: http://www.superdatascience.com/170 23
is to improve computational and scientific reasoning
through data, using programming. And we don't even
necessarily limit ourselves to R, but we're very R-
centric, right? We believe in APIs, that you know, you
should be doing, connecting with other systems. And
we also believe in reproducible resource, that all of
your work should be scripted out and programmed, so
that you can communicate with other people, and
collaborate with other people on your research.
So what R Studio does is, it builds tools that sit on top
of the R programming language, that really take full
advantage of the R programming language.
Kirill Eremenko: Okay, gotcha.
Nathan Stephens: Our most popular product, by far, is the R Studio IDE,
and if you've used R, you've probably used the R
Studio IDE. It's free, opensource software that you can
download and use to interact with R.
Kirill Eremenko: Yep, and IDE stands for integrated development
environment. That's like the window in which you
program things.
Nathan Stephens: It's, yeah, the data scientist's lightsaber, right? That's
the tool they're going to use to do their work.
Kirill Eremenko: Yeah. I tried programming, when I was learning R
myself, I tried programming a little bit. And you know,
you can program R in a text editor, and then just
apply, it's like R is a compiled ... R is an interpretive
language, not a compiled one. So, you apply the
interpreter to the text editor. And you can still get the
results, but it's so much easier and more efficient in
an IDE.
Show Notes: http://www.superdatascience.com/170 24
Nathan Stephens: Mm-hmm (affirmative), exactly.
Kirill Eremenko: Gotcha, all right. So, that's a great overview of R
Studio and R. And what about your role? What's your
role in R Studio? Or in fact, you started R Studio three
years ago. Has your role evolved over time?
Nathan Stephens: Yeah, my job changes every six months. You know, I'm
doing something new every six months, because it's a
small company, and it's a growing company. And
that's what happens at small, growing companies, is
your roles change. So, I'm a solutions engineer now.
And what we do in the solutions engineering group is,
we help customers integrate our products into their
systems. So if you buy our products, and you want to
work with them, with databases, or with Hadoop, or
Spark, or crypto-authentication, or on the cloud, any
of those types of problems, we get involved with those
problems.
So, we're really there to help build enterprise systems,
and help the architects and the IT groups manage
these workflows.
Kirill Eremenko: Mm-hmm (affirmative). Okay, gotcha. And just before
the podcast, you mentioned that, or in the email
correspondence, you mentioned that you have moved
on from being a data science practitioner, more to the
role of a data science tool builder. And that gives you a
unique perspective on career opportunities for data
scientists. Could you tell us a bit more about what is,
what does a role of a data science tool builder entail?
And how does it compare to just a data science
practitioner, a standard role? And what are those
unique career opportunities that you mentioned?
Show Notes: http://www.superdatascience.com/170 25
Nathan Stephens: Yeah. So, let me be clear on like, what the shift is. So,
I no longer analyze live data. So, data scientists are
largely there to, a chief component of the data
scientist's job is to get insights and understanding
from their data, to influence decisions, actions, and
results. I no longer do that. I don't have live data. I
don't analyze live data, and I don't take any data
insights to influence actions and results, not from ...
By live data, what I mean is, you know, living data
that's coming in through other data sources that I can
analyze.
So, let me explain how I got here. So when I was at the
client services, I always was very interested in this idea
of systems, and architecture, and building data
products. That's what I got to do at the ad network.
And when I went into client services, I actually got a
part of my time reserved to building analytic
infrastructure, in addition to all the client services
work I did. And as time went on, I found that I got
more and more interested in that analytic
infrastructure role, to the point where I was helping
my other clients learn how they would implement their
analytic infrastructure as well.
So, I was working heavily with IT at this point, and
other architects. And I was like, you know, working
with the CTO to expand out the use of R. And that's
why R Studio got interested in me, was because that
particular skillset was what they needed over at R
Studio. What was interesting about that was like, that
wasn't the primary core of the job. My job was actually
to work with the clients, you know, as a data scientist;
Show Notes: http://www.superdatascience.com/170 26
but it kind of morphed into this other interest of me
doing analytics infrastructure.
Kirill Eremenko: Yeah, interesting how you can discover new things on
the job, and find out new interests that you have, and
passions.
Nathan Stephens: Well, it was actually a real struggle, to be honest,
because you know, if you worked for Deloitte, right,
you work on billable hours, right?
Kirill Eremenko: Yep.
Nathan Stephens: And so, you're under a lot of pressure to bill a lot of
hours. And you had hourly targets, yearly targets, that
you're supposed to hold up. And all the while, I'm
doing this other thing that isn't tied to billable hours,
and isn't necessarily aligned with the corporate
strategy, but something I feel really passionate and
really curious about, right?
So, there was a real tension there about like, how to
spend my time.
Kirill Eremenko: Yeah. That's when you start working like, evenings,
and weekends, and you lose any kind of personal life,
or sports, and health. Everything goes down the drain.
Nathan Stephens: Yeah. That's kind of client services in a nutshell,
actually.
Kirill Eremenko: Exactly. All right. Well, that is very interesting. Tell us
a bit more about, before we move onto the other
components about the career opportunities, tell us a
bit more about the analytic infrastructure. So, I
encountered that, like when I was at Deloitte, it was
quite closed off to me. I was just doing the consulting
Show Notes: http://www.superdatascience.com/170 27
work, just doing the data science side of things. But
then, when I moved onto the superannuation fund, or
the pension fund, in Australia, I was heavily involved
with infrastructure, and data architects, solutions
engineers, and all these other different roles that I
didn't even know existed. And I found that to be a
fascinating role. Could you give us like a short
excursion to the world of analytic infrastructure? What
is it all about?
Nathan Stephens: Yeah, so that is a great question. That is a fantastic
question. So, analytic infrastructure has two, the way I
view analytic infrastructure right now is kind of in two
compliments. You have this notion of a data lab, right?
You have this idea that you have a sandbox to play in,
where analysts can work with their data, and learn,
and discover, and create. And most analysts I know
love that part of the job. They want to go create. They
want to build applications. They want to generate
reports. They want to try new technology. They want to
blow things up, right? I always say like, [crosstalk
00:41:47] not a big difference between a data scientist
and a mad scientist, right; just a few letters.
So, I think creating a data lab for people, a playground
for people to play in, is really important. And then,
there's this other notion of running analytics in a
production environment. And the difference between
those two is that, in the data lab, the data scientists
are in charge; and in a production environment, the IT
group, or the IT operations are in charge.
Kirill Eremenko: Yeah.
Show Notes: http://www.superdatascience.com/170 28
Nathan Stephens: And that handoff becomes ... Well, we could talk about
the handoff, but spanning those two worlds is the part
that I find very fascinating, so that's that fuzzy area
where I've lived. It's like, how do you connect this data
lab to this production world?
Kirill Eremenko: Gotcha. I totally agree. Like when I went to this
company, you would always get slapped on your
hands for trying to like, run a query without asking in
advance. Like, data scientists didn't even have access
to SQL before I came in. Then I requested the access,
and finally, after certain hurdles, we got it. And like,
every time you run a query, they're like, "Oh, you
could have hung the whole server, and you know, the
production environment." And then, they have ... Oh,
what's it called? They have these time slots, like in the
night, when all the queries are supposed to run. I
forgot what the exact, technical term for it is, but like,
they have allocated time slots for certain queries
because they know how much time it's going to run
and so on. And they need to get a certain amount done
in 24 hours.
And one of the first things that we did after a couple of
those incidents, where data scientists were like,
slapped on the hand, what we did is, we set up this
data lab. It was like, I think it was called the sandbox.
Some people called it the data lab, playground-
Nathan Stephens: Yeah, sandbox.
Kirill Eremenko: ... yeah, or called it a sandbox. And that really solved
the whole issue, because you can just experiment as
much as you want. There is still that issue of
handover, which you briefly touched on, but at least
Show Notes: http://www.superdatascience.com/170 29
it's not as bad. Like, people are not constantly chasing
you up about things that you're allegedly doing wrong,
and you get the freedom to experiment at the same
time.
Nathan Stephens: Yeah. That's so fun to hear you share that, because
that's been my exact experience as well. I had two guys
from IT come over to my desk when I was at the ad
network, and they were not happy at all. They towered
over my desk with very unhappy faces, and wanted me
to account for myself, you know?
Kirill Eremenko: Yeah.
Nathan Stephens: And that didn't happen once, but it happened twice.
And then I got a sandbox, also.
Kirill Eremenko: Yeah, yeah. True. It's interesting how they have these
systems in place to track down who exactly is the
culprit. They find you very quickly.
Nathan Stephens: Yeah, yeah. They will find you.
Kirill Eremenko: Yeah, okay. And so, in the case of analytic
infrastructure, so it's not ... Like, one of the steps is
setting up the sandbox or the playground, and then
dealing with different servers in the production
environment, and things like that. What else is part of
a role, the role of somebody who is in analytic
infrastructure? What does the day-to-day look like
there?
Nathan Stephens: Okay, yeah. So, I'm trying to parse that question,
because that also feels like two questions. So, let me
do the role, and then the day-to-day. So, the role of
like, let's call it an analytic administrator. I actually
Show Notes: http://www.superdatascience.com/170 30
wrote an article on R Views, which is one of our
corporate blogs about R, about an analytic
administrator.
So those analytic administrators, they have to be
pretty awesome, to be honest, right? Like, they have to
be connected to their data scientists, to understand
what the data scientist needs. They have to be aligned
with the executive audience to know like, what matters
to the company; like, where the value is going to be,
like what types of solutions are going to produce
business value. They have to get along well with IT. So,
they have to bring them doughnuts, and make sure
that their voice is being heard, and that they're
complying with all of those rules. And they have to be
really good evangelists in general, about promoting the
need for data science in the organization. If you're
fortunate enough to work for an organization where
data-driven decisions are happening, then that will be
easier. If you're working for an organization that's
maybe still more like, politically oriented, or making
decisions from their gut, then you're going to have a
little bit more work to persuade them that data science
is meaningful in your organization; but being a proper
evangelist is a really important part of the role.
What does that mean, day-to-day? Well, part of the
day-to-day is going to be managing that data science
lab that we talked about, right? Like, somebody's got
to be overseeing that architecture, making sure that
that thing is running. And you can either have, in
some cases, IT will manage that; but what I've seen,
usually, is more effective is if the analytics admin has
Show Notes: http://www.superdatascience.com/170 31
like, some nice levers that they can kind of pull to
control those things. They're also, you know, teaching
best practices. So, they're educating data scientists on
how to do things properly. I sometimes call it like,
shared infrastructures. Like airports, you can't have
all of the planes landing on the runway at the same
time. The data scientists have to know who else is
flying around them in the space, and who's coming in
for a landing. So, you have to be aware of those
resources.
And they don't know. Like, here's the thing with data
scientists. Data scientists, they're just not trained to
do this. Like, you don't learn it in school. So,
somebody has to teach them, and it's going to he the
analytic admin that's going to teach them, right? Like,
they're just going to do things, like you and I were just
saying, like we're going to blow up stuff, right?
Kirill Eremenko: Mm-hmm (affirmative).
Nathan Stephens: They're just going to do it. They're not going to know.
And that's okay, because nobody taught them. So,
there's an opportunity there.
Kirill Eremenko: Yeah.
Nathan Stephens: Other day-to-day would be, you know, making sure
that you're getting your architectural review board
presentation ready to go, to make R, or whatever
language you're using, an analytic standard, to make
sure that you have resources dedicated to that; like,
that people are actually funneling human and
financial resources into that work. And then of course,
the production work is a whole nother ball of wax, but
Show Notes: http://www.superdatascience.com/170 32
you know, if you're in the production side, you can
actually make even greater impacts.
So, it's like a big job. And I tell people like, analytic
admin, you're not going to see that in Indeed, or on
your job searches. Like, people aren't advertising for it,
but it's an actual need in organizations. And I know
that because I talk to a lot of organizations in my role.
Like, I'm on the phone every day with customers and
potential customers, and almost all of them have this
need. So, I like to tell people that if this is something
that they're interested in doing, I would definitely go
for it, because the need is definitely there; even though
the job description might not be written for it yet.
Kirill Eremenko: Gotcha, okay. I just have one burning question from
that. That was a great description of this whole role,
and I think I learned quite a bit of new things for
myself, just now. My question is, could you let me
know why does it sometimes ... And I'm sure other
data scientists will have exactly the same question ...
Why does it sometimes take so long to implement a
tool in an organization, especially like, for instance, I'm
in an organization, and I want an opensource tool,
such as R; like, I can download it on my computer,
and run it within 30 minutes. Why does it take several
weeks for an organization to roll that out to me, and to
allow me to use it for analyzing their data?
Nathan Stephens: Or months, or years, right?
Kirill Eremenko: Yeah.
Nathan Stephens: Yeah. So, there are barriers here. And I don't want to
be like a downer, right, but when we talk about large
Show Notes: http://www.superdatascience.com/170 33
corporations, it's important to know that there's this
long journey, decades-long journey, on how they get
here. And a lot of them aren't really geared towards
data-driven decisions. And a lot of companies don't
really know what to do with data scientists, is the
problem. Like, there's this notion of like, yes, it's
important. We need really smart guys. Let's go get
some really smart guys, and boom, we'll have a bunch
of financial success.
And that's not really the way it works. Like, you really
have to be thoughtful about how you're going to align a
data science team with the overall corporate strategy.
And the reality is that most companies struggle with
that. So when you're in an organization, and you say,
"Hey, I need a data science lab," a lot of organizations
are not even going to know what that means. Or if they
do know what that means, they're not going to be
geared to a way to fulfill that request.
So, it's an evolution. And I think a lot of the younger
companies that are coming up, like if you work for a
startup, that's not going to be as big of an issue. Like,
they're just going to know like, we run on Amazon.
We're going to [inaudible 00:51:12], you know, a VPN
... Or, I'm sorry, yeah, basically a new server
infrastructure, right, or an existing server
infrastructure inside of Amazon, and will serve your
needs. But like, a larger organization is going to
struggle with that.
Kirill Eremenko: Mm-hmm (affirmative), yeah. And do you think that's
going to be the cause, why all large organizations are
going to end very soon? Or [crosstalk 00:51:45].
Show Notes: http://www.superdatascience.com/170 34
Nathan Stephens: No, I'm not saying that that's going to happen. I think
there is a tension between like, the large corporations
and the smaller companies; but I think, I'm actually
very optimistic. And there's, I've met very talented
people in all sorts of groups. You know like, I work
with large financial groups, insurance groups,
consumer packaged goods groups. And I'm always
impressed with the quality of talent that these different
organizations can attract.
So, I'm actually very, very optimistic about the future
of data science, and the direction. What I get more
concerned about, frankly, is that the data scientists
themselves don't always really understand what they
bring to the table. So, I'll be more specific. Data
scientists are responsible for understanding their data.
And nobody else in the organization has that
responsibility. And so if you're a data scientist, and
you're spending 80% of your job like, scrubbing data,
that's because that is, you're in the role that does that.
Like, nobody else is doing that. And the power of that
is that when you speak about something, you can
speak authoritatively about that. You have
ammunition to say, "I know this is true because I
actually have been in the data, and I've seen it." And I
think that's one of the under-leveraged skills that I see
with data scientists, is that they take ... Not everyone,
but some data scientists will take that for granted.
Like, "Oh, yeah." It's like, "I just happen to know all
this stuff."
It's like, no, you know all that stuff. Like, take
advantage of that. Make sure other people know about
Show Notes: http://www.superdatascience.com/170 35
that. Like, broadcast that information. Make sure you
communicate what it is you're learning, because I
guarantee you, your boss, and your boss's boss,
they're not looking at that data. They don't know
unless you tell them. So, getting that information out
is extremely critical for the success of the data
scientist, and for their overall happiness in the job.
Kirill Eremenko: Yeah, and for the success of the business as well.
Nathan Stephens: Yeah, great point. I left that one out, but that's
probably the most important one.
Kirill Eremenko: Yeah, all right. Wow, fantastic. That's such a good
excursion to that world. Thank you so much. How
about we shift gears a little bit and jump into R? Let's
talk about R, and what's going on in R these days. And
you know, like some great things, I'm sure you have so
many great things to say about R.
Nathan Stephens: I do. I think R is fantastic. We were talking, before the
call, about R and Python. Could we just jump into that
one? Why don't we just hit the elephant in the room?
Kirill Eremenko: Yes.
Nathan Stephens: Okay. So, I don't think there's a war between R and
Python. I think the analytic space is plenty big to
accommodate two programming languages. And it
reminds me a little bit of the conversation back in the
'90s, when people were like, "Oh, it's got to be Apple or
Microsoft." Well, guess what; computation is big
enough to handle two large companies, right? We still
have both of these.
Show Notes: http://www.superdatascience.com/170 36
So, I don't think there's a war between R and Python. I
think that what needs to happen is, you know, you
can ... Well, what needs to happen is that those two
things need to work really well together. And in case, I
just want to mention that we recently made some
progress in that area, if you missed the
announcement. We actually brought Wes McKinney
on-staff at R Studio, and he's one of the well known
developers in the Python world. He's the father behind
Pandas, and he's now in charge of working in this
thing called Ursa Labs. And you can query that, if you
haven't seen Ursa Labs. It's named after the bear,
right; Ursa Major, Ursa Minor; the Big Dipper and the
Little Dipper.
And the job that he is leading up is really around
interoperability between datasets and programming
languages. So, what do I mean by that? If you're
familiar with, Apache Arrow is the project that's
building datasets that can be loaded into memory,
both in Python, and into R, and into other
programming languages. And if you can load, if you
can share data across programming languages, you
can easily jump in between the programming
languages. Like, you could say, "Okay, I've got this R
data frame. I want to like, use some Python magic on
this." I'd boot up my Python instance, and I suck that
data over into Python. Right now, transferring data is
an extremely painful process. And you know, Wes is
trying to make that a much easier process. And it's a
very foundational piece in the toolchain that I'm really
excited about.
Show Notes: http://www.superdatascience.com/170 37
So basically, my point is that we brought on one of the
key Python developers, who works for R Studio now.
We've made R Studio much more Python-friendly.
We're still R-centric, right? Like, we are still saying,
"We like R." But if you're an R developer, it's getting
easier and easier to work with the Python tools; to call
Python functions, and modules, and interoperate
between the two languages. And I think that's a huge
advantage for data scientists. The next generation of
data science development is to be multilingual, and to
take advantage of the things that Python and R both
offer; and Julia, and you know, whatever other
languages you might be working with as well.
Kirill Eremenko: Yeah, wow. I didn't know that. That's a very ... That's a
huge stride forward with getting the languages closer,
and hiring-
Nathan Stephens: I think it'll take a couple of ... Yeah, it'll take a little
while to play it all out, right? Like, it's definitely part of
the long game, but if I look down the road, I see a
future where you've got people who know R really well,
that are also very comfortable, you know, taking
advantage of Python. So, Python opens the door to
TensorFlow, Spark; and those are things that we've
already incorporated in the R stack, is good connectors
to Python, and to Spark, and to TensorFlow, via
Python. And I think there'll be more things like that
coming in the future.
Kirill Eremenko: Yeah. And I like your comment about multilinguality.
That's very important; or, it's a great selling point for
any data scientist to have on their résumé, that I know
Show Notes: http://www.superdatascience.com/170 38
both R and Python. I have experience with both. That's
where the world's going, right? [inaudible 00:58:34]
Nathan Stephens: Yeah, right. If you're a hiring manager, and you've got
one person who knows Python, and another person
who knows R and Python, yeah.
Kirill Eremenko: Gotcha.
Nathan Stephens: Yeah, it's an easy call.
Kirill Eremenko: It's a no-brainer. And so, just to clarify, is your vision
that in a couple years, we're going to have one,
combined language, R-Python? I'm assuming not. I'm
assuming we're still going to have separate, R and
Python, but the interlink between them is going to be
very efficient and very high. In that case, what would
you say that R and Python are good for, separately?
Like, which one would you use for certain things, and
the other one for other things?
Nathan Stephens: Yeah, yeah. I think you could answer that in a lot of
ways. I've asked a lot of people, "Why did you choose
R? Or why did you choose Python?" And I get a lot of
different answers from that, but one thing I hear
frequently, one thing that doesn't surprise me is that it
seems like, it's like it's not even a question in their
mind. They just kind of went to the language that
actually resonated with them. They're like, you know,
and R users are very much this way; it's like, "I just
love R." You know it's like, you talk to people, it's like,
"I just love that experience. I love what it does, and it's
just part of like, who I am, even," like the people that
really, really love it. Or maybe you want to build Shiny
Show Notes: http://www.superdatascience.com/170 39
applications, right? There are things that R does, that
Python won't do.
You know Python, I've talked to a lot of people that use
Python. And sometimes, the answer is back, like,
"What is R? I don't even know what it is." So, it's like
maybe they don't even know what it is. I think if it's an
individual choice, I think that's fine. Like, I think
that's great. If you're a Java guy, and you love Java,
that's fantastic. Just use the language that you want.
But what's interesting about the R language is that R
is so, I guess, forgiving, or just inclusive of other
languages. R is a little, there's some humility in the
language. And it kind of gives up a lot of its control
and power to other languages. So when you run a
model in R, you don't actually run it in R. You call a C,
or a C++, or a Fortran library to run it, right? When
you run a Spark job, you don't run a Spark job in R.
You're calling into the Scala API, right?
So like, and that's totally fine with what R is about. R
doesn't really want to do that, anyway. R's just like,
"Let me just introduce you to these other things." And
that's ... So anyway, not a lot of people look at R that
way, but that's the way I see R, as more of a way to
orchestrate, you know, a lot of power and goodness to
work with other systems.
Kirill Eremenko: Yeah, gotcha. And it makes the best of many worlds,
rather than just trying to introduce everything on its
own. That's pretty good.
Nathan Stephens: Yeah. R's pretty slow, right? Like, if you run things
inside of R, it's pretty slow.
Show Notes: http://www.superdatascience.com/170 40
Kirill Eremenko: Well, not with everything. Some things, like specific ...
What's it called ... like vector operations. There, I
think, R outperforms Python in some of those cases.
Nathan Stephens: Right, right. Yeah, yeah.
Kirill Eremenko: But like, loops and stuff [crosstalk 01:02:04], totally
agree with you. Like, R-
Nathan Stephens: Loops are pretty slow. Yeah, yeah.
Kirill Eremenko: [crosstalk 01:02:10] Yeah, all right. And what would
you say about R and deep learning? Like, with the
recent developments in using Keras with R and things
like that, those are pretty exciting.
Nathan Stephens: Yeah, yeah. So, just piggybacking on that, that R is
slow, it's like the solution to R is slow is to push that
information somewhere else. Like, don't do it in R. Do
it somewhere else. So with Python, with Keras, and
deep learning, all of those routines are also, that's a
Python world, right? Like, those are all written in
Python. And what JJ has done, JJ's our founder, and
done a lot of the engineering around Python and
TensorFlow, JJ has written a nice library of connectors
that allows somebody who knows R to take advantage
of all of the work that's being done in TensorFlow; and
not only take advantage of it, but actually give them a
really nice experience.
So, we put things into the IDE to help you debug your
models. JJ's very good at documentation as well, so
there's a really nice set of ... There's a book that you
can read. There's a library. There's a website with
examples to learn about this. So basically, that
technology is like, there and available today. Like, that
Show Notes: http://www.superdatascience.com/170 41
landed a few months ago. And we're trying to invite as
many people as are interested, to come experience it,
try it out, and learn from it. It's really cool stuff. I have
to say, it opens up a whole new dimension into
problems that we previously didn't have tools for.
Kirill Eremenko: Mm-hmm (affirmative), yeah. Definitely exciting, and
very, very exciting, especially for those who are used to
R, and are now interested in deep learning and AI. And
this is finally going to be available.
Nathan Stephens: Right.
Kirill Eremenko: Yeah, all right. Well, we're kind of like, coming close to
the end of this session. And time has flown by, and I
still have so many questions that I would love to ask
you, but I guess I'll hand it over to you. Like, is there
anything you would like to share with our listeners, or
with aspiring and professional data scientists who
want to grow their careers?
Nathan Stephens: Yeah. I think I've shared a lot with the career advice.
Can I just make a shameless plug for what we do at R
Studio?
Kirill Eremenko: Sure, of course. Go for it.
Nathan Stephens: All right, because a lot of people don't realize that we
actually do sell professional grade products for the
enterprise. And those are designed to work with all of
our opensource packages and tools. So if you're in the
enterprise world, you're typically looking at like,
security, authentication. You're trying to figure out
high-availability scaling. You have like, mission-critical
applications and whatnot in there. And we sell
Show Notes: http://www.superdatascience.com/170 42
products to bring R into the enterprise, and make it an
analytic standard in there.
So if you, today, if you are using R on your desktop at
your job, and you're downloading data from your SQL
server database, onto your laptop, and then taking it
home, you know, and leaving it at a café or something,
I would encourage you to think about going to the
website, seeing what we have to offer, because we
actually have a really nice platform for scaling out R in
the enterprise; a really nice toolchain for doing that.
And it'll make your life better, and increase the
capabilities of your tools. And not a lot of people know
that like, that's all available. So, yeah. I just wanted to
point that out. Thank you for letting me make a
shameless plug.
Kirill Eremenko: That's all right [crosstalk 01:06:00]. I just, I will
reiterate that. Like, there's a lot of organizations, like
we have executives, and directors, and entrepreneurs
listening to this. And just for their purpose, for their
sake, there's a lot of organizations that still use large,
corporate tools, such as SAS, and other tools that are
just there, archaically. And it's time to change. And I'm
not saying anything against SAS, but the world is
going opensource. The power of opensource is
incredible, and the communities behind opensource
tools are really empowering very fast changes, very fast
developments in the algorithms, in the speed, and in
everything that the tool requires.
And so, if it's time for change for your organization,
then R Studio is there to help. And also, if you are
starting a new business, an enterprise, or taking an
Show Notes: http://www.superdatascience.com/170 43
idea to execution, to actually building a company
around an idea, then don't go, it's probably not the
best idea to go for some enterprise-specific tool that is
not opensource. Why not go for an opensource tool,
and get in touch with Nathan? He'll set everything up
for you.
Nathan Stephens: I think that's fantastic. Can I add one thing on that?
Kirill Eremenko: Yeah, sure. Of course.
Nathan Stephens: Because I 100% agree with everything you just said.
Things are changing rapidly, and when I talk to people
who are in the hiring position, who are trying to build
out their platforms, you know, and bring in the best,
you know, to adapt to this new world, there's this idea
of bringing in the best talent, as well. You're trying to
capture the data scientists, and they're in high
demand. They can be expensive, right? And it's a big
investment.
And by and large, that new demand that's coming in
from colleges, they're going to know R, and they're
going to demand that there's R tools available to them
in their job. And so, making an investment in R, I feel
very, very strongly, obviously, because I work for R
Studio, but feel very strongly that an investment in R
is a good move in bringing in the best talent out there.
Kirill Eremenko: Gotcha, couldn't agree more. All right, Nathan, so
thank you so much for sharing all the insights, and
your wisdom, and your career journey. Where could
our listeners get in touch with you and contact you, if
they'd like to learn more, or maybe explore the
opportunities with R Studio?
Show Notes: http://www.superdatascience.com/170 44
Nathan Stephens: Yeah, you're welcome to reach out. My email at R
Studio is [email protected]. My Twitter handle is
NWStephens; and also, everywhere else on the
internet, it's going to be NWStephens.
Kirill Eremenko: Yep. And LinkedIn is a good place to get in touch with
you?
Nathan Stephens: NWStephens, yeah.
Kirill Eremenko: Yeah, awesome.
Nathan Stephens: Yeah, LinkedIn is great.
Kirill Eremenko: Awesome, all right. We'll include those links in the
show notes, and we'll try to find that article that you
mentioned, that you wrote about the analytics admin.
That was really interesting. I have one more question
for you today. What is a book that you can recommend
to our listeners, to empower their careers even more?
Nathan Stephens: Well, I'm going to make another shameless plug for
Hadley Wickham's book, called R For Data Scientists.
It is about R, but it also has some great foundational
material, just about how to think about and approach
data science. And so, that's why I recommend it.
Kirill Eremenko: Yeah. Does Hadley have a few books, because I'm sure
I've read one of them, and I think it's this one.
Nathan Stephens: Hadley has, yeah, Hadley is amazing with the amount
of content he pumps out. And yeah, he's got a few
books. I neglected to mention that it's co-authored
with Garrett Grolemund, as well, who also works at R
Studio.
Show Notes: http://www.superdatascience.com/170 45
Kirill Eremenko: Okay, gotcha. It seems like you've got all the top
analytics talents working for R Studio, and now you're
poaching from Python as well.
Nathan Stephens: I have the great ... When I go to a meeting, I assure
you, I'm the dumbest one in the meeting. It's really
nice to work with such amazing people.
Kirill Eremenko: Exactly. Like, that's my, I always appreciate when I'm
the dumbest person in the room. That means there's
places I can grow, right? Like if you're the smartest
person in the room, you should be in a different room.
Nathan Stephens: Yeah, yeah. I know, and I don't say that, yeah, just in
false sincerity. I really mean it. I'm the dumbest one in
the room. It's really a great experience, actually
working with so many wonderful people. And they're
not just smart at their jobs, but they're wonderful
people to get to know as well. I'm just really impressed
with the character of these people that I get to work
with.
Kirill Eremenko: Yeah. Well, the character of this podcast has been
amazing. Thank you so much, Nathan, for coming onto
the show and sharing all these wonderful insights.
Nathan Stephens: Thank you so much for the opportunity. I really
enjoyed it. I learned a lot.
Kirill Eremenko: All right. Talk to you soon. Bye.
Nathan Stephens: Bye.
Kirill Eremenko: So there you have it. That was Nathan Stephens from
R Studio, sharing his career journey, and all the recent
and greatest updates from R Studio; directly, you hear
it from, directly, the person who works there as a
Show Notes: http://www.superdatascience.com/170 46
director. And what was your favorite part of this
podcast? Mine, by far, was the analytic admin concept
and description. Nathan obviously has a lot of
experience in this space, and he described the idea
behind what an analytic admin does, or what that role
entails, very aptly, and makes a lot of sense that
companies should have a person like that onboard if
they are looking to build a lasting analytics culture, a
sustainable approach to data science, where
everybody's happy. The IT team is happy, and the data
scientists are happy as well.
So there we go. That was Nathan Stephens. All of the
show notes, and links, and all the things mentioned in
this episode are available at
www.superdatascience.com/171.
There, you will also find a transcript for this episode,
and the URL to Nathan's LinkedIn. Make sure to
connect with him, hit him up, and stay in touch. If you
are looking to implement R Studio at an enterprise
level, or a corporate level in your company, then make
sure to get in touch with Nathan. He'll guide you
through the process, and at least give you some tips.
And finally, if you know somebody who uses R
programming in their language, who is a big fan of R,
or who loves R Studio, why not send them this
podcast? There's a lot of valuable information, a lot of
updates on what's going on in the R space, and I think
there's a lot to learn here. So, make sure to forward it
on, and you might help somebody out; your friend,
your colleague, your relative. Help them out in their
career in data science.
Show Notes: http://www.superdatascience.com/170 47
And on that note, thank you so much for being here.
Can't wait to see you next time. And until then, happy
analyzing.
[Music 01:13:12]