sds podcast episode 513: transformers for natural …
TRANSCRIPT
Show Notes: http://www.superdatascience.com/513 1
SDS PODCAST
EPISODE 513:
TRANSFORMERS
FOR NATURAL
LANGUAGE
PROCESSING
Show Notes: http://www.superdatascience.com/513 2
Jon Krohn: 00:00 This is lucky episode number 513 with Denis Rothman,
an award-winning author on artificial intelligence.
Jon Krohn: 00:08 Welcome to the SuperDataScience Podcast. My name is
Jon Krohn, chief data scientist and bestselling author on
Deep Learning. Each week we bring you inspiring people
and ideas to help you build a successful career in data
science. Thanks for being here today, and now let's make
the complex simple.
Jon Krohn: 00:42 Welcome back to the SuperDataScience Podcast. Today's
guest is the colorful and ethically industrious Denis
Rothman. Denis is the author of three technical books on
artificial intelligence all of which have come out in the
past two years. These books are on AI in general with
particular focuses on Explainable AI and the giant
transformer models that have revolutionized Natural
Language Processing or NLP for short. His most recent
book called Transformers for NLP led him to win this
year's data community content creator award for
technical book author. Prior to becoming a full-time
author, speaker, and consultant, Denis spent 25 years as
the co-founder of a French AI company called Planilog,
which was acquired three years ago. All told, Denis has
been working on AI for 43 years since 1978 and has been
patenting AI algorithms such as those for chatbots since
1982.
Jon Krohn: 01:50 In today's episode, Denis leverages vivid analogies to fill
us in on what natural language processing is, what
transformer architectures are, and how they've
revolutionized NLP in the past few years. He also talks
about tools we can use to explain to get an understanding
of why complex AI algorithms provide a particular output
when provided a given input. This episode should be well-
suited to anyone who'd like to keep on top of what's
possible in AI today regardless of your background
practicing data scientists in particular. We'll also
Show Notes: http://www.superdatascience.com/513 3
appreciate Denis's mentions of particular modeling
approaches and software tools. All right, you ready? Let's
do it.
Jon Krohn: 02:39 Denis, welcome to the SuperDataScience Podcast. Where
in the world are you calling in from?
Denis Rothman: 02:45 Okay, thank you, and thank you for inviting me. Right
now I'm 150 kilometers from Paris. I'm out in the country
in the Champagne region then you have Burgundy and all
that. I'm around that place.
Jon Krohn: 03:02 Wow. That does not sound unpleasant.
Denis Rothman: 03:05 It's very pleasant.
Jon Krohn: 03:07 It sounds amazing. Is that like a COVID thing or you're
out there all the time?
Denis Rothman: 03:12 No, no, no. I like Paris and like to be out of Paris. It's like
being in Manhattan. And then, you go out a bit to the
northwest, just have to go 20 miles and you're in the
woods. You're in the forests in New York State. So,
around Goshen or places like that.
Jon Krohn: 03:30 I've heard. Yeah, I hope to someday spend time outdoors
just like this thing. So, as we discussed before the episode
started, I'm Canadian. And so, people often have this idea
of you being outdoors. But I grew up in Downtown
Toronto and now I live in Downtown Manhattan. And I
haven't experienced much outdoors at all. But I've heard
it's wonderful. And someday I'll experience that, too.
Denis Rothman: 03:56 Toronto is a nice place, too.
Jon Krohn: 03:58 Toronto is nice. It doesn't have a Champagne or
Burgundy region around it. We got the Niagara region,
which is our best imitation.
Show Notes: http://www.superdatascience.com/513 4
Denis Rothman: 04:06 That's why I chose France in fact because I could live
anywhere but I found that the quality of life like you have
medieval culture that you can't find in North America,
medieval culture, universities that go back to the 13th
century. I like that part. And then, you go to modern
Paris. I like that. But I like to travel, so it's not really a
problem.
Jon Krohn: 04:30 I've noticed from videos that I've seen of yours in the past,
you have very interesting art in the background. I think
you studied history at points in your career.
Denis Rothman: 04:42 Yeah. I paint. I play the piano. I was born in Berlin in
fact, and my father was a military lawyer for NATO so I
traveled all around all the time. But my dream was to go
to Sorbonne University. That was my thing. Because in
those days, the president of university says, "What you
came here, it's because you're really interested in the
history, the geography, archeology, mathematics,
linguistics." So, you can major in something. But in this
university, you can go to any class and you can get
credits for any. So, I would go into this cross-disciplinary
education, which was very fascinating. That's why I spent
so many times. I went to three Sorbonne universities in
fact. I just couldn't stop learning in there. So, yeah.
Jon Krohn: 05:37 Wow.
Denis Rothman: 05:38 Yeah. So, I studied a lot of everything.
Jon Krohn: 05:42 That sounds amazing. That's like my dream retirement. I
wonder if they'll accept me then.
Denis Rothman: 05:46 And I wanted to start my life like that, like thinking like
that. Because at one point, I was working a lot in the
states for student money, college money. And I was
driving cars, this driveaway thing where they give you a
car, and then you can take it anywhere. So, I cross
Show Notes: http://www.superdatascience.com/513 5
around the state. And one day, I was sitting in Florida,
and I say, "Do I want to live here? What do I want to do?"
Okay, I really want to go to Sorbonne University, because
I could have stayed down in Palm Beach and had a nice
life, study there. But no, I wanted to come back to Paris
and live this educational thing. And there's so many
cultures right next to Germany, Spain, Italy, Portugal,
UK, Belgium. It's incredible. I'm forgetting countries. I
don't want to leave the viewers out. Like Netherlands,
Luxembourg, Denmark. You just sit there and you have
all these people there. You're living in the world.
Jon Krohn: 06:45 Yeah, it's rich in culture. I am jealous. It sounds like
you're in the right spot to be.
Denis Rothman: 06:50 No, Manhattan is great.
Jon Krohn: 06:53 Yeah. It's a very concentrated piece of culture. And then,
as you say, you go 20 miles out and you're just in the
woods.
Denis Rothman: 07:03 That's right. People don't realize that. But you're only 30
minutes from beautiful nature, just right northwest, just
go through Washington Bridge out there and that's it.
Jon Krohn: 07:13 Yeah. So, amongst all of the learning that you've been
doing in recent years, there's been a fair bit of learning
and teaching of mathematics and artificial intelligence,
machine learning to the extent that you've published
books at an incredible rate. So, this year, you published
Transformers for Natural Language Processing. Last year,
you published two books, Hands-on Explainable AI with
Python, and just a few months before that, AI By
example. So, I'd like to dig into each of these books. I've
got a copy of Explainable right here but I want to go
backward chronologically. So, let's start with
Transformers for NLP. What is this book about? So, I can
give a little bit of color maybe for the audience but you
Show Notes: http://www.superdatascience.com/513 6
could do a much better job. So, natural language
processing is the application of data science or machine
learning to make use of natural language in some way
like written language or spoken language and yes, to
maybe automate things, and transformers are
particularly interesting in recent years because they've
been shown to have unprecedented accuracy at a lot of
natural language tasks.
Denis Rothman: 08:34 So, yeah, well, let's take this back a second. When you're
talking about NLP, you're talking about linguistics. If
you're talking about linguistics and machines, it's
computer linguistics. Okay. So, we're going back to
theory. And there's one little thing we have to understand
is that we're getting inputs with data. You get a lot of
data. So, that's the input. You have all this raw data,
billions and billions pieces of data. And on the other side,
you have to do some representation of reality so it doesn't
look like murky results, right? So, up to now, you had all
that input and then you had to get good representations.
But there were several models like you would do k-means
clustering, then you do parsers, then you do recurrent
neural networks, and then you can do CNNs and all. So,
it was a bit like a lot of tools to do all. So, every time you
had to do a task, you had to find out another tool like an
SVM. So, for 35 years because I started very early in
artificial intelligence, okay, so I saw no change, and I say,
"Where is this going?" These people are writing a lot of
algorithms. And I wrote one algorithm 25 years ago that's
running all over the world while we're speaking. So, I say,
"Why do they write all these algorithms when you can get
one universal algorithm to do the job?" Of course, I wrote
it for supply chain management and not NLP.
Denis Rothman: 10:17 So then, all of a sudden, Google around 2017 has this
problem. We have 5 billion searches per day. We're having
problem with the US Senate because they keep asking us
questions like... I'm speaking like big tech like Mark
Show Notes: http://www.superdatascience.com/513 7
Zuckerberg is called to the Senate. This is the reason
transformers exist. He's called to the Senate and they say,
"You know there's that post." And he's thinking, "What
post? There are 2 billion posts a day." Yeah. But that
post. He's thinking, "What are you talking about? I'm a
multi-billionaire. I'm surfing all the time. And they're
asking me about the 1, 500,000,000. I don't even know
what's in that post. I'm trying to do my best here." And he
says, "I don't have the tools." So, he's thinking, "Go see
my team." And the team says, "Well, we can't. We just
can't. We have 100 algorithms in there. We're not making
it. We're not making progress." Twitter has the same
problem, Amazon, Microsoft. So, one point, Google says
we have to stop all that. We need something industrial.
Denis Rothman: 11:28 So, instead of having a convolutional neural network, we
have layers but none of these layers are the same size,
none of these layers do the same thing. That's like a 1930
car. No, what we want is a V8. A V8 engine looks
beautiful inside like eight engines here, right, V8. So, they
come up and say, "Let's forget about this recurrent neural
network stuff. We want a V8." So, let's start with eight
heads, which are like a V8 engine. Let's start with eight
heads. Forget about recurrent stuff and all these layers.
And we want to write a layer and we're going to put the
layers, and every layer is the same size. Let's make every
layer the same size that way we have an industrial model.
It's like a rectangle. And we just stack these same layers,
same size. They come up and say, "Well, that's not
enough. We're not going to go fast enough with that. Wait,
let's take one of these layers and split it into a V8. Wow.
And now, we're going to run those eight layers, those
eight parts of a layer on eight heads on eight GPUs on
eight processors at the same time." Wow. They're going to
run there. All these words are just going to analyze other
words. We just want to say, Denis and Jon. Jon has a
guitar behind. Does he play the guitar? Let's put all that
together and see where that word fits into context.
Show Notes: http://www.superdatascience.com/513 8
Denis Rothman: 12:56 And once that layer is over, let's not mix it all up. Just
add it and send it to the next layer that will do the same
thing, building on what it learned in the first layer but it's
always the same size. So, one point they reached this
model and no one knew what was going to do because it
was training on raw data and it wasn't really labeled data.
They use label data just to show people. It did excellent
results. And then, all of a sudden OpenAI comes along
and says, "You know what? That's a good idea. Why don't
I create a stack not with 16 layers but 96 layers? Instead
of using 5 billion params, why don't I do 175 billion
parameters? And why don't I ask Microsoft for a
supercomputer, a $10 million supercomputer with 10,000
GPUs and 10s of 1000s of CPUs?" And now you have a
factory. So, now you have this industrial model V8, and
it's just there and it's going fast, fast, fast.
Denis Rothman: 14:03 And then, all of a sudden they wake up and they say,
"Uh-oh, what does it do? How is it possible to do all these
tasks?" And in fact, they discovered because it's called
emergence. Emergence is when you don't know what's
going to happen but it just emerges out of all that training
that in fact, the system, a GPT-3 transformer or a BERT.
They just learn the language. And once they learn the
language, it's based on what you ask them the prompt.
So, if you type nice prompts, it will analyze it as a
sequence and it will try to find out what follows. So, in
the end, you end up with the GPT-3 model trained on a
supercomputer and you can ask it anything you want.
Give me the synonyms of headphones and stuff. You can
invent your own tasks or give me a grammatical
breakdown of the sentence, or recently, why don't you
just take my... when I'm writing, translate it into Python
instead of translating it into French or translating to
JavaScript. And just to finish the little story, you bounce
back to Google and says, "Why don't we create a trillion, a
trillion parameter model?" And that thing is going to be so
big that you know it's going to exceed human capacity.
Show Notes: http://www.superdatascience.com/513 9
And people saying, "Gee, where are you going to get the
computer with the other one that was $10 million that's
one of the top 10?" Google says, "Yeah, but why are we
bothering ourselves with all those floating points?" We
don't need all that, those floating. So, let's build our own
TPUs and just cut all that floating-point stuff out of there
so we have a domain-specific machine.
Denis Rothman: 15:47 And now, they've created supercomputers that we can
rent for just a few $100 an hour, which is not much for a
corporation. That's even more powerful than OpenAI has.
And then, you can train what you want. And then, the
beautiful thing is it bounced backed into Google that has
BERT, and Google Search now is based on BERT.
Everything is BERT in Google Search. So, you see how we
went in... a few years, we went from prehistoric artificial
intelligence to super-industrial and industrialized society.
And big tech did that miracle. I mean you can say
anything you want about them. But what people don't
understand, just like people, like you and me that are
working. These are small teams of maybe 10 people.
They're in their corner. They're trying to find something.
They're not the billionaires. They're the guys like us just
trying to do stuff. And they come up with incredible
things. So, we do have to admire a big tech in that
respect. You can say anything you want but no one's
going to do as... what they've just done, it's industrial. So,
that's transformers.
Jon Krohn: 16:58 Yeah. You may already have heard of DataScienceGO,
which is the conference run in California by
SuperDataScience. And you may also have heard of
DataScienceGO Virtual, the online conference we run
several times per year, in order to help the
SuperDataScience community stay connected throughout
the year from wherever you happen to be on this wacky
giant rock called planet Earth. We've now started running
these virtual events every single month. You can find
Show Notes: http://www.superdatascience.com/513 10
them at datasciencego.com/connect. They're absolutely
free. You can sign up at any time. And then, once a
month, we run an event where you will get to hear from a
speaker engage in a panel discussion or an industry
expert Q&A session. And critically, there are also speed
networking sessions, where you can meet like mine, a
data scientist from around the globe. This is a great way
to stay up to date with industry trends, hear the latest
from amazing speakers, meet peers, exchange details, and
stay in touch with the community. So, once again, these
events run monthly. You can sign up at
datasciencego.com/connect. I'd love to connect with you
there.
Jon Krohn: 18:11 So, these transformers like OpenAIs, GPT-3 that you
mentioned like BERT that you mentioned. What
applications we've got Google Search that you talked
about? What applications do you teach?
Denis Rothman: 18:25 You can do like and do question. One of my favorites is
summarization for second-grade students. So, you're
going to say, "Denis, this guy, I'm in an interview with
this guy that's supposed to be super intelligent, and he's
interested in second grade summarizing. Maybe I will re-
edit this and cut that part out."
Jon Krohn: 18:50 No, no, I know that that's hugely valuable.
Denis Rothman: 18:53 So, my second grade summarizing thing, and I can give
you like, I'll give you many others in just the list. It's one
of the most interesting ones. Because, in fact, when we're
talking here, we look smart. You're talking-
Jon Krohn: 19:08 You do.
Denis Rothman: 19:08 ... on artificial intelligence. Whoa, do I look smart? Ask
me about plants, ask me about the names of flowers, ask
me why these insects live with these flowers in that forest,
Show Notes: http://www.superdatascience.com/513 11
and they don't live in another. What? I'm not a second
grader. I'm a baby. So, what I like to do now is I'll take an
article that's new for me, where I'm a baby, not even a
second grader. I'm not even a first grader. I'm nothing.
And I'll feed it to GPT-3, and I get this nice explanation
where I understand everything. I say, "Wow. I just really
liked that feature." So, it got me thinking," Why don't I go
to the question-answer thing?" So then, from there, I'm
going to go ask the questions but it's prompt engineering.
You can see what I'm getting at. It's the way you ask it. If
you ask it to explain like a college student, you'll get
something you won't understand and it feel like a second
grader.
Denis Rothman: 20:08 You're inventing the usage, in fact. So, you can go, "Now,
I go to question-answers." And I say, "Well, can you
explain like dark holes to me like you would to a child?"
And then, he doesn't. Now, I understand. Can you explain
like you would explain to a high school student? Now I
can understand better. Can you explain the same black
hole like a college student? Wow, great. Could you explain
some math with it? Okay. Could you give me some
equations? Okay. Now, can explain quantum computing?
Right. Can give me the Heisenberg equation? Okay. Can
you break it down for me? Could you write some code
now for me, where it's an HTML page? I see the equation.
I just want a little graph to show the waves.
Jon Krohn: 20:58 Wow.
Denis Rothman: 20:59 Yeah. So, now I have my HTML page. How am I going to
deploy it? Okay. He'll explain. Oh, I have a problem. I'd
like to put OpenAI in a Jupyter notebook. But what's the
code? Can you give me the code so I can just copy and
paste it? Okay, great. Okay.
Jon Krohn: 21:19 So, these examples, you're doing this on a daily basis.
Show Notes: http://www.superdatascience.com/513 12
Denis Rothman: 21:21 Yes.
Jon Krohn: 21:21 You're constantly querying GPT-3.
Denis Rothman: 21:23 Yes, I'm doing it right here.
Jon Krohn: 21:27 Right now. Actually, every-
Denis Rothman: 21:29 I'm doing it like from a TV. It's like people playing video
games. Like I'm here all the time playing around with that
stuff. It's insane. It's like I don't know where you're going.
It's an adventure.
Jon Krohn: 21:42 Yeah. If you're not watching the video version of this,
Denis is pointing at his phone and tapping away at it. But
not everyone has access to GPT-3, right? Don't you need
to be approved? I had to wait months. I just submitted an
application to get access to the API, and then finally got
it. So, it seems like not everyone today could just access
GPT-3 unless you have a workaround.
Denis Rothman: 22:08 And that's the trick. Let's go back to linguistics. Okay.
You go back to linguistics. What are we talking about?
We're talking about we have a lot of raw input. We have a
model. And it's the way we ask things that we get things.
And what's interesting is to play around, like I just said,
right? That's the interesting part. But if you go back to
my book, you can get GPT-2. And you can take GPT-2
and you can train it. Because what I did, for example, in
chapter three, I took a BERT model. And I took the works
of Immanuel Kant, the German philosopher. And I fed all
those books into it just to have fun. Then I began to ask
him, Kant, some questions, "Where does human logic go?
How does human thought?" The goal here is to play
around with it. You have to have a lot of fun otherwise
you'll never understand transformers. And you got to get
to talk to them to explain. Now, what did I just say
before? Google BERT drives Google Search. What I did is I
Show Notes: http://www.superdatascience.com/513 13
take props like the second grader stuff and I just copy it
into Google Search. And I'm deviating the use of Google
search by giving it long sentences, not just keywords.
Could you explain to me the solar system through the
eyes of a second-grade student? Please don't show me any
videos. I just want some text. Skip all that stuff. And I
just deal with these two things.
Jon Krohn: 23:48 Wow.
Denis Rothman: 23:51 Yeah, it's a transformer so it can absorb all of that. That's
what's new. So, in fact, you can train having fun with
transformers with Google Search. You can ask it
questions. Could you tell me this? Could you tell me that?
And then, you go on as it gives you answers in the
system. You can ask it for more difficult questions. Oh,
yeah, I got that. I got the Heisenberg equation. I
understand. And we keep [inaudible 00:24:14]. But now,
could you tell me more? You can talk to it, but people
don't know it's a transformer.
Jon Krohn: 24:20 Right. So, we're filming today on September 20th. And I
had just happened to be on your YouTube channel before
we started filming. And it was today, September 20th that
you published a how-to video with more detail on exactly
what you just described on how to use BERT in behind
Google queries to get lots of interesting information. So,
that's something that listeners can check out. So, Denis,
so Transformers for NLP, that was a beautiful and it was
a beautiful introduction to what natural language
processing is and the history of transformers. You got a
lot of great analogies in there particularly like the V8
analogies. But that was just your book this year,
Transformers for NLP. Tell us a bit about Hands-on XAI
with Python which came out in the summer of 2020. So, I
could do my little spiel. Explainable AI is, it's where we
apply algorithms to very complex models I guess like Bert
or GPT-3. And we apply algorithms to those so that we
Show Notes: http://www.superdatascience.com/513 14
can try to understand to get an explanation for how a
particular output was reached. Is that right?
Denis Rothman: 25:35 Well, yeah, so let's go back to linguistics again. Okay. So,
basically what we're saying, "What is an algorithm?" You
have an input and you have an output, and you have this
thing in the middle called algorithm. So, one problem
here is there's a confusion in many people's mind is that
Explainable AI is explaining the algorithm. Okay. So,
that's an area you can explore. But no, that's not
Explainable AI. Explainable AI is model-agnostic. I don't
even care about the algorithm. What do I care? Google
Search. Let's go back to Google Search. I'm on Google
Search. And I type, explain the Heisenberg equation.
Okay, what do I get? I can see the result. I don't need
Explainable AI. I know that I won't like the result because
I don't understand anything on that page. Okay. So, now
I'll do something called Shapley. It's the theory of games,
okay? It's like a basketball player. You have a team, and
you just take one player out. You're not scoring anymore.
You put that player back again, you're scoring. That's
Shapley. That's as simple as that algorithm is. Just pull
something out, see what happens, puts it back again, and
calculates the input. So, I'm saying, "Explain the
Heisenberg equation", which is, in fact, an interesting one
because it shows that you can't find the position and the
speed of a particle at the same time.
Jon Krohn: 27:09 Right, yes.
Denis Rothman: 27:10 If you're looking at the speed, you won't find the position.
If you look at the position, you won't find the speed.
Jon Krohn: 27:15 I've known that one since I was a kid because in Star
Trek, The Next Generation, I learned-
Denis Rothman: 27:19 That's right.
Show Notes: http://www.superdatascience.com/513 15
Jon Krohn: 27:19 Right? So, it's where you [crosstalk 00:27:22]. Yeah,
exactly. In order to be able to teleport, you'd have to know
this information, you'd have to know the speed and the
direction of all of the electrons and everything and pass
that information over to somewhere else, beam it over.
And so, often, I think they have issues with that, right? It
happens all the time.
Denis Rothman: 27:42 That's right, that's right.
Jon Krohn: 27:42 They're like, "The Heisenberg's uncertainty principle on
Dewar is broken." And now, you ended up with a nose on
your ear or whatever.
Denis Rothman: 27:53 Yeah, that's right. So, when you go back to Star Trek, the
Star Trek thing is you just take the input like of Google
Search and you see you don't like it. So, you say now,
"Could you explain the Heisenberg equation in Star
Trek?" So, now you'll get this nice explanation that you
just gave. And you say, "Well, maybe I can't tell it, I can't
write about that that way." Well, Can you explain the
Heisenberg equation like for second graders? So, you can
see that when you add things and you subtract things,
you get different. So, that's Shapley. That's also LIME.
That's also Anchors. It's about all the algorithms are in
that book. And it's model-agnostic. People keep trying to
look into layers. I would encourage someone to try to look
into a GPT-3 99 layer model with 170 billion parameters
and tell me which parameter influenced the input of the
record that was in position 2,100,000,000. It's
impossible.
Jon Krohn: 28:56 Yeah, It's meaningless.
Denis Rothman: 28:58 You can do with small parts. People from Facebook do
that. They just plug it in to see some things. But in fact,
the funnier thing is the [inaudible 00:29:10], which was
around in France in the same days I was at the Sorbonne,
Show Notes: http://www.superdatascience.com/513 16
and there was big fights between people on artificial
intelligence. He wrote interpretable AI and he says, "We're
going to peek into a transformer to see what it means and
we're going to use LIME." But LIME is a model-agnostic.
So, what I'm saying is it's model-
Jon Krohn: 29:41 Right, right, right, right, right.
Denis Rothman: 29:42 We don't care. If I go to a store and I buy a phone. And
that phone and I go home, it doesn't work, I don't care
what's in that phone it doesn't work. Or if I buy a phone
and the ringtone is always wrong, I don't care. So, it's
model-agnostic. So, you take the input, you look at the
output, and you play around with the input again to see
how it influences the output. And you see which word or
which image, or that's Explainable AI in a nutshell. And
you can do a lot of things. One of the fun things I did,
which is a very funny one is I took the US Census data. I
had a lot of fun with that one. It was the US Census data.
And I had this program that was, in fact, given by Google.
But they're always very careful about this now. I was
explaining how you can figure out why someone's going to
earn more than $50,000 or less than $50,000 based on
the US Census data. And I was looking at the fields in the
data set. It's in my books somewhere on chapter four or
five.
Denis Rothman: 30:51 And I say, "Gee whiz." Eighty percent of what's in there is
forbidden in Europe. You had race. It's strictly banned in
Europe. Because there's a legal problem with Explainable
AI. In Europe, you have to explain why your algorithm did
that. And if you have race in there, you can get a fine up
to 20% of your sales. You're talking millions and for big
tech, billions. So, you want to be careful. So, I said, "Gee,
how can they do that?" You can only do that in the
States. Right? But what is race have to do with revenue?
Wow. So, let me take that out. I just pulled that out of
there. And I reran an algorithm. They tweaked a bit. And I
Show Notes: http://www.superdatascience.com/513 17
said, "Yeah, I'm getting good results as good as theirs."
And in fact, you had Jamaican as a race. I mean that was
[crosstalk 00:31:45]. So, I just took the whole field out
and say, "Get this out of here." Then I take another field,
so we're back to Shapley again, right? I'm taking another
field out and I'm saying, sex, female, male. Does it really
matter in 2021 in the States if you have a college degree?
Is a PhD woman, female doctor going to earn less than a
male doctor? I don't think so. So, let me take all that out
too. Forget it. Take all that out. Because that's
discriminating and it's bad.
Denis Rothman: 32:24 And today we don't really want that because you have
transgender people that don't... or people that are
transgender, or people that don't to be considered as male
or female. We're in the new world, the new era. What am I
going to do? Put other. So, we're going to have other on
the stat... I just pulled that, feel that's useless. So then, I
go to another field and I'll stop on that one. I'm saying,
"Now they're saying marital status." Is the person
married, divorced? So, let's sum it up. I took every field
out of there and I just left two fields in there, age and
years of education. And I'm saying if someone has 15
years of education starting from elementary school all the
way college, that person has a probability of earning more
than a person that has no education at all, just drops out
in 10th grade. So then, I go back and say, "But age is a
factor." Because if I'm five-years-old, I'm not going to earn
as much as when I'm 20, 25, or 30. So, I just found out in
the 25 to 45 or 30 to 50-year bracket, you earn a lot
more. And then, when you're older, your brain is not so
fast. So, it goes back to baby [inaudible 00:33:36]. And
with just these two fields in Explainable AI, look at all the
noise in your data. You could just kick all that stuff out.
So, it's both explaining in a model and agnostic way. I
didn't speak about a model here, just data input and
output. And it's trying to be ethical at the same time I say
"Get all that data out of there and get the bias out of
Show Notes: http://www.superdatascience.com/513 18
there. You don't need it." Because there's nothing to talk
about. Age and revenue, that's it... age and education.
Jon Krohn: 34:12 Makes a lot of sense to me. So, in my day job, we build a
lot of algorithms for predicting who's a great fit for a job
and it's the same thing. Things like gender or race, those
cannot be in your model.
Denis Rothman: 34:26 Of course, they're useless. It's not even a question. It's
ethical. And it's useless, because it has nothing to do with
it. The subject, you can come in. And if that person has
either education or the experience that can compensate
for a lack of education, we're looking for competence, for
abilities. We're not looking for insane stuff for where they
came from. I don't care. I've hired tons of people in my
life. I don't even care where they came from. I don't even
look at their resume. Half of the time, I don't even care.
They come in. I don't want to see. Are we in sync or are
we smiling together? How do we feel together? And you
understand my questions about the job. You understand
what you're going to do. Okay, you have some college
degrees, that's fine. Okay, well, let's get to work and see
what happens. And if you like it, you'll stay, and if you
don't like it, you'll go. So, you can spend a few months
here. We'll see what happens. That's the best way to hire
people. Because they really love you. And then, you get
into this thing. You say, "Gee, he hired me. He didn't ask
me any stupid questions. I want to work hard. I want to
stay there."
Jon Krohn: 35:35 Right. That makes a lot of sense. All right, so we've talked
about Transformers for NLP, the book that came out this
year. We've now talked about just recently Hands-on XAI
that came out last year. So, that Hands-on XAI book
came out in July of 2020. Just a few months before that
in February 2020, you had another book called AI By
Example. So, maybe we just quickly what is that book all
about?
Show Notes: http://www.superdatascience.com/513 19
Denis Rothman: 36:00 Oh, generally, I write a book in between two and a half
months and three months.
Jon Krohn: 36:05 Wow.
Denis Rothman: 36:09 That's how I work. Now, why do I write so fast? You have
the whole-
Jon Krohn: 36:16 Because you get GPT-3 to do it.
Denis Rothman: 36:18 That's right. I didn't even write the book.
Jon Krohn: 36:21 Alright.
Denis Rothman: 36:22 I have [crosstalk 00:36:22]
Jon Krohn: 36:25 You can just do it in Google with BERT. You just say
BERT-
Denis Rothman: 36:27 That's right, write the book. So, what I'm saying here, you
go back to what I was talking about Sorbonne University
and education. I have cross-disciplinary education. And
my first patent, word to vector patent, word piece was
1982. I registered another patent in 1986 for expert
system chatbots. In 1986, I got my first artificial
intelligence contract in aerospace and the company now
called Airbus. And at the same time, I entered [Luxury
00:37:06]. So, I have so much practical experience in
corporations. I never went through the AI winter. I didn't
even notice there was an AI winter. If you told me there
was an AI winter then I'll say, "Well, where is it? Because
it's pretty hot out here."
Jon Krohn: 37:28 Yeah, not in Burgundy. There was no way a winter in
Burgundy.
Denis Rothman: 37:33 No, no. So, Artificial Intelligence By Example is a very
simple story. Tushar Gupta from Packt noticed my
LinkedIn profile. He says, "You have a lot of experience.
Show Notes: http://www.superdatascience.com/513 20
Why don't you share it?" And I say, "Well, I don't really
need the money." Because I just sold my company.
Because in 2016, AI became fashion and everyone's
talking to my company. All of a sudden, I said, "Yeah." I
told my wife, "Let's sell it."
Jon Krohn: 38:05 That was Planilog.
Denis Rothman: 38:08 Yeah, that's right. I sold it. We sold in 2016. And then, I
trained people for two or three years. And then, in the
meantime to Tushar says, "Why don't you share all that
experience, these patents and stuff you wrote with people
that it would give... a nice book where people get case
studies and all that." And I say, "Why would I write a
book? I don't need the money. I just sold my company. I
don't want to do anything. I want to stay home and play
video games." And he's saying, Yeah-
Jon Krohn: 38:37 He said you don't write a book for the money.
Denis Rothman: 38:42 No. Well, you don't write books for money.
Jon Krohn: 38:43 No.
Denis Rothman: 38:44 Any author will tell you that. You don't earn a lot of
money writing books, technical books. You earn money
writing like Stephen King, but not writing technical
books. I don't see how you can earn money with that. But
I was thinking maybe he's right, maybe I should share
this with my family and friends. Because I never had time
to explain my job. And then, there are a lot of people on
LinkedIn asking me these questions all the time. Maybe it
could help them. And maybe I'll meet a lot of people that
way because I have people from 100 countries on
LinkedIn. And maybe I'll learn stuff from them because I
like culture. I like every country. I like China. I like the
United States. I like Iran. I like Israel. I like Germany, any
country. You give me any country, you always find nice
Show Notes: http://www.superdatascience.com/513 21
people. Because people are always thinking governments.
So, they don't like the government, which means the
whole population is sentenced to death. No, you have a
nice people everywhere.
Jon Krohn: 39:47 For sure.
Denis Rothman: 39:48 Yeah, right. So, he got me into that. I wrote the book in
three months. So, it wasn't that much of a big deal. What
happens is the book is written in my mind. Like I'm
watching TV. And the book is just up there. And it's like a
woman carrying a baby, and all of a sudden, it's just a
pain. I have to get out of my system. So, I'll be writing at
full speed. You just can't stop me. It won't stop me. It's
the wham, wham, wham, wham, wham. Sometimes, I
even get a chapter done in a day. So, I was writing and
writing and writing, and I can't stop. I just can't stop it
until I get to the last page and I say," Okay, now I'm
okay." So, it's a compulsion. It's not something... and I've
been thinking about the stuff for decades. I spend every
day, I spent at least five hours thinking. Even when I was
working in my company, I was always spending two... I
stopped working generally around 4:00 pm for operational
stuff and I would think until 9:00 in the evening. I read
books, philosophy, sociology, linguistics, computer
science, math. So, I was constantly building up my
theories and I have a theory of artificial intelligence in my
mind. So, I just have to organize it for the book. I know
where it's going. I know the next step after transformers.
I'm just waiting for things to happen.
Jon Krohn: 41:22 Alright. So, Denis, you've given us amazing context on
your books. So, you had AI By Example that you
explained last there, which based on your experience of
35 years of consulting that you are able to provide that to
the audience very quickly. That's how you got these books
written so quickly. In just a few months, you were able to
distill your 35 years of consulting experience with
Show Notes: http://www.superdatascience.com/513 22
artificial intelligence and surely the readers benefit greatly
from all that experience. We also talked about Hands-on
XAI and Transformers for NLP. So, now let's jump to some
audience questions. We had tons of great ones on
LinkedIn. Your audience is so engaged because you do
answer all their questions online. And so, today, we're not
going to have time to go through every single question
that's come up. There are so many. But I think Denis is
probably going to end up... you're going to end up going
over these.
Denis Rothman: 42:21 Yeah, I think I might even make a post maybe at the end
of the week, where mentioned our podcast. And I take all
the questions from the comments in your post and make
sure that all of them are answered.
Jon Krohn: 42:37 Nice. Well, that sounds really great.
Denis Rothman: 42:38 And then, just tag all the people that ask the questions.
Jon Krohn: 42:42 Perfect, they will greatly appreciate that. So, the first
question here is from Serg Masis, who is also an author of
a book on Explainable AI. And so, he was curious what
XAI methods or libraries you use most for transformer
models.
Denis Rothman: 42:58 Okay, so let's say... let's go back to Explainable AI. The
best Explainable AI is model-agnostic unless you're a
developer and you want to see what's going on inside. But
you might have problems with 96 layer GPT-3 model and
170 billion parameters. So, you can do it. So, you just
need the input, the output. And it's like in a soccer team
to see if I take this player out, what happens? So, that's
Explainable AI is model-agnostic, like LIME is model-
agnostic. Shapley is model-agnostic. So, you just want to
take the input, look at the output, and then tweak, play
around with the input and see what happens to the
output until you find the trigger. And so, it's model-
Show Notes: http://www.superdatascience.com/513 23
agnostic. So, you can use any model-agnostic Explainable
AI on any algorithm and it doesn't even need to be
transformers or artificial intelligence, because Shapley
existed before. So, it applies to anything. Think of it like a
recipe for a nice cake you like and the person says, "I like
your cake, but I'm not really a specialist. I can't tell you
what I like in your cake." So, the person can say, "I like
you so much that week by week when I bake that cake,
I'll take some ingredients out until we find which one is
missing." And then, at one point, guy said, "Yes
cinnamon, it's the cinnamon I like in your cake. It's
cinnamon."
Jon Krohn: 44:37 It's a great analogy.
Denis Rothman: 44:39 We have the input, the output. That's it, the ingredients
and the result.
Jon Krohn: 44:47 Beautifully said, I love that analogy. There's another
question here that is something that we've already talked
about, you and me, Denis. So, I'm going to give a
summary answer. So, there's a question here from Jean-
charles Arald if you pronounce it the French way. And it's
this point about how these transformer models are getting
so big. So, trillions of parameter. Do we really need this
many for human language given that we have a limited
vocabulary, maybe only a couple of 1000 words for most
people? And he makes the point that bees, he seems to
suggest they only have dozens of neurons and that's
efficient for them. So, what are we missing in our models?
So, I did a neuroscience PhD so I'm going to quickly give
some summary thoughts here. And then, I'm going to
open up the floor to you, Denis.
Jon Krohn: 45:36 So, bees don't just have dozens of neurons unless I'm
misreading something here. They definitely have at least
hundreds of millions, maybe billions of neurons. A
human brain has 90 billion neurons, but the key thing
Show Notes: http://www.superdatascience.com/513 24
here is that we don't conflate neurons with parameters.
So, the question, it says, "Why do we need trillions of
parameters?" Well, even a human brain with its 90 billion
neurons, the connections, which are equivalent to the
parameters in a model, there's more connections than
there are stars in the universe. It's an obscenely large
number. And so, I think that's the answer is that, but
yeah, so our transformer models today couldn't even
approximate. They're still not as good although you may
disagree.
Denis Rothman: 46:21 Yeah, no, they're not as good. They can't be. So, if we go
back to neuroscience, and because machines are not
brains. So, that's one thing you need to know, too, like a
calculator. Texas instrument calculator is not built like
our brain. So, that's like a projection. We have to take
out. It's like children looking at a puppet show and
thinking that that's a real person. No, the machine is not
a real person. It has nothing to do with the brain, in fact,
but let's keep on that topic and not try to elude it. So, we
say, of course, you have a lot of neurons, and you can't
mix them up with the connections and the connections
are the parameters in the transformers. So, trillions is
nothing. But there's another problem, much deeper for
both systems, for both machines and humans, which you
know as a neuroscientist, is that when we build a
representation, we don't build it in one part of the brain.
Like if I want an apple, okay, apple, no. I have the
language part that's going to do something then I have
the color part.
Denis Rothman: 47:31 There's so many things are lining up in there. And it's not
exactly common to every human being. It's about that
because I can have an apple associated to someone who
threw me an apple when I was six years old and hit me.
So, I have another part of my brain saying an apple, no.
And then, other person says an apple a day keeps the
doctor away. And then, you have this dopamine part. So,
Show Notes: http://www.superdatascience.com/513 25
it's extremely complex to see what is going to fire up in a
brain with a word. And it's equivalently difficult with a
transformer because of the billions of opinions you have
on the web. So, one person can say, "I like gun control."
The other say, "No, I don't want gun control." I want to
take a vaccine. I don't. So, it adds up to different
representations. So, the model has to take all that into
account. And they have to feed some ethics into it. So, it's
big. So, trillions is nothing in fact. It's just the
connections. And the neurons are very few neurons, in
fact, in models like that. And then, the question about
can we find easier models?
Jon Krohn: 48:42 Yeah, exactly. So, there's a question from Dr. Chiara
Baston, who's in Italy. And yeah, she asks can we do
better with simpler models?
Denis Rothman: 48:49 Well, you can't find simpler than Shapley. The thing is,
probably when you're looking at posts or books, people
show you all these diagrams. And no, it just add up. I'm
in my kitchen, and I forgot to put enough sugar in my
cake. So, when it comes out, my children say it's not
enough sugar in there. So, that's bias, right? They're
saying, I want more sugar. Or if there's too much sugar,
like United States can spill a lot of sugar into a pastry
and we have less than Europe. So, I'm going to say that's
bias. Why did you pour all that sugar in there? So, you
have the ingredients, you have the recipe. But think of
that, how many people can go to a restaurant with, I
would say even in a McDonald's restaurant because
people always making fun of burgers. But yeah, well, how
is the bread made? Tell me the ingredients in the bread.
No one can do that.
Denis Rothman: 49:47 We're in a complex world. It's not easy. Even if you have
Shapley which is very simple or LIME, you just take work.
And even talking about bees, that's a problem too
because we're forgetting something, is their memory, is
Show Notes: http://www.superdatascience.com/513 26
they have patterns they're using with their body. The bees
go around in certain ways to signal things to other bees.
And they're using a language that we don't... we're trying
to understand. So, nature is extremely complex as well.
Like an ant. An ant has a few neurons. Yeah, well, what
about a whole group of ants? Wow, lots of brain. And no
one can understand it.
Jon Krohn: 50:35 Yeah, that's a really great point as well. So, as I
mentioned, we did not have time in the episode
unfortunately to get a response on air to every question,
but it sounds like Denis is going to make a-
Denis Rothman: 50:47 Yeah, yeah, I'll answer all the questions.
Jon Krohn: 50:48 He'll answer all these questions.
Denis Rothman: 50:48 Yeah, sure.
Jon Krohn: 50:48 And so, it's-
Denis Rothman: 50:51 Somewhere at the end of the week.
Jon Krohn: 50:54 Nice. Well, so, that'll be up before the episode airs. So,
Denis, obviously, people should follow you on LinkedIn.
That's a great place to get in touch and ask questions. Are
there any other places that people should follow you?
Denis Rothman: 51:06 No, I just work on LinkedIn.
Jon Krohn: 51:08 Perfect.
Denis Rothman: 51:09 Because I work on LinkedIn. And when I'm finished, I go
see my family and friends.
Jon Krohn: 51:15 That sounds great. When you're in the region that you're
in, it must be very nice visiting-
Show Notes: http://www.superdatascience.com/513 27
Denis Rothman: 51:20 Yes, yes. Yes. Because we get along with our neighbors.
We can go downtown and eat right in front of a medieval
cathedral, and in Paris and the places I live and go
around. And we have monuments. There's place where
there's even a castle across the way from where I stay. So,
yeah.
Jon Krohn: 51:40 It's beautiful. I can see you, yeah, one social medium is
enough.
Denis Rothman: 51:43 That's right.
Jon Krohn: 51:44 All right. Denis, thank you so much for being on the
program. And we'll have to have you again sometime.
Thank you so much for your time.
Denis Rothman: 51:49 Sure, when you want. Okay, bye-bye.
Jon Krohn: 51:57 What a character Denis is. I had an absolute hoot filming
this episode. Today, he filled us in on the history of
transformer architectures particularly highlighting
OpenAis, GPT-3 model and Google's BERT model. He
talked about how with his Transformers for NLP book you
can learn how to find tune the GPT-3, precursor
algorithm GPT-2 to perform state-of-the-art natural
language processing capabilities like question answering
and text summarization. And he talked about how Shap
and LIME can be used to explain how an AI algorithm is
arriving at its output no matter whether it's a simple
algorithm or a billion parameter transformer model.
Jon Krohn: 52:40 As always, you can get the show notes including the
transcript for this episode, the video recording, any
materials mentioned on the show, the URLs for Denis's
LinkedIn profile as well as my own social media profiles at
superdatascience.com/513. That's
superdatascience.com/513. If you enjoyed this episode,
I'd greatly appreciate it if you left a review on your favorite
Show Notes: http://www.superdatascience.com/513 28
podcasting app or on the SuperDataScience YouTube
channel. I also encourage you to let me know your
thoughts on this episode directly by adding me on
LinkedIn or Twitter and then tagging me in a post about
it. To support the SuperDataScience company that kindly
funds the management, editing and production of this
podcast without any annoying third-party ads, you could
consider creating a free login to their learning platform.
It's superdatascience.com. You can check out the 99 days
to your first data science job challenge at
superdatascience.com/challenge. Or, you could consider
buying a usually pretty darn cheap Udemy course
published by Ligency, a SuperDataScience affiliate, such
as my own, Mathematical Foundations of Machine
Learning course.
Jon Krohn: 53:47 Thanks to Ivana, Jaime, Mario, and JP on the
SuperDataScience team for managing and producing
another terrific episode today. Keep on rocking it out
there, folks, and I'm looking forward to enjoying another
round of the SuperDataScience Podcast with you very
soon.