sds podcast episode 513: transformers for natural …

28
Show Notes: http://www.superdatascience.com/513 1 SDS PODCAST EPISODE 513: TRANSFORMERS FOR NATURAL LANGUAGE PROCESSING

Upload: others

Post on 21-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Show Notes: http://www.superdatascience.com/513 1

SDS PODCAST

EPISODE 513:

TRANSFORMERS

FOR NATURAL

LANGUAGE

PROCESSING

Show Notes: http://www.superdatascience.com/513 2

Jon Krohn: 00:00 This is lucky episode number 513 with Denis Rothman,

an award-winning author on artificial intelligence.

Jon Krohn: 00:08 Welcome to the SuperDataScience Podcast. My name is

Jon Krohn, chief data scientist and bestselling author on

Deep Learning. Each week we bring you inspiring people

and ideas to help you build a successful career in data

science. Thanks for being here today, and now let's make

the complex simple.

Jon Krohn: 00:42 Welcome back to the SuperDataScience Podcast. Today's

guest is the colorful and ethically industrious Denis

Rothman. Denis is the author of three technical books on

artificial intelligence all of which have come out in the

past two years. These books are on AI in general with

particular focuses on Explainable AI and the giant

transformer models that have revolutionized Natural

Language Processing or NLP for short. His most recent

book called Transformers for NLP led him to win this

year's data community content creator award for

technical book author. Prior to becoming a full-time

author, speaker, and consultant, Denis spent 25 years as

the co-founder of a French AI company called Planilog,

which was acquired three years ago. All told, Denis has

been working on AI for 43 years since 1978 and has been

patenting AI algorithms such as those for chatbots since

1982.

Jon Krohn: 01:50 In today's episode, Denis leverages vivid analogies to fill

us in on what natural language processing is, what

transformer architectures are, and how they've

revolutionized NLP in the past few years. He also talks

about tools we can use to explain to get an understanding

of why complex AI algorithms provide a particular output

when provided a given input. This episode should be well-

suited to anyone who'd like to keep on top of what's

possible in AI today regardless of your background

practicing data scientists in particular. We'll also

Show Notes: http://www.superdatascience.com/513 3

appreciate Denis's mentions of particular modeling

approaches and software tools. All right, you ready? Let's

do it.

Jon Krohn: 02:39 Denis, welcome to the SuperDataScience Podcast. Where

in the world are you calling in from?

Denis Rothman: 02:45 Okay, thank you, and thank you for inviting me. Right

now I'm 150 kilometers from Paris. I'm out in the country

in the Champagne region then you have Burgundy and all

that. I'm around that place.

Jon Krohn: 03:02 Wow. That does not sound unpleasant.

Denis Rothman: 03:05 It's very pleasant.

Jon Krohn: 03:07 It sounds amazing. Is that like a COVID thing or you're

out there all the time?

Denis Rothman: 03:12 No, no, no. I like Paris and like to be out of Paris. It's like

being in Manhattan. And then, you go out a bit to the

northwest, just have to go 20 miles and you're in the

woods. You're in the forests in New York State. So,

around Goshen or places like that.

Jon Krohn: 03:30 I've heard. Yeah, I hope to someday spend time outdoors

just like this thing. So, as we discussed before the episode

started, I'm Canadian. And so, people often have this idea

of you being outdoors. But I grew up in Downtown

Toronto and now I live in Downtown Manhattan. And I

haven't experienced much outdoors at all. But I've heard

it's wonderful. And someday I'll experience that, too.

Denis Rothman: 03:56 Toronto is a nice place, too.

Jon Krohn: 03:58 Toronto is nice. It doesn't have a Champagne or

Burgundy region around it. We got the Niagara region,

which is our best imitation.

Show Notes: http://www.superdatascience.com/513 4

Denis Rothman: 04:06 That's why I chose France in fact because I could live

anywhere but I found that the quality of life like you have

medieval culture that you can't find in North America,

medieval culture, universities that go back to the 13th

century. I like that part. And then, you go to modern

Paris. I like that. But I like to travel, so it's not really a

problem.

Jon Krohn: 04:30 I've noticed from videos that I've seen of yours in the past,

you have very interesting art in the background. I think

you studied history at points in your career.

Denis Rothman: 04:42 Yeah. I paint. I play the piano. I was born in Berlin in

fact, and my father was a military lawyer for NATO so I

traveled all around all the time. But my dream was to go

to Sorbonne University. That was my thing. Because in

those days, the president of university says, "What you

came here, it's because you're really interested in the

history, the geography, archeology, mathematics,

linguistics." So, you can major in something. But in this

university, you can go to any class and you can get

credits for any. So, I would go into this cross-disciplinary

education, which was very fascinating. That's why I spent

so many times. I went to three Sorbonne universities in

fact. I just couldn't stop learning in there. So, yeah.

Jon Krohn: 05:37 Wow.

Denis Rothman: 05:38 Yeah. So, I studied a lot of everything.

Jon Krohn: 05:42 That sounds amazing. That's like my dream retirement. I

wonder if they'll accept me then.

Denis Rothman: 05:46 And I wanted to start my life like that, like thinking like

that. Because at one point, I was working a lot in the

states for student money, college money. And I was

driving cars, this driveaway thing where they give you a

car, and then you can take it anywhere. So, I cross

Show Notes: http://www.superdatascience.com/513 5

around the state. And one day, I was sitting in Florida,

and I say, "Do I want to live here? What do I want to do?"

Okay, I really want to go to Sorbonne University, because

I could have stayed down in Palm Beach and had a nice

life, study there. But no, I wanted to come back to Paris

and live this educational thing. And there's so many

cultures right next to Germany, Spain, Italy, Portugal,

UK, Belgium. It's incredible. I'm forgetting countries. I

don't want to leave the viewers out. Like Netherlands,

Luxembourg, Denmark. You just sit there and you have

all these people there. You're living in the world.

Jon Krohn: 06:45 Yeah, it's rich in culture. I am jealous. It sounds like

you're in the right spot to be.

Denis Rothman: 06:50 No, Manhattan is great.

Jon Krohn: 06:53 Yeah. It's a very concentrated piece of culture. And then,

as you say, you go 20 miles out and you're just in the

woods.

Denis Rothman: 07:03 That's right. People don't realize that. But you're only 30

minutes from beautiful nature, just right northwest, just

go through Washington Bridge out there and that's it.

Jon Krohn: 07:13 Yeah. So, amongst all of the learning that you've been

doing in recent years, there's been a fair bit of learning

and teaching of mathematics and artificial intelligence,

machine learning to the extent that you've published

books at an incredible rate. So, this year, you published

Transformers for Natural Language Processing. Last year,

you published two books, Hands-on Explainable AI with

Python, and just a few months before that, AI By

example. So, I'd like to dig into each of these books. I've

got a copy of Explainable right here but I want to go

backward chronologically. So, let's start with

Transformers for NLP. What is this book about? So, I can

give a little bit of color maybe for the audience but you

Show Notes: http://www.superdatascience.com/513 6

could do a much better job. So, natural language

processing is the application of data science or machine

learning to make use of natural language in some way

like written language or spoken language and yes, to

maybe automate things, and transformers are

particularly interesting in recent years because they've

been shown to have unprecedented accuracy at a lot of

natural language tasks.

Denis Rothman: 08:34 So, yeah, well, let's take this back a second. When you're

talking about NLP, you're talking about linguistics. If

you're talking about linguistics and machines, it's

computer linguistics. Okay. So, we're going back to

theory. And there's one little thing we have to understand

is that we're getting inputs with data. You get a lot of

data. So, that's the input. You have all this raw data,

billions and billions pieces of data. And on the other side,

you have to do some representation of reality so it doesn't

look like murky results, right? So, up to now, you had all

that input and then you had to get good representations.

But there were several models like you would do k-means

clustering, then you do parsers, then you do recurrent

neural networks, and then you can do CNNs and all. So,

it was a bit like a lot of tools to do all. So, every time you

had to do a task, you had to find out another tool like an

SVM. So, for 35 years because I started very early in

artificial intelligence, okay, so I saw no change, and I say,

"Where is this going?" These people are writing a lot of

algorithms. And I wrote one algorithm 25 years ago that's

running all over the world while we're speaking. So, I say,

"Why do they write all these algorithms when you can get

one universal algorithm to do the job?" Of course, I wrote

it for supply chain management and not NLP.

Denis Rothman: 10:17 So then, all of a sudden, Google around 2017 has this

problem. We have 5 billion searches per day. We're having

problem with the US Senate because they keep asking us

questions like... I'm speaking like big tech like Mark

Show Notes: http://www.superdatascience.com/513 7

Zuckerberg is called to the Senate. This is the reason

transformers exist. He's called to the Senate and they say,

"You know there's that post." And he's thinking, "What

post? There are 2 billion posts a day." Yeah. But that

post. He's thinking, "What are you talking about? I'm a

multi-billionaire. I'm surfing all the time. And they're

asking me about the 1, 500,000,000. I don't even know

what's in that post. I'm trying to do my best here." And he

says, "I don't have the tools." So, he's thinking, "Go see

my team." And the team says, "Well, we can't. We just

can't. We have 100 algorithms in there. We're not making

it. We're not making progress." Twitter has the same

problem, Amazon, Microsoft. So, one point, Google says

we have to stop all that. We need something industrial.

Denis Rothman: 11:28 So, instead of having a convolutional neural network, we

have layers but none of these layers are the same size,

none of these layers do the same thing. That's like a 1930

car. No, what we want is a V8. A V8 engine looks

beautiful inside like eight engines here, right, V8. So, they

come up and say, "Let's forget about this recurrent neural

network stuff. We want a V8." So, let's start with eight

heads, which are like a V8 engine. Let's start with eight

heads. Forget about recurrent stuff and all these layers.

And we want to write a layer and we're going to put the

layers, and every layer is the same size. Let's make every

layer the same size that way we have an industrial model.

It's like a rectangle. And we just stack these same layers,

same size. They come up and say, "Well, that's not

enough. We're not going to go fast enough with that. Wait,

let's take one of these layers and split it into a V8. Wow.

And now, we're going to run those eight layers, those

eight parts of a layer on eight heads on eight GPUs on

eight processors at the same time." Wow. They're going to

run there. All these words are just going to analyze other

words. We just want to say, Denis and Jon. Jon has a

guitar behind. Does he play the guitar? Let's put all that

together and see where that word fits into context.

Show Notes: http://www.superdatascience.com/513 8

Denis Rothman: 12:56 And once that layer is over, let's not mix it all up. Just

add it and send it to the next layer that will do the same

thing, building on what it learned in the first layer but it's

always the same size. So, one point they reached this

model and no one knew what was going to do because it

was training on raw data and it wasn't really labeled data.

They use label data just to show people. It did excellent

results. And then, all of a sudden OpenAI comes along

and says, "You know what? That's a good idea. Why don't

I create a stack not with 16 layers but 96 layers? Instead

of using 5 billion params, why don't I do 175 billion

parameters? And why don't I ask Microsoft for a

supercomputer, a $10 million supercomputer with 10,000

GPUs and 10s of 1000s of CPUs?" And now you have a

factory. So, now you have this industrial model V8, and

it's just there and it's going fast, fast, fast.

Denis Rothman: 14:03 And then, all of a sudden they wake up and they say,

"Uh-oh, what does it do? How is it possible to do all these

tasks?" And in fact, they discovered because it's called

emergence. Emergence is when you don't know what's

going to happen but it just emerges out of all that training

that in fact, the system, a GPT-3 transformer or a BERT.

They just learn the language. And once they learn the

language, it's based on what you ask them the prompt.

So, if you type nice prompts, it will analyze it as a

sequence and it will try to find out what follows. So, in

the end, you end up with the GPT-3 model trained on a

supercomputer and you can ask it anything you want.

Give me the synonyms of headphones and stuff. You can

invent your own tasks or give me a grammatical

breakdown of the sentence, or recently, why don't you

just take my... when I'm writing, translate it into Python

instead of translating it into French or translating to

JavaScript. And just to finish the little story, you bounce

back to Google and says, "Why don't we create a trillion, a

trillion parameter model?" And that thing is going to be so

big that you know it's going to exceed human capacity.

Show Notes: http://www.superdatascience.com/513 9

And people saying, "Gee, where are you going to get the

computer with the other one that was $10 million that's

one of the top 10?" Google says, "Yeah, but why are we

bothering ourselves with all those floating points?" We

don't need all that, those floating. So, let's build our own

TPUs and just cut all that floating-point stuff out of there

so we have a domain-specific machine.

Denis Rothman: 15:47 And now, they've created supercomputers that we can

rent for just a few $100 an hour, which is not much for a

corporation. That's even more powerful than OpenAI has.

And then, you can train what you want. And then, the

beautiful thing is it bounced backed into Google that has

BERT, and Google Search now is based on BERT.

Everything is BERT in Google Search. So, you see how we

went in... a few years, we went from prehistoric artificial

intelligence to super-industrial and industrialized society.

And big tech did that miracle. I mean you can say

anything you want about them. But what people don't

understand, just like people, like you and me that are

working. These are small teams of maybe 10 people.

They're in their corner. They're trying to find something.

They're not the billionaires. They're the guys like us just

trying to do stuff. And they come up with incredible

things. So, we do have to admire a big tech in that

respect. You can say anything you want but no one's

going to do as... what they've just done, it's industrial. So,

that's transformers.

Jon Krohn: 16:58 Yeah. You may already have heard of DataScienceGO,

which is the conference run in California by

SuperDataScience. And you may also have heard of

DataScienceGO Virtual, the online conference we run

several times per year, in order to help the

SuperDataScience community stay connected throughout

the year from wherever you happen to be on this wacky

giant rock called planet Earth. We've now started running

these virtual events every single month. You can find

Show Notes: http://www.superdatascience.com/513 10

them at datasciencego.com/connect. They're absolutely

free. You can sign up at any time. And then, once a

month, we run an event where you will get to hear from a

speaker engage in a panel discussion or an industry

expert Q&A session. And critically, there are also speed

networking sessions, where you can meet like mine, a

data scientist from around the globe. This is a great way

to stay up to date with industry trends, hear the latest

from amazing speakers, meet peers, exchange details, and

stay in touch with the community. So, once again, these

events run monthly. You can sign up at

datasciencego.com/connect. I'd love to connect with you

there.

Jon Krohn: 18:11 So, these transformers like OpenAIs, GPT-3 that you

mentioned like BERT that you mentioned. What

applications we've got Google Search that you talked

about? What applications do you teach?

Denis Rothman: 18:25 You can do like and do question. One of my favorites is

summarization for second-grade students. So, you're

going to say, "Denis, this guy, I'm in an interview with

this guy that's supposed to be super intelligent, and he's

interested in second grade summarizing. Maybe I will re-

edit this and cut that part out."

Jon Krohn: 18:50 No, no, I know that that's hugely valuable.

Denis Rothman: 18:53 So, my second grade summarizing thing, and I can give

you like, I'll give you many others in just the list. It's one

of the most interesting ones. Because, in fact, when we're

talking here, we look smart. You're talking-

Jon Krohn: 19:08 You do.

Denis Rothman: 19:08 ... on artificial intelligence. Whoa, do I look smart? Ask

me about plants, ask me about the names of flowers, ask

me why these insects live with these flowers in that forest,

Show Notes: http://www.superdatascience.com/513 11

and they don't live in another. What? I'm not a second

grader. I'm a baby. So, what I like to do now is I'll take an

article that's new for me, where I'm a baby, not even a

second grader. I'm not even a first grader. I'm nothing.

And I'll feed it to GPT-3, and I get this nice explanation

where I understand everything. I say, "Wow. I just really

liked that feature." So, it got me thinking," Why don't I go

to the question-answer thing?" So then, from there, I'm

going to go ask the questions but it's prompt engineering.

You can see what I'm getting at. It's the way you ask it. If

you ask it to explain like a college student, you'll get

something you won't understand and it feel like a second

grader.

Denis Rothman: 20:08 You're inventing the usage, in fact. So, you can go, "Now,

I go to question-answers." And I say, "Well, can you

explain like dark holes to me like you would to a child?"

And then, he doesn't. Now, I understand. Can you explain

like you would explain to a high school student? Now I

can understand better. Can you explain the same black

hole like a college student? Wow, great. Could you explain

some math with it? Okay. Could you give me some

equations? Okay. Now, can explain quantum computing?

Right. Can give me the Heisenberg equation? Okay. Can

you break it down for me? Could you write some code

now for me, where it's an HTML page? I see the equation.

I just want a little graph to show the waves.

Jon Krohn: 20:58 Wow.

Denis Rothman: 20:59 Yeah. So, now I have my HTML page. How am I going to

deploy it? Okay. He'll explain. Oh, I have a problem. I'd

like to put OpenAI in a Jupyter notebook. But what's the

code? Can you give me the code so I can just copy and

paste it? Okay, great. Okay.

Jon Krohn: 21:19 So, these examples, you're doing this on a daily basis.

Show Notes: http://www.superdatascience.com/513 12

Denis Rothman: 21:21 Yes.

Jon Krohn: 21:21 You're constantly querying GPT-3.

Denis Rothman: 21:23 Yes, I'm doing it right here.

Jon Krohn: 21:27 Right now. Actually, every-

Denis Rothman: 21:29 I'm doing it like from a TV. It's like people playing video

games. Like I'm here all the time playing around with that

stuff. It's insane. It's like I don't know where you're going.

It's an adventure.

Jon Krohn: 21:42 Yeah. If you're not watching the video version of this,

Denis is pointing at his phone and tapping away at it. But

not everyone has access to GPT-3, right? Don't you need

to be approved? I had to wait months. I just submitted an

application to get access to the API, and then finally got

it. So, it seems like not everyone today could just access

GPT-3 unless you have a workaround.

Denis Rothman: 22:08 And that's the trick. Let's go back to linguistics. Okay.

You go back to linguistics. What are we talking about?

We're talking about we have a lot of raw input. We have a

model. And it's the way we ask things that we get things.

And what's interesting is to play around, like I just said,

right? That's the interesting part. But if you go back to

my book, you can get GPT-2. And you can take GPT-2

and you can train it. Because what I did, for example, in

chapter three, I took a BERT model. And I took the works

of Immanuel Kant, the German philosopher. And I fed all

those books into it just to have fun. Then I began to ask

him, Kant, some questions, "Where does human logic go?

How does human thought?" The goal here is to play

around with it. You have to have a lot of fun otherwise

you'll never understand transformers. And you got to get

to talk to them to explain. Now, what did I just say

before? Google BERT drives Google Search. What I did is I

Show Notes: http://www.superdatascience.com/513 13

take props like the second grader stuff and I just copy it

into Google Search. And I'm deviating the use of Google

search by giving it long sentences, not just keywords.

Could you explain to me the solar system through the

eyes of a second-grade student? Please don't show me any

videos. I just want some text. Skip all that stuff. And I

just deal with these two things.

Jon Krohn: 23:48 Wow.

Denis Rothman: 23:51 Yeah, it's a transformer so it can absorb all of that. That's

what's new. So, in fact, you can train having fun with

transformers with Google Search. You can ask it

questions. Could you tell me this? Could you tell me that?

And then, you go on as it gives you answers in the

system. You can ask it for more difficult questions. Oh,

yeah, I got that. I got the Heisenberg equation. I

understand. And we keep [inaudible 00:24:14]. But now,

could you tell me more? You can talk to it, but people

don't know it's a transformer.

Jon Krohn: 24:20 Right. So, we're filming today on September 20th. And I

had just happened to be on your YouTube channel before

we started filming. And it was today, September 20th that

you published a how-to video with more detail on exactly

what you just described on how to use BERT in behind

Google queries to get lots of interesting information. So,

that's something that listeners can check out. So, Denis,

so Transformers for NLP, that was a beautiful and it was

a beautiful introduction to what natural language

processing is and the history of transformers. You got a

lot of great analogies in there particularly like the V8

analogies. But that was just your book this year,

Transformers for NLP. Tell us a bit about Hands-on XAI

with Python which came out in the summer of 2020. So, I

could do my little spiel. Explainable AI is, it's where we

apply algorithms to very complex models I guess like Bert

or GPT-3. And we apply algorithms to those so that we

Show Notes: http://www.superdatascience.com/513 14

can try to understand to get an explanation for how a

particular output was reached. Is that right?

Denis Rothman: 25:35 Well, yeah, so let's go back to linguistics again. Okay. So,

basically what we're saying, "What is an algorithm?" You

have an input and you have an output, and you have this

thing in the middle called algorithm. So, one problem

here is there's a confusion in many people's mind is that

Explainable AI is explaining the algorithm. Okay. So,

that's an area you can explore. But no, that's not

Explainable AI. Explainable AI is model-agnostic. I don't

even care about the algorithm. What do I care? Google

Search. Let's go back to Google Search. I'm on Google

Search. And I type, explain the Heisenberg equation.

Okay, what do I get? I can see the result. I don't need

Explainable AI. I know that I won't like the result because

I don't understand anything on that page. Okay. So, now

I'll do something called Shapley. It's the theory of games,

okay? It's like a basketball player. You have a team, and

you just take one player out. You're not scoring anymore.

You put that player back again, you're scoring. That's

Shapley. That's as simple as that algorithm is. Just pull

something out, see what happens, puts it back again, and

calculates the input. So, I'm saying, "Explain the

Heisenberg equation", which is, in fact, an interesting one

because it shows that you can't find the position and the

speed of a particle at the same time.

Jon Krohn: 27:09 Right, yes.

Denis Rothman: 27:10 If you're looking at the speed, you won't find the position.

If you look at the position, you won't find the speed.

Jon Krohn: 27:15 I've known that one since I was a kid because in Star

Trek, The Next Generation, I learned-

Denis Rothman: 27:19 That's right.

Show Notes: http://www.superdatascience.com/513 15

Jon Krohn: 27:19 Right? So, it's where you [crosstalk 00:27:22]. Yeah,

exactly. In order to be able to teleport, you'd have to know

this information, you'd have to know the speed and the

direction of all of the electrons and everything and pass

that information over to somewhere else, beam it over.

And so, often, I think they have issues with that, right? It

happens all the time.

Denis Rothman: 27:42 That's right, that's right.

Jon Krohn: 27:42 They're like, "The Heisenberg's uncertainty principle on

Dewar is broken." And now, you ended up with a nose on

your ear or whatever.

Denis Rothman: 27:53 Yeah, that's right. So, when you go back to Star Trek, the

Star Trek thing is you just take the input like of Google

Search and you see you don't like it. So, you say now,

"Could you explain the Heisenberg equation in Star

Trek?" So, now you'll get this nice explanation that you

just gave. And you say, "Well, maybe I can't tell it, I can't

write about that that way." Well, Can you explain the

Heisenberg equation like for second graders? So, you can

see that when you add things and you subtract things,

you get different. So, that's Shapley. That's also LIME.

That's also Anchors. It's about all the algorithms are in

that book. And it's model-agnostic. People keep trying to

look into layers. I would encourage someone to try to look

into a GPT-3 99 layer model with 170 billion parameters

and tell me which parameter influenced the input of the

record that was in position 2,100,000,000. It's

impossible.

Jon Krohn: 28:56 Yeah, It's meaningless.

Denis Rothman: 28:58 You can do with small parts. People from Facebook do

that. They just plug it in to see some things. But in fact,

the funnier thing is the [inaudible 00:29:10], which was

around in France in the same days I was at the Sorbonne,

Show Notes: http://www.superdatascience.com/513 16

and there was big fights between people on artificial

intelligence. He wrote interpretable AI and he says, "We're

going to peek into a transformer to see what it means and

we're going to use LIME." But LIME is a model-agnostic.

So, what I'm saying is it's model-

Jon Krohn: 29:41 Right, right, right, right, right.

Denis Rothman: 29:42 We don't care. If I go to a store and I buy a phone. And

that phone and I go home, it doesn't work, I don't care

what's in that phone it doesn't work. Or if I buy a phone

and the ringtone is always wrong, I don't care. So, it's

model-agnostic. So, you take the input, you look at the

output, and you play around with the input again to see

how it influences the output. And you see which word or

which image, or that's Explainable AI in a nutshell. And

you can do a lot of things. One of the fun things I did,

which is a very funny one is I took the US Census data. I

had a lot of fun with that one. It was the US Census data.

And I had this program that was, in fact, given by Google.

But they're always very careful about this now. I was

explaining how you can figure out why someone's going to

earn more than $50,000 or less than $50,000 based on

the US Census data. And I was looking at the fields in the

data set. It's in my books somewhere on chapter four or

five.

Denis Rothman: 30:51 And I say, "Gee whiz." Eighty percent of what's in there is

forbidden in Europe. You had race. It's strictly banned in

Europe. Because there's a legal problem with Explainable

AI. In Europe, you have to explain why your algorithm did

that. And if you have race in there, you can get a fine up

to 20% of your sales. You're talking millions and for big

tech, billions. So, you want to be careful. So, I said, "Gee,

how can they do that?" You can only do that in the

States. Right? But what is race have to do with revenue?

Wow. So, let me take that out. I just pulled that out of

there. And I reran an algorithm. They tweaked a bit. And I

Show Notes: http://www.superdatascience.com/513 17

said, "Yeah, I'm getting good results as good as theirs."

And in fact, you had Jamaican as a race. I mean that was

[crosstalk 00:31:45]. So, I just took the whole field out

and say, "Get this out of here." Then I take another field,

so we're back to Shapley again, right? I'm taking another

field out and I'm saying, sex, female, male. Does it really

matter in 2021 in the States if you have a college degree?

Is a PhD woman, female doctor going to earn less than a

male doctor? I don't think so. So, let me take all that out

too. Forget it. Take all that out. Because that's

discriminating and it's bad.

Denis Rothman: 32:24 And today we don't really want that because you have

transgender people that don't... or people that are

transgender, or people that don't to be considered as male

or female. We're in the new world, the new era. What am I

going to do? Put other. So, we're going to have other on

the stat... I just pulled that, feel that's useless. So then, I

go to another field and I'll stop on that one. I'm saying,

"Now they're saying marital status." Is the person

married, divorced? So, let's sum it up. I took every field

out of there and I just left two fields in there, age and

years of education. And I'm saying if someone has 15

years of education starting from elementary school all the

way college, that person has a probability of earning more

than a person that has no education at all, just drops out

in 10th grade. So then, I go back and say, "But age is a

factor." Because if I'm five-years-old, I'm not going to earn

as much as when I'm 20, 25, or 30. So, I just found out in

the 25 to 45 or 30 to 50-year bracket, you earn a lot

more. And then, when you're older, your brain is not so

fast. So, it goes back to baby [inaudible 00:33:36]. And

with just these two fields in Explainable AI, look at all the

noise in your data. You could just kick all that stuff out.

So, it's both explaining in a model and agnostic way. I

didn't speak about a model here, just data input and

output. And it's trying to be ethical at the same time I say

"Get all that data out of there and get the bias out of

Show Notes: http://www.superdatascience.com/513 18

there. You don't need it." Because there's nothing to talk

about. Age and revenue, that's it... age and education.

Jon Krohn: 34:12 Makes a lot of sense to me. So, in my day job, we build a

lot of algorithms for predicting who's a great fit for a job

and it's the same thing. Things like gender or race, those

cannot be in your model.

Denis Rothman: 34:26 Of course, they're useless. It's not even a question. It's

ethical. And it's useless, because it has nothing to do with

it. The subject, you can come in. And if that person has

either education or the experience that can compensate

for a lack of education, we're looking for competence, for

abilities. We're not looking for insane stuff for where they

came from. I don't care. I've hired tons of people in my

life. I don't even care where they came from. I don't even

look at their resume. Half of the time, I don't even care.

They come in. I don't want to see. Are we in sync or are

we smiling together? How do we feel together? And you

understand my questions about the job. You understand

what you're going to do. Okay, you have some college

degrees, that's fine. Okay, well, let's get to work and see

what happens. And if you like it, you'll stay, and if you

don't like it, you'll go. So, you can spend a few months

here. We'll see what happens. That's the best way to hire

people. Because they really love you. And then, you get

into this thing. You say, "Gee, he hired me. He didn't ask

me any stupid questions. I want to work hard. I want to

stay there."

Jon Krohn: 35:35 Right. That makes a lot of sense. All right, so we've talked

about Transformers for NLP, the book that came out this

year. We've now talked about just recently Hands-on XAI

that came out last year. So, that Hands-on XAI book

came out in July of 2020. Just a few months before that

in February 2020, you had another book called AI By

Example. So, maybe we just quickly what is that book all

about?

Show Notes: http://www.superdatascience.com/513 19

Denis Rothman: 36:00 Oh, generally, I write a book in between two and a half

months and three months.

Jon Krohn: 36:05 Wow.

Denis Rothman: 36:09 That's how I work. Now, why do I write so fast? You have

the whole-

Jon Krohn: 36:16 Because you get GPT-3 to do it.

Denis Rothman: 36:18 That's right. I didn't even write the book.

Jon Krohn: 36:21 Alright.

Denis Rothman: 36:22 I have [crosstalk 00:36:22]

Jon Krohn: 36:25 You can just do it in Google with BERT. You just say

BERT-

Denis Rothman: 36:27 That's right, write the book. So, what I'm saying here, you

go back to what I was talking about Sorbonne University

and education. I have cross-disciplinary education. And

my first patent, word to vector patent, word piece was

1982. I registered another patent in 1986 for expert

system chatbots. In 1986, I got my first artificial

intelligence contract in aerospace and the company now

called Airbus. And at the same time, I entered [Luxury

00:37:06]. So, I have so much practical experience in

corporations. I never went through the AI winter. I didn't

even notice there was an AI winter. If you told me there

was an AI winter then I'll say, "Well, where is it? Because

it's pretty hot out here."

Jon Krohn: 37:28 Yeah, not in Burgundy. There was no way a winter in

Burgundy.

Denis Rothman: 37:33 No, no. So, Artificial Intelligence By Example is a very

simple story. Tushar Gupta from Packt noticed my

LinkedIn profile. He says, "You have a lot of experience.

Show Notes: http://www.superdatascience.com/513 20

Why don't you share it?" And I say, "Well, I don't really

need the money." Because I just sold my company.

Because in 2016, AI became fashion and everyone's

talking to my company. All of a sudden, I said, "Yeah." I

told my wife, "Let's sell it."

Jon Krohn: 38:05 That was Planilog.

Denis Rothman: 38:08 Yeah, that's right. I sold it. We sold in 2016. And then, I

trained people for two or three years. And then, in the

meantime to Tushar says, "Why don't you share all that

experience, these patents and stuff you wrote with people

that it would give... a nice book where people get case

studies and all that." And I say, "Why would I write a

book? I don't need the money. I just sold my company. I

don't want to do anything. I want to stay home and play

video games." And he's saying, Yeah-

Jon Krohn: 38:37 He said you don't write a book for the money.

Denis Rothman: 38:42 No. Well, you don't write books for money.

Jon Krohn: 38:43 No.

Denis Rothman: 38:44 Any author will tell you that. You don't earn a lot of

money writing books, technical books. You earn money

writing like Stephen King, but not writing technical

books. I don't see how you can earn money with that. But

I was thinking maybe he's right, maybe I should share

this with my family and friends. Because I never had time

to explain my job. And then, there are a lot of people on

LinkedIn asking me these questions all the time. Maybe it

could help them. And maybe I'll meet a lot of people that

way because I have people from 100 countries on

LinkedIn. And maybe I'll learn stuff from them because I

like culture. I like every country. I like China. I like the

United States. I like Iran. I like Israel. I like Germany, any

country. You give me any country, you always find nice

Show Notes: http://www.superdatascience.com/513 21

people. Because people are always thinking governments.

So, they don't like the government, which means the

whole population is sentenced to death. No, you have a

nice people everywhere.

Jon Krohn: 39:47 For sure.

Denis Rothman: 39:48 Yeah, right. So, he got me into that. I wrote the book in

three months. So, it wasn't that much of a big deal. What

happens is the book is written in my mind. Like I'm

watching TV. And the book is just up there. And it's like a

woman carrying a baby, and all of a sudden, it's just a

pain. I have to get out of my system. So, I'll be writing at

full speed. You just can't stop me. It won't stop me. It's

the wham, wham, wham, wham, wham. Sometimes, I

even get a chapter done in a day. So, I was writing and

writing and writing, and I can't stop. I just can't stop it

until I get to the last page and I say," Okay, now I'm

okay." So, it's a compulsion. It's not something... and I've

been thinking about the stuff for decades. I spend every

day, I spent at least five hours thinking. Even when I was

working in my company, I was always spending two... I

stopped working generally around 4:00 pm for operational

stuff and I would think until 9:00 in the evening. I read

books, philosophy, sociology, linguistics, computer

science, math. So, I was constantly building up my

theories and I have a theory of artificial intelligence in my

mind. So, I just have to organize it for the book. I know

where it's going. I know the next step after transformers.

I'm just waiting for things to happen.

Jon Krohn: 41:22 Alright. So, Denis, you've given us amazing context on

your books. So, you had AI By Example that you

explained last there, which based on your experience of

35 years of consulting that you are able to provide that to

the audience very quickly. That's how you got these books

written so quickly. In just a few months, you were able to

distill your 35 years of consulting experience with

Show Notes: http://www.superdatascience.com/513 22

artificial intelligence and surely the readers benefit greatly

from all that experience. We also talked about Hands-on

XAI and Transformers for NLP. So, now let's jump to some

audience questions. We had tons of great ones on

LinkedIn. Your audience is so engaged because you do

answer all their questions online. And so, today, we're not

going to have time to go through every single question

that's come up. There are so many. But I think Denis is

probably going to end up... you're going to end up going

over these.

Denis Rothman: 42:21 Yeah, I think I might even make a post maybe at the end

of the week, where mentioned our podcast. And I take all

the questions from the comments in your post and make

sure that all of them are answered.

Jon Krohn: 42:37 Nice. Well, that sounds really great.

Denis Rothman: 42:38 And then, just tag all the people that ask the questions.

Jon Krohn: 42:42 Perfect, they will greatly appreciate that. So, the first

question here is from Serg Masis, who is also an author of

a book on Explainable AI. And so, he was curious what

XAI methods or libraries you use most for transformer

models.

Denis Rothman: 42:58 Okay, so let's say... let's go back to Explainable AI. The

best Explainable AI is model-agnostic unless you're a

developer and you want to see what's going on inside. But

you might have problems with 96 layer GPT-3 model and

170 billion parameters. So, you can do it. So, you just

need the input, the output. And it's like in a soccer team

to see if I take this player out, what happens? So, that's

Explainable AI is model-agnostic, like LIME is model-

agnostic. Shapley is model-agnostic. So, you just want to

take the input, look at the output, and then tweak, play

around with the input and see what happens to the

output until you find the trigger. And so, it's model-

Show Notes: http://www.superdatascience.com/513 23

agnostic. So, you can use any model-agnostic Explainable

AI on any algorithm and it doesn't even need to be

transformers or artificial intelligence, because Shapley

existed before. So, it applies to anything. Think of it like a

recipe for a nice cake you like and the person says, "I like

your cake, but I'm not really a specialist. I can't tell you

what I like in your cake." So, the person can say, "I like

you so much that week by week when I bake that cake,

I'll take some ingredients out until we find which one is

missing." And then, at one point, guy said, "Yes

cinnamon, it's the cinnamon I like in your cake. It's

cinnamon."

Jon Krohn: 44:37 It's a great analogy.

Denis Rothman: 44:39 We have the input, the output. That's it, the ingredients

and the result.

Jon Krohn: 44:47 Beautifully said, I love that analogy. There's another

question here that is something that we've already talked

about, you and me, Denis. So, I'm going to give a

summary answer. So, there's a question here from Jean-

charles Arald if you pronounce it the French way. And it's

this point about how these transformer models are getting

so big. So, trillions of parameter. Do we really need this

many for human language given that we have a limited

vocabulary, maybe only a couple of 1000 words for most

people? And he makes the point that bees, he seems to

suggest they only have dozens of neurons and that's

efficient for them. So, what are we missing in our models?

So, I did a neuroscience PhD so I'm going to quickly give

some summary thoughts here. And then, I'm going to

open up the floor to you, Denis.

Jon Krohn: 45:36 So, bees don't just have dozens of neurons unless I'm

misreading something here. They definitely have at least

hundreds of millions, maybe billions of neurons. A

human brain has 90 billion neurons, but the key thing

Show Notes: http://www.superdatascience.com/513 24

here is that we don't conflate neurons with parameters.

So, the question, it says, "Why do we need trillions of

parameters?" Well, even a human brain with its 90 billion

neurons, the connections, which are equivalent to the

parameters in a model, there's more connections than

there are stars in the universe. It's an obscenely large

number. And so, I think that's the answer is that, but

yeah, so our transformer models today couldn't even

approximate. They're still not as good although you may

disagree.

Denis Rothman: 46:21 Yeah, no, they're not as good. They can't be. So, if we go

back to neuroscience, and because machines are not

brains. So, that's one thing you need to know, too, like a

calculator. Texas instrument calculator is not built like

our brain. So, that's like a projection. We have to take

out. It's like children looking at a puppet show and

thinking that that's a real person. No, the machine is not

a real person. It has nothing to do with the brain, in fact,

but let's keep on that topic and not try to elude it. So, we

say, of course, you have a lot of neurons, and you can't

mix them up with the connections and the connections

are the parameters in the transformers. So, trillions is

nothing. But there's another problem, much deeper for

both systems, for both machines and humans, which you

know as a neuroscientist, is that when we build a

representation, we don't build it in one part of the brain.

Like if I want an apple, okay, apple, no. I have the

language part that's going to do something then I have

the color part.

Denis Rothman: 47:31 There's so many things are lining up in there. And it's not

exactly common to every human being. It's about that

because I can have an apple associated to someone who

threw me an apple when I was six years old and hit me.

So, I have another part of my brain saying an apple, no.

And then, other person says an apple a day keeps the

doctor away. And then, you have this dopamine part. So,

Show Notes: http://www.superdatascience.com/513 25

it's extremely complex to see what is going to fire up in a

brain with a word. And it's equivalently difficult with a

transformer because of the billions of opinions you have

on the web. So, one person can say, "I like gun control."

The other say, "No, I don't want gun control." I want to

take a vaccine. I don't. So, it adds up to different

representations. So, the model has to take all that into

account. And they have to feed some ethics into it. So, it's

big. So, trillions is nothing in fact. It's just the

connections. And the neurons are very few neurons, in

fact, in models like that. And then, the question about

can we find easier models?

Jon Krohn: 48:42 Yeah, exactly. So, there's a question from Dr. Chiara

Baston, who's in Italy. And yeah, she asks can we do

better with simpler models?

Denis Rothman: 48:49 Well, you can't find simpler than Shapley. The thing is,

probably when you're looking at posts or books, people

show you all these diagrams. And no, it just add up. I'm

in my kitchen, and I forgot to put enough sugar in my

cake. So, when it comes out, my children say it's not

enough sugar in there. So, that's bias, right? They're

saying, I want more sugar. Or if there's too much sugar,

like United States can spill a lot of sugar into a pastry

and we have less than Europe. So, I'm going to say that's

bias. Why did you pour all that sugar in there? So, you

have the ingredients, you have the recipe. But think of

that, how many people can go to a restaurant with, I

would say even in a McDonald's restaurant because

people always making fun of burgers. But yeah, well, how

is the bread made? Tell me the ingredients in the bread.

No one can do that.

Denis Rothman: 49:47 We're in a complex world. It's not easy. Even if you have

Shapley which is very simple or LIME, you just take work.

And even talking about bees, that's a problem too

because we're forgetting something, is their memory, is

Show Notes: http://www.superdatascience.com/513 26

they have patterns they're using with their body. The bees

go around in certain ways to signal things to other bees.

And they're using a language that we don't... we're trying

to understand. So, nature is extremely complex as well.

Like an ant. An ant has a few neurons. Yeah, well, what

about a whole group of ants? Wow, lots of brain. And no

one can understand it.

Jon Krohn: 50:35 Yeah, that's a really great point as well. So, as I

mentioned, we did not have time in the episode

unfortunately to get a response on air to every question,

but it sounds like Denis is going to make a-

Denis Rothman: 50:47 Yeah, yeah, I'll answer all the questions.

Jon Krohn: 50:48 He'll answer all these questions.

Denis Rothman: 50:48 Yeah, sure.

Jon Krohn: 50:48 And so, it's-

Denis Rothman: 50:51 Somewhere at the end of the week.

Jon Krohn: 50:54 Nice. Well, so, that'll be up before the episode airs. So,

Denis, obviously, people should follow you on LinkedIn.

That's a great place to get in touch and ask questions. Are

there any other places that people should follow you?

Denis Rothman: 51:06 No, I just work on LinkedIn.

Jon Krohn: 51:08 Perfect.

Denis Rothman: 51:09 Because I work on LinkedIn. And when I'm finished, I go

see my family and friends.

Jon Krohn: 51:15 That sounds great. When you're in the region that you're

in, it must be very nice visiting-

Show Notes: http://www.superdatascience.com/513 27

Denis Rothman: 51:20 Yes, yes. Yes. Because we get along with our neighbors.

We can go downtown and eat right in front of a medieval

cathedral, and in Paris and the places I live and go

around. And we have monuments. There's place where

there's even a castle across the way from where I stay. So,

yeah.

Jon Krohn: 51:40 It's beautiful. I can see you, yeah, one social medium is

enough.

Denis Rothman: 51:43 That's right.

Jon Krohn: 51:44 All right. Denis, thank you so much for being on the

program. And we'll have to have you again sometime.

Thank you so much for your time.

Denis Rothman: 51:49 Sure, when you want. Okay, bye-bye.

Jon Krohn: 51:57 What a character Denis is. I had an absolute hoot filming

this episode. Today, he filled us in on the history of

transformer architectures particularly highlighting

OpenAis, GPT-3 model and Google's BERT model. He

talked about how with his Transformers for NLP book you

can learn how to find tune the GPT-3, precursor

algorithm GPT-2 to perform state-of-the-art natural

language processing capabilities like question answering

and text summarization. And he talked about how Shap

and LIME can be used to explain how an AI algorithm is

arriving at its output no matter whether it's a simple

algorithm or a billion parameter transformer model.

Jon Krohn: 52:40 As always, you can get the show notes including the

transcript for this episode, the video recording, any

materials mentioned on the show, the URLs for Denis's

LinkedIn profile as well as my own social media profiles at

superdatascience.com/513. That's

superdatascience.com/513. If you enjoyed this episode,

I'd greatly appreciate it if you left a review on your favorite

Show Notes: http://www.superdatascience.com/513 28

podcasting app or on the SuperDataScience YouTube

channel. I also encourage you to let me know your

thoughts on this episode directly by adding me on

LinkedIn or Twitter and then tagging me in a post about

it. To support the SuperDataScience company that kindly

funds the management, editing and production of this

podcast without any annoying third-party ads, you could

consider creating a free login to their learning platform.

It's superdatascience.com. You can check out the 99 days

to your first data science job challenge at

superdatascience.com/challenge. Or, you could consider

buying a usually pretty darn cheap Udemy course

published by Ligency, a SuperDataScience affiliate, such

as my own, Mathematical Foundations of Machine

Learning course.

Jon Krohn: 53:47 Thanks to Ivana, Jaime, Mario, and JP on the

SuperDataScience team for managing and producing

another terrific episode today. Keep on rocking it out

there, folks, and I'm looking forward to enjoying another

round of the SuperDataScience Podcast with you very

soon.