on “geek” versus “nerd”

4
- 1 - On “Geek” Versus “Nerd” To many people, “geek” and “nerd” are synonyms, but in fact they are a little different. Consider the phrase “sports geek” — an occasional substitute for “jock” and perhaps the arch-rival of a “nerd” in high-school folklore. If “geek” and “nerd” are synonyms, then “sports geek” might be an oxymoron. (Furthermore, “sports nerd” either doesn’t compute or means something else.) In my mind, “geek” and “nerd” are related, but capture different dimensions of an intense dedication to a subject: geek - An enthusiast of a particular topic or field. Geeks are “collection” oriented, gathering facts and mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that their subject has to offer. nerd - A studious intellectual, although again of a particular topic or field. Nerds are “achievement” oriented, and focus their efforts on acquiring knowledge and skill over trivia and memorabilia. Or, to put it pictorially à la The Simpsons: Both are dedicated to their subjects, and sometimes socially awkward. The distinction is that geeks are fans of their subjects, and nerds are practitioners of them. A computer geek might read Wired and tap the Silicon Valley rumor- mill for leads on the next hot-new-thing, while a computer nerd might read CLRS and keep an eye out for clever new ways of applying Dijkstra’s algorithm. Note that, while not synonyms, they are not necessarily distinct either: many geeks are also nerds (and vice versa).

Upload: hebrews13v8

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

DESCRIPTION

On “Geek” Versus “Nerd”

TRANSCRIPT

Page 1: On “Geek” Versus “Nerd”

- 1 -

On “Geek” Versus “Nerd” To many people, “geek” and “nerd” are synonyms, but in fact they are a little different. Consider the phrase “sports

geek” — an occasional substitute for “jock” and perhaps the arch-rival of a “nerd” in high-school folklore. If “geek”

and “nerd” are synonyms, then “sports geek” might be an oxymoron. (Furthermore, “sports nerd” either doesn’t

compute or means something else.)

In my mind, “geek” and “nerd” are related, but capture different dimensions of an intense dedication to a subject:

geek - An enthusiast of a particular topic or field. Geeks are “collection” oriented, gathering facts and

mementos related to their subject of interest. They are obsessed with the newest, coolest, trendiest things that

their subject has to offer.

nerd - A studious intellectual, although again of a particular topic or field. Nerds are “achievement” oriented,

and focus their efforts on acquiring knowledge and skill over trivia and memorabilia.

Or, to put it pictorially à la The Simpsons:

Both are dedicated to their subjects, and sometimes socially awkward. The distinction is that geeks are fans of their

subjects, and nerds are practitioners of them. A computer geek might read Wired and tap the Silicon Valley rumor-

mill for leads on the next hot-new-thing, while a computer nerd might read CLRS and keep an eye out for clever

new ways of applying Dijkstra’s algorithm. Note that, while not synonyms, they are not necessarily distinct

either: many geeks are also nerds (and vice versa).

Page 2: On “Geek” Versus “Nerd”

- 2 -

AN EXPERIMENT

Do I have any evidence for this contrast? (By the way, this viewpoint dates back to a grad-school conversation with

fellow geek/nerd Bryan Barnes, now a physicist atNIST.) The Wiktionary entries for “geek” and “nerd“ lend some

credence to my position, but I’d like something a bit more empirical…

“You shall know a word by the company it keeps” ~ J.R. Firth (1957)

To characterize the similarities and differences between “geek” and “nerd,” maybe we can find the other words that

tend to keep them company, and see if these linguistic companions support my point of view?

Data and Method

(Note: If you’re neither a geek nor a nerd, don’t be scared by the math. It’s not too bad… or you can probably just

skip to the “Results” subsection below…)

I analyzed two sources of Twitter data, since it’s readily available and pretty geeky/nerdy to boot. This includes

a background corpus of 2.6 million tweets via the streaming from between December 6, 2012, and January 3, 2013.

I also sampled tweets via the search API matching the query terms “geek” and “nerd” during the same time period

(38.8k and 30.6k total, respectively). Yes, yes, yes… I collected all the data six months ago but just now got around

to crunching the numbers. It’s been a busy year!

A great little statistic for measuring how much company two words tend to keep is pointwise mutual

information (PMI). It’s commonly used in the information retrieval literature to measure the co-occurrence of words

and phrases in text, and it also turns out to be a good predictor of how humans evaluate semantic

word similarity (Recchia & Jones, 2009) and topic model quality (Newman & al., 2010).

For two words w and v, the PMI is given by:

,

where in this case is the probability of the word(s) in question appearing in a random tweet, as estimated from

the data. For instance, if we let v = “geek,” we compute the log-probability of a word w in the “geek” search corpus,

and subtract the log-probability of w in the background corpus.

Results

The PMI statistic measures a kind of correlation: a positive PMI score for two words means they ”keep great

company,” a negative score means they tend to keep their distance, and a score close to zero means they bump into

each other more or less at random.

Page 3: On “Geek” Versus “Nerd”

- 3 -

With that in mind, here is a scatterplot of various words according to their PMI scores for both “geek” and “nerd”

on different axes (ignoring words with negative PMI, and treating #hashtags as distinct):

Many people have asked for a high-res PDF of this plot, so here you go.

Moving up the vertical axis, words become more geeky (“#music” → “#gadget” → “#cosplay”), and moving left

to right they become more nerdy (“education” → “grammar” → “neuroscience”). Words along the diagonal are

similarly geeky and nerdy, including social (“#awkward”, “weirdo”), mainstream tech (“#computers”,

“#microsoft”), and sci-fi/fantasy terms (“doctorwho,” ”#thehobbit”). Words in the lower-left (“chores,”

“vegetables,” “boobies”) aren’t really associated with either, while those in the upper-right (“#avengers”, “#gamer”,

“#glasses”) are strongly tied to both. Orange words are more geeky than nerdy, and blue words are the

opposite. Some observations:

Collections are geeky. All derivatives of the word “collect” (“collection,” “collectables”, etc.) are orange. As

are “boxset” and “#original,” which imply a taste for completeness and authenticity.

Page 4: On “Geek” Versus “Nerd”

- 4 -

Academic fields are nerdy: “math”, “#history,” “physics,” “biology,” “neuroscience,” “biochemistry,” etc.

Other academic words (“thesis”, “#studymode”) and institutions (“harvard”, “oxford”) are also blue.

The science & technology words differ. General terms (“#computers,” “#bigdata”) are on the diagonal —

similarly geeky and nerdy. As you splay up toward more geeky, though, you see products, startups, brands,

and more cultish technologies (“#apple”, “#linux”). As you splay down toward more nerdy you see more

methodologies (“calculus”).

# (I take this one back. The average PMI score for all hashtags is 0.74 with “geek” but 0.73 with “nerd.” The

difference isn’t statistically significant using a paired t-test or Wilcoxon test, or practically significant using

a common-sense test.)

Hobbies: compare the more geeky pastimes (“#toys,” “#manga”) with the more nerdy ones (“chess,”

“sudoku”).

Brains: the word “intelligence” may be geeky, but “education,” “intellectual,” and “#smartypants” are nerdy.

Reading: “#books” are nerdy, but “ebooks” and “ibooks” are geeky.

Pop culture vs. high culture: “#shiny” and “#trendy” are super-geeky, but (curiously) “cellist” is

the nerdiest…

The list goes on. If you want to poke around yourself, download the raw PMI scores (4.2mb) and let me know in

the comments what you find. Since many people have asked: I computed PMI for all words appearing in the search

tweets with “geek” and “nerd” (millions) and then manually scanned roughly 7,500 words with positive PMI scores

for both. The scatterplot contains about 300 words that I hand-picked because they made sense.

(Update: I learned that Olivia Culpo — a self-described “cellist nerd” — was crowned Miss Universe on December

20, 2012. The event was heavily tweeted smack in the middle of my data collection, so that probably explains the

correlation between “cellist” and “nerd” here. It also underscores the limitations of time-sensitive data.)

Conclusion

In broad strokes, it seems to me that geeky words are more about stuff (e.g., “#stuff”), while nerdy words are more

about ideas (e.g., “hypothesis”). Geeks are fans, and fans collect stuff; nerds are practitioners, and practitioners play

with ideas. Of course, geeks can collect ideas and nerds play with stuff, too. Plus, they aren’t two

distinct personalities as much as different aspects of personality. Generally, the data seem to affirm my thinking.

I wonder how similar the results would be if you applied this method to the Google Books Ngrams corpus, or

something more general instead of a niche media like Twitter. I also wonder what other questions might be answered

with this kind of analysis (for example, my wife and I have a perennial disagreement over which word is wetter:

“moist” vs. “damp.”).

Finally, when I mentioned to a friend that I was going to write up this post, she said “Well, I guess we know which

one you are.” But do we really? I may be a science nerd, but I’m probably a music geek…