telling stories with public data: arm · felipe hoffa google cloud developer advocate @felipehoffa....
TRANSCRIPT
Telling stories with public data: ARMFelipe HoffaGoogle Cloud Developer Advocate@felipehoffa
@felipehoffa
@felipehoffa
@felipehoffa
@felipehoffa
@felipehoffa
@felipehoffa
DATA
Let's answer some questions- 400k GitHub repos, 1B files, 14 TB of code- What would you ask?
SPACES vs TABS
@felipehoffa
@felipehoffa
Spaces vs Tabs - GitHub on BigQuery edition
•Data source: GitHub repos in BigQuery•Stars matter: Top 400,000 repositories (by stars)•No small files: >10 lines w/ spaces or tabs•No duplicates: One vote per unique file•One vote per file: Mixed? Majority lines win per file•Top languages: • .java, .h, .js, .c, .php, .html, .cs, .json, .py, .cpp, .xml, .rb, .cc, .go
@felipehoffa
Spaces vs Tabs - Extract
@felipehoffa
Spaces vs Tabs - Apply the rules
@felipehoffa
Spaces vs Tabs - Results
● Most active GitHub communities?
● Top companies contributing to open source?
● Top countries pip installing grimoire?
● Coders prefer cold countries?
● Should I post to Hacker News?
● Tabs or spaces? Leading or trailing commas?
● API changes: Will I break their heart? Their code?
● Stack Overflow - minutes to first reply, per tag?
● Do people really copy code out of Stack Overflow?
GitHub: Top countries
@felipehoffa
GitHub users per country
@felipehoffa
Countries by # of pushes
@felipehoffa
Countries by # of unique ids pushing / population
@felipehoffa
Open source developers per capita ranking, 2017
@felipehoffa
@felipehoffa
Where would a coder go?
@felipehoffa
Top companies contributing to open source - on GitHub?
Beyond GitHub- Stack Overflow- Hacker News- Wikipedia pageviews, navigation- PyPI installs
Tell me more about ARM...
Tell me more about ARM...
Tell me more about ARM...
Tell me more about ARM...
Wikipedia trends
Wikipedia trends
Wikipedia trends
Where people click next
Where people click next
Where people click next
Where people click next
Where people click from
Where people click from
Where people click from
2006: Hacker News is born
The first Hacker News stories
The first Hacker News ARM stories
The first Hacker News ARM stories, with context
2008: Stack Overflow is born
The first Stack Overflow qs for [ARM]
The first Stack Overflow qs for [ARM], with views
The top Stack Overflow questions for [arm], ever
The top Stack Overflow questions for [arm], now
PyPI installs
GitHub
GitHub: Projects with ARM issues w/o templates
GitHub: Projects with Linaro issues
Projects with high % stars coming from people that starred top ARM projects
2018 kernel commits from @linaro.org
Hacker News and Linaro
Hacker News and Linaro in conversation
Hacker News and Linaro in conversation
Hacker News Linaro mentions 2012-2013
Hacker News Linaro mentions 2018-
@felipehoffa
DATA
Datasets ready to play in BigQuery, free monthly TB- GitHub activity- GitHub files- Stack Overflow- Hacker News- Wikipedia pageviews- Wikipedia navigation- PyPI installs
@felipehoffa
https://medium.com/@hoffa/bigquery-without-a-credit-card-discover-learn-and-share-199e08d4a064
Questions?
More: https://github.com/fhoffa/analyzing_github
News: reddit.com/r/bigqueryAsk: stackoverflow.com
Felipe Hoffa @felipehoffa
Rate me?
bit.ly/bqfeedback
Please read then delete● *Make a copy of this document and
rename*● Remember to add your presentation
title and name on the title slide● Create new slides within this
document and copy content to those slides.
● *Please do not edit the master* If you have specific requirements contact:[email protected]
● Don’t change formats, fonts, font sizes, colours
● Spell-check and use American English; avoid regional spellings
● Slides are not a script!○ Use slides to highlight key points
● Each slide should be able to stand alone○ Points have to self-explanatory; good
slides achieve this (slides may be printed and viewed without explanation)
● If you use builds, make sure they work in print and PDF
● Titles should not be open ended○ Incomplete sentence…
● Do not end a bullet with a conjunction (and, or, but…)
○ Use shift-enter to carry text to following line with no bullet
○ Use tab and shift-tab to indent or raise bullet levels
● Please follow the Linaro Code of Conduct for all content