commerce data academy: intro to github and git...the git website. just go to...
TRANSCRIPT
Working with TeamsGit and Github
Rebecca Bilbro Sasan Bahadaran Pri Oberoi
3212016
Intro to Github and Git Sasan Bahadaran
May 9 2017
Commerce Data Academy A data education initiative of the Commerce Data Service Launched by CDS to offer data science data engineering and
web development training to employees of the US Department of Commerce
Course schedule and materials (eg slides code papers) produced for the Commerce Data Academy on Github
Questions Feel free to write us at Data Academy (dataacademydocgov)
Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project
workflows
Goals Your goals for the class Understand what version control is and why should you use it
for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio
Prerequisites 1 Create your own Github account
2 Create your own Waffleio account
3 Downloadinstall Git
4 Downloadinstall Anacondas Python distribution
5 Verify your access to Terminal (Mac) or Powershell (Windows)
Any challenges Questions
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Commerce Data Academy A data education initiative of the Commerce Data Service Launched by CDS to offer data science data engineering and
web development training to employees of the US Department of Commerce
Course schedule and materials (eg slides code papers) produced for the Commerce Data Academy on Github
Questions Feel free to write us at Data Academy (dataacademydocgov)
Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project
workflows
Goals Your goals for the class Understand what version control is and why should you use it
for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio
Prerequisites 1 Create your own Github account
2 Create your own Waffleio account
3 Downloadinstall Git
4 Downloadinstall Anacondas Python distribution
5 Verify your access to Terminal (Mac) or Powershell (Windows)
Any challenges Questions
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project
workflows
Goals Your goals for the class Understand what version control is and why should you use it
for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio
Prerequisites 1 Create your own Github account
2 Create your own Waffleio account
3 Downloadinstall Git
4 Downloadinstall Anacondas Python distribution
5 Verify your access to Terminal (Mac) or Powershell (Windows)
Any challenges Questions
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Goals Your goals for the class Understand what version control is and why should you use it
for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio
Prerequisites 1 Create your own Github account
2 Create your own Waffleio account
3 Downloadinstall Git
4 Downloadinstall Anacondas Python distribution
5 Verify your access to Terminal (Mac) or Powershell (Windows)
Any challenges Questions
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Prerequisites 1 Create your own Github account
2 Create your own Waffleio account
3 Downloadinstall Git
4 Downloadinstall Anacondas Python distribution
5 Verify your access to Terminal (Mac) or Powershell (Windows)
Any challenges Questions
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Open Sources Installations We use open source and free software so they should have a minimal impact on
your IT department
DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy
However it is up to the CIO of each bureau to accept this guidance policy or not
DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Review
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is data science
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
ldquoData science is the practice of transforming raw data into insights products
and applications to empower data-driven decision making It combines
proven time-tested methods from fields including statistics natural sciences
computer science operations research and design in ways that are
particularly well-suited to the data age These methods which range from
data mining and visualization to predictive modeling can scale from small to
large datasets and can handle structured data as well as unstructured data
like text and imagesrdquo
Jeff Chen Chief Data Scientist US Department of Commerce
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
How is data science different fromdata analytics
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is hypothesis-driven development
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot
We Believe That ~ fht~ C--Jf Jbt jf1gt
Will Result In ~ fh~ OfJfCAJfYle-gt
We Will Know We Have Succeeded When
lt we- ie-e- a rne-atwabe- tigtialgt
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What tools do data scientists use
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is the data science pipeline
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is a data product
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
How are data products different fromanalytical insights
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data
Benjamin Bengfort
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is software engineering
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What does collaboration look like in a data group
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Backlog
waffleioserenity
24
uniforms
31
I trainjob I
22
Ihospi taljob IQ FiiNF
32
W f$flampF
check ship for survivors
II
secure identification keycards and
0 shy
lower onto train and secure cargo
repair ambulance shuttle
capture an Alliance anti-aircraft gun
1--1-lo
collect package from post master
Ready
20
disable explosive set by trap
II 18
recover hidden loot at Canton
financial
4
retrieve cargo from train
I ttain job I enhancement
30
join Mal in boarding train
[ trainjob I
21
collect remaining funds to pay for
shipmates release
financial I - 1- lo
In Progress
alert others of distress call
fix ships engine problem
mmm bull 1-L II 13
unload and pen cattle
M MMUi L II
get cargo from abandoned carrier
Done
29
find a brand new compression coil for the
steamer
wontfix
find a captain for t h
Istartup I e ship
II 27
find a mechanic for the ship
Istartup I II 16
buy a solid ship
Istartup I () II
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Version Control
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Examples
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Google Drive
I~ SharePoint
rop ox
Tortoise SVN8Bitbucket
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
What is version controlOther names
What problems does this solve
What are the benefits
What are some common features
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Definition The management of changes to electronic documents and in particular computer programs
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo
Wikipedia knows everything
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Tell us about a time when you could have used someversion control
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Local Version Control Systems
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Version ControlA Visualization
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Checkout
File
Local Computer
Version Database
Version 3
Version 2
Version 1
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
1 2
A
3
B
4
C
5 6
Branches and revisions through time - example scenario
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Aug
27 28 5
J
l Branches and revisions through time - actual workflow
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Distributed vs Centralized
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Centralized
What are the benefits
What are the weaknesses
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Decentralized
What are the benefits
What are the weaknesses
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Git
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
bull git --distributed-is-the-new-centralized
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency
Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows
middot middot Learn Git in your browser for free with Try Git
00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material
~ Downloads
p~ Community
GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development
and more
Q Search entire site
Installing Git
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Installing on Windows
There are also a few ways to install Git on Windows The most official build is available for download on
the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note
that this is a project called Git for Windows which is separate from Git itself for more information on it go
to httpsllgit-for-windowsgithubiol
Another easy way to get Git installed is by installing GitHub for Windows The installer includes a
command line version of Git as well as the GUI It also works well with Powershell and sets up solid
credential caching and sane CALF settings Well learn more about those things a little later but suffice it
to say theyre things you want You can download this from the GitHub for Windows website at
httpwindowsgithubcom
Installing Git
httpgit-for-windowsgithubio
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Installing Git
httpgit-scmcomdownloadmac
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)
Distributed Version Control
Open Source
Initial release 7 April 2005
All metadata is stored in the git directory
Git - History Lesson
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Speed
Simple design
Strong support for non-linear development (thousands of parallel branches)
Fully distributed
Able to handle large projects like the Linux kernel efficiently (speed and data size)
Git - Advantages
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Committed data is safely stored in your local object database
Staged marked such that the current state of the modified file will be included in the next commit
Modified changed but not staged or committed
Git - ldquoStagesrdquo
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Working Directory
Staging Area
git directory (Repository)
Git - Areasplaces
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Git Commands
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
git init create a new git repository to manage the current folder
git clone ltrepository addressgt downloads an existing git repository for the first time
git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit
git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database
Git - Basic Commands
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
git fetch ltservergt ltbranchgt updates your object database but does not change the working directory
git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)
git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory
git push ltservergt ltbranchgt sends your latest branch commits to the remote server
Git - Basic Commands
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Git Challenge (20 minutes)
httpstrygithubiolevels1challenges1
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Github
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Github
A remote git repository
A website
provides secure access
provides repository metadata amp reports
provides tools for development teams
Launched April 10 2008
~10 million users in 2015
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
0 bull
0 0 bull bull
Non-local git repositories are called ldquoremotesrdquo
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Object Database
where git stores metadata about each commit
Index Staging Area
file snapshots to be included in next commit
Working Directory
the ldquophysicalrdquo files on a computer
Git - ldquoPlacesrdquo
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Server Computer
Version Database
Version 3
Version 2
Version 1
Computer A Computer B
Version Database Version Database
Version 3 Version 3
Version 2 Version 2
Version 1 Version 1
Github A Distributed Version Control example
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
The ldquooriginrdquo remote is automatically created when you clone
It is the default remote to use for pushing and pulling
There is nothing special about ldquooriginrdquo it is just a default name
Git - ldquoOriginrdquo
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
User Account
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
bull bullbull bull bullbull bull bull bull
bull bull bull bull bull bull bull bull bull
bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull
bull bull
COMMERCE DATA SERVICE
0 Search GitHub
Rebecca Bilbro rebeccabilbro
) Washington DC
C9 Joined on Sep 13 2014
17 11 39 Followers Starred Following
Organizations
MObullu O
Pull requests Issues Gist
Edit profile[plusmn]Contributions Q Repositories 3 Public activity
v Popular repositories Repositories contributed to
xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o
v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL
calendar Q CommerceData recordtagger o o
Building a simple Python application - Calenda NOAA metadata record tagger that implement
v capstone 8 CommerceData newexporters o o
v Capstone project as part of Data Analysis certi building a predictive model for new exporters
Colonials Q DistrictDataLabsltrinket o 3
v GT Colonials Multidimensional data explorer and visualizatio
dashboards Q georgetown-an sql-tutorial o 1
Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL
Contributions
Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb
M bull bull bull bull bullbull bull
bullbullbullbullbullbullw bull bull bullbullbull bull bull
Summary of pull requests issues opened and commits Learn how we count contributions Less bull More
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Repo
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
0 This repository Search Pull requests Issues Gist
iJ rebeccabilbro I orlo 0Unwatchbull Star VFork
ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings
A tour of ROC curves - Edit
iLl 19 commits ii 1 branch V O releases 1 contributor
Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP
bull rebeccabilbro added method to guess the label column
ii data starting to flesh out bulk ingest method for UGI data
ii figures added precision recall image
~ DS_Store basic implementation of roe curve plotter
~ gitignore basic implementation of roe curve plotter
~ LICENSE Initial commit
~ READMEmd added plotting template to readme
~ classipy added method to guess the label column
~ ingestpy added randomizer to ingest
~ rocpy basic implementation of roe curve plotter
Latest commit 382b9ca 4 days ago
16 days ago
19 days ago
9 days ago
9 days ago
19 days ago
9 days ago
4 days ago
9 days ago
9 days ago
lillJ READMEmd
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Command Line
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Shifting to the command line
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as
PowerShell If you have Windows 7 or later do this
bull Click Start
bull In Search programs and files type powershell
bull Hit Enter
Mac OSX For Mac OSX youll need to do this
bull Hold down COMMAND and hit the spacebar
bull In the top right the blue search bar will pop up
bull Type terminal
bull Click on the Terminal application that looks kind of like a black box
bull This will open Terminal
bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock
Now you have your Terminal open and its in your Dock so you can get to it
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
Where am I
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
Whatrsquos my name
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
Make a directory
gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
Change between directories
gt cd temp gt pwd gt
$ cd temp $ pwd $
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
List files and directories
gt dir gt
$ ls $
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Mac OSX Terminal
Windows Powershell
Make an empty file
gt cd temp gt New-Item iamcooltxt -type file gt dir gt
$ cd temp $ touch iamcooltxt $ ls $
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Zed Shawrsquos book
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Letrsquos use what wersquove learned
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
working local GlthubIndexdirectory repo repo
Revert
diff-cached
fetch
checkout HEAD
Compare
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Teamwork(makes the dream work)
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Organization
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
0 This organization Search Pull requests Issues Gist
Commerce Data Service A startup within DOC focused on building data products with and for the bureaus
Washington DC httpJlwwwcommercegov datadocgov
IQ Repositories People 20 l Teams 4
Fiiters ~ Q Find a repository +New repository People 20 gt
DataService_ WebSite JavaScript 1 V4
IV forked from timwoodDataCorps_ WebSite
The website for the Commerce Data Service - A startup within the Department of
Commerce
Updated 19 hours ago
ITA_Principal_ Travel css o Vamp Updated a day ago
Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy
Updated a day ago
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Waffle
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
COMMERCE DATA SERVICE
DistricDatalabstrinket
Backlog
S6
Better Licensing
- typefeature
SS
username check
priority medium type bug
so Dataset Searching
priority medium type feature
Dataset Overwrite
- type technicaldebt
4S
500 error on upload w missing col row values
AJAXify the uptoader
priority medium type featurC
middotmiddotOM
0
0
0 ~
0 ~ () ~
() ~ bull38
3Dtours
0 ~
37
Sampling technique for bigger datasets
0 ~
Feature nomination tool for visualization
Ready In Progress Done bull S4 14
Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10
column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form
lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature
~ Async Upload with Celery
IVersion 03 I priority medium type feature middotbull 13
Dimension Histograms and Ranking 10
IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader
type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte
IVersiono3 IBlll type bug () ~ 36
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Pair programmingMake your own waffle
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
CommunicationCommit Messages
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
git commit -m ldquotry to be as helpful as possiblerdquo
(To your team and to future you)
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Why
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Why do data scientists need version control
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Data Ingestion Data Munging and Wrangling
Computation and Analyses
Modeling and Application
Reporting and Visualization
Where does version control fit into thedata science pipeline
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Folder structure conventions on Github
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
READMEmd
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
gitignore
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
fixtures
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
requirementstxt
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Where to go from here
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Additional Tutorials httppcottlegithubiolearnGitBranching
httprogerdudlergithubiogit-guide
httpwwwtutorialspointcomgit
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config
Resources Git Desktop httpsdesktopgithubcom
TortoiseGit httpstortoisegitorg
Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf
Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control
Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository
Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell
Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration
Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection
Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config