commerce data academy: intro to github and git...the git website. just go to...

94
Intro to Github and Git Sasan Bahadaran May 9, 2017

Upload: others

Post on 26-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Working with TeamsGit and Github

Rebecca Bilbro Sasan Bahadaran Pri Oberoi

3212016

Intro to Github and Git Sasan Bahadaran

May 9 2017

Commerce Data Academy A data education initiative of the Commerce Data Service Launched by CDS to offer data science data engineering and

web development training to employees of the US Department of Commerce

Course schedule and materials (eg slides code papers) produced for the Commerce Data Academy on Github

Questions Feel free to write us at Data Academy (dataacademydocgov)

Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project

workflows

Goals Your goals for the class Understand what version control is and why should you use it

for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio

Prerequisites 1 Create your own Github account

2 Create your own Waffleio account

3 Downloadinstall Git

4 Downloadinstall Anacondas Python distribution

5 Verify your access to Terminal (Mac) or Powershell (Windows)

Any challenges Questions

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 2: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Commerce Data Academy A data education initiative of the Commerce Data Service Launched by CDS to offer data science data engineering and

web development training to employees of the US Department of Commerce

Course schedule and materials (eg slides code papers) produced for the Commerce Data Academy on Github

Questions Feel free to write us at Data Academy (dataacademydocgov)

Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project

workflows

Goals Your goals for the class Understand what version control is and why should you use it

for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio

Prerequisites 1 Create your own Github account

2 Create your own Waffleio account

3 Downloadinstall Git

4 Downloadinstall Anacondas Python distribution

5 Verify your access to Terminal (Mac) or Powershell (Windows)

Any challenges Questions

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 3: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Goals Our goals for the class Explain and make the case for version control Collaboration in codingsoftware engineering Illustrate what Git software is and what it can do Differentiate Git (the software) and Github (the website) Describe how we integrate Git and Github into our project

workflows

Goals Your goals for the class Understand what version control is and why should you use it

for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio

Prerequisites 1 Create your own Github account

2 Create your own Waffleio account

3 Downloadinstall Git

4 Downloadinstall Anacondas Python distribution

5 Verify your access to Terminal (Mac) or Powershell (Windows)

Any challenges Questions

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 4: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Goals Your goals for the class Understand what version control is and why should you use it

for your projects Start using Git on the command line Experiment with pushing repos to Github Practice working with a team using Waffleio

Prerequisites 1 Create your own Github account

2 Create your own Waffleio account

3 Downloadinstall Git

4 Downloadinstall Anacondas Python distribution

5 Verify your access to Terminal (Mac) or Powershell (Windows)

Any challenges Questions

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 5: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Prerequisites 1 Create your own Github account

2 Create your own Waffleio account

3 Downloadinstall Git

4 Downloadinstall Anacondas Python distribution

5 Verify your access to Terminal (Mac) or Powershell (Windows)

Any challenges Questions

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 6: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Open Sources Installations We use open source and free software so they should have a minimal impact on

your IT department

DOC has provided guidance that states that states that Github and all the tools that we are teaching are permissible under policy

However it is up to the CIO of each bureau to accept this guidance policy or not

DOC has a formalized Github policy httpsgithubcomCommerceGovPolicies-and-GuidanceblobmasterGithubGuidanceforDepartmentofCommercemd

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 7: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Review

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 8: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is data science

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 9: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

ldquoData science is the practice of transforming raw data into insights products

and applications to empower data-driven decision making It combines

proven time-tested methods from fields including statistics natural sciences

computer science operations research and design in ways that are

particularly well-suited to the data age These methods which range from

data mining and visualization to predictive modeling can scale from small to

large datasets and can handle structured data as well as unstructured data

like text and imagesrdquo

Jeff Chen Chief Data Scientist US Department of Commerce

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 10: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

How is data science different fromdata analytics

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 11: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is hypothesis-driven development

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 12: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

rt~poth~i~ Dviv~n o~v~loprvi~n+ ThoughtWorksmiddot

We Believe That ~ fht~ C--Jf Jbt jf1gt

Will Result In ~ fh~ OfJfCAJfYle-gt

We Will Know We Have Succeeded When

lt we- ie-e- a rne-atwabe- tigtialgt

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 13: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What tools do data scientists use

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 14: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is the data science pipeline

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 15: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 16: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is a data product

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 17: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

How are data products different fromanalytical insights

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 18: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Data products are self-adapting broadly applicable economic engines that derive their value from data and generate more data by influencing human behavior or by making inferences or predictions upon new data

Benjamin Bengfort

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 19: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is software engineering

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 20: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What does collaboration look like in a data group

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 21: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 22: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 23: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Backlog

waffleioserenity

24

uniforms

31

I trainjob I

22

Ihospi taljob IQ FiiNF

32

W f$flampF

check ship for survivors

II

secure identification keycards and

0 shy

lower onto train and secure cargo

repair ambulance shuttle

capture an Alliance anti-aircraft gun

1--1-lo

collect package from post master

Ready

20

disable explosive set by trap

II 18

recover hidden loot at Canton

financial

4

retrieve cargo from train

I ttain job I enhancement

30

join Mal in boarding train

[ trainjob I

21

collect remaining funds to pay for

shipmates release

financial I - 1- lo

In Progress

alert others of distress call

fix ships engine problem

mmm bull 1-L II 13

unload and pen cattle

M MMUi L II

get cargo from abandoned carrier

Done

29

find a brand new compression coil for the

steamer

wontfix

find a captain for t h

Istartup I e ship

II 27

find a mechanic for the ship

Istartup I II 16

buy a solid ship

Istartup I () II

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 24: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Version Control

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 25: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Examples

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 26: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Google Drive

I~ SharePoint

rop ox

Tortoise SVN8Bitbucket

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 27: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

What is version controlOther names

What problems does this solve

What are the benefits

What are some common features

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 28: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Definition The management of changes to electronic documents and in particular computer programs

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 29: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

ldquoIn computer software engineering revision control is any kind of practice that tracks and provides control over changes to source coderdquo

Wikipedia knows everything

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 30: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Tell us about a time when you could have used someversion control

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 31: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Local Version Control Systems

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 32: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Version ControlA Visualization

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 33: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Checkout

File

Local Computer

Version Database

Version 3

Version 2

Version 1

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 34: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

1 2

A

3

B

4

C

5 6

Branches and revisions through time - example scenario

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 35: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Aug

27 28 5

J

l Branches and revisions through time - actual workflow

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 36: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Distributed vs Centralized

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 37: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Centralized

What are the benefits

What are the weaknesses

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 38: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Decentralized

What are the benefits

What are the weaknesses

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 39: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Git

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 40: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

bull git --distributed-is-the-new-centralized

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency

Git is easy to learn and has a tiny footprint with lightning fast performance It outclasses SCM tools like Subversion CVS Perforce and ClearCase with features like cheap local branching convenient staging areas and multiple workflows

middot middot Learn Git in your browser for free with Try Git

00 About m Documentation The advantages of Git Command reference pages Pro compared to other source Git book content videos and control systems other material

~ Downloads

p~ Community

GUI clients and binary releases Get involved Bug reporting for all major platforms mailing list chat development

and more

Q Search entire site

Installing Git

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 41: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Installing on Windows

There are also a few ways to install Git on Windows The most official build is available for download on

the Git website Just go to httpllgit-scmcomldownloadlwin and the download will start automatically Note

that this is a project called Git for Windows which is separate from Git itself for more information on it go

to httpsllgit-for-windowsgithubiol

Another easy way to get Git installed is by installing GitHub for Windows The installer includes a

command line version of Git as well as the GUI It also works well with Powershell and sets up solid

credential caching and sane CALF settings Well learn more about those things a little later but suffice it

to say theyre things you want You can download this from the GitHub for Windows website at

httpwindowsgithubcom

Installing Git

httpgit-for-windowsgithubio

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 42: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Installing Git

httpgit-scmcomdownloadmac

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 43: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Originally conceivedcreated by Linus Torvalds (after a fight with BitKeeper)

Distributed Version Control

Open Source

Initial release 7 April 2005

All metadata is stored in the git directory

Git - History Lesson

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 44: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Speed

Simple design

Strong support for non-linear development (thousands of parallel branches)

Fully distributed

Able to handle large projects like the Linux kernel efficiently (speed and data size)

Git - Advantages

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 45: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 46: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Committed data is safely stored in your local object database

Staged marked such that the current state of the modified file will be included in the next commit

Modified changed but not staged or committed

Git - ldquoStagesrdquo

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 47: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Working Directory

Staging Area

git directory (Repository)

Git - Areasplaces

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 48: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Git Commands

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 49: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

git init create a new git repository to manage the current folder

git clone ltrepository addressgt downloads an existing git repository for the first time

git add ltfile pathgt marks individualmodified files to be added to the indexstaging area for next commit

git commit -m ltmessagegt takes metadatachanges from staging and adds to the object database

Git - Basic Commands

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 50: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

git fetch ltservergt ltbranchgt updates your object database but does not change the working directory

git merge ltsource branchgt applies the commits from source branch to the current working directory (which is the manifestation of another branch)

git pull ltservergt ltbranchgt performs a fetch and then merges those changes into your working directory

git push ltservergt ltbranchgt sends your latest branch commits to the remote server

Git - Basic Commands

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 51: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Git Challenge (20 minutes)

httpstrygithubiolevels1challenges1

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 52: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Github

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 53: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 54: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Github

A remote git repository

A website

provides secure access

provides repository metadata amp reports

provides tools for development teams

Launched April 10 2008

~10 million users in 2015

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 55: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

0 bull

0 0 bull bull

Non-local git repositories are called ldquoremotesrdquo

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 56: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Object Database

where git stores metadata about each commit

Index Staging Area

file snapshots to be included in next commit

Working Directory

the ldquophysicalrdquo files on a computer

Git - ldquoPlacesrdquo

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 57: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Server Computer

Version Database

Version 3

Version 2

Version 1

Computer A Computer B

Version Database Version Database

Version 3 Version 3

Version 2 Version 2

Version 1 Version 1

Github A Distributed Version Control example

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 58: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

The ldquooriginrdquo remote is automatically created when you clone

It is the default remote to use for pushing and pulling

There is nothing special about ldquooriginrdquo it is just a default name

Git - ldquoOriginrdquo

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 59: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

User Account

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 60: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

bull bullbull bull bullbull bull bull bull

bull bull bull bull bull bull bull bull bull

bullbullbull bull bull bullbullbullbull bullbull bull bullbullbull bullbull bull bullbullbull bullbullbull bull bullbull

bull bull

COMMERCE DATA SERVICE

0 Search GitHub

Rebecca Bilbro rebeccabilbro

) Washington DC

C9 Joined on Sep 13 2014

17 11 39 Followers Starred Following

Organizations

MObullu O

Pull requests Issues Gist

Edit profile[plusmn]Contributions Q Repositories 3 Public activity

v Popular repositories Repositories contributed to

xbus-503-ipython-demos 8 DistrictDataLabsBlogs o o

v Demonstration code for XBUS-503 Data Wran Data Science related biogs for DDL

calendar Q CommerceData recordtagger o o

Building a simple Python application - Calenda NOAA metadata record tagger that implement

v capstone 8 CommerceData newexporters o o

v Capstone project as part of Data Analysis certi building a predictive model for new exporters

Colonials Q DistrictDataLabsltrinket o 3

v GT Colonials Multidimensional data explorer and visualizatio

dashboards Q georgetown-an sql-tutorial o 1

Responsive dashboard templates for Bootstrap A brief tutorial on SOL with Python (using SQL

Contributions

Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb

M bull bull bull bull bullbull bull

bullbullbullbullbullbullw bull bull bullbullbull bull bull

Summary of pull requests issues opened and commits Learn how we count contributions Less bull More

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 61: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Repo

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 62: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

0 This repository Search Pull requests Issues Gist

iJ rebeccabilbro I orlo 0Unwatchbull Star VFork

ltgtCode CD Issues 4 1 Pull requests o Wiki -+- Pulse 1J Graphs 0 Settings

A tour of ROC curves - Edit

iLl 19 commits ii 1 branch V O releases 1 contributor

Branch mastermiddot New pull request New file Upload files Find file SSHmiddot gi tg i thub com r ebeccabill Download ZIP

bull rebeccabilbro added method to guess the label column

ii data starting to flesh out bulk ingest method for UGI data

ii figures added precision recall image

~ DS_Store basic implementation of roe curve plotter

~ gitignore basic implementation of roe curve plotter

~ LICENSE Initial commit

~ READMEmd added plotting template to readme

~ classipy added method to guess the label column

~ ingestpy added randomizer to ingest

~ rocpy basic implementation of roe curve plotter

Latest commit 382b9ca 4 days ago

16 days ago

19 days ago

9 days ago

9 days ago

19 days ago

9 days ago

4 days ago

9 days ago

9 days ago

lillJ READMEmd

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 63: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Command Line

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 64: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Shifting to the command line

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 65: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

Windows On Windows were going to use PowerShell People used to work with a program called cmdexe but it s not nearly as usable as

PowerShell If you have Windows 7 or later do this

bull Click Start

bull In Search programs and files type powershell

bull Hit Enter

Mac OSX For Mac OSX youll need to do this

bull Hold down COMMAND and hit the spacebar

bull In the top right the blue search bar will pop up

bull Type terminal

bull Click on the Terminal application that looks kind of like a black box

bull This will open Terminal

bull You can now go to your Dock and CTRL-click to pull up the menu then select Options-gtKeep In Dock

Now you have your Terminal open and its in your Dock so you can get to it

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 66: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

Where am I

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 67: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

Whatrsquos my name

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 68: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

Make a directory

gt mkdir temp gt mkdir tempstuff gt mkdir tempstuffthings gt mkdir tempstuffthingsfrankjoealexjohn gt

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 69: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

Change between directories

gt cd temp gt pwd gt

$ cd temp $ pwd $

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 70: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

List files and directories

gt dir gt

$ ls $

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 71: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Mac OSX Terminal

Windows Powershell

Make an empty file

gt cd temp gt New-Item iamcooltxt -type file gt dir gt

$ cd temp $ touch iamcooltxt $ ls $

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 72: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Zed Shawrsquos book

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 73: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Letrsquos use what wersquove learned

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 74: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Merge Conflict Workshop (20 minutes)httpbitlyxbus501-workshop-git

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 75: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

working local GlthubIndexdirectory repo repo

Revert

diff-cached

fetch

checkout HEAD

Compare

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 76: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Teamwork(makes the dream work)

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 77: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Organization

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 78: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

0 This organization Search Pull requests Issues Gist

Commerce Data Service A startup within DOC focused on building data products with and for the bureaus

Washington DC httpJlwwwcommercegov datadocgov

IQ Repositories People 20 l Teams 4

Fiiters ~ Q Find a repository +New repository People 20 gt

DataService_ WebSite JavaScript 1 V4

IV forked from timwoodDataCorps_ WebSite

The website for the Commerce Data Service - A startup within the Department of

Commerce

Updated 19 hours ago

ITA_Principal_ Travel css o Vamp Updated a day ago

Commerce_Data_Academy _Courses Course materials offered by the Commerce Data Academy

Updated a day ago

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 79: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Waffle

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 80: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

COMMERCE DATA SERVICE

DistricDatalabstrinket

Backlog

S6

Better Licensing

- typefeature

SS

username check

priority medium type bug

so Dataset Searching

priority medium type feature

Dataset Overwrite

- type technicaldebt

4S

500 error on upload w missing col row values

AJAXify the uptoader

priority medium type featurC

middotmiddotOM

0

0

0 ~

0 ~ () ~

() ~ bull38

3Dtours

0 ~

37

Sampling technique for bigger datasets

0 ~

Feature nomination tool for visualization

Ready In Progress Done bull S4 14

Data file uploading Research Auto-analysis Feature Done lversiono31 - typefeature 0 IVersion 03 I priority medium question task 0~ bull Issues closed in the last week are shown in this 43 10

column Drag issues here to close them Implement beta auto analysis Dropdown Dataset Edit Form

lversiono31 - typefeature () middotbull IVersion 03 I priority medium type feature

~ Async Upload with Celery

IVersion 03 I priority medium type feature middotbull 13

Dimension Histograms and Ranking 10

IVersion 03 I priority medium type feature 0 middotbull Large files hang uploader

type buglversiono3IBlll () ~ Upload Error l ine contains NULL byte

IVersiono3 IBlll type bug () ~ 36

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 81: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Pair programmingMake your own waffle

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 82: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

CommunicationCommit Messages

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 83: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

git commit -m ldquotry to be as helpful as possiblerdquo

(To your team and to future you)

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 84: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Why

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 85: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Why do data scientists need version control

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 86: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Data Ingestion Data Munging and Wrangling

Computation and Analyses

Modeling and Application

Reporting and Visualization

Where does version control fit into thedata science pipeline

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 87: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Folder structure conventions on Github

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 88: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

READMEmd

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 89: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

gitignore

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 90: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

fixtures

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 91: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

requirementstxt

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 92: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Where to go from here

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 93: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Additional Tutorials httppcottlegithubiolearnGitBranching

httprogerdudlergithubiogit-guide

httpwwwtutorialspointcomgit

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config

Page 94: Commerce Data Academy: Intro to Github and Git...the Git website. Just go to http:llgit-scm.comldownloadlwin and the download will start automatically. Note that this is a project

Resources Git Desktop httpsdesktopgithubcom

TortoiseGit httpstortoisegitorg

Git Cheat Sheet httpstraininggithubcomkitdownloadsgithub-git-cheat-sheetpdf

Getting Started httpsgit-scmcombookenv2Getting-Started-About-Version-Control

Basics httpsgit-scmcombookenv2Git-Basics-Getting-a-Git-Repository

Branching httpsgit-scmcombookenv2Git-Branching-Branches-in-a-Nutshell

Github Setup httpsgit-scmcombookenv2GitHub-Account-Setup-and-Configuration

Git Tools httpsgit-scmcombookenv2Git-Tools-Revision-Selection

Git Commands httpsgit-scmcombookenv2Git-Commands-Setup-and-Config