and computing etc1010: data modelling · exercises on creating rmarkdown reports on the humble...

62
ETC1010: Data Modelling and Computing Week of introduction: Rmarkdown Professor Di Cook & Dr. Nicholas Tierney EBS, Monash U. 2019-08-02

Upload: others

Post on 09-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

ETC1010:DataModellingandComputingWeekofintroduction:RmarkdownProfessorDiCook&Dr. NicholasTierneyEBS,MonashU.2019-08-02

Page 2: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Whatisthissong?

2/62

Page 3: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Recap3/62

Page 4: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Tra�cLightSystem

4/62

Page 5: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

RedPostit

--

IneedahandSlowdown

GreenPostit

--

IamuptospeedIhavecompletedthething

Tra�cLightSystem

5/62

Page 6: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

6/62

Page 7: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

7/62

Page 8: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

8/62

Page 9: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

9/62

Page 10: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Ressentials:Ashortlist(fornow)Functionsare(mostoften)verbs,followedbywhattheywillbeappliedtoinparentheses:

do_this(to_this)do_that(to_this,to_that,with_those)

Columns(variables)indataframesareaccessedwith$:

dataframe$var_name

Packagesareinstalledwiththeinstall.packagesfunctionandloadedwiththelibraryfunction,oncepersession:

install.packages("package_name")library(package_name)

10/62

Page 11: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Today:Outline

WhywecareaboutReproducibilityR+markdown=RmarkdownControlingoutputandinputofrmarkdownExercisesoncreatingrmarkdownreportsonthehumbleplatypusFormupassignmentgroupsQuizReleaseassignment(latertoday)

11/62

Page 12: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Weareinatightspotwithreproducibility

12/62

Page 13: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Only6outof53landmarkresultscouldbereproduced

--Amgen,2014*

*HeardviaGarretGrolemund'sgreattalk

13/62

Page 14: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Anestimated75%-90%ofpreclinicalresultscannotbereproduced

--Begley,2015*

*HeardviaGarretGrolemund'sgreattalk

14/62

Page 15: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Estimatedannualcostofirreproducibilityforbiomedicalindustry=28BillionUSD

--Freedman,2015*

*HeardviaGarretGrolemund'sgreattalk

15/62

Page 16: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

16/62

Page 17: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

17/62

Page 18: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

18/62

Page 19: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Sowhatcanwedoaboutit?

19/62

Page 20: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Reproducibilitychecklist

Near-termgoals:

Arethetablesand�guresreproduciblefromthecodeanddata?Doesthecodeactuallydowhatyouthinkitdoes?Inadditiontowhatwasdone,isitclearwhyitwasdone?(e.g.,howwereparametersettingschosen?)

20/62

Page 21: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Reproducibilitychecklist

Long-termgoals:

Canthecodebeusedforotherdata?Canyouextendthecodetodootherthings?

21/62

Page 22: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Literateprogrammingisapartialsolution

Literateprogrammingshinessomelightonthisdarkareaofscience.

AnideafromDonaldKnuthwhereyoucombineyourtextwithyourcodeoutputtocreateadocument.

Ablendofyourliterature(text),andyourprogramming(code),tocreatesomethingyoucanreadfromtoptobottom.

22/62

Page 23: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

So

Imagineareport:

Introduction,methods,results,discussion,andconclusion,

andallthebitsofcodethatmakeeachsection.

Withrmarkdown,youcanseeallthepiecesofyourdataanalysisll t th

23/62

Page 24: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

ll t th

Markdownasanewplayertolegibility

In2004,JohnGruber,ofdaring�reballcreatedmarkdown,asimplewaytocreatetextthatrenderedintoaHTMLwebpage.

24/62

Page 25: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

-bulletlist-bulletlist-bulletlist bulletlist

bulletlistbulletlist

25/62

Page 26: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

1.numberedlist2.numberedlist3.numberedlist

__bold__,**bold**,_italic_,*italic*

>quoteofsomethingprofound

1.numberedlist2.numberedlist3.numberedlist

bold,bold,italic,italic

quoteofsomethingprofound

26/62

Page 27: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Withverylittlemarkingup,wecancreaterichtext,thatactuallyresemblesthetextthatwewanttosee.

27/62

Page 28: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

LearntousemarkdownInyoursmallgroups,spend�veminutesworkingthroughmarkdowntutorial.com

05:0028/62

Page 29: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Rmarkdownhelpscompletethesolutiontothereproducibilityproblem

Q:Howdowetakemarkdown+Rcode="literateprogrammingenvironment"

A:Rmarkdown

29/62

Page 30: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Rmarkdown...

Providesanenvironmentwhereyoucanwriteyourcompleteanalysis,andmarriesyourtext,andcodetogetherintoarichdocument.

Youwriteyourcodeascodechunks,putyourtextaroundthat,andthenheypresto,youhaveadocumentyoucanreproduce.

30/62

Page 31: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Reminder:You'vealreadyusedrmarkdown!

31/62

Page 32: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

HowwillweuseRMarkdown?

Everyassignment+project/isanRMarkdowndocumentYou'llalwayshaveatemplateRMarkdowndocumenttostartwithTheamountofsca�oldinginthetemplatewilldecreaseoverthesemesterTheselecturenotesarecreatedusingRMarkdown(!)

32/62

Page 33: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Theanatomyofanrmarkdowndocument

Therearethreepartstoanrmarkdowndocument.

Metadata(YAML)Text(markdownformatting)Code(codeformatting)

DEMO33/62

Page 34: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Metadata:YAML(YAMLAin'tMarkupLanguage)

Themetadataofthedocumenttellsyouhowitisformed-whatthetitleis,whatdatetoput,andothercontrolinformation.

Ifyou'refamiliarwith ,thisisksimilartohowyouspecifydocumenttype,styles,fonts,options,etcinthefrontmatter/preamble.

LT XAE

34/62

Page 35: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Metadata:YAML

RmarkdowndocumentsuseYAMLtoprovidethemetadata.Itlookslikethis:

---title:"Anexampledocument"author:"NicholasTierney"output:html_document---

Itstartsanendswiththreedashes---,andhas�eldslikethe 35/62

Page 36: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

f ll d

1.bulletlist2.bulletlist3.bulletlist

1.bulletlist2.bulletlist3.bulletlist

Text

Ismarkdown,aswediscussedintheearliersection,

Itprovidesasimplewaytomarkuptext

36/62

Page 37: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code

Werefertocodeinanrmarkdowndocumentintwoways:

1.Codechunks,and2.Inlinecode.

37/62

Page 38: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Codechunks

Codechunksaremarkedbythreebackticksandcurlybraceswithrinsidethem:

```{rchunk-name}#acodechunk```

38/62

Page 39: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

abacktickisaspecialcharacteryoumightnothaveseenbefore,itistypicallylocatedunderthetildekey(~).OnUSA/Australiakeyboards,isundertheescapekey:

39/62

Page 40: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Inlinecode

Sometimesyouwanttorunthecodeinsideasentence.Thisiscalledrunningthecode"inline".

Youmightwanttorunthecodeinlinetonamethenumberofvariablesorrowsinadatasetinasentencelike:

ThereareXXXobservationsintheairqualitydataset,andXXXvariables.

40/62

Page 41: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Inlinecode

Youcancallcode"inline"likeso:

Thereare`rnrow(airquality)`observationsintheairqualitydataset,and`rncol(airquality)`variables.

Whichgivesyouthefollowingsentence

Thereare153observationsintheairqualitydataset,and6i bl

41/62

Page 42: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

i bl

Code:Inlinecode

What'sgreataboutthisisthatifyourdatachangesupstream,thenyoudon'tneedtoworkoutwhereyoumentionedyourdata,youjustupdatethedocument.

42/62

Page 43: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

YourTurn:Putittogether

Gotorstudio.cloudand

openthedocument"01-oz-atlas.Rmd"knitthedocumentChangethedatasectionatthetoptobefromadi�erentstateinsteadof"NewSouthWales"knitthedocumentagain

Howdothetextand�guresinthedocumentchange? 05:0043/62

Page 44: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

break44/62

Page 45: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

45/62

Page 46: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Chunknames

Straightafterthe```{ryoucanuseatextstringtonamethechunk:

```{rread-crime-data}

```{rread-crime-data}crime<-read_csv("data/crime-data.csv")```

46/62

Page 47: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:ChunkNames

Namingcodechunkshasthreeadvantages:

1.Navigatetospeci�cchunksusingthedrop-downcodenavigatorinthebottom-leftofthescripteditor.

2.Graphicsproducedbychunksnowhaveusefulnames.3.Youcansetupnetworksofcachedchunkstoavoidre-performing

expensivecomputationsoneveryrun.

47/62

Page 48: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Chunknames

Everychunkshouldideallyhaveaname.Namingthingsishard,butfollowtheserulesandyou'llbe�ne:

1.Onewordthatdescribestheaction(e.g.,"read")2.Onewordthatdescribesthethinginsidethecode(e.g,

"gapminder")3.Separatewordswith"-"(e.g.,read-gapminder)

48/62

Page 49: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:Chunkoptions

Youcancontrolhowthecodeisoutputbychangingthecodechunkoptionswhichfollowthetitle.

```{rread-gapminder,eval=FALSE,echo=TRUE}gap<-read_csv("gapminder.csv")```

Whatdoyouthinkthisdoes?

00:3049/62

Page 50: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Code:ChunkoptionsThecodechunkoptionsyouneedtoknowaboutrightnoware:

cache:TRUE/FALSE.Doyouwanttosavetheoutputofthechunksoitdoesn'thavetorunnexttime?eval:TRUE/FALSEDoyouwanttoevaluatethecode?echo:TRUE/FALSEDoyouwanttoprintthecode?include:TRUE/FALSEDoyouwanttoincludecodeoutputinthe�naloutputdocument?SettingtoFALSEmeansnothingisputintotheoutputdocument,butthecodeisstillrun.

Youcanreadmoreabouttheoptionsattheo�cialdocumentation:https://yihui.name/knitr/options/#code-evaluation 50/62

Page 51: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Yourturn

gotorstudio.cloud,opendocument01-oz-atlas.Rmdandchangethedocumentsothatthecodeoutputishidden,butthegraphicsareshown.(Hint:Google"rstudiormarkdowncheatsheet"forsometips!)Re-Knitthedocument.TakealookattheRMarkdownGallery.

05:0051/62

Page 52: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Globaloptions:Setandforget

Youcansetthedefaultchunkbehaviouronceatthetopofthe.Rmd�leusingachunklike:

knitr::opts_chunk$set(echo=FALSE,cache=TRUE)

thenyouwillonlyneedtoaddchunkoptionswhenyouhavethei l th t 'd lik t b h di� tl

52/62

Page 53: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

i l th t 'd lik t b h di� tl

Yourturn

Gotoyour01-oz-atlas.Rmddocumentonrstudio.cloudandchangetheglobalsettingsatthetopofthermarkdowndocumenttoecho=FALSE,andcache=TRUEUpdatetheothercodechunksbyremovingthecodechunkoptions.

03:0053/62

Page 54: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

DEMO

Themanydi�erentoutputsofrmarkdown

54/62

Page 55: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Yourturn:Di�erenttypesofdocuments

1.ChangetheoutputofyourcurrentRMarkdown�letoproduceaWorddocument.Nowtrytoproducepdf-thismaynotwork!That'sOK,wedo'ntneeditrightnow.

2.CreateanewdocumentthatwillproduceaslideshowFile>NewRMarkdown>Presentation

3.Createa�exdashboarddocument-seethisoptionintheFile>NewRMarkdown>Fromtemplate

list. 55/62

Page 56: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

YourTurn:Makingthegroups

Wearegoingtosetupthegroupsfordoingassignmentwork.

1.Chooseaquotefromthebag.2.Findtheotherpeopleintheclasswiththesamequoteasyou3.Grabyourgearandclaimatabletoworktogetherat.

56/62

Page 57: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

YourTurn:Askyourteammatesthesequestions:

1.Whatisonefoodyou'dneverwanttotasteagain?2.Ifyouwereacomicstripcharacter,whowouldyoubeandwhy?

LASTLY,comeupwithanameforyourteamandtellthistoatutor,alongwiththenamesofmembersoftheteam.

05:0057/62

Page 58: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

YourTurn

Gotorstudio.cloudtooz-atlas-final.RmdReadthroughthedocumentandaddtextwherepromptedtolearnmoreabouttheAustraliannativeplatypus!

58/62

Page 59: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Recap:

ThereisaReproducibilityCrisisrmarkdown=YAML+text+codermarkdownhasmanydi�erentoutputtypesPlatypusareinteresting!Assignmentwillbeannouncedlatertoday

59/62

Page 60: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Learningmore:

RMarkdowncheatsheetandMarkdownQuickReference(Help->MarkdownQuickReference)handy,we'llrefertoitoftenasthecourseprogresses

60/62

Page 61: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Labquiz

TakethequizfortodayfromED.

61/62

Page 62: and Computing ETC1010: Data Modelling · Exercises on creating rmarkdown reports on the humble platypus Form up assignment groups Quiz Release assignment (later today) 11 / 62. We

Shareandsharealike

ThisworkislicensedunderaCreativeCommonsAttribution-NonCommercial-ShareAlike4.0InternationalLicense.

62/62