finding and accessing human genomics datasets
TRANSCRIPT
![Page 1: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/1.jpg)
We are always looking for data
Finding & Accessing
Human Genomic Datasets
CRUK, 7th November 2016
Tweets welcome #CamFindData@repositiveio
![Page 2: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/2.jpg)
Outline of the day
- Data sources and data access - Case study: University of Cambridge- Coffee break- Introduction to Repositive- Hands-on session: searching for data- Round up and closure
![Page 3: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/3.jpg)
On-line tools used during the workshop
To ask questions during the presentation and answer questions:
go to slido.com
enter event code: 7315
![Page 4: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/4.jpg)
We are always looking for data
Finding & Accessing
Human Genomic Datasets
CRUK, 7th November 2016
Tweets welcome #CamFindData@repositiveio
![Page 5: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/5.jpg)
• 2001:FirstHumanGenomeSequence• 2005:PersonalGenomeProject• 2008:UK10K• 2013:UK100KProject• 2015:1MPrecisionMedicineUS• 2016:AstraZeneca–HLI2M
• Manyothernationalandinternationalprojects
Genome Technology Evolution
![Page 6: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/6.jpg)
•Consensusamongresearchers,clinicians,politicians&thepublicthatgenomicswilltransformbiomedicalresearch,healthcareandlifestylechoices(StephanBeck,UCL)
OPPORTUNITY
![Page 7: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/7.jpg)
Data should be made available
![Page 8: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/8.jpg)
• Requiredbyfunders• Cannotpublishunlessaccessionnumbergiven
• Specialised• ENA• EGA• dbGaP• dbSNP…
• Generalist• Dryad• figshare
Public Repositories
![Page 9: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/9.jpg)
• OpenAccess• Eg.PGP,CC0• BermudaAccord
• Managed(RestrictedorControlledAccess)• DataAccessCommittee• Noeffectiveagreement(policyvacuum)
• GlobalAllianceforGenomics&Health• enablecompatible,readilyaccessible,andscalableapproachesforsharing
GOVERNANCE Models
![Page 10: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/10.jpg)
Open vs Managed Access
OpenAccess
75,000,000permonth
ManagedAccess
150permonth
500,000 fold difference (Stephan Beck, UCL)
![Page 11: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/11.jpg)
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈.5 PB OpenAccess
80+ PB
Sequenced
Genome data available in public
repos
Exponential growth rate
Under-utilised datahashuge potentialfor
medicalresearch
![Page 12: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/12.jpg)
Access to Managed Data
Benefits:• Strictgovernance• Individualsareprotected• Reviewofconsent• Applicantsignsforfullresponsibilityforgovernance
Disadvantages:• Nocontrolofdataonceaccessisgiven
• Highbarrierforaccess–toohigh?
![Page 13: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/13.jpg)
Often a long process
Bottlenecks: • Finding relevant and usable
data• Getting authorisation to
access data• Formatting data• Storing and moving data
We studied the problem with qualitative interviews followed by a survey of researchers in
human genetics
T. A. van Schaik et alThe need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 http://tinyurl.com/schaik-dnadigest
![Page 14: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/14.jpg)
NIH / eRA Commons login
No
Yes
Organisation registered with eRA
Organisation has DUNS number
No
NoWrite research proposal
Yes+ 2-3 days
+ 1-2 weeks
+ 1 week
Yes
Submit proposal
+ 1-2 days
Access grantedFind/Download/Decrypt data
+ 1-4 weeks
Science…
+ 1-2 days
PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets.
dbGaP application process
Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/
![Page 15: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/15.jpg)
Sanger eDAM Account
No
Write research proposal
+ 1 hourYes
Submit proposal
+ 1-2 days
Access grantedFind/Download/Decrypt data
+ 2-7 days
Science…
+ 1-2 days
EGA application process
Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/
![Page 16: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/16.jpg)
• Findingspecificrelevantgenomicdataforresearchcantakeup to six months foranuntrainedresearcherwithoutdedicatedtools
• Application&responsetimefordata access committees can vary widelydependingon• thetypeofdataset• consentregulationsofthestudy
• =>thereisnoconsensusforthe‘contracts’betweeneachdataset
FACTS
![Page 17: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/17.jpg)
Researchers often choose to not access data at all
![Page 18: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/18.jpg)
WHY should we bother?
![Page 19: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/19.jpg)
• Validateexistingstudies• Avoidunnecessaryduplication• Comparetonewstudies• Enhancenewdatasets
Why datasets are useful
![Page 20: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/20.jpg)
Case studies
Raquel,PhDStudent,London,UK.
Researchinggenesassociatedwithrareeyedisorders.
Problems:- Doesn’tknowwheretolookfordata.- Doesn'tknowifdataevenexists.
“I gave up on finding the data - it was very time consuming and not proving fruitful – so I started focusing more on generating my own data.”
![Page 21: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/21.jpg)
Case studies
Mahantesh,AcademicResearcher,Taipei,Taiwan.
Studyingpharmacogenomicsincardiovascularepidemiology.
Problems:- Needslotsofdata.- Knowsitexistsbutstruggleswithgettingaccesstoit.
“Often it’s very hard to get the required number of cases and controls to carry out research in public health and epidemiology.”
![Page 22: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/22.jpg)
Case studies
Jana,CompanyBiocurator,Zurich,Switzerland.
BiocuratingmicroarrayandRNA-Seqdata.
Problems:- Needslotsofdata.- Lotsofdataouttherebuthardtofilterdownto‘useful/relevant’data.
“Many repositories don’t list the metadata details I need to know if a dataset is useful to me, I can waste a lot of time searching.”
![Page 23: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/23.jpg)
How many data sources?
How many sources of human genomics data do you know
about?
![Page 24: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/24.jpg)
Data sources across the globeGEOlocationof278datasourcesanalysed.
Found by tracking IP address of the source.
Theseinclude:
PublicRepositories
Universities
Companies
BioBanks
Researchconsortiums
![Page 25: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/25.jpg)
Data source content
Assay Types
Dedicated to…
![Page 26: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/26.jpg)
DATA is fragmented
![Page 27: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/27.jpg)
Hundreds of data sources…buttheyaren’teasytofind!
http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here:
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-160
50
100
150
200
250
300
1025 33 35
102
174
239
![Page 28: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/28.jpg)
Cambridge specific Case Study
![Page 29: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/29.jpg)
• PostdoctoralresearcheratUniversityofCambridgeMedicalSchool
• WorkingongeneticinheritanceandCancer• UsingNGSdataandbioinformatics
• Aftersearchingfordataonlineshedecidedtoapplyfor:• 2dbGaPdatasets• 3EGAdatasets
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
![Page 30: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/30.jpg)
The Research Operations Office -willhelpyouwiththecontracts(DataTransferAgreements-DTAs)andsignatures.
• HasadesignatedindividualwhoprocessesalldbGaPapplicationsastheyallabidebyNIHlegalrestrictionsandregulationsabouthowtohandlethedataoncegrantedaccess
• ForEGAapplications,eachDTAmustbeprocessedseparatelybecausethereisnoconsensusforthe‘contracts’betweeneachdataset.
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
![Page 31: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/31.jpg)
The nominated IT director -willbespecifictoyourdepartment.
• TheywillneedtoconfirmyoucansupporttherequirementsoftheDTA.
• IftheheadofyourdepartmentalITisnothappytosign–theheadofITfortheUniversitywillbeabletosignitoff.
Cambridge specific Case Study
Blog Post:Pending… will be on http://blog.repositive.io/
![Page 32: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/32.jpg)
Top Tips:
• Thinkaboutyourstoragespace!
• Thinkaboutwhatsortofanalysisandprocessingyouaregoingtodowiththedataonceyoudohaveit.Aftersuchalongprocess,theapprovalcouldbetooquick.
• Understandwhatyouneedbeforeyoustarttheapplicationprocess!
• Youmayhaveaccessforalimitedperiod
Cambridge specific Case Study
![Page 33: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/33.jpg)
COFFEE BREAK
Backin10’
![Page 34: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/34.jpg)
@repositiveio
![Page 35: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/35.jpg)
1-click to human genomic data access
to make finding data as easy as finding a book on Amazon, book a hotel on Expedia!
![Page 36: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/36.jpg)
Simpler workflowfor data access
Our expertise is data search platforms
Discoverandaccess
Search,seerelatedresults
Findcolleagues&theirdata interests
Co-annotatedata&communityfeedback
![Page 37: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/37.jpg)
We are enabling best practices
MAKE DATA DISCOVERABLE
SIMPLIFY WORKFLOWS
CONTRIBUTE TOCOMMUNITY
DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive
![Page 38: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/38.jpg)
Connecting the world of genomic data
![Page 39: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/39.jpg)
![Page 40: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/40.jpg)
1.Formgroupsof2-3people2.Selectaleader&aspokeperson3.Choose1data theme youareinterestedin
1. E.g,coloncancer,prostatecancer,breastcancer
4.Signupathttps://discover.repositive.io/5.SearchtheRepositivewithselectedtheme
Hands on
![Page 41: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/41.jpg)
Team presentation: 2 minutes
1. Introduction What data did you try to find and why?Have you tried to search for this data before?
2. MethodsThe 5 main steps you took on Repositive to try and find this data.
3. ResultsDid you find the data on Repositive?What challenges did you encounter?
4. ConclusionSum up your experience in 1 sentence.
1 2 3 4 5
![Page 42: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/42.jpg)
Feedback on the workshop
Bugs and feedback to: Charlotte at Repositive.io
Please leave your feedback on the workshop:
http://tinyurl.com/feedback280916
![Page 43: Finding and Accessing Human Genomics Datasets](https://reader034.vdocuments.us/reader034/viewer/2022042611/58738b8d1a28ab272d8b6b93/html5/thumbnails/43.jpg)
http://discover.repositive.io @repositive