my grid - putting the scientist at the centre
DESCRIPTION
my Grid - putting the scientist at the centre. A case study investigating Williams-Beuren Syndrome. - PowerPoint PPT PresentationTRANSCRIPT
myGrid -putting the scientist at the centre
A case study investigating Williams-Beuren Syndrome
The scientist’s (Hannah’s) problem
Chr 7 ~155 Mb
~1.5 Mb
7q11.23
CTA-315H11
CTB-51J22
‘Gap’
Physical Map
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
1. Identify new, overlapping sequence of interest
2. Characterise the new sequence at nucleotide and amino acid level
Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc
A B C
The Williams Workflows
A: Identification of overlapping sequenceB: Characterisation of nucleotide sequenceC: Characterisation of protein sequence
Recording Architecture
19747251 AC005089.3831Homo sapiens BAC
clone CTA-315H11 from 7, complete sequence15145617 AC073846.6
815Homo sapiens BAC
clone RP11-622P13 from 7, complete sequence15384807 AL365366.20
46.1Human DNA sequence
from clone RP11-553N16 on chromosome 1, complete sequence7717376 AL163282.2
44.1Homo sapiens
chromosome 21 segment HS21C08216304790 AL133523.5
44.1Human chromosome 14
DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence34367431 BX648272.1
44.1Homo sapiens mRNA;
cDNA DKFZp686G08119 (from clone DKFZp686G08119)5629923 AC007298.17
44.1Homo sapiens 12q22
BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence34533695 AK126986.1
44.1Homo sapiens cDNA
FLJ45040 fis, clone BRAWH302048620377057 AC069363.10
44.1Homo sapiens
chromosome 17, clone RP11-104J23, complete sequence4191263 AL031674.1
44.1Human DNA sequence
from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence17977487 AC093690.5
44.1Homo sapiens BAC
clone RP11-731I19 from 2, complete sequence17048246 AC012568.7
44.1Homo sapiens
chromosome 15, clone RP11-342M21, complete sequence14485328 AL355339.7
44.1Human DNA sequence
from clone RP11-461K13 on chromosome 10, complete sequence5757554 AC007074.2
44.1Homo sapiens PAC
clone RP3-368G6 from X, complete sequence4176355 AC005509.1
44.1Homo sapiens
chromosome 4 clone B200N5 map 4q25, complete sequence2829108 AF042090.1
44.1Homo sapiens
chromosome 21q22.3 PAC 171F15, complete sequence
>gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequenceAAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAGGAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTCAAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCTGTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG
urn:lsid:taverna:datathing:15
..BLAST_Report
rdf:type
urn:lsid:taverna:datathing:13
..similar_sequences_to
.. nucleotide_sequence
rdf:type
service invocation
..created_by
workflow invocation
workflow definition
experiment definition
project
person
group
service description
organisation
..described_by
..run_during
..invocation_of
..part_of
..works_for
..part_of
..part_of
..author
..author
..run_for
..masked_sequence_of
..filtered_version_of
The myGrid Information Model Annotation & argumentation
Using workflows and web services
• Automation– Capturing processes in an explicit manner– Tedium! Computers don’t get bored/distracted/hungry/impatient!– Saves repeated time and effort
• Modification, maintenance, substitution and personalisation
• Easy to share, explain, relocate, reuse and build• Available to wider audience: don’t need to be a coder,
just need to know how to do Bioinformatics • Releases Scientists/Bioinformaticians to do other work• Record
– Provenance: what the data is like, where it came from, its quality– Management of data (LSID - Life Science IDentifiers)
Demonstration topics
• Taverna – using a workflow editing environment to capture bioinformatics protocols
• Personalisation – setting context to allow later personalisation
• Provenance – retaining information on the origin of results
The myGrid Information Model Programmes, studies & experiments
has participants
1 0..*
uses
10..*contains
1
0..*
method
0..*1
episodes1
0..*
lab books
1
0..*
participates in
10..*
acts in0..*
1
selected studies
0..*
instances
1 0..*
initiates
1
0..*
LabBookView
StudyRole
StudyParticipationEpisode
PersonStudyParticipation
ExperimentDesign
InvestigationProgramme
Study
ProgrammeResource
Operation
Workflow
WebServiceOperation
ExperimentInstance
example operation types
The myGrid Information Model Provenance metadata
created via
1 1created by
0..*
1
outputs
1
0..*
inputs
1
0..*
includes
0..1
1
value
1
value
1
has provenance
trace0..1
1
initiates
1 0..*StudyParticipation
LifeScienceDocument
ActualInputParameter
ActualOutputParameter
WorkflowTrace
WebServiceTrace
OperationTrace
DirectCreation
CreationTypeDataProvenance
InvestigationExperimentInstance
example trace types