analysis: discovery of possible regulatory motifs what follows is a simulation of the proposed...

34
Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please consider what capabilities you would want to serve your research and annotation interests. A narrative to help you go through the simulation appears in a red-bordered box, such as the one below. To begin: 1. Click on Slide Show, (on the upper toolbar) 2. Click View Show 3. Click Continue button Continue Scenario 5

Upload: bethany-lee

Post on 20-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Analysis: Discovery of possible regulatory motifs

What follows is a simulation of the proposed graphical interface. As you go through the simulation please consider what capabilities you would want to serve your research and annotation interests.

A narrative to help you go through the simulation appears in a red-bordered box, such as the one below.

To begin:1. Click on Slide Show, (on the upper toolbar)2. Click View Show3. Click Continue button

Continue

Scenario 5

Page 2: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

You’ve decided you want to know what regulates the expression of nif genes, encoding the machinery for nitrogen fixation. Here’s your strategy:

Scenario 5

Continue

• (Search for other genes with same motifs)

• Analyze set of 5’ sequences for motifs

• Extract 5’ sequences from all genes in set

• Collect nif genes from Anabaena PCC 7120 into set

• Include in set orthologs of the Anabaena genes

Analysis: Discovery of possible regulatory motifs

Page 3: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Click on Build Set to begin finding orfs with

the desired specifications

Page 4: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in All open reading frames of

All amino acid sequences of

All intergenic regions of

Human-annotated orfs of

Private set

Public set

All open reading frames of

Build set Display set Modify set Set operation Cancel

Choose set type

The first goal is to find all open reading frames within Prochlorococcus

annotated as nif genes, so click on All open reading frames in

Page 5: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in All open reading frames of Arthrobacter platensisGloeobacter violaceusMicrocystis aeruginosa

Nostoc punctiformeNostoc PCC 7120

Prochlorococcus MED4Prochlorococcus MIT9313

Prochlorococcus S120Synechococcus PCC6301Synechococcus PCC7942

Synechococcus WHSynechocystis PCC 6803Thermosynechococcus

TrichodesmiumUnicellulularFilamentous

All

Anabaena PCC 7120

Display set Modify set Set operation Cancel

Choose set type Choose database

Build set

Click on Anabaena PCC 7120

Page 6: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in Anabaena PCC 7120

Display set Modify set Set operation Cancel

such that:

Variable Data Operation Function Done

Choose database

Build set

All open reading frames of

Choose set type

You want to compare the description of each orf with “nif”. To get a tool to extract the

description, click on .Function

Page 7: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in Anabaena PCC 7120

Display set Modify set Set operation Cancel

such that:

Variable Data Operation Function Done

Choose database

Closest ortholog of

Protein product of

Upstream region of

Downstream region of

Description of

Category of

Annotation level of

Description of

Choose function

(item

Build set

All open reading frames of

Choose set type

Click on Description of.

Page 8: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in

Display set Modify set Set operation Cancel

Variable Data Operation Function Done

Description of

Choose function

(item) =

includesexcludesincludes

Op

Build set

You want to find orfs whose description includes the word “nif”. Click on includes.

Anabaena PCC 7120 such that:

Choose database

All open reading frames of

Choose set type

Page 9: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in

Display set Modify set Set operation Cancel

Data Operation Function Done

includes

Op

nif

Type description term(s)

Build set

Description of

Choose function

(item)

You can type in any characters to search for. For this simulation, the term “nif” is

provided. Press the Enter key

Anabaena PCC 7120 such that:

Choose database

All open reading frames of

Choose set type

Page 10: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in

Display set Modify set Set operation Cancel

Variable Data Operation Function Done

includes

Op

nif

Type description term(s)

Build set

Description of

Choose function

(item)

No more specifications. Press the Done button.

Anabaena PCC 7120 such that:

Choose database

All open reading frames of

Choose set type

Page 11: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in

Display set Modify set Set operation Cancel

Variable Data Operation Function Done

includes

Op

nif

Type description term(s)

Build set

Description of

Choose function

(item)

Done

Save results and scriptSave only resultsSave only results

If this were a complicated search, you might want to save the specifications as a script. In this case, just save the results by clicking on

Save only results.

Anabaena PCC 7120 such that:

Choose database

All open reading frames of

Choose set type

Page 12: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

All items in

Display set Modify set Set operation Cancel

Variable Data Operation Function Done

includes

Op

nif

Type description term(s)

Build set

Description of

Choose function

(item)

7120 nif genes

Type name of set

Anabaena PCC 7120 such that:

Choose database

All open reading frames of

Choose set type

All orfs of Anabaena whose descriptions include “nif” will be collected into a set. You can name the set anything you want. For this simulation, a

name is provided. Press the Enter key.

Page 13: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus

Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit

Anab7120:alr0692 similar to nifU

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:asr1309 similar to nifU

Anab7120:alr1407 nifV1 homocitrate synthase

Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis

Anab7120:asr1409 nifT

Done

Set: 7120 nif genes

<< more items >>

This is the result of the search. The set is displayed both as a list of orfs and a graphical representation of

the genetic neighborhood of each orf. You can find out more about an orf by clicking its name or its

arrow. For now, just press . ContinueContinue

Page 14: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus

Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit

Anab7120:alr0692 similar to nifU

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:asr1309 similar to nifU

Anab7120:alr1407 nifV1 homocitrate synthase

Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis

Anab7120:asr1409 nifT

Done

Set: 7120 nif genes

<< more items >>

This search, like most, is only a beginning. It brought up some unintended hits (“nif” found “NiFe”). More seriously, it brought up

many genes probably in the middle of operons and unlikely to be preceded by regulatory motifs. The genetic neighborhood gives clues

as to operon structure. Select the two most likely orfs to begin operons by clicking on the circles next to alr0874 and alr1407.

Page 15: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, C terminus

Anab7120:all0687 hupL [NiFe] uptake hydrogenase large subunit, N terminus

Anab7120:all0688 hupS [NiFe] uptake hydrogenase small subunit

Anab7120:alr0692 similar to nifU

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:asr1309 similar to nifU

Anab7120:alr1407 nifV1 homocitrate synthase

Anab7120:asr1408 nifZ iron-sulfur cofactor synthesis

Anab7120:asr1409 nifT

Done

Set: 7120 nif genes

<< more items >>

Let’s suppose you proceed in a like fashion through the rest of the list.

Press . Done

Page 16: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:alr1407 nifV1 homocitrate synthase

Done

Set: 7120 nif genes

The set now consists of the six Anabaena nif genes that you judged most likely to be preceded by transcriptional signals. It might be interesting to see where this set is located on the genome. To do this, click , then make some room

by clicking on Show graphic.Display set

Anab7120:all1438 nifE nitrogenase Fe/Mo cofactor

Anab7120:all1455 nifH dinitrogenase reductase

Anab7120:all1517 nifB nitrogen fixation protein

Anab7120:alr2968 nifV2 homocitrate synthase

Display set

Show orf ID

Show gene name

Show description

Show coordinates

Show graphic

Show neighbors: +/- 1

Show map

Page 17: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:alr1407 nifV1 homocitrate synthase

Done

Set: 7120 nif genes

Replace the space-consuming description with coordinates by clicking on Show

description, and then click Show coordinates and finally Show map.

Anab7120:all1438 nifE nitrogenase Fe/Mo cofactor

Anab7120:all1455 nifH dinitrogenase reductase

Anab7120:all1517 nifB nitrogen fixation protein

Anab7120:alr2968 nifV2 homocitrate synthase

Display set

Show orf ID

Show gene name

Show description

Show coordinates

Show graphic

Show neighbors: +/- 1

Show map

Page 18: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation

Anab7120:alr0874 nifH2

Anab7120:alr1407 nifV1

Done

Set: 7120 nif genes

Anab7120:all1438 nifE

Anab7120:all1455 nifH

Anab7120:all1517 nifB

Anab7120:alr2968 nifV2

Display set

Show orf ID

Show gene name

Show description

Show coordinates

Show graphic

Show neighbors: +/- 1

Show map

Replace the space-consuming description with coordinates by clicking on Show

description, and then click Show coordinates and finally Show map.

Page 19: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Anab7120:alr0874 nifH2 1008496 -> 1009389

Anab7120:alr1407 nifV1 1671878 -> 1673011

Anab7120:all1438 nifE 1696389 <- 1697831

Anab7120:all1455 nifH 1713396 <- 1714283

Anab7120:all1517 nifB 1776670 <- 1778097

Anab7120:alr2968 nifV2 3609625 -> 3611012

Build set Display set Modify set Set operation Done

Set: 7120 nif genes

Replace the space-consuming description with coordinates by clicking on Show

description and then Show coordinates, and finally, click on Show map.

Display set

Show orf ID

Show gene name

Show description

Show coordinates

Show graphic

Show neighbors: +/- 1

Show map

Page 20: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation Done

Anab7120:alr0874 nifH2 1008496 -> 1009389

Anab7120:alr1407 nifV1 1671878 -> 1673011

Set: 7120 nif genes

Anab7120:all1438 nifE 1696389 <- 1697831

Anab7120:all1455 nifH 1713396 <- 1714283

Anab7120:all1517 nifB 1776670 <- 1778097

Anab7120:alr2968 nifV2 3609625 -> 3611012

Anabaenachromosome

6413771 bpFour of the six putative nif operons are clustered near 1.7 Mb... but

back to business. Our idea was to extend the set to include orthologs in other nitrogen-fixing cyanobacteria.

To do this, click , then

Transformations, then Ortholog of.Set operation

Set operation

Maintenance

Set operations

Analysis tools

Discovery tools

TransformationsTransformations Closest ortholog of

Protein product of

Upstream region of

Downstream region of

Ortholog of

Page 21: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Orthologs of (

Build set Display set Modify set Set operation Cancel

All open reading frames of

All amino acid sequences of

All intergenic regions of

Human-annotated orfs of

Public set

Private setPrivate set

Choose set type

You want the orthologs of the orfs in the set you just made. This set is yours – a private

set – as opposed to certain sets that are available to all users. Click Private set.

Page 22: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Orthologs of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

The list of choices will consist of whatever sets you may have created. Choose the one

you just made: 7120 nif genes.

7120 IS895 seqs7120 nif genes

7120 STTR7 regionsLight-specific genesNpun STTR7 regions

7120 nif genes

Choose set

Page 23: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Orthologs of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

At present, the set of filamentous cyanobacteria include just the nitrogen-

fixing strains Nostoc punctiforme, Trichodesmium erythreum, Anabaena.

Click on filamentous.

7120 nif genes

Choose set

Arthrobacter platensisGloeobacter violaceusMicrocystis aeruginosa

Nostoc punctiformeAnabaena PCC 7120

Prochlorococcus MED4Prochlorococcus MIT9313

Prochlorococcus S120Synechococcus PCC6301Synechococcus PCC7942Synechococcus WH8102Synechocystis PCC 6803Thermosynechococcus

Trichodesmium erythreumUnicellulularFilamentous

Allfilamentous

Choose database

in )

Page 24: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Orthologs of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

7120 nif genes

Choose set

Filamentous

Choose database

in )

all nif genes

Type name of set

All orthologs of the selected nif genes will be combined and saved in a set of

your choice. For this simulation, a name is provided. Press the Enter key.

Page 25: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation Done

Anab7120:alr0874 nifH2 dinitrogenase reductase

Anab7120:alr1407 nifV1 homocitrate synthase

Set: all nif genes

Anab7120:all1438 nifE nitrogenase Fe/Mo cofactor

Anab7120:all1455 nifH dinitrogenase reductase

Anab7120:all1517 nifB nitrogen fixation protein

Anab7120:alr2968 nifV2 homocitrate synthase

NostPunc:637.025 nifH2 dinitrogenase reductase

NostPunc:510.011 nifV1 homocitrate synthase

NostPunc:651.072 nifE nitrogenase Fe/Mo cofactor

NostPunc:510.021 nifB nitrogen fixation protein

<< more items >>The set now consists of nif genes from all filamentous cyanobacteria. From this set

we want to extract the upstream sequences. Click on ,then click on Transformations and

Upstream region of.

Set operation

Ortholog of

Protein product of

Upstream region of

Downstream region ofUpstream region of

Set operation

Maintenance

Set operations

Analysis tools

Discovery tools

TransformationsTransformations

Page 26: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Upstream region of (

Build set Display set Modify set Set operation Cancel

All open reading frames of

Human-annotated orfs of

Public set

Private setPrivate set

Choose set type

Again you want the orfs from a set you made yourself, so click on Private set.

Page 27: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Upstream region of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

7120 IS895 seqs7120 nif genes

7120 STTR7 regionsall nif genes

Light-specific genesNpun STTR7 regions

all nif genes

Choose set

)

The set you just defined magically appears on the list (no chance for

misspelling). Click on it.

Page 28: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Upstream region of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

all nif genes

Choose set

)

Give this new set of 5’ regions a descriptive name (done here for you). Press the Enter key.

all nif genes – 5’

Type name of set

Page 29: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation Done

Anab7120.C:1006982-1008496d

Anab7120.C:1671462-1671878d

Set: all nif genes – 5’

Anab7120.C:1697832-1698138c

Anab7120.C:1713264-1713395c

Anab7120.C:1778098-1779034c

Anab7120.C:3609273-3609624d

NostPunc.637:37288-37376d

NostPunc.510:15955-16325d

NostPunc.651:60311-60584c

NostPunc.510:5239-6338c

<< more items >>The resulting set consists of sequences not orfs, and so the elements are defined by coordinates.

Clicking on a coordinate brings up the sequence display (see Scenario 6). Clicking on a graph of an orf brings up the orf’s annotation

page. Click .Continue Continue

Page 30: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Build set Display set Modify set Set operation Done

Anab7120.C:1006982-1008496d

Anab7120.C:1671462-1671878d

Set: all nif genes – 5’

Anab7120.C:1697832-1698138c

Anab7120.C:1713264-1713395c

Anab7120.C:1778098-1779034c

Anab7120.C:3609273-3609624d

NostPunc.637:37288-37376d

NostPunc.510:15955-16325d

NostPunc.651:60311-60584c

NostPunc.510:5239-6338c

<< more items >>The final step in this procedure is to analyze the set of upstream sequences of nif genes hoping to find a

common motif. Click on Set operatio , then Analysis tools. Tools based on Position-Specific

Scoring Matrices (PSSM’s) are most often used for the task. Click on one of these: Meme.

Set operation

Maintenance

Set operations

Analysis tools

Discovery tools

Transformations

Analysis tools Align

PSSM: Gibbs sampler

PSSM: Meme

Make HMM

PSSM: Meme

Set operation

Page 31: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

PSSM: Meme of (

Build set Display set Modify set Set operation Cancel

Public set

Private setPrivate set

Choose set type

Click Private set and then all nif genes – 5’ to give Meme the set of 5’ sequences.

Page 32: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

PSSM: Meme of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

Click Private set and then all nif genes – 5’ to give Meme the set of 5’ sequences.

7120 IS895 seqs7120 nif genes

7120 STTR7 regionsall nif genes

all nif genes – 5’Npun STTR7 regions

all nif genes – 5’

Choose set

)

Page 33: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

PSSM: Meme of (

Build set Display set Modify set Set operation Cancel

Private set

Choose set type

Give the results a name, press Enter, and the task is accomplished.

all nif genes – 5’

Choose set

)

PSSM:all nif – 5’

Type name of results

Page 34: Analysis: Discovery of possible regulatory motifs What follows is a simulation of the proposed graphical interface. As you go through the simulation please

Analysis: Discovery of possible regulatory motifsSummary

• The interface facilitates operations on sets of genes and sequences

• The interface puts at your disposal powerful tools (that already exist), without the need to figure out a different computer environment

• Taken together, these capabilities make possible a focus by those not particularly adept at computer programming on the function of noncoding sequences

Scenario 5

But don’t be fooled – the interface does not yet exist. That’s the point of the proposal!