cftr – gene cloning and initial bioinformatic analysis riordan et 12(*) et tsui (1989) science...
TRANSCRIPT
CFTR – gene cloningand initial bioinformatic analysis
Riordan et 12(*) et Tsui (1989)Science 245:1066
Carlow IT BioinformaticsNovember 2006
* Including Francis Collins, later leader of the Human Genome Sequencing Project
Cystic fibrosis
• Horrible inherited disease– Affecting lung, pancreas, sweat-glands
• Abnormally high trans-membrane electrical potential– Decreased Cl- ion membrane transport
• Often associated with failure to respond to ATP dependent kinase – no phosphorylation: no function
More symptoms etc.
• Difficult breathing• Early death (1959 6mths, 2006 38yrs)• More prone to infections (thicker mucus)• Can do pre-natal diagnosis or sweat test• "Woe is the child who tastes salty from a
kiss on the brow, for he is cursed, and soon must die“ German proverb 1700s
• We modify AMPs defensins: can make one effective in high salt environment??
Genetics & epidemiology
• Located on chr 7q31.2 180Kb gene• 1 in 25 europeans carries a CFTR mutation so
1:2500 live birth have the disease• Males and female equally affected• Life expect higher in males – nobody knows why
• Why so common?• Cholera toxin requires normal CFTR• Also possible connexion with typhus
Mapping
• Genetic association with markers pinpoints chromosome 7
• Chromosome walking to zero in
• NO genome sequence in those days
Clone and sequence
• Why bother?– because we can!– ? can predict features/functions– ? Can compare CF v normal to identify mutation
• Working with cDNA not genomic• Generate cDNA libraries from cells & cell-lines• Screen for cDNAs that hybridise with known
CFTR fragment• Eventually (much hard work) got 19 overlapping
cDNA clones
Gene sequence
• Clones span 6.1kb of RNA• ORF protein of 1480 amino acids
– So bigger than 300AA average
• In 1989 << 1000 human genes sequenced • Bioinformatic analysis possible then:
– Start codon, consensus seq for transl start + AUG– 2nd structure prediction– Hydropathy plot– Homology searches (pre BLAST)– Glycosylation, Ser, Thr kinase sites
Protein analysis
Whole protein is two similar halves each with 6 membraneSpanning domains (hydropathic peaks) and two NBFs (hydrophilic regions) and a charged R region
Fig6 – homology/similarity F508
Comparing two conserved regions in CFTR and other proteins: some withTwo, some with one similar region, multidrug resistance, transporters etc.
Conserved, hydrophobicAromatic position at 508
Structure of the fold
• Two halves similar structure but low AA conservation (best is only 27/66 identities)
• Others in family have much tighter conservation
• No signal peptide says that orientation of first TM domain is (i – o)
• External loops very short• …except between TM7 and TM8 where
there is N glycosylation site
More…
• R domain is one exon 69/241 residues are polar alternating +ve and –ve charge regions
• Also most of the phosphorylation kinase sites• All family members secrete something:
– Chloride (CFTR) – Pigment (drosophila white gene)– lytic peptide (E. coli hemolysin)
• …so what about the “function unknown” mbpX gene in liverwort chloroplasts ?
More…
• Hypothesise that CFTR is the ion channel
• 10/12 of TM domains have >1 +ve AA– ie. amphipathic helix– cf. brain Na+ channel & GABA-R Cl- channel
• Contrast p-glycoprotein– Closely realted but no +ve TM AAs
• Big protein – maybe also other functions
Conclude
• From very little data and very small DBN=bases N=seqs
• 198823,800,000 20,579198934,762,585 28,791199049,179,285 39,533
• 200011,101,066,288 10,106,023
• to compare with can make predictions about structure and function that have stood the test of time.
Postscript
• F508 may be about delivery of protein to the membrane– Functions fine if you trick cells to deliver!
• By 1995 300 different mutations identified in the gene
• Last month 1531 different mutations at– http://www.genet.sickkids.on.ca/cftr/StatisticsPage.html
• With human genome, SNPs, ESTs much easier to interpret sequence information