rna production protein/ligand -...
TRANSCRIPT
1
ProteinCloning
Production
Purification Structural analysis:
• NMR
• X-ray
Protein/ligand interactions
Bioinformatics
RNA
Cell-free production
Alt.
Cpep10 course:Protein expression in E.coli
Problem 1: Structural integrity
Problem 3: Size
Problem 2: Space and time dependent interaction network
Problem 4: Unstructured pieces
Specials: Expression vectors, tags, cell-free systems
Problem 5: Codon usage
Change from native to host environment
The host
• Inexpensive• Easy to manipulate• Well characterized• Grows quickly• Recombinant protein up to 50% total protein• Easy and cheap labeling protocols for NMR and x-ray• Many expressionvectors and fusionproteins available• Simple mutagenesis and screening protocols• Use of cheap In-vitro transcription translation systems
2
How complex it could be !!!Natural synthesis of actin
Polymerizes controlled
Binds number of proteinsnebulin/tropomyosintroponins/myosinthymosin/profilingelsolin/actinin ...
mRNA transport to specific location
Problem: folding pathway / controlled interactions
Specific eukaryotic chaperonepathway
Acetylated N-terminus
Where a lot of proteins interact !The muscle
• Strong interactions
• Strong forces
• Many interactions
• Highly regulated interactions
• Flexibility
Result of TAP-method analysis:Average number of interactionpartners of a given yeast proteinis 5 - 10
Problem: Interaction modi / multiple interactions
3 MegaDalton proteinstructureto solve: Titanic enterprise
Problem: size / unstructured pieces
3
Typical medium sized protein
• Independent domains• Head to tail interaction• Posttranslational modification• Conformational change• Linear peptides• Protein/protein interaction• Protein/lipid interaction
insolubleinsoluble
Domain phasing
solublesoluble
insolubleinsoluble
N CDomainDomainboundariesboundariesare well are well defineddefined
Defining a domain withmultiple sequence alignment
Domain boundaries ofKH domain of FMRwas predicted.
Procaryotic membernusA stops her
4
Expressing the KH domain defined bymultiple sequence alignment
After prediction ofdomain boundariesproduced proteinwas very unstableand lowexpressed.
More variants
Redefining a domain after expression
Domain boundaries ofKH domain afterexpression andsolving structure
Missing helix of KH domainPastore group, London
Domain phasing :Too much input
Multiple sequence alignment includes prokaryotic sequence with different topology .
Literature
5
Domain phasing :The missing domain
Complex domaininterface
No independence
Domain interphasesvary and change inother structuralcontext
Attempts to express the PH domain (443-551) in E. coli were notsuccessful, and the protein product is insoluble.An extension to the N terminus with a small part of DbH domain(422-551) is soluble.
Literature
Get rid of flexible bits
FERM structure:N and C-termini come together
• Flexible central bit removed• N- and C- terminal pieces independent
expressed• Reassembled complex
X-raysamples
insolubleinsoluble
Domain independent ?
solublesoluble
ADomain ADomain Aisisintegratedintegratedin structurein structure
ADomain ADomain Ais partiallyis partiallyindependentindependent
6
Rule of thumb:Rule of thumb:
Good start : N-end ruleGood start : N-end rule55’’ codonscodons: AAA = K: AAA = K
start and endstart and endhydrophilichydrophilicsecondary structuresecondary structurenext neighbour domainnext neighbour domain
full length limited proteolysisfull length limited proteolysis
Are there simple rules ?
A
Message : Wrong borders - no folding - no expression system
Domain phasing :Multiple constructs
solublesoluble
N C
Staggered oligosThe winningteam
Typical weekTypical week
X-talsX-tals
•• multiple multiple PCRs PCRs and vectorsand vectors •• scale up scale up
• N15 probe
•• solubility screen solubility screen
???
• small scale production via autoinduction
7
•• Creation of mutant libraryCreation of mutant library•• Random mutations / DNA shufflingRandom mutations / DNA shuffling•• Deletion seriesDeletion series
(like nuclease treated full length DNA)(like nuclease treated full length DNA)
•• Reporter proteinReporter protein•• Fusion to C-terminus of target proteinFusion to C-terminus of target protein•• If reporter folds it will give signalIf reporter folds it will give signal•• N- terminus should be foldedN- terminus should be folded
Is your protein folded ?
CAT (chloramphenicol res.)GFPComplementation methodsMarker genes / proteomics
Promoter PassengerCarrier
Vector backbone
origin resistance
The Vectors:Typical structure
Affinity_high stability_high production_cleavable
Expression story 1 :Extracellular Ig domain with disulfide bridge
Screening of Tags:
• GST• His• trx• dsbA
Screening of proteasecleavage sites :
• Thrombin• Tev protease
Result:dsbA Tev with low yieldbut folded
Leaderlessversion of dsbA
Strains withmutations in redoxsystem[Origami…]
unpublished
8
2. Double Tag to get full length
Natural unstructured titin peptide [PEVK-element]
Labeit group, 1997Mannheim unpublished
Double Tag vector
PEVKPEVK
Publication on purification of arecombinant PEVK fragment
Why it’s important :
• Design might be wrong andpeptide is unfolded
• Design is OK and peptide isunstructured
Ligand interaction
3. Unstructured = Native ?3. Unstructured = Native ?
Kellenberger et al.,EMBL 2001
Why test multiple vectors :Yield
Carrierproteins + CH domains
Protease
Parallel preparation:
Uwe Sauer,Umea 1998-2006
9
The VectorsThe Vectorsgene xgene x
gene xgene x
gene xgene x
gene xgene x
gene xgene x
Unpublished
Vector backgroundCloning advantage :
• Cut medium sized gene out
• Which produces colored protein
• False positives will be colored
Alternative if domain is well defined:MonoGFP fusion
Ni
Ni
Ni
Ni
Ni
Ni
10
List of vectors
1
pETM13N_HIS_NUSA_GSTEV_GFP
7559bp
Kan®
ori
T7
YFP
lacI
His6
Tev
nusA-Carrier
XhoI (158)
NotIBamHI
Acc65I (204)
NcoI (932)
XbaI
File with Map/Features/SequenceFeatures = Data file
• MW,pI• Gels • Purification data
Carrier His_GSTev/
N_His
His_PreScission/
N_His
His_Enterokinase/
N_His
His_Thrombin/
N_His
N_His
__ dir
C-His N-strep/
C-strep
C-fusion
Trx */* */* */ */ * *
GST */* */* */* */* * *
MBP */* */* * * * * - / *
DsbA */ */ * - / *
NusA */* */* * *
DsbC * * - / *
Ztag1
Ztag2 */* */* * * *
GB1 */* */* * * *
DsbAin * * *
DsbCin * * *
EFtag * * *
Mistic
ZZ tag * *
TolA
mGFP *
fGFP *
sMBP - / *
T4lyso *
Zr-pep
SF1pep
Webpages EMBL core facility,New KBC webpages Umeå
Coexpression : Inclusion bodies
Myc doesn’t form homodimers likemax and expresses in inclusionbodies in E.coli.
Very hydrophobicinterface
Coexpression : No inclusion bodies
Myc forms stable heterodimer withmax and expresses soluble in thecomplex.
Very hydrophobicinterface
11
Heterodimers of2 different complexes
One partner is tagged ,the other not.
+ + +
Max/Myc HNF1/DCoH
Coexpression : Results
Schallreuter et al,BBRC, 2003
Coexpression Coexpression
Two-plasmid
KanRori1
+AmpR
ori2
gene I gene II
Coexpression Coexpression Dicistronic
XbaI SpeISD gene I
SD gene IIXbaI
T7
gene I gene II
+ Rnase deficient strainBL21Star
Advantage in cotranslational folding:Assembly of 7 nucleoporins into 0.5 MDComplex [Lutzmann et al]
12
Old expression problem solved:Csk_src
Src tyrosine kinase insoluble
Src tyrosine kinase + Yersinia phosphatase 10% soluble and functional
Expression in Baculovirus infected insect cells and yeast
Levinson et al,
Where to go for coexpression ?• Lac repressor : pREP4
• T7 Lysozyme : pLysS
• rare tRNAs : CodonPlus strain
• Protein modifications :
• ASF/SF2 phosphorylated by SRPKinase 1
• Farnesyl group added by transferase (lipid group)
• Dephosphorylation of tyrosine kinases (abl, src etc) for better solubility
• Heterodimers to keep protein soluble: max/myc
• Chaperones groEL/ES to increase solubility
Message : Wrong partners - no folding - no expression system
Rare codon effectsTranscriptionfactor expressed in E.colicreates another subband
MW - + - + - + IPTG
ADD.PEPTIDE
FULL LENGTH
AGGAGG CGACGG ptRNA
Remenyi 2001,EMBL
13
Rare codon effects:Misincorporation of Lysine
FMR KH domainshows strange 28 Ddifferent species in mass
ADD.PEak
FULL LENGTH
Reason:Rare arginine codonsAGAs or AGGs areloaded with lysine tRNAs;MW difference of 28 D
Pastore group, London
Codon Frequency in E. coli
Codon usage in E.coli :
Problems:
• Arg R• Ile I• Leu L
• Gly• Pro
Codon usage problems
Signs:
• Mass difference AGA loads AAA [K] / CGG loads CAG [Q]• Consecutive rare codon spots• Protein ladder after purification with N-tag or• No expression• Signs of toxicity
Solutions:
• Codon plus / Rosetta strains• Patch rare codons: Partial gene synthesis• Scattered rare codons: Gene synthesis
14
Cell-free expression
+ Nanodiscs and/or lipids Membrane protein
High copy number
Very pure
Kit
Apolipoprotein astool
Lessons from protein structures:Expression problems in E.coli
Multiple constructs/coexpression of modifiers
Cut in pieces and reassemble
Coexpression strategies or/and reassemble
Unstructured doesn’t mean unfolded or non native
Problem 1: Structural integrity
Problem 3: Size
Problem 2: Space and time dependent interaction network
Problem 4: Unstructured pieces
Parallel strategies in cloning, vector selection, strains, …..