protein engineering and directed evolution may 24, 2011
TRANSCRIPT
Protein Engineering and Directed Evolution
May 24, 2011
Protein Engineering
Sequence
Structure
Function
A linear combination of 20 amino acids. Protein sequence dictated by gene sequence
The protein structure is dictated by its linear sequence
Protein function is dictatedby its three dimensional structure
Optimize functionsof proteins/enzymes
Structural informationcan be a guide
Generate variantsthrough alterationof DNA sequence
Enzymes in Food ProcessingIncreasing the quality of beer
The enzymes alpha-amylase, glucoamylase and glucose isomerase convert starch to high fructose corn syrup (HFCS). Alpha-amylase is used to liquefy starch slurry so that the starch is solubilized and readied for the next steps. Alpha-amylase splits the large amylose and amylopectin molecules that make up the starch into soluble dextrin fragments.
Starch Processing
www.genencor.com
What is protein engineering?• Optimize the properties of enzymes/proteins through
changes in protein sequence. Asking enzymes to function more efficiently, in harsh conditions and last longer, etc.– Industrial applications– Chemical applications– Agriculture applications– Pharmaceutical applications– Many more
• Introduce new properties into enzyme. Go beyond the biological context.– Perform catalysis under completely foreign conditions– Catalyze what is not observed in nature– De novo design of enzyme function (difficult, although
progressing)
Which properties can we improve?
• Catalysts
• Immune Response
• Control Units
• Structural Scaffolds
Biological Functions:
• Catalyze many classes of reactions
• High specificity / selectivity
• Low energy input
GOOD Properties:
• Low activities for non-natural reactions
• Marginally stable
• Industrial conditions
• Low expression in heterologous hosts
BAD Properties:
Chemical Synthesis
Bioremediation
Chemical Sensors
Pharmaceuticals
Metabolic Control
Engineer
• Enzymes are inherent designable• Only “optimized” in the context of their biological systems• Naturally evolvable and evolved. Divergent evolution is nature’s way of generating diversity
These enzymes also function using the same catalytic residues (serine, histidine and aspartic acid). However, they catalyze the cleavage of different substrates (divergence)
P1 (Cleavage site) Large Small Positivee.g. Tryptophan Alanine Lysine
Nature evolve new sequences, behaviors and biological functions
Evolution--mutation, recombination, and natural selection--has generated a fantastic array of functional molecules
We can use the evolution algorithm to create “new” enzymes
Cirino
Evolutionary approaches are generally more powerful as long as a suitable search strategy is available.
number of mutations
1
10
10
10
-1
-2
-3
wild-type enzymeperforming natural
function
wild-type enzymeperforming new function
Cirino
We can design enzymes to accommodate our needs:- novel specificity / activity / stability
New functions can be achieved if:
1) It is physically possible2) It is evolutionarily feasible
(a path of functional enzymes exists in sequence space)
3) We can generate genetic diversity
4) We can select / screen for improvements
Sequence Space
2 residues
2 amino acids
3 residues 2 amino acids
500 residues 20 amino acids
20500 = 10650 sequences
19500 ~ 104 dimensions
Huge:
Highly dimensional:
Cirino
{Sequence Space}
Fit
nes
s
The Fitness Landscape
starting point
Finish point
local maxima
Fitness LandscapeThe mapping from genotype (target sequence) to phenotype (fitness; as measured in the experiment). Directed evolution is an optimization on the fitness landscape.
Arnold, Nat. Mol Cell Bio. 2009
Evolution is a random walk on a fitness landscape in sequence space, survival of the fittest
Cirino
Ruggedness in Proteins
Wild-type
Improved Mutant AB
Intermediate Mutant A
Intermediate Mutant B
Fitness
Fitness
The Red and Green Residues are Interacting
Fitness
Mutations can be beneficiary and can also be deleteriousTwo single mutations may each be deleteriousBut the combination of two may be beneficiary
Cirino
Success requires an intelligent working strategy!
Protein space:sequences for a300-amino protein
20300 impossibly largeand mostly empty
Search technologies are limited (~10 clones)
Beneficial mutationsare rare
6-9
Local exploration of sequencespace around existingfunctional proteins (1-2 amino acid substitutions)
Generating new, useful proteins requiresaccumulation of multiple mutations
Rapid screen (or selection) to identify smallimprovements
Cirino
Example -
Glyphosate: Very effective herbicide-Toxic towards most crops-Decrease crop yield
Herbicide tolerance: more than 75% of genetically modifiedplants are engineered for herbicide tolerance
Acetyl-Glyphosate: Not herbicidal
What can directed evolution do? Examples
DNA Shuffling to improve activity
Graduate Improvement of Catalytic Properties
A very robust enzyme applications in transgenic plant development
GFP Can be evolved to Other FPs
Tsien, Annu Rev. Biochem 1998
Red Fluorescent Protein (RFP)
Isolated from nonbioluminescent reef corals
Tsien, Nature Biotechnology, 2000
Classic mutagenesisChemical mutagenesis with mutagens. Works mostly on the whole cell level.
Deamination with nitrous acid
C U pairs with A instead of GA H pairs with C instead of T
Alkylation with EMS or nitrosoguanidine
G 6Eq pairs with T instead of C
Classic mutagenesis by radiation
UV crosslinks two neighboringpyrimidine bases. Errors andmutations are introduced duringDNA repair by host enzymes.
Directed Mutagenesis/Evolution Strategies• Most mutagenesis strategies rely on PCR.• Most times knowing the gene sequence is essential.• Site-directed mutagenesis (point mutation).
– Introduce specific mutation at a specified location in the gene • Random mutagenesis .
– Introduce random mutations at a specified position or throughout the gene of interest
• DNA shuffling.– Shuffle mutants of the same gene to achieve diversity
• DNA family shuffling.– Shuffling homologous genes from different species to explore large
sequence space• Genome Shuffling.
– Shuffling genomes through homologous recombination.
Site-directed mutations • To probe the importance of a specific amino acid in a
protein sequence.– Is the amino acid involved in catalysis?– Does the amino acid dictate specificity?– Is the amino acid essential for protein function?
• If the importance of the amino acid is known (from crystal structure, biochemical analysis)– Mutate the amino acid to enhance enzyme properties– Alter the size of the amino acid to tighten/loosen enzyme
substrate specificity
Subtilisin stability can be improved by point mutations
• What is subtilisin:– A serine protease from Bacillus
bacteria– Broadly specific for proteins that
commonly soil cloth– Used widely as the “enzymatic
additive” in commercial laundry detergents
• Wild type subtilisin can be easily inactived– In the presence of bleach, the
protein becomes inactive very quickly (~90% inactivation)
– The inactivation is due to oxidation of the methionine at position 222 (M222)
Second Generation Subtilisin• M222 was systematically mutated to each of 19 other amino acids and
the stability of the mutant enzymes were investigated (Genencor). (Estell, JBC, 1985)
per
cen
t en
zym
e ac
tivi
ty
Time (min)
1M H2O2
Site directed mutagenesis is limited
• Site directed mutagenesis is limited in its scope. – Difficult to predict which substitution can be beneficiary.– More than one residue can contribute to enzyme activity and stability. – “Key” residues unknown. – The availability of the crystal structure helps, but does not allow a
reliable prediction of what/where the mutations should be.
• Protein engineers therefore need to generate all possible amino acid changes at one, or a combination of residues.
• How do we modify the PCR-based mutagenesis procedures to – 1) all possible mutations at a single position? – 2) introduce multiple mutations
Degenerate Oligonucleotides
5’ ACG GTC GAT GTA CCA GGG CCC AAC 3’
100% 100% 100% 100%
During normal primer synthesis,the desired nucleotide is addedat to the growing oligonucleotide.Each nucleotide pool is 100% pure.
N
25% A25% C25% G25% T
To make a degenerate primer,A mixed nucleotide pool is used in additional to the four pure pools.During DNA synthesis, N canbe added to the oligonucleotideinstead of one of the four purenucleotides.
5’ ACG GTC GAT GTA NNN GGG CCC AAC 3’
64 possible combinationscovering all 20 amino acidsincluding stop codons.
Saturation mutagenesis example
These authors found 5 residuesthat interact with the substrate directly.
Saturation mutagenesis were performedsimultaneously at all five positions.
Library size: 20X20X20X20X20
3.2 million possible combinations
The desired mutant contained mutationsin four of the five residues. The newenzyme property cannot be achieved with single residue mutations.
M. jannaschii TyrRS bound to tyrosineWang and Schultz, 2003
Error-prone PCR generate random point mutation(s)
Parentgene
Cirino
Error-prone PCR: Random Mutagenesis• Altering the PCR conditions to make it prone to errors during
amplification random incorporation of substitutions.
• Normal PCR reaction: MgCl2, 0.2mM dNTPs, template DNA, primers, DNA polymerase, thermal cycling (95C, 55C, 72C)– Taq polymerase error rate: 2 X 10-4
– pfu polymerase error rate: 7 X 10-7
• Error-prone PCR conditions which INCREASE error rates of Taq polymerase and accumulate mutations– Staggered dNTP concentration (0.2 mM dATP & dGTP, 1.0 mM
dCTP & dTTP)– Addition of MnCl2 (affects Taq error rate)– Increase the number of PCR cycle– Increase the length of molecule to be amplified
Cirino
NdeI BamHI
pET28 expression plasmid
Library of mutant expressionplasmids
Error-prone PCR Mutation and Amplification.
Cut PCR product with NdeI and BamHI, Purify insert library
Library creationGOI
Add primers contain restriction sites.
Screen fordesired properties
Error Prone PCR – Subtilisin Example
• Goal: To have subtilisin function in a nonaqueous solvent.
• Unlike the previous example, this property cannot be predicted and one has no idea where to start.
• Solution: error prone PCR. 10 successive rounds of mutagenesis were performed. In each round, the improved mutant was selected. The gene encoding the mutant serves as template for the next round of error prone PCR.
Chen and Arnold, PNAS, 1993, p5618
log scalechangein activity
Aiming for great sequence diversityIf the fitness landscape is rugged, point mutations alone are likely to lead to local optima. Point mutations are too gradual to allow the block changes that are required for continued sequence evolution
{Sequence Space}
Fit
nes
s
The Fitness Landscape
DNA shuffling recombines different mutants to allow greater sequence space exploration
Differentmutants from asingle gene
all combinationsof mutations
Cirino
DNA shuffling recombines mutants• DNA Recombination allows us to look at a larger portion of sequence
space (compared to what point mutagenesis allows).
• Those sequences which are being explored are already “solutions” (i.e., the sequences already correspond to fold and function, at least in another protein) reduction in search space
• Combines additive mutations and removes deleterious mutations (e.g., after several rounds of error-prone PCR)
• More likely to result in “new” functions (compared to accumulating single point mutations)
DNA Shuffling
1. Digest PCR products of homologous genes. Create pool of ssDNA fragments (short single strand DNA). Perform “primerless” PCR to
reassemble genes.
1.
3.
Cut and clone reassembled genes for expression.
2.
DNase I digestion
Genetic recombination assay
Wildtype Lac Zα on pUC 18 plasmid
Transform E. coli in presence of X-Gal
Lac Zα Mutants
75 b.p.
stop codons
Transform E. coli in presence of X-Gal
Stemmer, W. P. PNAS Vol. 91 pp. 10747-10751 1994
Genetic recombination assay (cont.)
Recombine mutant genes
white
white
white
blue
Transform E. coli in presence of X-Gal. Count blue and white colonies to measure recombination frequency.
Mutant 1
Mutant 2
Ratio of active recombinant colonies after assembling 50-100bp fragments was 24% (n=386)
Negative mutations are suppressedStarting mutants may have bothpositive and negative mutations.The net change of the mutant maybe positive negative mutationsmasked
DNA shuffling generates all possible combination of pointmutants large library
Backcrosses with wild type regioncan remove negative mutations.
Recombinants with largenumber of negative mutationsare eliminated from the next round of DNA shuffling.Positive mutants are selected to
go to the next round of shuffling.
Error prone PCR and Shuffling together are powerful protein engineering techniques
.
0 1 2 3 4 5 6
Generation
Re
lati
ve
ac
tiv
ity
1
10
20
wtrandom mutagenesis
recombination
random mutagenesis
Further shuffling
Family shuffling
Key: the starting genes are already nature’s solutions after natural evolution. They contain functional domains.
Example of family shufflingGoal: Increase the activity of cephalosporinase towards moxalactam (an antibiotic)
1. Select four related cephalosporinase from different species
Nature, 391, 1998, p288
2. Generate point mutants of each gene and shuffle the mutants of each gene separately (8 fold improvement in activity for each cephalosporinase.
3. Combine all the mutants from all four genes and perform family shuffling.The best mutants from family shuffling were 270-540 fold more active.
Cephalosporinase Family Shuffling
Genome Shuffling of Antibiotic Producing Streptomyces Strains
• Streptomyces are important industrial organisms for the production of antibiotics, anticancer drugs and other small molecule pharmaceutical compounds
• Examples: Tetracyclines, erythromycin, daunorubicin, mithramycin, lovastatin (Zocor)
• Streptomyces are soil borne, gram-positive bacteria that live under unfavorable conditions (starvation, among a population of other bacteria)
• The antibiotics are produced as secondary metabolites, mostly for self-defense.
Classic Mutagenesis is often used to find high-producing mutant strains
• How do we find a mutant strain of Streptomyces fradiae that produces higher amounts of antibiotic tylosin (Eli Lilly)?
• The directed evolution of microorganisms have traditionally been through the asexual process of classical strain improvement (CSI): sequential random mutagenesis and screening.
• The sequential mutagenesis are performed using mutagens and UV radiation.
• Most of times, the nature of the mutation is not important. (Black box approach)
• Although CSI is the method of choice in pharmaceutical companies, the process is inefficient and usually take decades and $$$$ to isolate a significantly improved mutant.
CSI vs. Genome shufflingIn CSI, during one round of mutagenesis, a large number of mutants can be recovered.Usually, only the best performing mutant strain will be selected and be subjected to additional mutagenesis. Genome shuffling takes all the mutants that show improvementover parent strain and shuffle the genomes together to generate combinations of mutations (mimicking the natural evolution of species). This process is analogousto DNA shuffling, but on a much more grand scale (genomes vs.genes).
Maxygen, Nature, 2002
How is genome shuffling possible?• Combine the cellular contents of several mutant strains
through protoplast fusion.
• During protoplast fusion, homologous recombination between homologous chromosomal regions will take place, allowing mutations to be passed from one strain to another.
• Fused protoplasts can be regenerated into single cells carrying shuffled genomes.
Genome Shuffling
Screening / Selecting Improved Variants(generally considered the hard part)
Key Point: You get what you screen for! And other properties or functions not selected for may be lost.
Some Concerns:• How well does your screen reflect your desired function?
•Sensitivity of the screen (what is the background – how well can you identify small improvements?)
•Screening capabilities / sampling of library / library size
•Equipment requirements (robotics, cell sorter, imaging)
How do we look for the desired mutant?
Selection vs screening• Selection is unambiguous (as long as all the control experiments have
been done). Easy to identify a mutant enzyme that has evolved to allow the bacteria to survive under certain selection criteria.
• For many enzymes, selection is difficult or impossible to setup (i.e. many enzyme functions are not essential to bacterial function, such as therapeutic proteins)
• Screening is the systematic method to find the mutant of choice from large library of mutants.– Screening cell phenotype if possible.– Color or fluorescence screening is efficient and easy.– The least efficient method is to analyze each sample manually for the
desired properties (e.g. product formation)
Screening – improving with technology
A few clones needed for screening / site-directedmutagenesis
hand-pick colonies by an unluckygraduate student
colorassay
product assay
Picked by arobot
Examples of selection• A mutant aminoacyl-tRNA synthetase that can
incorporate an unnatural amino acid in an antibiotic selection marker
• Improved antibiotic resistance enzymes that allow the cells to survive higher concentrations of an antibiotic
• A regulatory protein that can turn on gene expression when induced by a small molecule
Example of screening • Antibodies with high affinities for antigen.
• Design an small molecule inhibitor that tightly binds to and blocks a cell-surface protein or an enzyme inside a cell
• To generate a growth factor or hormone with increased affinity for its receptor
• Mutant enzyme catalyzing a novel reaction