multiplex assessment of protein variant abundance by ......2018/01/16  · the proteome17, we...

32
Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing Kenneth A. Matreyek 1,7 , Lea M. Starita 1,7 , Jason J. Stephany 1 , Beth Martin 1 , Melissa A. Chiasson 1 , Vanessa E. Gray 1 , Martin Kircher 1 , Arineh Khechaduri 1 , Jennifer N. Dines 2 , Ronald J. Hause 1 , Smita Bhatia 3 , William E. Evans 4 , Mary V. Relling 4 , Wenjian Yang 4 , Jay Shendure 1,5, *, Douglas M. Fowler 1,6, * 1 Department of Genome Sciences, University of Washington, Seattle, Washington, USA. 2 Department of Medical Genetics, University of Washington, Seattle, Washington, USA. 3 School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA. 4 Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA. 5 Howard Hughes Medical Institute, Seattle, Washington, USA. 6 Department of Bioengineering, University of Washington, Seattle, Washington, USA. 7 These authors contributed equally to this work. * Correspondence should be addressed to D.M.F ([email protected]) or J.S. ([email protected]). certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not this version posted January 16, 2018. . https://doi.org/10.1101/211011 doi: bioRxiv preprint

Upload: others

Post on 22-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Multiplex Assessment of Protein Variant Abundance by Massively ParallelSequencingKenneth A. Matreyek1,7, Lea M. Starita1,7, Jason J. Stephany1, Beth Martin1, Melissa A.Chiasson1, Vanessa E. Gray1, Martin Kircher1, Arineh Khechaduri1, Jennifer N. Dines2,Ronald J. Hause1, Smita Bhatia3,William E. Evans4, Mary V. Relling4,Wenjian Yang4, JayShendure1,5,*,DouglasM.Fowler1,6,*1DepartmentofGenomeSciences,UniversityofWashington,Seattle,Washington,USA.2DepartmentofMedicalGenetics,UniversityofWashington,Seattle,Washington,USA.3SchoolofMedicine,UniversityofAlabamaatBirmingham,Birmingham,Alabama,USA.4DepartmentofPharmaceuticalSciences,St.JudeChildren'sResearchHospital,Memphis,Tennessee,USA.5HowardHughesMedicalInstitute,Seattle,Washington,USA.6DepartmentofBioengineering,UniversityofWashington,Seattle,Washington,USA.7Theseauthorscontributedequallytothiswork.* Correspondence should be addressed to D.M.F ([email protected]) or J.S.([email protected]).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 2: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

ABSTRACTDetermining the pathogenicity of human genetic variants is a critical challenge, andfunctional assessment is often the only option. Experimentally characterizingmillions ofpossiblemissense variants in thousands of clinically important geneswill likely requiregeneralizable,scalableassays.HerewedescribeVariantAbundancebyMassivelyParallelSequencing(VAMP-seq),whichmeasurestheeffectsofthousandsofmissensevariantsofaproteinonintracellularabundanceinasingleexperiment.WeapplyVAMP-seqtoquantifythe abundance of 7,595 single amino acid variants of two proteins, PTEN and TPMT, inwhichfunctionalvariantsareclinicallyactionable.Weidentify1,079PTENand805TPMTsingleaminoacidvariantsthatresultinlowproteinabundance,andmaybepathogenicoralter drug metabolism, respectively. We observe selection for low-abundance PTENvariants in cancer, and our abundance data suggest that a PTEN variant accounting for~10% of PTEN missense variants in melanomas functions via a dominant negativemechanism. Finally, we demonstrate that VAMP-seq can be applied to other genes,highlightingitspotentialasageneralizableassayforcharacterizingmissensevariants.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 3: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

INTRODUCTIONEvery possible nucleotide change that is compatible with life is likely present in thegermline of a living human1. Some of these variants alter protein activity or abundance,and,consequently,mayimpactdiseaserisk.However,only~2%ofallpresentlyreportedmissensevariantshaveclinicalinterpretations2,3.Mostoftheremainingvariants,aswellasnearly allmissense variants not yet observed, are rare and cannot be interpreted usingtraditionalgeneticapproaches.Furthermore, computationalapproachesare insufficientlyaccurate. These limitations create a major challenge for the clinical use of genomicinformation. Somatic mutations further complicate this picture. Every cancer genomeharborsadditionalmissensevariants(~44onaverage,inonesurvey)4,anddistinguishingbetweendriverandpassengermutationsremainsadifficultchallenge.Deep mutational scans, which enable the simultaneous functional characterization ofthousands of missense variants of a protein, offer one potential solution to the variantinterpretationproblem5–7.Forexample,theeffectsofnearlyallpossiblesingleaminoacidvariants of the RING domain of BRCA1 on E3 ligase and BARD1 binding activity werequantified in a single study8. In another example, the effects of all possible single aminoacidvariantsofPPARγon theexpressionofCD36 in response todifferentagonistsweremeasured9. Inbothcases, the functionaldata led to theaccurate identificationofmostoftheknownpathogenicvariants, suggesting that it couldbeuseful in the interpretationofnewlyobservedvariants.Sofar,deepmutationalscans,includingtheBRCA1andPPARγscans,havereliedonassaysspecificforeachprotein’smolecularfunction.However,developingspecificassaysforeachofthethousandsofdisease-relatedproteinsisimpractical.Toovercomethischallenge,wesought to devise a functional assay that was both informative of variant effect andgeneralizabletomanyproteins.Webasedourassayonthefactthat,despitetheirdiversity,mostproteins shareakey requirement: theymustbeabundantenough toperform theirmolecularfunction.Variantscaninterferewiththesteady-stateabundanceofaproteinincells via a variety of mechanisms, including by diminishing thermodynamic stability,alteringpost-transcriptionalregulationorinterruptingtrafficking.Infact,asmuchas75%of the pathogenic variation in monogenic disease is thought to disrupt thermodynamicstabilityand,consequently,alterabundance10,11.Furthermore, low-abundancevariantsoftumor suppressors can lead to cancer12,13, while low-abundance variants of drug-metabolizingenzymescanalterdrugresponse14. Here,wedescribeVariantAbundancebyMassivelyParallelSequencing(VAMP-seq),whichmeasuresthesteady-stateabundanceofvariantsofaproteininculturedhumancells.Weapplied VAMP-seq to assess 3,946 single amino acid variants of the tumor suppressorPTENand3,649variantsoftheenzymeTPMT.Ourresultsrevealhowchangesinproteinbiophysical properties and interactions within and between proteins alter proteinabundance in cells.We identify 1,079 previously uncharacterized, low-abundance singleaminoacidvariantsofPTENthatarelikelytobepathogenic,and805TPMTsingleaminoacidvariants that are likely tobeunable to adequatelymethylate and thereby inactivatethiopurine drugs.We observe selection for low-abundance PTEN variants in cancer andidentifyadominantnegativemechanismforPTENvariantP38S,whichaccountsfor~10%ofPTENmissensevariantsobservedinmelanomas.Finally,wedemonstratethatVAMP-seqcan be applied to other clinically important proteins includingVKOR, CYP2C9, CYP2C19,MLH1,andPMS2.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 4: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

RESULTSMultiplexassessmentoftheabundanceofPTENandTPMTvariantsInspiredbyhigh-throughputmethodstoassessthestabilityofproteinvariantsinyeast15,bacteria16,andanearliermicroarray-basedassaythatprofiledproteinabundancesacrossthe proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that usesfluorescent reporters to measure the steady-state abundance of protein variants inculturedhumancells(Fig. 1).Eachcellexpressesasinglevariantdirectly fusedtoEGFP.ThestabilityofthevariantdictatestheabundanceoftheEGFPfusionand,accordingly,thegreenfluorescencesignalofthecell.Tocontrolforexpressionlevel,mCherryiseitherco-transcriptionallyorco-translationallyexpressedfromthesameconstruct.We first evaluated the suitability of VAMP-seq to quantify abundance of the tumorsuppressorproteinPTENandtheenzymeTPMT.EachwildtypeopenreadingframewasN-terminallytaggedwithEGFPandrecombinedintoasinglegenomiclocusofanengineeredHEK293Tcellline18.Wealsoconstructedcelllinesthatexpressedknownlow-abundancevariants of each protein. After inducing expression of the integrated variants withdoxycycline,weassessed theEGFP:mCherryratioby flowcytometry.We found thatcellsexpressing wild type PTEN or TPMT had ~5-fold higher EGFP:mCherry ratios than theknownlow-abundancevariants(Fig.2a;SupplementaryFig.1b,c).WenextappliedVAMP-seqtomeasurethesteadystateabundanceof thousandsofPTENand TPMT single amino acid variants in parallel. Barcoded, site saturation mutagenesislibrariesof eachproteinwereseparately recombined intoourengineeredHEK293Tcellline18,19.Cellsharboringeach libraryhadEGFP:mCherryratios that spanned the rangeofourwildtype(WT)andknownlow-abundancevariantscontrols(Fig.2a).Cellswereflowsorted into bins according to their EGFP:mCherry ratio, and high-throughput DNAsequencing was used to quantify each variant’s frequency in each bin. Finally, anabundancescorewascalculatedforeachvariantbasedonitsdistributionacrossthebins(Fig. 1; Supplementary Table 1).Abundancescores ranged fromaboutzero, indicatingtotallossofabundance,toaboutone,indicatingWT-likeabundance(Fig.2b).Abundancescorescorrelatedmodestlywellbetweenreplicates(meanr=0.68andmeanρ= 0.66 for both PTEN and TPMT; Supplementary Fig. 2). To improve accuracy, finalabundance scores and confidence intervals were computed from many replicateexperiments (PTEN, n = 8; TPMT, n = 8). The resulting data set describes the effects of3,946ofthe7,638possiblesingleaminoacidPTENvariantsand3,649ofthe4,655possibleTPMT variants (Fig. 2c, d; Supplementary Tables 2, 3). VAMP-seq-derived abundancescoreswere highly correlatedwith individually assessed variant abundance (n = 26, r =0.91,ρ=0.96 forPTEN;n=18, r=0.8,ρ=0.68 forTPMT;Supplementary Fig. 3a, b).Furthermore,PTENvariantabundancemeasuredusingfull-lengthEGFPorafifteenaminoacidsplit-GFPtag20wereinagreement(n=6,r=0.98,ρ=0.94;SupplementaryFig.1d).Finally, our abundance scores were consistent with 41 PTEN and 20 TPMT variantabundance effects assessed by western blotting (Supplementary Fig. 3c, d). Thus, weconcludedthatVAMP-seqaccuratelyquantifiessteady-stateproteinvariantabundance.For both proteins, the distribution of abundance scores was bimodal with peaks thatoverlappedWT synonyms and nonsense variants (Fig. 2b). Nonsense variants exhibitedconsistently low scores, except for those at the extremeN- or C-termini of each protein(Supplementary Fig. 3e). A larger fraction of PTENvariants had low abundance scoresthanTPMTvariants,possiblyreflecting the lower thermostabilityofPTEN(Tm=40.3 ˚C)

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 5: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

relative to TPMT (Tm = ~60 ˚C) (Supplementary Fig 3f)21,22. This inverse relationshipbetweenthefractionoflow-abundancevariantsandthermostabilityisconsistentwiththeresultsof adeepmutational scanofGFP,whichhasanevenhighermelting temperature(Tm=~78˚C)andhadveryfewvariantswitha largeeffectonfluorescence23,24. Medianvariant abundance scores at each position illustrated the tolerance of each position toaminoacidsubstitution(Fig.2g,h;SupplementaryTables4,5).Positionaltolerancewasinversely related to positional conservation (ρ = -0.25 and -0.60 for PTEN and TPMT,respectively;Fig.2i,j;SupplementaryFig.3g,h).PTENpositionswithinalphahelicesandbetasheetswerelesstoleranttosubstitution,whilethoseintheflexibleloopswerehighlytolerant(Fig.2k,l;SupplementaryFig.3i).TPMTpositionswithinthebetasheets,whichcomprisethecoreofprotein,werelesstolerantofsubstitution(SupplementaryFig.3j).ThermodynamicstabilityisadeterminantofvariantabundanceVariantscanalterproteinabundanceinsidecellsviaavarietyofmechanisms,includingbychanging thermodynamic stability. We compared our abundance scores to variousbiochemicalandbiophysicalfeaturesandfoundthathydrophobicpacking,whichisknowntoaffectthermodynamicstabilityinvitro25–27,wasakeydriverofabundance.MutationofWThydrophobic aromatic,methionine, or longnonpolar aliphatic amino acids producedthe largest decreases in abundance for both proteins (Fig. 3a). In fact, WT amino acidhydrophobicity was negatively correlated with abundance score (Fig. 3b, WT hydroΦ),whereas mutant amino acid hydrophobicity was positively correlated with abundancescore (MThydroΦ). Conversely,mutations ofWTamino acidswithhigh relative solventaccessibility (RSA), polarity (WT Polarity), and crystal-structure temperature factor (B-factor), all features associated with polar residues present on the protein surface, wereassociated with high abundance scores (Fig. 3b). Consistent with the importance ofhydrophobicpacking,positionswiththelowestaverageabundancescoreswerelargelyinthe solvent inaccessible interiors of each protein (Fig. 3c, d). Finally, PTEN abundancescorescorrelatedstronglywith invitromelting temperatures21 (n=5, r=0.97,ρ=0.90;SupplementaryFig.4a).Theseobservations,consistentbetweenPTENandTPMT,suggestthatvariantthermodynamicstabilityisamajordriverofvariantabundanceinvivo.Next,weexploredtheroleofpolarcontacts,usingthePTENstructuretoidentifyallside-chains predicted to form hydrogen bonds and ion pairs. Of the 76 positions potentiallyparticipating in these interactions,only22were intolerant tomutation (SupplementaryFig. 4b). These 22 intolerant positions largely clustered into discrete groups in three-dimensional space (Fig. 3e; Supplementary Fig. 4c). The groups highlighted regions ofPTEN particularly important for abundance, and often included positions distant inprimarysequence.Forexample,group2positions,alongwithS170,mediateinter-domaincontactsbetweenthePTENphosphataseandC2domains28,andwefindthatmutationsatthesepositions result in a loss of abundance (Fig. 3e).Mutations at thesepositions alsofrequently occur in different types of cancer28; our data suggests theymay compromisefunctionbyvirtueoftheirlowabundance.Similarly,lossofabundancefromabrogationofintra-domainpolarcontactsmayaccountforthehighfrequencyofmutationsatK66,Y68,orD107(group1)incancers(Fig.3e; SupplementaryFig.4d).TPMTlackedclustersofintolerant,polar-contactpositions,possiblybecause it isasmaller,singledomainproteinwithahighermeltingtemperature.InteractionswiththecellmembranemodulatePTENvariantabundanceThoughVAMP-seqdoesnotexplicitlyquerypost-translationalmodification, traffickingorpartnerbinding,eachcanhaveaprofoundimpactonabundance.Therefore,wesearchedforsignaturesofthesepropertiesinourabundancedata.PTENmediatestheremovalofthe

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 6: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

3’ phosphate from phosphatidylinositol 3,4,5-triphosphate (PIP3) to producephosphatidylinositol 4,5-diphosphate (PIP2) at the membrane29. Interaction with themembraneisaidedbyphospholipid-bindingpositionspresentinbothPTENdomains(Fig.3f)30,31. Furthermore, PTEN membrane binding and activity is negatively regulated byphosphorylationofitsunstructuredC-terminaltail29,32.ActivesiteorC-terminalregulatoryphosphositevariantshavebeenfoundtodecreaseactivity,reducemembranebindingandincrease abundance, hinting at the existence of a negative feedback mechanism thatdegradesmembrane-bound,activePTEN32,33.WethereforeaskedwhetheranyPTENvariantsincreasedabundance,perhapsbyalteringmembrane interaction. We identified 41 positions in PTEN that had mean abundancescores higher thanWT. 19 of these enhanced-abundance positions were in structurallyresolved regions, and 53% of them were within 7 Å of known phospholipid-bindingpositions.Incomparison,only13%ofallstructurallyresolvedPTENpositionswerewithin7 Å of phospholipid-binding positions (Supplementary Fig. 4e). Thus, positions withabundance-enhancing variants tended to be near themembrane-proximal face of PTEN,andincludedthoseimportantforbindingPIP3,PIP2orPI(3)P31,34,35(Fig.3f).Furthermore,phosphomimetic substitutions at the S385 PTEN C-terminal regulatory phosphositeexhibitedthehighestabundancescores,whereaspositivelychargedsubstitutionshadlowscores, supporting the impact of phosphorylation at this site on abundance(Supplementary Fig. 4f).Thus,manyof theenhanced-abundancevariantswe identifiedlikelydisruptPTENmembranelocalizationorPIP3phosphatasefunction.NewpotentiallypathogenicvariantsinPTENrevealedbyabundancedataInadditiontorevealingthebiochemicalandbiologicaldeterminantsofproteinabundance,VAMP-seqscorescanalsobeusedto identifypotentiallypathogenicvariants.Tosimplifycomparisons to clinical variant effects, we classified PTEN missense single nucleotidevariants (SNVs) as either low abundance, possibly low abundance, possibly WT-likeabundance, or WT-like abundance based on how each variant’s abundance score andconfidence interval compared to the distribution of WT synonym scores (Fig. 4a,SupplementaryFig.5a).Then,weanalyzedvariantspresentinpublicdatabasesofeithergermlineorsomaticvariationinthelightoftheseabundanceclassifications.HeterozygouslossofPTENactivityinthegermlinecancauseaspectrumofclinicalfindingsincludingmultiplehamartomas,carcinoma,andmacrocephaly,collectivelyknownasPTENHamartomaTumorSyndrome(PHTS)36,whichincludesCowdenSyndrome.Thereare216PTEN germline missense SNVs in ClinVar, a submission-driven database of variantsidentified primarily through clinical testing3. 41 of the 216 PTENmissense variants areannotatedaspathogenic,24ofwhichhadabundancescoresfromVAMP-seq.Ofthese24,15(62%)wereclassifiedaslowabundance(Fig.4b),asignificantlyhigherproportionthanthe24%ofallscoredmissensevariantsthatarelowabundance(Resamplingtest,n=24,P<0.0001;Fig. 4a; Supplementary Fig. 5b; Supplementary Table 6).Of the remainingninevariants,fourwerepossiblylowabundanceandthreewereactivesitevariants(H93R,G129E, andR130L)known tobe inactivewithout lossof abundance.The remaining twovariants(D24GandR234Q)weredistaltotheactivesiteandlikelyalterPTENfunctionbyan unknownmechanism37,38. Thus, VAMP-seq-derived abundance scores, combinedwithstructural knowledge of the PTEN active-site, reveal >90% of known PTEN pathogenicvariants.Wecouldnot formallyassess theVAMP-seq falsepositiveratebecausenoPTENvariantsare currently classified as benign. However, as has been done before9, we were able to

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 7: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

identifylikelynon-damagingvariantsbasedontheirfrequencyinthepopulation.GermlinePTEN variants cause Cowden Syndrome, a high-penetrance, dominantly-inheritedMendeliandisease,atarateofatleast~1per200,000individuals36,39.Thus,onaverage,aparticulardominantdamagingvariantshouldoccurlessthanonceinacohortof200,000individuals,eachharboringtwocopiesofPTEN,correspondingtoanallelefrequencyof2.5x 10-6. 21missense variants and 1 nonsense variant in the GnomADdatabase had allelefrequencies above this threshold. We excluded two of the variants, R130X and R173H,whichwereincancerdatabasesandthusarepossiblydamaging.Oftheremaining20likelynon-damagingvariants,16werescoredinourassayandall16wereclassifiedasWT-likeor possibly WT-like abundance (Supplementary Fig. 5c). Three of the variants, A79T,P354Q,andS294R,hadfrequencieshigherthan5x10-5,stronglysuggestingthattheyarenotdamaging2(Fig.4a).ThisanalysissuggeststhatthePTENabundancescoredatahaveaverylowfalsepositiverate.Anadditional41PTENvariantsareannotatedaslikelypathogenicinClinVar.Ofthese,22had abundance scores, 9 (41%) of which were classified as low abundance (Fig. 4c;Supplementary Fig. 5b). Thus, the likely pathogenic category also had more low-abundancevariants thanexpectedbasedonchance(Resamplingtest,n=22,P=0.0343;Supplementary Table 6). The 134 remaining ClinVar variants are of uncertainsignificance.81ofthesevariantshadabundancescores,and23(28%)werelowabundance(Fig.4d).Byprovidingadditionalevidencethatsupportspathogenicity,ourabundancedatacouldbeusedtoaltervariantclinicalinterpretations40.Forexample,ofthe9low-abundance,likelypathogenic ClinVar variants, one variant (I335K) could be reclassified as pathogenic byadding the low-abundance classification to publically available information(Supplementary Fig. 6)40.Furthermore,23variantsofuncertainsignificancealongwith263 possible but not-yet-observed missense variants are low-abundance and couldpotentiallybemoved to the likelypathogenic categoryonceobserved in the appropriateclinical setting (Supplementary Table 7). However, we currently lack clinical data forthesevariants,andtheabsenceofbonafidebenignPTENvariantsmeansthatwecannotformally assess the specificity of our assay. Identifying best practices for integrating ourPTEN variant abundance measurements into clinical practice will likely require furtherstudyanddiscussionbythecommunity.AbundancedataidentifiesmechanismsofPTENdysregulationincancerSomaticinactivationofPTENbymissensevariationisanimportantcontributortomultipletypesofcancer41.WeaskedwhetherVAMP-seqderivedabundancedatacouldyieldinsightinto thecontributionofpreviouslyreportedsomaticPTENvariants to tumorigenesis.WecollectedPTENmissenseornonsensevariantsfoundinTheCancerGenomeAtlas42andtheAACRProjectGENIE43, andcompared theobserved frequenciesofPTENvariantsofeachabundance class to the expected frequencies based on cancer type-specific nucleotidemutation spectra42. We observed significantly more low-abundance PTEN variants thanexpected foreverycancer typeanalyzed(Resampling test,allPvalues≤0.0002;Fig. 4e;seeSupplementaryTable8forp-values).ThispatternsuggeststhatselectionforPTENinactivationthroughloss-of-abundanceisacommononcogenicmechanism.SomeinactivevariantsofPTENsuchasC124S,G129E,R130G,andR130Qareofwildtype-likeabundance.TheseinactivevariantsexertadominantnegativeaffectonPTENactivity,leadingtoenhancedAktphosphorylationandenhancedtumorigenesisinmousemodels44–46.Asexpected,knowndominantnegativevariantshadWT-likeorhigherabundance,with

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 8: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

C124S,R130GandG129Eexhibitingabundancescoresof1.21,1.08,and0.76,respectively.Known dominant negative variants were also significantly enriched in cancer, largelydrivenbythehighfrequenciesofR130GandR130Q44,47(Fig.4e;SupplementaryFig.5d;SupplementaryTable8forp-values).Unlikeforeveryothercancertypeweexamined,melanomalackedanenrichmentofknowndominant negative variants. However, the P38S variant was significantly enriched,accountingfor10.4%ofPTENmissensevariants(Resamplingtest,n=77,P<0.0001;Fig.4e;SupplementaryFig. 5d; see SupplementaryTable8 forp-values).P38Shasbeenpreviously observed in melanoma cancer cell lines, yet had never been functionallycharacterized48.P38ShadaslightlyhigherabundancescorethanWT(1.13) inourassay.BasedonitsprevalenceinmelanomaanditsWT-likeabundance,wehypothesizedthat itmightexertadominantnegativeeffect.Indeed,wefoundthatP38S,likeknowndominant-negativevariants,droveincreasedAktphosphorylationinthepresenceofendogenouswildtype PTEN (Fig. 4f; Supplementary Fig. 5e). In contrast, computational predictorssuggested that P38S is thermodynamically unstable, highlighting the utility of VAMP-seq(SupplementaryFig.5f).Overall,ourresultsshowthatloss-of-abundanceisanimportantmechanism by which PTEN variants cause cancer and reveal a new dominant negativevariant,P38S,thatisover-representedinmelanoma.ImplicationsofTPMTabundanceforthiopurinedrugtreatmentTPMT is one of 17 pharmacogeneswhose genotype can be used to guide drug dosing49.FunctionalTPMTisrequiredtometabolizethiopurinedrugssuchas6-mercaptopurine(6-MP) and its prodrug, azathioprine. Thiopurine drugs are used to treat individuals withleukemia, rheumatic disease, inflammatory bowel disease, or rejection in solid organtransplant. Increasedexposure to thiopurinescauses treatment interruptionoreven life-threateningmyelosuppressionandhepatotoxicity.ThreeknownnonfunctionalvariantsofTPMT, A80P, A154T and Y240C, are found at high allele frequencies (combined MAF =0.066)andareresponsiblefor95%ofdecreased-functionalleles inthepopulation50.Thedrugtoxicitytocarriersofthesevariantscanbeexplained,atleastinpart,bythefactthatthey result in lower abundance of TPMT relative towild type14,22 (Fig. 5a). Accordingly,bothabundancescores(Fig5a)andindividuallyassessedEGFP:mCherryvalues(Fig.2a;SupplementaryFig.1c)werelowerforthesenonfunctionalvariantscomparedtotheWTallele.Sinceourabundancescoresaccuratelyidentifyknowndecreased-functionalleles,weanalyzedtheabundanceofrareTPMTvariantsofunknownfunction.Inaclinicalstudyofpatientswithacutelymphoblasticleukemia(ALL),884patientswereanalyzedbyexomearray.278ofthesepatientsalsohadexomesequencingdataavailable.Red blood cell (RBC) TPMT activity and 6-MP dose intensity, the dose at which eachindividual became sensitive to 6-MP, were also measured51. The three known, high-frequencydrug sensitivityvariantswere identified, alongwith four rarevariants: S125L,Q179H,R215HandR226Q(combinedMAF<0.0053).ThemeanRBCactivityofindividualsheterozygous for Q179H, R215H, and R226Q was lower than the mean activity ofindividualswithoutTPMTvariants,buthigherthantheactivityofindividualsheterozygousfor thehigh-frequencydrugsensitivityvariants (Supplementary Fig. 7a, b). Incontrast,RBCactivityforS125LwashigherthanWT.Thiopurinedoseintensity,whichisaffectedbyTPMTactivity,ishighlycorrelatedwithvariantabundance(r=0.99,ρ=1,n=6;Fig.5b;Supplementary Fig. 7c). Though their RBC activity varied over a wide range, theindividualsheterozygousfortheserarevariantstoleratedahighermeandoseof6-MPthanindividualsheterozygousfortheknownsensitivityvariants.Additionally,eachofthefourrare variants are surface accessible, and they are classified asWT-like based VAMP-seq

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 9: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

abundance data. Individual assessment confirmed that these rare alleles do not affectabundance (Supplementary Fig. 7d). Thus, we suggest that S125L, Q179H, R215H andR226Qmaynotbedecreased-functionvariants.Sequencingofthehumanpopulation2andindividualsintoleranttothiopurinedrugs52hasrevealedanadditional120rareTPMTvariants.Thesevariants(MAFrange=0.000004–0.00066)arecarried, inaggregate,by0.2%of thepopulation2,but the impactofmostofthesevariantsonTPMTactivityandabundanceareunknown53.Wemeasuredabundancescoresfor94ofthesevariants,classifyingfourteen(15%)aslowabundanceandeighteen(19%)aspossibly lowabundance.Whentheseoranyoftheother365missensevariantsweclassifiedasloworpossiblylowabundanceareidentifiedintheclinic,wesuggestthattheymaybedecreased-functionvariantsand that the risk for thiopurine toxicitymaybeelevated.Dosereductionorclosermonitoringcouldminimizetoxicityanddirectlyimproveoutcomes50.GeneralutilityofVAMP-seqforassessingvariantabundanceTodemonstrate thatVAMP-seq is applicable todiverseproteins,we evaluatedwild typeand known or predicted low-abundance variants for an additional set of sevenpharmacogenes or “clinically actionable” genes54,55 (Supplementary Table 9). ForCYP2C9,CYP2C19,andVKOR,wefoundlargedifferencesintheEGFP:mCherryratiosofthewild type and known or predicted low-abundance missense variants (Fig. 6), whereasMLH1andPMS2yielded smallerdifferences. For these fiveproteins,VAMP-seq couldbeused to test variant effects on abundance. Furthermore,~52%ofhumanproteins testedyieldedatleastasmuchfluorescenceasMLH1whenexpressedasN-terminalEGFPfusionsin a genome-wide screen17, suggesting that many human proteins are compatible withVAMP-seq (Supplementary Fig. 8). However, preliminary experiments for BRCA1 andLMNAresultedinlowEGFPsignalornodifferenceintheEGFP:mCherryratiobetweenwildtypeandknownlow-abundancevariants(Fig.6anddatanotshown).Thus,VAMP-seqwillnotbeapplicableinallcases.Inparticular,proteinsthataremarginallystablelikeBRCA1,make large complexes like LMNA, or are secreted and therefore break the link betweenvariantgenotypeandphenotypearenotamenabletoVAMP-seq.DISCUSSIONVAMP-seq is a generalizablemethod for multiplexmeasurement of steady-state proteinvariant abundance. Since alterations in abundance may account for a large fraction ofknown pathogenic variation10,11, an important application of VAMP-seq may be to aidclinical geneticists in understanding the effects of newly discovered missense variants.Indeed,theAmericanCollegeofMedicalGeneticssuggeststhatwell-establishedfunctionalassayscanprovidestrongevidenceofpathogenicity40.Thus, in thecontextofmonogenicdiseaseswhereproteininactivationispathogenic,VAMP-seq-derivedabundancedatacanhelptoidentifypathogenicvariants.TheutilityofVAMP-seqforthispurposeishighlightedbythefactthat62%ofknownPTENpathogenicmissensevariantswereoflowabundance.Ifotherproteinsyieldedsimilarresults,VAMP-seqcouldprovideevidenceofpathogenicityforgreater thanhalfof thepathogenicmissensevariantswewilleventually findasmorehumangenomesaresequenced.Besides the known PTEN pathogenic missense variants, we also identified 1,064 low-abundancePTENsingleaminoacidvariants thatwould likelyconferan increasedriskofPTENHamartomaTumorSyndrome.Additionally,weidentified805low-abundancesingleamino acid TPMT variants, which would likely require an altered drug dosing. Our

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 10: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

prospective functionalcharacterizationof these loss-of-functionvariants,available inourinteractiveweb interface couldenable increased cancer surveillance in the caseofPTENcarriersorpreventdrugtoxicityinthecaseofTPMTcarriers.Interpretationofsomaticvariation ismoredifficult,but functionaldatacanrevealdrivervariantsand,therefore,potentialtreatments.Forexample,variationinPTEN,presumablyresulting in PTEN loss-of-function, is associatedwith increased sensitivity to PI3K, AKT,andmTOR inhibitors, and decreased sensitivity to receptor tyrosine kinase inhibitors56.Our PTEN abundance data reveal many loss-of-function variants, which could help toclarifythelinkbetweenPTENinactivationandaltereddrugsensitivity,andassuchmightinformcancertreatment.Furthermore,aidedbyourabundancedata,weidentifiedP38Sasa candidate PTEN dominant negative variant in melanoma. We showed that cellsexpressing theP38Svarianthaveelevated levelsofactivatedAKT, supporting thenotionthatP38Sactsinadominantnegativefashion.SincetheknowndominantnegativevariantsG129E and C124S result in exacerbated oncogenic phenotypes in mice44,46, P38S statusmighthelptopredicttumoraggressiveness.Despiteitsutility,VAMP-seqhaslimitations.BottlenecksinourlibrarygenerationmethodwerethemainculpritintheabsenceofapproximatelyhalfofallpossiblePTENvariantsinthe final data set. In the future, early validationof libraryqualityusingdeep sequencingalongwithutilizationof otherwell-validated library generationmethods9 could improvecoverage.Additionally,likeanyexperimentalassay,VAMP-seqabundancedataissubjecttouncertainty.Toaddress this concern,wequantified theuncertaintyassociatedwitheachvariant’sabundancescore.Wesuggestthatabundancescoreuncertaintyshouldbetakeninto consideration, as we did when classifying variant abundance. VAMP-seq relies onfusionoftheproteinofinteresttoEGFP.WeshowedahighconcordancebetweenVAMP-seq abundance data and abundance as measured by other methods, but this might notalways be the case. Furthermore, VAMP-seq cannot yield insight into variants that arepathogenic because of reduced enzymatic activity, altered localization, or effects onsplicing.Thus,whileVAMP-seqabundancedataisusefulforprovidingevidenceofvariantpathogenicity,itshouldnotbeusedtoconcludethatavariantisbenign.In addition to providing evidence for clinical variant interpretation, VAMP-seq data canalso yield insight into the biophysical and biochemical features that influence proteinabundance and function inside the cell. Proteins with single functions and limited post-translationalregulation,suchasTPMT,yieldvariantabundanceprofilesthatlargelyreflectmolecular determinants of folding and thermodynamic stability. Alternatively, proteinswithmultiplefunctions,intramolecularinteractionsandlevelsofregulation,suchasPTEN,yield abundance profiles that are a composite of these many factors. For example, weobserved that variants at PTEN positions known to influence membrane interactiongenerallyresulted inelevatedabundance.Thiseffectcouldbeduetoanegative feedbackmechanism wherein membrane-associated, active PTEN is particularly susceptible todegradation32,33.Thus, furtherstudyofhigh-abundancePTENvariantsmightrevealnovelfeaturesofPTENmembraneinteraction.ProteinssuchasSrcandEGFRarealsobelievedtopossess negative feedbackmechanisms regulating abundance33, and are thus high-valuetargetsforVAMP-seq. WesuggestthatgeneralizableassayslikeVAMP-seqareapromisingwaytounderstandthefunctional effects of missense variation at scale. In addition to demonstrating itseffectiveness forPTENandTPMT,weprovidepreliminaryevidencethatVAMP-seqcouldbeapplied tomanyotherclinicallyrelevanthumanproteins.VAMP-seqalsoavoids time-

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 11: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

intensivestepslikeengineeringknockoutsofeachgeneofinterest,whichcanberequiredforsomefunctionalassays.Furthermore,repeatingVAMP-seqassaysindifferentcelllinescould reveal cell-type specific regulation of variant abundance. Comparing variantabundance data in wild type and chaperone knockout cells could reveal what makes aprotein a chaperone client. Combining VAMP-seq with small molecule modulators ofchaperoneorproteindegradationmachinerymayevenrevealvariant-specifictreatmentsthatcouldrescue low-abundancevariants.Thus,VAMP-seqgreatlyexpandsourabilitytomeasure the impact of missense variants on abundance, a generally important,fundamentalpropertythatunderliesproteinfunction.ACKNOWLEDGEMENTSWe thank Jason Underwood and Katy Munson of the University of Washington PacBioSequencingServicesforassistancewithlong-readsequencing;AnhLeithoftheUniversityofWashingtonFoegeFlowLabandLemuelGitariandDonnaPrunkardoftheUniversityofWashingtonPathologyFlowCytometryCoreFacility for assistancewith cell sorting; andBrian Shirts and Colin Pritchard in the University of Washington Department of LabMedicineforadvice.TheauthorswouldliketoacknowledgetheAmericanAssociationforCancer Research and its financial andmaterial support in the development of the AACRProjectGENIEregistry,aswellasmembersoftheconsortiumfortheircommitmenttodatasharing.Interpretationsaretheresponsibilityofstudyauthors.FUNDINGSTATEMENTThis work was supported by the National Institute of General Medical Sciences(1R01GM109110 and 5R24GM115277 to D.M.F., P50GM115279 to M.V.R. and W.E.E.,NationalCancer InstituteR01CA096670 toS.B.andP30CA21765 toM.V.R.)andanNIHDirector’sPioneerAward(DP1HG007811-05toJ.S.).K.A.M.isanAmericanCancerSocietyFellow(PF-15-221-01),andwassupportedbyaNationalCancerInstituteInterdisciplinaryTrainingGrant in Cancer (2T32CA080416). J.N.D. is supported by aNational Institute ofGeneral Medical Sciences Training Grant (T32GM007454). J.S. is an Investigator of theHowardHughesMedicalInstitute.CONTRIBUTIONSD.M.F.,J.S.,K.A.M.,andL.M.Sconceivedof,designedandmanagedtheexperimentsandanalyses,andwrotethemanuscript;J.J.S.andB.M.clonedexpressionconstructsandlibrariesandpreppedandperformedNGSsequencing;K.A.M,M.A.C.andA.K.providedconstructsanddataforadditionaldiseasegenesandpharmacogenes;M.K.wrotethescriptstoextractbarcodesandvariableregionsfromlong-readsequences;J.N.D.assistedinusingtheACMGguidelinestoreclassifyPTENvariants;R.J.H.providedconstructsforTPMTexperiments;V.E.Gdesignedthewebsite;andS.B.,W.E.E,M.V.R.,andW.Y.providedclinicaldataforTPMTcomparison.COMPETINGFINANCIALINTERESTSTheauthorsdeclarenocompetingfinancialinterests.REFERENCES1. Shirts,B.H.,Pritchard,C.C.&Walsh,T.Family-SpecificVariantsandtheLimitsof

HumanGenetics.TrendsMol.Med.22,925–934(2016).2. Lek,M.etal.Analysisofprotein-codinggeneticvariationin60,706humans.Nature

536,285–291(2016).3. Landrum,M.J.etal.ClinVar:Publicarchiveofrelationshipsamongsequence

variationandhumanphenotype.NucleicAcidsRes.42,980–985(2014).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 12: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

4. Lawrence,M.S.etal.Mutationalheterogeneityincancerandthesearchfornewcancer-associatedgenes.Nature499,214–8(2013).

5. Fowler,D.M.,Stephany,J.J.&Fields,S.Measuringtheactivityofproteinvariantsonalargescaleusingdeepmutationalscanning.Nat.Protoc.9,2267–2284(2014).

6. Gasperini,M.,Starita,L.&Shendure,J.Thepowerofmultiplexedfunctionalanalysisofgeneticvariants.Nat.Protoc.11,1782–1787(2016).

7. Manolio,T.A.etal.CommentaryBedsideBacktoBench:BuildingBridgesbetweenBasicandClinicalGenomicResearch.Cell169,6–12(2017).

8. Starita,L.M.etal.MassivelyParallelFunctionalAnalysisofBRCA1RINGDomainVariants.Genetics200,413–422(2015).

9. Majithia,A.R.etal.ProspectivefunctionalclassificationofallpossiblemissensevariantsinPPARG.Nat.Genet.48,1570–1575(2016).

10. Yue,P.,Li,Z.&Moult,J.Lossofproteinstructurestabilityasamajorcausativefactorinmonogenicdisease.J.Mol.Biol.353,459–473(2005).

11. Redler,R.L.,Das,J.,Diaz,J.R.&Dokholyan,N.V.ProteinDestabilizationasaCommonFactorinDiverseInheritedDisorders.J.Mol.Evol.82,11–16(2016).

12. Berger,A.H.,Knudson,A.G.&Pandolfi,P.P.Acontinuummodelfortumoursuppression.Nature476,163–169(2011).

13. Lee,M.S.etal.ComprehensiveanalysisofmissensevariationsintheBRCTdomainofBRCA1bystructuralandfunctionalassays.CancerRes.70,4880–4890(2010).

14. Tai,H.L.,Krynetski,E.Y.,Schuetz,E.G.,Yanishevski,Y.&Evans,W.E.EnhancedproteolysisofthiopurineS-methyltransferase(TPMT)encodedbymutantallelesinhumans(TPMT*3A,TPMT*2):mechanismsforthegeneticpolymorphismofTPMTactivity.Proc.Natl.Acad.Sci.U.S.A.94,6444–9(1997).

15. Kim,I.,Miller,C.R.,Young,D.L.&Fields,S.High-throughputanalysisofinvivoproteinstability.Mol.Cell.Proteomics12,3370–8(2013).

16. Klesmith,J.R.,Bacik,J.-P.,Wrenbeck,E.E.,Michalczyk,R.&Whitehead,T.A.Trade-offsbetweenenzymefitnessandsolubilityilluminatedbydeepmutationalscanning.Proc.Natl.Acad.Sci.U.S.A.114,2265–2270(2017).

17. Yen,H.-C.S.,Xu,Q.,Chou,D.M.,Zhao,Z.&Elledge,S.J.Globalproteinstabilityprofilinginmammaliancells.Science322,918–923(2008).

18. Matreyek,K.A.,Stephany,J.J.&Fowler,D.M.Aplatformforfunctionalassessmentoflargevariantlibrariesinmammaliancells.NucleicAcidsRes.45,e102(2017).

19. Jain,P.C.&Varadarajan,R.Arapid,efficient,andeconomicalinversepolymerasechainreaction-basedmethodforgeneratingasitesaturationmutantlibrary.Anal.Biochem.449,90–8(2014).

20. Cabantous,S.,Terwilliger,T.C.&Waldo,G.S.Proteintagginganddetectionwithengineeredself-assemblingfragmentsofgreenfluorescentprotein.Nat.Biotechnol.23,102–107(2005).

21. Johnston,S.B.&Raines,R.T.ConformationalStabilityandCatalyticActivityofPTENVariantsLinkedtoCancersandAutismSpectrumDisorders.Biochemistry54,1576–1582(2015).

22. Wu,H.etal.Structuralbasisofallelevariationofhumanthiopurine-S-methyltransferase.Proteins67,198–208(2007).

23. Ward,W.W.,Prentice,H.J.,Roth,A.F.,Cody,C.W.&Reeves,S.C.SpectralPertrubationsoftheAequoreaGreen-FluorescentProtein.Photochem.Photobiol.35,803–808(1982).

24. Sarkisyan,K.S.etal.Localfitnesslandscapeofthegreenfluorescentprotein.Nature533,397–401(2016).

25. Zhou,H.&Zhou,Y.Quantifyingtheeffectofburialofaminoacidresiduesonproteinstability.Proteins322,315–322(2004).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 13: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

26. Kauzmann,W.SomeFactorsintheInterpretationofProteinDenaturation.Adv.ProteinChem.14,1–63(1959).

27. Rocklin,G.J.etal.Globalanalysisofproteinfoldingusingmassivelyparalleldesign,synthesis,andtesting.Science357,168–175(2017).

28. Lee,J.O.etal.CrystalstructureofthePTENtumorsuppressor:implicationsforitsphosphoinositidephosphataseactivityandmembraneassociation.Cell99,323–34(1999).

29. Song,M.S.,Salmena,L.&Pandolfi,P.P.ThefunctionsandregulationofthePTENtumoursuppressor.Nat.Rev.Mol.CellBiol.13,283–96(2012).

30. Nguyen,H.-N.etal.Anewclassofcancer-associatedPTENmutationsdefinedbymembranetranslocationdefects.Oncogene34,3737–3743(2015).

31. Walker,S.M.,Leslie,N.R.,Perera,N.M.,Batty,I.H.&Downes,C.P.Thetumour-suppressorfunctionofPTENrequiresanN-terminallipid-bindingmotif.Biochem.J.379,301–7(2004).

32. Das,S.,Dixon,J.E.&Cho,W.Membrane-bindingandactivationmechanismofPTEN.Proc.Natl.Acad.Sci.U.S.A.100,7491–6(2003).

33. Vazquez,F.,Ramaswamy,S.,Nakamura,N.&Sellers,W.R.PhosphorylationofthePTENTailRegulatesProteinStabilityandFunction.Mol.Cell.Biol.20,5010–5018(2000).

34. Wei,Y.,Stec,B.,Redfield,A.G.,Weerapana,E.&Roberts,M.F.Phospholipid-bindingsitesofphosphataseandtensinhomolog(PTEN):Exploringthemechanismofphosphatidylinositol4,5-bisphosphateactivation.J.Biol.Chem.290,1592–1606(2015).

35. Naguib,A.etal.PTENFunctionsbyRecruitmenttoCytoplasmicVesicles.Mol.Cell58,255–268(2015).

36. Hobert,J.a&Eng,C.PTENhamartomatumorsyndrome:anoverview.Genet.Med.11,687–94(2009).

37. Melbārde-Gorkuša,I.etal.ChallengesinthemanagementofapatientwithCowdensyndrome:casereportandliteraturereview.Hered.CancerClin.Pract.10,5(2012).

38. Staal,F.J.T.etal.AnovelgermlinemutationofPTENassociatedwithbraintumoursofmultiplelineages.Br.J.Cancer86,1586–91(2002).

39. Nelen,M.R.etal.NovelPTENmutationsinpatientswithCowdendisease:Absenceofcleargenotype-phenotypecorrelations.Eur.J.Hum.Genet.7,267–273(1999).

40. Richards,S.etal.Standardsandguidelinesfortheinterpretationofsequencevariants:ajointconsensusrecommendationoftheAmericanCollegeofMedicalGeneticsandGenomicsandtheAssociationforMolecularPathology.Genet.Med.17,405–423(2015).

41. Hollander,M.C.,Blumenthal,G.M.&Dennis,P.a.PTENlossinthecontinuumofcommoncancers,raresyndromesandmousemodels.Nat.Rev.Cancer11,289–301(2011).

42. Kandoth,C.etal.Mutationallandscapeandsignificanceacross12majorcancertypes.Nature502,333–9(2013).

43. AACRProjectGENIEConsortium.AACRProjectGENIE:PoweringPrecisionMedicinethroughanInternationalConsortium.CancerDiscov.7,818–831(2017).

44. Papa,A.etal.Cancer-AssociatedPTENMutantsActinaDominant-NegativeMannertoSuppressPTENProteinFunction.Cell157,595–610(2014).

45. Leslie,N.R.&Longy,M.InheritedPTENmutationsandthepredictionofphenotype.Semin.CellDev.Biol.52,30–38(2016).

46. Wang,H.etal.Allele-specifictumorspectruminptenknockinmice.Proc.Natl.Acad.Sci.U.S.A.107,5142–5147(2010).

47. Bonneau,D.&Longy,M.MutationsofthehumanPTENgene.Hum.Mutat.16,109–22

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 14: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

(2000).48. Aguissa-Touré,A.-H.&Li,G.GeneticalterationsofPTENinhumanmelanoma.Cell.

Mol.LifeSci.69,1475–91(2012).49. Hodges,L.M.etal.Veryimportantpharmacogenesummary.Pharmacogenet.

Genomics21,152–161(2011).50. Relling,M.Vetal.ClinicalPharmacogeneticsImplementationConsortiumGuidelines

forThiopurineMethyltransferaseGenotypeandThiopurineDosing:2013Update.Clin.Pharmacol.Ther.93,324–325(2013).

51. Liu,C.etal.GenomewideApproachValidatesThiopurineMethyltransferaseActivityIsaMonogenicPharmacogenomicTrait.Clin.Pharmacol.Ther.101,373–381(2017).

52. Appell,M.L.etal.Nomenclatureforallelesofthethiopurinemethyltransferasegene.Pharmacogenet.Genomics23,242–248(2013).

53. Hamdan-Khalil,R.etal.Invitrocharacterizationoffournovelnon-functionalvariantsofthethiopurineS-methyltransferase.Biochem.Biophys.Res.Commun.309,1005–1010(2003).

54. Kalia,S.S.etal.Recommendationsforreportingofsecondaryfindingsinclinicalexomeandgenomesequencing,2016update(ACMGSFv2.0):apolicystatementoftheAmericanCollegeofMedicalGeneticsandGenomics.Genet.Med.19,1–7(2016).

55. Relling,M.etal.NewPharmacogenomicsResearchNetwork:AnOpenCommunityCatalyzingResearchandTranslationinPrecisionMedicine.Clin.Pharmacol.Ther.0,1–6(2017).

56. Dillon,L.M.&Miller,T.W.TherapeutictargetingofcancerswithlossofPTENfunction.Curr.DrugTargets15,65–79(2014).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 15: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure1|OverviewofVariantAbundancebyMassivelyParallelSequencing(VAMP-seq).AmixedpopulationofcellseachexpressingoneproteinvariantfusedtoEGFPiscreated.Thevariantdictatestheabundanceofthevariant-EGFPfusionprotein,resultinginarangeofcellularEGFPfluorescencelevels.Cellsarethensortedintobinsbasedontheirleveloffluorescence,andhighthroughputsequencingisusedtoquantifyeveryvariantineachbin.VAMP-seqscoresarecalculatedfromthescaled,weightedaverageofvariantsacrossbins.Theresultingsequence-functionmapsdescribetherelativeintracellularabundanceofthousandsofproteinvariants.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 16: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure2|VAMP-seqabundancescoresforPTENandTPMT.a,FlowcytometryprofilesforPTEN(left)andTPMT(right),withWT(red),knownlow-abundancevariantcontrols(blue),andthevariantlibraries(gray)overlaid.Binthresholdsusedtosortthelibraryareshownabovetheplots.Eachsmoothedhistogramwasgeneratedfromatleast1,500recombinedcellsfromcontrolconstructs,andatleast6,000recombinedcellsfromthelibrary.b,VAMP-seqabundancescoredensityplotsforPTEN(left)andTPMT(right)nonsensevariants(bluedottedline),synonymousvariants(reddottedline),andmissensevariants(filled,solidline).Themissensevariantdensitiesarecoloredasgradientsbetweenthelowest10%ofabundancescores(blue),theWTabundancescore(white),andabundancescoresaboveWT(red).c,d,HeatmapofPTEN(c)andTPMT(d)abundancescores,coloredaccordingtothescaleinb.Variantsthatwerenotscoredarecoloredgray.e,f,NumberofaminoacidsubstitutionsscoredateachpositionforPTENandTPMT.g,h,PositionalmedianPTENandTPMTabundancescores,computedforpositionswithaminimumof5variants,areshownasdots.Thegraylinerepresentsthemeanabundancescoreinathree-residueslidingwindow.i,j,PTENandTPMTposition-specificPSICconservationscoresareshownasdots,andthegraylinerepresentsthemeanPSICscorewithinathree-residueslidingwindow.k,l,PTENandTPMTdomainarchitectureisshown,withpositionsinalphahelicesandbetasheetscoloredcyanandpink,respectively.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 17: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure3|Biochemicalfeaturesinfluencingintracellularproteinabundance.a,ScatterplotsofvariantabundancescoresaveragedoveralltwentyWTresidues(left)ormutantresidues(right)forPTEN(x-axis)andTPMT(y-axis).b,AscatterplotofSpearman’srhovaluesforPTEN(x-axis)orTPMT(y-axis)abundancescorecorrelationswithvariousevolutionary(red),structural(blue),orprimaryproteinsequence(cyan)features.SeelegendofSupplementaryTable2forinformationregardingthesefeatures.c,d,PTEN(c,PDB:1d5r)andTPMT(d,PDB:2h11)crystalstructuresareshown.Chainsarecoloredaccordingtopositionalmedianabundancescoresusingagradientbetweenthelowest10%ofpositionalmedianabundancescores(blue),theWTabundancescore(white),andabundancescoresaboveWT(red).The20%ofpositionswiththelowestscoresareshownasasemi-transparentsurface.ThesubstratemimickingcompoundstartrateandS-adenosyl-L-homocysteinearedisplayedasmagentaspheres.e,Low-abundancePTENresidueswithpredictedhydrogenbondsorsaltbridgesareshownasstickswithasemi-transparentsurfacerepresentation.Residueswithin11Åofeachotherareclusteredandcoloredasdiscretegroups.Theresiduesineachgroupareidentifiedbynumber,followed,inparentheses,bythenumberoftimesanyvariantattheresidueisfoundintheCOSMICdatabase.f,Residueswithhighabundancescoresareshownassemi-transparentredspheres,andknownmembrane-interactingside-chainsshownasopaquecyanspheres.Residuesthatarebothmembrane-interactingandhavehighabundancescoresareshowningray.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 18: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure4|PTENvariantabundanceclassesacrossPHTSandcancer.a,AhistogramofPTENabundancescoresforallmissensevariantsobservedintheexperiment,withbarscoloredaccordingtoabundanceclassification.AbundancescoresforthreepossiblybenignvariantspresentintheGnomADdatabaseareshownasdotscoloredbyclassification.b,c,d,Abundancescorehistograms,coloredbyabundanceclassification,forPTENgermlinevariantslistedinClinVarasknownpathogenic(b),likelypathogenic(c),orvariantsofuncertainsignificance(d).e,PTENmissenseandnonsensevariantsinTCGAandtheAACRGENIEprojectdatabasesarearrangedbycancertype.ThetopbarineachcancertypepanelshowstheobservedfrequencyofvariantsineachabundanceclassasdeterminedusingVAMP-seqdata.Thebottombarineachcancertypepanelshowstheexpectedabundanceclassfrequenciesbasedoncancertype-specificnucleotidesubstitutionrates.Abundanceclassesarecoloredblue(low-abundance),lightblue(possiblylow-abundance),pink(possiblyWT-like),orred(WT-like).TheP38Svariantisadditionallycoloredwithyellowstripes.ThefourknownPTENdominantnegativevariantsarecoloredyellow.Variantsnotscoredintheexperimentarecoloredgrey.f,AwesternblotanalysisofcellsstablyexpressingWTormissensevariantsofN-terminallyHA-taggedPTEN.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 19: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure5|TPMTvariantabundanceclassesacrosspharmacogenomicsphenotypes.a,AhistogramofTPMTabundancescoresforallmissensevariantsobservedintheexperiment,withbarscoloredaccordingtoabundanceclassification(top).Abundancescoresforvariantspreviouslyidentifiedandcharacterizedinpatientsareshownasdotscoloredbyclassification.VariantsfoundingnomADatfrequencieshigherthan4x10-6arealsoshown(bottom).b,Ascatterplotofabundancescoreandmean6-MPdosetoleratedbyindividualsheterozygousforeachvariant.Doseintensityisthedoseatwhich6-MPbecomestoxictothepatientbeforethe100%protocoldoseof75mg/m2.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 20: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Figure6|Additionaldrug-anddisease-relatedgenesarecompatiblewithVAMP-seq.RepresentativeflowcytometryEGFP:mCherrysmoothedhistogramplotsforWT(red)andknownorpredicteddestabilizedvariants(blue)forVKOR,CYP2C9,CYP2C19,MLH1,PMS2,andLMNA.Eachsmoothedhistogramwasgeneratedfromatleast1,000recombinedcells.

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 21: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

ONLINEMETHODSGeneralreagents,DNAoligonucleotidesandplasmids.Unlessotherwisenoted,allchemicalswereobtainedfromSigmaandallenzymeswereobtainedfromNewEnglandBiolabs.E.coliwereculturedat37°CinLuriaBroth.AllcellculturereagentswerepurchasedfromThermoFisherScientificunlessotherwisenoted.HEK293TcellsandderivativesthereofwereculturedinDulbecco’smodifiedEagle’smedium(DMEM)supplementedwith10%fetalbovineserum(FBS),100U/mLpenicillin,and0.1mg/mLstreptomycin.Inductionmediumwasfurthermoresupplementedwith2μg/mLdoxycycline(Sigma-Aldrich).Cellswerepassagedbydetachmentwithtrypsin–EDTA0.25%.AllsyntheticoligonucleotideswereobtainedfromIDTandcanbefoundinSupplementaryTable10.Allnon-libraryrelatedplasmidmodificationswereperformedwithGibsonassembly57.ThePTENopenreadingframewasobtainedfrom1066pBabepuroLPTEN,whichwasagiftfromWilliamSellers(Addgeneplasmid#10785),andcombinedwithadditionalpreviously-usedcodingsequences18tocreateattB-EGFP-PTEN-IRES-mCherry-562bgl.ThisplasmidwasmodifiedthroughsplittingoftheEGFPcodingsequencetocreateattB-sGFP-PTEN-IRES-mCherry-bGFP,whichwasusedinassessingfluorescenceratiosofWTormutantPTENusingthesplit-GFPformat20.TheblasticidinresistancegenewasobtainedfrompLentiCMVrtTA3Blast(w756-1),whichwasagiftfromEricCampeau(Addgeneplasmid#26429),andfusedC-terminallytomCherrytocreateattB-EGFP-PTEN-IRES-mCherry-BlastR.ThisconstructwasusedtocreatethelargepanelofindividuallytestedPTENvariants.TheampicillinresistancecassetteinattB-EGFP-PTEN-IRES-mCherry-562bglwasreplacedwithakanamycinresistancecassettetocreateattB-EGFP-PTEN-IRES-mCherry-562bgl-KanR,whichwasusedtoshuttlethemutagenizedPTENopenreadingframeinthelibrarygenerationprocess.ThePTENcodingregioninattB-EGFP-PTEN-IRES-mCherry-562bglwasreplacedtocreatetheconstructsusedtotestVKOR(IDTgBlock),MLH1,andLMNA.CYP2C9andCYP2C19plasmidswerealsocreatedusingthebackboneofattB-EGFP-PTEN-IRES-mCherry-562bglbyreplacingthePTENcodingsequencewithCYP2C9orCYP2C19ORFs(IDTgBlocks)andmovingtheEGFPtagtotheC-terminusoftheprotein.TheMLH1vectorwasadditionallymodifiedtocreateattB-EGFP-PMS2-2A-MLH1-IRES-mCherry,asMLH1co-expressionwasnecessarytoobservesignalwithEGFP-fusedPMS2.MLH1wasclonedfrompCEP9MLH1,whichwasagiftfromBertVogelstein(Addgeneplasmid#16458)58.PMS2wasclonedfrompSG5PMS2-wt,whichwasagiftfromBertVogelstein(Addgeneplasmid#16475)59.LMNAwasclonedfrompBABE-puro-GFP-wt-laminA,whichwasagiftfromTomMisteli(Addgeneplasmid#17662)60.pCAG-NLS-HA-Bxb1wasagiftfromPawelPelczar(Addgeneplasmid#51271)61.TheattB_mCherry_P2A_MCSplasmidwasbuiltfromthepcDNA5/FRT/TObackbone(ThermoFisher).mCherry_P2Awassynthesized(gBlocks,IDT)andEGFPamplifiedfrompHAGE-CMV-eGFP-N(giftfromAlejandroBalazs)usingprimerseGFP1and2wasaddedbyGibsonassembly.Wild-typeTPMT(NM_000367.3)wassynthesized(gBlocks,IDT)andclonedin-framewiththeEGFPbyGibsonAssembly.TheCMVpromoterwasreplacedwiththesynthesizedAttBsequence(gBlocks,IDT).ThefinalvectorwasshortedberemovingalloftheinterveningsequencebetweentheE.ColiOriandtheBGHpoly-AsignalthatfollowstheEGFP-XfusionbyinversePCRwithInv_attB_GPS_AscI_RandInv_attB_GPS_AscI_F,cuttingwithAscIandreligation.SingleaminoacidmutationsweremadeusingthesameinversePCRmethoddescribedbelow.Constructionofbarcoded,site-saturationmutagenesislibrariesforTPMTandPTEN

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 22: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

Site-saturationmutagenesislibrariesofTPMTandPTENwereconstructedusinginversePCR19.ForTMPT,wildtypeTPMTwasfirstclonedintopUC19.Next,foreachcodon,mutagenicprimerswereorderedwithmachine-mixedNNKbasesatthe5’endofthesenseoligonucleotide.MutagenizedTPMTwasclonedintotheHind-III/Xho-IsitesofaatB_mCherry_P2A_MCS.A15base,degeneratebarcodewasthenclonedintotheXbaIsiteofthemultiplecloningsitebyGibsonAssembly57.Owingtopoorcoverageintheinitiallibrary,aseparate“fill-in”librarywasconstructedforTPMTaminoacids192-239bythesameprotocol.Colonycountsrevealedapproximately40,000and10,000barcodeclonesforthemainTPMTandTPMTfill-inplasmidlibrariesrespectively.ForPTEN,eightrandomlychosencodonswereusedtooptimizedinversePCRamplification,usingattB-EGFP-PTEN-IRES-mCherry-562bglasthetemplate.Templateconcentrationsbetween0.02pgthrough20,000pgwereusedtoidentifytheminimumamountoftemplateneededtoseebandsonanagarosegelafter20cyclesusingprimerconcentrationsbetween0.25and0.5µΜ.Thefinalconcentrationswere250pgoftemplateplasmidand0.25uMofforwardandreverseprimers.Eachcodonamplificationwasdoneinatotalvolumeof10uLusing20cyclesatthestandardconditionsrecommendedforKapaHiFi(95°Cfor3minutesfollowedby20cyclesof98°Cfor20s,60°Cfor15sand72°Cfor30s/kboftemplateplasmid,followedbyafinalextensionof5min).TwoµLofeachamplifiedproductwererunona0.7%agarosegelforvisualvalidationofamplification,andtheremaining8µLofproductwasdiluted1:10withwater.TwoµLofthisdilutedproductwasquantifiedusingPicoGreen(ThermoFisher)onaBioTekH1platereader.PicoGreenmeasurementswereignoredforcodonswheremultipleamplifiedbandsofmultiplesizeswereobserved,andinsteadreplacedbyPicoGreenmeasurementsforadjacentcodonswithamplifiedbandsoftheintendedsizeofsimilarintensitytotheamplifiedbandoftheintendedsizeforthecodoninquestion.BasedonthesePicoGreen-derivedconcentrations,allampliconsweremixedtogethersothatapproximatelyequalamountsofthebandsofintendedsizewerepresentforallamplifiedcodons.Thisfinalmixtureofthelibrarywascleanedandconcentratedbyethanolprecipitation.Theprecipitatedproductwasresuspendedin100μLofddH2O.Tophosphorylatedtheamplifiedproduct,16μLofcleanedproductat~11.5ng/μLwasmixedwith2μLof10xT4DNAligasebuffer(NewEnglandBiolabs)and2μLofT4PNKenzyme,andincubatedat37°Cfor1hour.Tocircularizetheamplifiedproduct,theentire20μLreactionwasthenmixedwith4μL10xT4DNAligasebuffer,14μLofddH2O,and2μLofT4DNAligase,incubatedat16°Cfor1hour,25°Cfor10min,andheatinactivatedat65°Cfor10min.Residualtemplateplasmidwasthenremovedbyadding1μLofDPNIenzymetothetube,andincubatedat37°Cfor1hour.Theligatedproductwascleanedandconcentratedintoafinal6µLvolumeusingaZymoCleanandConcentratekit,andthentransformedintoNEB10-betaelectrocompetentE.coli.ToselectagainstinputplasmidandplasmidscontainingshortPCRproducts,thelibrarywasthenshuttledintoattB-EGFP-PTEN-IRES-mCherry-562bgl-KanRviadirectionalcloningusingXbaIandEcoRI.Barcodeswereaddedtothelibrarybyfillinginalongoligo(PTEN_BC_F1.1)supplementedwithashortreverseoligo(PTEN_BC_R)usingKlenow(-exo)polymerase.Here,0.25µΜ ofPTEN_BC_F1.1andPTEN_BC_Rweremeltedandannealedtogetherat98°Cfor3minutesinBuffer2.1(NewEnglandBiolabs)andcooledto25˚Catarateof–0.1°C/sec.4000unitsofKlenow(-exo)and0.033µΜ dNTP’swereadded,andthemixturewasincubatedfor15minutesat25°C.Thepolymerasewasinactivatedbyincubatingfor20minutessat70°C,andtheproductwascooledto37°Catarateof-0.1°C/sec.ThecooledproductwasthendigestedwithEcoRIandSacIIinBuffer2.1,purifiedwithaZymoCleanandConcentratekit,andelutedin30μLofddH2O.TodigestthemutagenizedPTENlibraryintheattB-EGFP-PTEN-IRES-mCherry-562bgl-KanRvector,2μg

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 23: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

ofplasmidwasmixedwith5μlof10xCutsmartbuffer,1μlEcoRI-HF,and1μlSac-IIina50μlreaction,digestedat37*Cfor1hour,andpurifiedwithaZymoCleanandConcentratekit.Bothpurifieddigestionproductsweremixedtogether,ligatedwithT4DNAligase,purifiedwithaZymoCleanandConcentratekit,andtransformedintointoNEB10-betaelectrocompetentE.coli(NewEnglandBiolabs).Colonycountsestimatedthislibrarytocontainroughly35,200barcodes.SingleMoleculeRealTime(SMRT)sequencingtolinkeachTPMTandPTENvariantstoitsbarcodeForbothPTENandTPMT,therelationshipbetweenvariantsandbarcodeswasestablishedusingSMRTsequencing(PacificBiosciences).TopreparethecircularSMRT-belltemplates62,libraryplasmidsweredigestedwithrestrictionenzymestoreleasethebarcodeandopenreadingframe.HairpinSMRT-belloligonucleotideswithcomplementarystickyendsandSMRTprimingsequenceswereligatedtothefragments.TPMTlibrariesweredigestedusingBsrGIandSphI.Thecorrectfragmentwassize-selectedon1%agaroseandgel-purifiedwithNEBMonarchDNAGelExtractionkit(NewEnglandBiolabs).CustomSMRTbelladapterspb_SphIandpb_BsrGIweresticky-endligatedtothepurifiedfragment.Tomakeaworkingstockof20μMSMRTbelladaptorsin10mMTris,0.1mMEDTA,100mMNaCl,theywereheatedto85˚Candsnapcooledonice.Theligationreactioncontained500ngpurifiedfragment,2.5µMofeachadaptor,1 µLofBsrGI,1 µLofSphI,1Xligasebuffer,and2µLofT4ligaseina40µLreaction.Theligationwasperformedatroomtemperaturefor2hours,thenheatinactivatedat65˚Cfor20minutes.1µLeachofExoIIIandExoVIIwereaddedandincubatedat37˚Cfor1hour.ThefinalSMRTbellfragmentswerepurifiedviaAmpurePB(PacificBiosciences)at1.8Xconcentration,washedin70%ethanol,elutedin15µL10mMTrisandquantifiedbyBioAnalyzer(Agilent).ThePTENlibrarywasdigestedusingSacIIandXbaI.Thecorrectfragmentwassize-selectedon1%agaroseandgel-purifiedwithaQiagenGelExtractionkit(Qiagen).CustomSMRTbelladaptersXbaI_SMRTBellandSacII_SMRTBellweresticky-endligatedto~150ngofthepurifiedfragmentina50µLreactionusing1xT4DNAligasebuffer,1 µMofeacholigo,800unitsofT4DNAligase,5unitsofSacII,and5unitsofXbaI.Theligationwasperformedatroomtemperaturefor30minutes,thenheatinactivatedat65˚Cfor10minutes.TenunitsofExonucleaseVII(ThermoFisher)and100unitsofExonucleaseIII(Enzymatics)wereaddedtothemixture,incubatedfor30minsat37°C.ThefinalSMRTbellfragmentswerepurifiedwithAmpurePB(PacificBiosciences)at1.8Xconcentration,washedtwicein70%ethanol,elutedin20µL10mMTris,andquantifiedusingaQuBit(ThermoFisher)andBioAnalyzer(Agilent).TheTPMTandPTENconstructsweresequencedonaPacificBiosciencesRSIIsequencer.ThemainTPMTlibrarywassequencedusingfourSMRTcellsandthefill-inTPMTlibrarywassequencedusingtwo.ThePTENlibrarywassequencedusingfiveSMRTcells.Basecallfileswereconvertedfromthebaxformattothebamformatusingbax2bam(version0.0.2)andthenbamfilesforeachlibraryfromseparatelaneswereconcatenated.ConsensussequencesforeachsequencedmoleculeineverylibraryweredeterminedusingtheCircularConsensusSequencing2algorithm(version2.0.0)withdefaultparameters(bax2bamandccscanfoundonGithub,https://github.com/PacificBiosciences/unanimity/blob/master/doc/PBCCS.md).EachresultingconsensussequencewasthenalignedtoeithertheTPMTorPTENreferencesequenceusingBurrows-WheelerAligner63(http://bio-bwa.sourceforge.net/).BarcodesandinsertsequenceswereextractedfromeachalignmentusingcustomscriptsthatparsedtheCIGARandMDstrings.Forbarcodessequencedmorethanonce,ifbarcode-variant

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 24: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

sequencesdiffered,thebarcodewasassignedtothevariantthatrepresentedmorethan50%ofthesequences.Barcodeslackingamajorityvariantsequencewereassignedthevariantsequencewiththehighestaveragequalityscoreasdeterminedbytheccs2algorithm.Thebarcode-variantextractionandbarcodeunificationscriptscanbefoundathttps://github.com/shendurelab/AssemblyByPacBio/.Metricsregardingtheprocessingofsequencingdataforthebarcode-variantassignmentscanbefoundinSupplementaryTable13.ThefinalTPMTlibrarieshave26,416barcodesassociatedwith6,251full-lengthnucleotidesequencevariantsthatencoded3,994uniqueproteinsequenceswithzerooroneaminoacidchange.ThefinalPTENlibraryhad22,707barcodesassociatedwith7,756full-lengthnucleotidesequencevariantsthatencoded5,043uniqueproteinsequenceswithzerooroneaminoacidchange.ForbothTPMTandPTENabarcode-variantmapfilewascreatedthatcontainseachbarcodeanditsnucleotidesequence.IntegrationofsinglevariantclonesorbarcodedlibrariesintotheHEK293-landingpadcelllineBarcodedvariantlibrariesorsinglevariantcloneswererecombinedintotheTet-onlandingpadinengineeredHEK293TTetBxb1BFPClone4cellsthatwegeneratedpreviously18.Thesecellsharborexactlyonecopyofatet-induciblepromoterfollowedbyaBxb1recombinasesite.IntegrationofapromoterlessplasmidcontainingaBxb1recombinasesiteresultsinexpressionofonevariantpercell.First,FuGENE6(Promega)wasusedtotransfecttheBxb1recombinase-expressingpCAG–NLS–HA–Bxb1plasmid,followed24-48hourslaterbythesinglevariantorlibraryplasmid.Twodaysaftertransfection,variantexpressionwasinducedbyadding0.5-2µg/mLdoxycyclinetothemedia(DMEM+10%FBS).Then,cellswerepreparedforsortingbyliftingfrom10cmplateswithVersenesolution(0.48mMEDTAinPBS),washing1XinPBS,resuspendinginsortbuffer(1XPBS+1%heat-inactivatedFBS,1mMEDTAand25mMHEPESpH7.0)andfilteringthrough35μmnylonmesh.CellsweresortedonaBDAriaIIIFACSmachineusingan85or100μmnozzle.mTagBFP2,expressedfromtheunrecombinedlandingpad,wasexcitedwitha405nmlaser,andemittedlightwascollectedafterpassingthrougha450/50nmbandpassfilter.EGFP,expressedaftersuccessfulrecombinationofthevariantorlibraryplasmid,wasexcitedwitha488nmlaser,andemittedlightwascollectedafterpassingthrough505nmlongpassand530/30nmbandpassfilters.mCherry,alsoexpressedaftersuccessfulrecombinationofthevariantorlibraryplasmidwasexcitedwitha561nmlaser,andemissionwasdetectedusing600nmlongpassand610/20bandpassfilters.Beforeanalysisoffluorescence,live,singlecellsweregatedusingFSC-AandSSC-A(forlivecells)orFSC-AandFSC-H(forsinglecells)signals.RecombinantmTagBFP2negative,mCherrypositivecellswereisolated,withmCherryfluorescencevaluesatleast10timeshigherthanthemedianfluorescencevalueofnegativeorcontrolcells,andmTagBFP2fluorescenceatleast10timeslowerthanthemedianoftheunrecombinedmTagBFP2positivecells(SeeSupplementaryFig.1aforgatingexample).Multiplereplicateintegrationswereconductedandsortedforrecombinants(SupplementaryTable1).Aftersorting,thelibrarieswereuniformlymTag2BFPnegativeandmCherrypositive.AnalyticalflowcytometrywasperformedwithaBDLSRIIflowcytometer,equippedwithfiltersetsidenticaltothosedescribedfortheAriaIII,withtheexceptionofmCherryemissionwhichwasdetectedusing595nmlongpassand610/20bandpassfilters.FACStobincellsbymCherry:EGFPratioCellsharboringvariantlibraries,preparedasdescribedabove,weresortedusingaFACSAriaIII(BDBiosciences)intobinsaccordingtotheabundanceoftheirexpressed,EGFPtaggedvariant.First,live,single,recombinantcellswereselectedusingforwardand

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 25: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

sidescatter,mCherryandmTagBFP2signals.Then,aFITC:PE-TexasRedratiometricparameterintheBDFACSDIVAsoftwarewascreated.AhistogramoftheFITC:PE-TexasRedratiowascreatedandgatesdividingthelibraryintofourequallypopulatedbinsbasedontheratiowereestablished.ThedetailsofreplicatesortscanbefoundinSupplementaryTable1.SortedlibrarygenomicDNApreparation,barcodeamplificationandsequencingFortheTPMTexperiments,sortedcellswerecollectedbycentrifugationandtheFACSsheathbufferwasaspirated.Cellsweretransferredintoamicrofugetube,pelletedandstoredat-20˚C.GenomicDNAwaspreparedusingtheGentraPrepkit(Qiagen).Foreachbin,allthepurifiedDNAwasspreadovereight25uLPCRreactionscontainingKapaRobust,primersGPS-landing-f(inthegenome)andBC-GPS-P7-i#-UMI(3’ofthebarcode)totagthebarcodeswithauniquemolecularindex(UMI)andaddasampleindex.UMI-taggingPCRwereperformedusingthefollowingconditions:initialdenaturation95˚C2minutes,followedbythreecyclesof(95˚C15seconds,60˚C20seconds,72˚C3minutes).TheeightPCRreactionswerepooledandthePCRampliconwaspurifiedusing1xAmpureXP(BeckmanCoulter).Toshortentheampliconandaddthep5andp7Illuminacluster-generatingsequences,theUMI-taggedbarcodeswerethenamplifiedwithprimersBC-TPMT-P5-v2andIlluminap7.ThisPCRwasperformedwithKapaRobustandSYBRgreenIIonaBio-Radmini-opticonqPCRmachine,reactionsweremonitoredandremovedbeforesaturationoftheSYBRgreenIIsignal,ataround25cycles.Theampliconswerepooledandgelpurified.Barcodeswerereadtwicebypaired-endsequencingprimersTPMT_Read1andTPMT_Read2.TheUMIandindexweresequencedbytheindexreadandprimerTPMT_IndexusingaNextSeq500(Illumina).AfterconvertingtofromtheBCLtoFASTQformatusingIllumina’sbcl2fastqversion2.18,acustomscriptwasusedtodemultiplexthesamplesbyindexandcallaconsensusbarcodefromtheread1andread2sequences.TocollapsethebarcodecopiesassociatedwithuniqueUMIs,theUMI(bases1-10oftheindexread)werepastedontotheconsensusbarcodeanduniquecombinationswereidentified(sort|uniq-c).Thebarcodefromeachuniquebarcode-UMIpairwasusedtopopulateaFASTQfilethatcouldbeusedbytheEnrich2softwarepackagetocountvariants.ForthePTENexperiments,sortedcellswerereplatedonto10cmplatesandallowedtogrowforapproximatelyfivedays.Cellswerethencollected,pelletedbycentrifugation,andstoredat-20˚C.GenomicDNAwaspreparedusingaDNEasykit,accordingtothemanufacturer’sinstructions(Qiagen)withtheadditionofa30minuteincubationat37°CwithRNAseinthere-suspensionstep.Eight50μLfirst-roundPCRreactionswereeachpreparedwithafinalconcentrationof~50ng/μLinputgenomicDNA,1xKapaHiFiReadyMix,and0.25μMoftheKAM499/JJS_501aprimers.Thereactionconditionswere95°Cfor5minutes,98°Cfor20seconds,60°Cfor15seconds,72°Cfor90seconds,repeat7times,72°Cfor2minutes,4°Chold.Eight50μLreactionswerecombined,boundtoAMPureXP(BeckmanCoulter),cleaned,andelutedwith40μLwater.40%oftheelutedvolumewasmixedwith2xKapaRobustReadyMix;JJS_seq_Fandoneoftheindexedreverseprimers,JJS_seq_R1athroughJJS_seq_R12awereaddedat0.25μMeach.ReactionconditionsforthesecondroundPCRwere95°Cfor3minutes,95°Cfor15seconds,60°Cfor15seconds,72°Cfor30seconds,repeat14times,72°Cfor1minutes,4°Chold.Ampliconswereextractedafterseparationona1.5%TBE/agarosegelusingaQuantumPrepFreeze‘NSqueezeDNAGelExtractionKit(Bio-Rad).ExtractedampliconswerequantifiedusingaKAPALibraryQuantificationKit(KapaBiosystems)andsequencedonaNextSeq500usingaNextSeq500/550HighOutputv275cyclekit(Illumina),usingprimersJJS_read_1,JJS_index_1,andJJS_read_2.SequencingreadswereconvertedtoFASTQformatandde-multiplexedwithbcl2fastq.BarcodepairedsequencingreadsforPTEN

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 26: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

experiments1through4werejoinedusingthefastq-jointoolwithintheea-utilspackage(http://expressionanalysis.github.io/ea-utils/)usingthedefaultparameters,whereasonlyonebarcodereadwascollectedforPTENexperiments5through8.Technicalamplificationandsequencingreplicateswereconductedforeverysample,andcomparedtoassessvariabilityinquantitationstemmingfromamplificationandsequencing.Experimentswithpoortechnicalreplicationacrossmultiplebinswerereamplifiedandresequencedintheirentirety,leavingeightreplicateexperimentswithtechnicalreplicatesshownhere(SupplementaryFig.9).FASTQfilesfromthesetechnicalreplicateamplificationandsequencingrunswereconcatenatedforanalysiswithEnrich2.BarcodecountingandvariantcallingEnrich2wasusedtocountthebarcodes,associateeachbarcodewithanucleotidevariant,andthentranslateandcountboththeunique-nucleotideandunique-aminoacidvariants64.FASTQfilescontainingeitherUMI-collapsedbarcodes(TPMT)ortotalbarcodes(PTEN)andthebarcode-mapforeachproteinwereusedasinputforEnrich2.Enrich2configurationfilesforeachexperimentareavailableontheGitHubrepository(http://github.com/FowlerLab/VAMPseq).Barcodesassignedtovariantscontaininginsertions,deletionsormultipleaminoacidmutationswereremovedfromtheanalysis.CalculatingVAMP-seqscoresandclassificationsRStudiov1.0.136wasusedforallsubsequentanalysisoftheEnrich2output.Thecountforeachvariantinabinwasdividedbythesumofcountsrecordedinthatbintoobtainthefrequencyofeachvariant(Fv)withinthatbin.Thiscalculationwasrepeatedforeverybinineachreplicateexperiment.Thefrequenciesofavariantinallfourbinsofanexperimentwereaddedtogethertoobtainthetotalfrequencyvalue(Fv,total)foreachvariantforeachexperiment.Thistotalfrequencyvaluewasusedforfilteringlow-frequencyvariants,whichwereasonedwouldbesubjecttohighlevelsofcountingnoise,outofthesubsequentcalculations.WesettheFv,totalfilteringthresholdbasedontheassumptionthataccuratelyscoredsynonymousvariantsshouldcreateaclear,unimodaldistributionaroundWT.WeexaminedhowdifferentminimumFv,totalfilteringthresholdvaluesaffectedthespreadandcentraltendencyofthesynonymousdistribution(SupplementaryFig.10).Weempiricallyselected1x10-4astheFv,totalfilteringthresholdvalueasitminimizedtheskewandcoefficientofvariationofthesynonymousvariantabundancescoredistributionwhileretainingthemajorityofmissensevariants.Next,foreachexperiment,aweightedaveragewascalculatedforeachvariant(Wv)passingtheFv,totalfilteringthresholdvalueusingthefollowingequation:

𝑊! =(𝐹!,!"# ! × 0.25)+ 𝐹!,!"# ! × 0.5 + (𝐹!,!"# !× 0.75)+ (𝐹!,!"# ! × 1)

4 Thus,allweightedaveragevaluesrangedfromavalueof0.25to1.Finally,foreachexperiment,anabundancescoreforeachvariant(Sv)wasobtainedbysubjectingtheweightedaverageofeachvarianttomin-maxnormalization,usingtheweightedaveragevalueofWT(Wwt),whichwasgivenascoreof1,andthemedianweightedaveragevaluefornon-terminalnonsensevariants(Wnonsense)atpositions51through349forPTEN,orpositions51through219forTPMT,whichwasgivenanabundancescoreof0,usingthefollowingequation:

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 27: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

𝑆! = (𝑊! −𝑊!"!#$!#$)(𝑊!" − 𝑊!"!#$!#$)

Thefinalabundancescoreforeachvariantwascalculatedbytakingthemeanofthemin-maxnormalizedabundancescoresacrosstheeightreplicateexperimentsinwhichitcouldhavebeenobserved.Onlyvariantswhichwerescoredintwoormorereplicateexperimentswereretainedintheanalysis.Weimplementedthisfilterbecausemanysourcesofnoisearenotcapturedincount-basedestimatesofvarianceandbecausehavingreplicate-levelvarianceestimateswascriticaltoourabundanceclassificationscheme.Astandarderrorforeachabundancescorewascalculatedbydividingthestandarddeviationofthemin-maxnormalizedvaluesforeachvariantbythesquarerootofthenumberofreplicateexperimentsinwhichitwasobserved.Lastly,thelowerboundofthe95%confidenceintervalwascalculatedbymultiplyingthestandarderrorbythe97.5percentilevalueofanormaldistributionandsubtractingthisproductfromtheabundancescore.Theupperboundofthe95%confidenceintervalwascalculatedbyinsteadaddingtheproducttotheabundancescore.PositionalVAMP-seqscoreswerecalculatedbytakingthemedianofallsingleaminoacidVAMP-seqscoresateachposition.InNNKmutagenesisschemesliketheoneweemployed,synonymousvariantscanbegeneratedat50ofthe61aminoacid-codingcodonsthatmayexistinthetemplatesequence.Notably,thefollowingcodonsinthetemplatesequenceprecludegenerationofasynonymousvariantatthatposition:ATG(M),ATT(I),TTT(F),GAG(E),GAT(D),AAG(K),AAT(N),CAG(Q),CAT(H),TAT(Y),andTGT(C).Thus,synonymousvariantsweretheoreticallypossibleat272and167codonsforthePTENandTPMTproteins,respectively.Ofthese,synonymousvariantswereobservedat151PTENand138TPMTcodonsinourfinaldataset.ForbothTPMTandPTEN,thedistributionofwildtypesynonymswasusedtocreateVAMP-seqclassificationsforeveryvariant(seeSupplementaryFig.5aforscheme).First,weestablishedasynonymousscorethresholdbydeterminingtheabundancescorethatseparatedthe95%mostabundantsynonymousvariantsfromthe5%lowestabundancesynonymousvariants(0.689forPTEN,and0.723forTPMT).Variantswhoseabundancescoreandupperconfidenceintervalwerebothbelowthissynonymousthresholdvaluewereclassifiedas“lowabundance”variants,whereasthosewithabundancescoresbelowthisthresholdbutupperconfidenceintervaloverthisthiswereclassified“possiblylowabundance”.Variantswithscoresabovethisthresholdbutlowerconfidenceintervalsbelowthethresholdwereconsidered“possiblywt-likeabundance”.Variantswithscoresandlowerconfidenceintervalabovethethresholdwereclassifiedas“WT-likeabundance.”ForbothTPMTandPTEN,substitution-intolerantpositionsweredeterminedbasedontheproportionofvariantsatthepositionwithscoresbelowthesynonymousthreshold,determinedasdescribedabove.Positionswhere5ormorevariantswerescoredandgreaterthan90%ofthescoreswerebelowthesynonymousvariantthresholdvaluewereconsideredsubstitutionintolerant.Enhancedabundancepositionsweredeterminedbasedontheproportionofvariantsatthepositionwithscoresabovethemedianofthesynonymousdistribution.Positionswhere5ormorevariantswerescoredandmorethan5variantshadscoresabovethemedianofthesynonymousdistributionwereconsideredenhanced-abundancepositions.AssessmentofthePTENlibrarycomposition

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 28: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

TobetterunderstandthesourcesbottleneckinginthePTENexperiments,thecompositionofthePTENplasmidlibrarypreparationusedtogeneraterecombinantcellswasassessedbydeterminingbarcodefrequenciesusinghighthroughputIlluminasequencing.Tworeactionswereindependentlyperformedfromthesameplasmidpreparationandservedastechnicalreplicates.Each50μLfirst-roundPCRreactionwaspreparedwithafinalconcentrationof~50ng/μLinputplasmidDNA,1xKapaHiFiReadyMix,and0.25μMeachoftheJJS_seq_F/JJS_501aprimers.Thereactionconditionswere95°Cfor3minutes,98°Cfor20seconds,60°Cfor15seconds,72°Cfor15seconds,repeat5times,72°Cfor2minutes,4°Chold.ThereactionwasboundtoAMPureXPbeads(BeckmanCoulter),cleaned,andelutedwith16μLwater.15μLoftheelutedvolumewasmixedwith2xKapaRobustReadyMix;JJS_P5_(short)andeitherJJS_seq_R1afortechnicalreplicate1orJJS_seq_R2afortechnicalreplicate2wereaddedat0.25μMeach.ReactionconditionsforthesecondroundPCRwere95°Cfor3minutes,95°Cfor15seconds,60°Cfor15seconds,72°Cfor30seconds,repeat19times,72°Cfor1minutes,4°Chold.Ampliconswereextractedafterseparationona1.5%TBE/agarosegelusingaQuantumPrepFreeze‘NSqueezeDNAGelExtractionKit(Bio-Rad).ExtractedampliconswerequantifiedusingaKAPALibraryQuantificationKit(KapaBiosystems)andsequencedonaNextSeq500usingaNextSeq500/550HighOutputv275cyclekit(Illumina),usingprimersJJS_read_1,JJS_index_1,andJJS_read_2.SequencingreadswereconvertedtoFASTQformatandde-multiplexedwithbcl2fastq.Barcodepairedsequencingreadswerejoinedusingthefastq-jointoolwithintheea-utilspackage.Enrich2wasusedtocountthebarcodesinthereads,usingaminimumqualityfilterof20.Highcorrelation(Pearson’sr=99)ofbarcodecountswasobservedbetweentechnicalreplicateamplifications(SupplementaryFig.11a).Afterbarcodecountsinbothreplicateswerecombined,aminimumcountfilterof200wasimposedtoremovebarcodesarisingfromsequencingerror(SupplementaryFig.11b).Eachbarcode’scountwasdividedbythetotalnumberofbarcodereadspassingthisfiltertoobtainfrequenciesforeachbarcode.Usingthebarcode-variantmapgeneratedbyPacBiosubassembly,aproteinsequencewasassignedtoeachbarcode.Barcodesmissingfromthebarcode-variantmapwerecategorizedas“Notsubassembled”.Thefrequencyofeachtypeofsequencewasdetermined(SupplementaryFig.11c).Thecompositionofthesingleaminoacidvariantsinthelibrarywerenextanalyzedtodeterminesourcesofpotentiallibrarybottlenecking.Thenucleotidefrequenciesateachmutatedcodonweredetermined(SupplementaryFig.11d),andrelativefrequenciesofeachaminoacidvariantobservedinthelibrarywerecalculated(SupplementaryFig.11e).Singleaminoacidsubstitutioncoveragewasdeterminedforeachpositionalongtheprotein(SupplementaryFig.11fand11g).Lastly,thedistributionofsingleaminoacidvariantswithinthelibrarywasdetermined(SupplementaryFig.11h),andsimulationsofsamplesizesrequiredtoobserveeachPTENsingleaminoacidvariantwereperformed(SupplementaryFig.11i).VariantannotationfromonlinedatabasesPublishedwesternblottingresultsforPTENandTPMTvariantsarelisted,alongwithreferences,inSupplementaryTable11andSupplementaryTable12.Wecollectedstructuralfeatureinformation,includingabsolutesolventaccessibilities,usingDSSP65,66basedonPDBstructure1d5rforPTENand2H11forTPMT.Foreachaminoacidinbothproteins,wedividedtheabsolutesolventaccessibilityderivedfromDSSPbytheempiricallydeterminedmaximumaccessibilityofthataminoacidtoyieldrelativesolventaccessibility67.TheCOSMIC(CatalogueofSomaticMutationsinCancer)releasev81wasusedfortheanalyseswepresented68.CancergenomicsdataincludingthosefromTheCancerGenomeAtlasandAACRProjectGENIE43datawasaccessedfromcBioPortal69on

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 29: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

2/15/2017and2/21/2017,respectively.PTENvariantsobservedintheGBM,LGG-GBM,andGliomacancercategorieswerecombinedintoasinglebraincancercategoryfortheanalysis.ClinVar3datawasaccessedon6/29/2017andfilteredtoexcludeeverythingexceptgermlinemissenseandnonsensevariants.Averageevolutionarycoupling70valuesbypositionwerecalculatedusingdatafromhttp://evfold.org/.Mutationalspectrafromthesixtransitionortransversioncategoriesforbreastadenocarcinoma,lungsquamouscellcarcinoma,uterinecorpusendometrialcarcinoma,glioblastomamultiforme,colonandrectalcarcinoma,ovarianserouscarcinoma42,andmelanoma71wereusedtocreateexpectedPTENvariantfrequencydistributions.MinorallelefrequencieswereextractedfromtheGnomADdatabase(Feb.2017release)2.TPMTallelesnamesandRSIDnumbersweretakenfromhttp://www.imh.liu.se/tpmtalleles/tabell-over-tpmt-alleler?l=en.ThePTENvarianteffectpredictionswereobtainedfromPolyphen-2(http://genetics.bwh.harvard.edu/pph2/)72,Provean(http://provean.jcvi.org/)73,SIFT(http://sift.jcvi.org/)74,Snap2(https://rostlab.org/services/snap2web/)75,Mutationassessor(http://mutationassessor.org/r3/)76,andFATHMM(http://fathmm.biocompute.org.uk/)77byqueryingtheirrespectivewebsites.PTENpred78wasdownloadedandallpredictionswererunlocally.ThepredictionsforLRT79,MutationTaster80,MetaSVM81,MetaLR81,MCap82,andCADD83werecollectedwithdbNSFP84,whichwasdownloadedandrunlocally.PTENClinVarandcancergenomicsanalysesNinePTENvariantswerelistedinClinVarasbothlikelypathogenicandpathogenic.Weexaminedtheevidenceforthesevariants–H61R,Y68H,L108P,G127R,R130L,R130Q,G132V,R173C,andR173H–andfollowingtheACMG-AMPguidelines40,allnineweredeemedtobelonginthelikelypathogeniccategory.Anadditionaltwovariants–R15KandP96S–hadaninterpretationofuncertainsignificancealongwithanotherinterpretationoflikelypathogenicorpathogenic,andthustheclinicalsignificanceofthevariantwaslistedas“Conflictinginterpretationsofpathogenicity”.AsrecommendedbytheACMG/AMPguidelines40,variantswithconflictinginterpretationswereconsideredvariantsofunknownsignificance.Forourstatisticalanalysisoftheenrichmentsoflow-abundancevariantsinthepathogenic,likelypathogenic,anduncertainsignificanceClinVarcategoriesweusedaresamplingapproach.Wedrew10,000randomsamples,withreplacementcorrespondingtothenumberofvariantsscoredfromeachcategoryinClinVar(pathogenic=24;likelypathogenic=22;uncertainsignificance=81)fromthe1,313PTENmissensevariants(e.g.singlenucleotidevariantsthatchangeanaminoacid)withabundancescores.Werecordedthefrequencyoflowabundancevariantsineachroundofresampling.Then,wecomputedtheP-valueforeachcategorybydividingthenumberoftimestheobservedfrequencyofPTENlow-abundancevariantsfellbelowthefrequenciesoflow-abundancevariantsintheresampledsetsby10,000.Forourstatisticalanalysisofenrichmentsoflow-abundance,dominantnegative,orP38Svariantsindifferentcancertypes,wefirstusedtheratesofsinglenucleotidetransitionsandtransversionsobservedinTCGA42,71tocreatemutationalprobabilitiesforeverypossiblePTENmissenseornonsensevariant.Basedontheseprobabilitieswedrew10,000randomsamplesofPTENvariantsofsizetoequalthenumberofPTENvariantsfoundineachcancertype(n=337,192,153,186,77,113,and327forbrain,breast,colorectal,endometrial,melanoma,NSCLC,anduterinecancers,respectively).Foreachcancertype,thiscreatedthenulldistributionofPTENvariantfrequenciesbasedonthemutationspectrumalone.Then,foreachcancertype,wecomputedtheP-valuebydividingthenumberoftimestheobservedfrequencyoflow-abundance,dominantnegativeorP38S

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 30: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

variantsfellbelowthefrequencyoftheappropriatetypeofvariantsintheresampledsetsby10,000.RosettaΔΔGpredictionsComputationalpredictionsofPTENvariantlossesinfoldingenergy(e.g.ΔΔGs)wereperformedusingthe2017.08releaseofRosetta.ThePTENproteindatabank(PDB)file1d5rwasrenumberedtoaccommodatemissingresidues,andtheTLAligandwasremoved.PreminimizationoftheensuingfilewasperformedusingRosettaminimize_with_cst,followedbytheconvert_to_cst_fileshellscript.FinegrainestimationsoffoldingenergychangesuponPTENmutationwerecreatedwithRosettaddg_monomer85usingthetalaris2014scoringfunction,andthefollowingflags:-ddg:weight_filesoft_rep_design,-fa_max_dis9.0,ddg::iterations50,-ddg::dump_pdbstrue,-ignore_unrecognized_res,-ddg::local_opt_onlyfalse,-ddg::min_csttrue,-constraints::cst_fileinput.cst,-ddg::suppress_checkpointingtrue,-in::file::fullatom,-ddg::meanfalse,-ddg::mintrue,-ddg::sc_min_onlyfalse,-ddg::ramp_repulsivetrue,-ddg::output_silenttrue.ComparisonofTPMTredbloodcellactivityordoseintensitytoabundancescores.Genotypes,TPMTredbloodcellactivitythatwasnormalizedbycohortanddoseintensitydatafor884ALLpatientswasprovidedfromthestudydescribedinLiuetal.51.ThemeanTPMTredbloodcellactivityanddoseintensityfromindividualsheterozygousforeachuniqueTPMTvariantwascalculated.ThesevaluesweredirectlycomparedtoabundancescoresforthatvariantfromtheVAMP-seqassayorthewild-typenormalizedGFP:mCherryratiofromindividualflowcytometryexperiments(Figure5;SupplementaryFig.7).WesternblottingHEK293TTetBxb1BFPClone4cells18weretransfectedwiththepCAG-NLS-HA-Bxb1expressionvectorandeitheranattB-PTEN-HA-IRES-mCherryplasmidencodingaPTENvariantoranattB-mCherry_2A_GFPplasmidencodingaTPMTvariant.Twodaysaftertransfection,cellswereswitchedtomediacontaining2μg/mLdoxycycline.Foreachvariant,approximately8,000mTagBFP2negative,mCherrypositivecellsweresortedusingaFACSAriaIIIsorter(BDBiosciences),andallowedtogrowtoconfluencein6-wellplateswithDox-containingmedia.CellsexpressingPTENvariantswerethencollectedwithTrypsin-EDTA,washedinPBS,andincubatedwithlysisbuffer(20mMTrispH8.0,150mMNaCl,1%TritonX-100,andProteaseInhibitorCocktail(Sigma-Aldrich))for10minutesat4°C.Thetubeswerecentrifugedat21,000xgfor5minutes,thesupernatantwascollected,andproteinconcentrationwasdeterminedbytheDCProteinassay(Bio-Rad)againstastandardcurveofbovineserumalbumin.40μgofproteinwasloadedperwellofaNuPage4-12%Bis-Trisgel(Invitrogen)inMOPSbuffer,usingSpectraMulticolorBroadRangeProteinLadder(ThermoFisherScientific)forsizecomparison.ProteinsweretransferredtoaPVDFmembraneusingaGenieBlotter(IdeaScientific).Westernblottingwasperformedusinga1:2,000dilutionofanti-phospho-AKT(T308;13038;CellSignalingTechnology)followedbydetectionwitha1:10,000dilutionofanti-rabbit-HRP(NA934V;GEHealthcare);a1:2,000dilutionofanti-pan-AKT(2920;CellSignalingTechnology)followedbydetectionwitha1:10,000dilutionofanti-mouse-HRP(NA931V;GEHealthcare);a1:4,000dilutionofanti-GFPantibody(11814460001;Roche),followedbydetectionwitha1:10,000dilutionofanti-mouse-HRP;1:5,000dilutionofanti-HA-HRP(3F10;Roche);ora1:5,000dilutionofanti-beta-actin-HRP(ab8224;Abcam),usingtheSuperSignal™WestDuraextendeddurationsubstrate(ThermoFisherScientific).TPMTexpressingcellswereremovedfromtheplatewithcoldPBS,pelletedandresuspendedinlysisbuffer(50mMTrispH8.0,150mMNaCl,1%NP-40,andProtease

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 31: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

InhibitorCocktail(Roche)).ProteinconcentrationwasdeterminedbyBradfordAssay(Bio-Rad).45,15and5µgoflysatewasloadedperwellofaNuPage4-12%Bis-Trisgel(Invitrogen)inMOPSbuffer,usingSeeBluePlus2ProteinLadder(ThermoFisherScientific)forsizecomparison.ProteinsweretransferredtoaPVDFmembraneusingaGenieBlotter(IdeaScientific).Westernblottingwasperformedusinga1:3,000dilutionofanti-GFPantibody(11814460001;Roche)followedbydetectionwitha1:10,000dilutionofanti-mouse-HRP(NA934V;GEHealthcare)ora1:5,000dilutionofanti-beta-actin-HRP(ab8224;Abcam),usingtheSuperSignal™WestDuraextendeddurationsubstrate(ThermoFisherScientific).DataandcodeavailabilityThedatapresentedinthemanuscriptareavailableasSupplementaryTables.CodeusedfortheanalysesperformedinthisworkisincludedasSupplementaryFile1,andalsoavailableathttp://github.com/FowlerLab/VAMPseq.CodeusedforsubassemblybyPacBioisavailableathttp://github.com/shendurelab/AssemblyByPacBio.TheIlluminaandPacBiorawsequencingfilesandbarcode-variantmapscanbeaccessedattheNCBIGeneExpressionOmnibus(GEO)repositoryunderaccessionnumberGSE108727(releaseduponpublication).OnlineMethodsReferences57. Gibson,D.G.etal.EnzymaticassemblyofDNAmoleculesuptoseveralhundred

kilobases.Nat.Methods6,343–5(2009).58. Papadopoulos,N.etal.MutationofamutLhomologinhereditarycoloncancer.

Science263,1625–9(1994).59. Nicolaides,N.C.,Littman,S.J.,Modrich,P.,Kinzler,K.W.&Vogelstein,B.Anaturally

occurringhPMS2mutationcanconferadominantnegativemutatorphenotype.Mol.Cell.Biol.18,1635–1641(1998).

60. Scaffidi,P.&Misteli,T.LaminA-dependentmisregulationofadultstemcellsassociatedwithacceleratedageing.Nat.CellBiol.10,452–459(2008).

61. Hermann,M.etal.Binaryrecombinasesystemsforhigh-resolutionconditionalmutagenesis.NucleicAcidsRes.42,3894–3907(2014).

62. Travers,K.J.etal.AflexibleandefficienttemplateformatforcircularconsensussequencingandSNPdetection.NucleicAcidsRes.38,e159(2010).

63. Li,H.&Durbin,R.FastandaccurateshortreadalignmentwithBurrows-Wheelertransform.Bioinformatics25,1754–1760(2009).

64. Rubin,A.F.etal.Enrich2:astatisticalframeworkforanalyzingdeepmutationalscanningdata.bioRxiv75150(2016).doi:10.1101/075150

65. Kabsch,W.&Sander,C.Dictionaryofproteinsecondarystructure:patternrecognitionofhydrogen-bondedandgeometricalfeatures.Biopolymers22,2577–637(1983).

66. Touw,W.G.etal.AseriesofPDB-relateddatabanksforeverydayneeds.NucleicAcidsRes.43,D364–D368(2015).

67. Tien,M.Z.,Meyer,A.G.,Sydykova,D.K.,Spielman,S.J.&Wilke,C.O.Maximumallowedsolventaccessibilitesofresiduesinproteins.PLoSOne8,(2013).

68. Forbes,S.A.etal.COSMIC:Somaticcancergeneticsathigh-resolution.NucleicAcidsRes.45,D777–D783(2017).

69. Gao,J.,Aksoy,B.,Dogrusoz,U.&Dresdner,G.IntegrativeanalysisofcomplexcancergenomicsandclinicalprofilesusingthecBioPortal.Sci.Signal.6,1–20(2013).

70. Marks,D.S.,Hopf,T.a&Sander,C.Proteinstructurepredictionfromsequencevariation.Nat.Biotechnol.30,1072–80(2012).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint

Page 32: Multiplex Assessment of Protein Variant Abundance by ......2018/01/16  · the proteome17, we developed VAMP-seq. VAMP-seq is a multiplex assay that uses fluorescent reporters to measure

71. Krauthammer,M.etal.ExomesequencingidentifiesrecurrentsomaticRAC1mutationsinmelanoma.Nat.Genet.44,1006–14(2012).

72. Adzhubei,I.,Jordan,D.M.&Sunyaev,S.R.PredictingfunctionaleffectofhumanmissensemutationsusingPolyPhen-2.Curr.Protoc.Hum.Genet.7.20.1-7.20.41(2013).doi:10.1002/0471142905.hg0720s76

73. Choi,Y.&Chan,A.P.PROVEANwebserver:Atooltopredictthefunctionaleffectofaminoacidsubstitutionsandindels.Bioinformatics31,2745–2747(2015).

74. Ng,P.C.&Henikoff,S.SIFT:predictingaminoacidchangesthataffectproteinfunction.NucleicAcidsRes.31,3812–3814(2003).

75. Hecht,M.,Bromberg,Y.&Rost,B.Betterpredictionoffunctionaleffectsforsequencevariants.BMCGenomics16,S1(2015).

76. Reva,B.,Antipin,Y.&Sander,C.Predictingthefunctionalimpactofproteinmutations:Applicationtocancergenomics.NucleicAcidsRes.39,(2011).

77. Shihab,H.A.etal.PredictingtheFunctional,Molecular,andPhenotypicConsequencesofAminoAcidSubstitutionsusingHiddenMarkovModels.Hum.Mutat.34,57–65(2013).

78. Johnston,S.B.&Raines,R.T.PTENpred:ADesignerProteinImpactPredictorforPTEN-relatedDisorders.J.Comput.Biol.23,1–7(2016).

79. Chun,S.&Fay,J.C.Identificationofdeleteriousmutationswithinthreehumangenomes.Identif.deleteriousMutat.withinthreeHum.genomes.19,1553–1561(2009).

80. Schwarz,J.M.,Cooper,D.N.,Schuelke,M.&Seelow,D.MutationTaster2:mutationpredictionforthedeep-sequencingage.Nat.Methods11,361–2(2014).

81. Dong,C.etal.ComparisonandintegrationofdeleteriousnesspredictionmethodsfornonsynonymousSNVsinwholeexomesequencingstudies.Hum.Mol.Genet.24,2125–2137(2015).

82. Jagadeesh,K.A.etal.M-CAPeliminatesamajorityofvariantsofuncertainsignificanceinclinicalexomesathighsensitivity.Nat.Genet.48,1581–1586(2016).

83. Kircher,M.etal.Ageneralframeworkforestimatingtherelativepathogenicityofhumangeneticvariants.Nat.Genet.46,310–5(2014).

84. Liu,X.,Wu,C.,Li,C.&Boerwinkle,E.dbNSFPv3.0:AOne-StopDatabaseofFunctionalPredictionsandAnnotationsforHumanNonsynonymousandSplice-SiteSNVs.Hum.Mutat.37,235–241(2016).

85. Kellogg,E.H.,Leaver-Fay,A.&Baker,D.Roleofconformationalsamplingincomputingmutation-inducedchangesinproteinstructureandstability.Proteins79,830–838(2011).

certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was notthis version posted January 16, 2018. . https://doi.org/10.1101/211011doi: bioRxiv preprint