supplementary material for...had undergone dna capture with either the agilent sureselect human all...
TRANSCRIPT
www.sciencemag.org/cgi/content/full/science.aar6731/DC1
Supplementary Material for
Quantifying the contribution of recessive coding variation to developmental disorders
Hilary C. Martin*, Wendy D. Jones, Rebecca McIntyre, Gabriela Sanchez-Andrade,
Mark Sanderson, James D. Stephenson, Carla P. Jones, Juliet Handsaker, Giuseppe Gallone, Michaela Bruntraeger, Jeremy F. McRae, Elena Prigmore,
Patrick Short, Mari Niemi, Joanna Kaplanis, Elizabeth J. Radford, Nadia Akawi, Meena Balasubramanian, John Dean, Rachel Horton, Alice Hulbert, Diana S. Johnson, Katie Johnson, Dhavendra Kumar, Sally Ann Lynch, Sarju G. Mehta, Jenny Morton,
Michael J. Parker, Miranda Splitt, Peter D Turnpenny, Pradeep C. Vasudevan, Michael Wright, Andrew Bassett, Sebastian S. Gerety, Caroline F. Wright,
David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles, Jeffrey C. Barrett*, on behalf of the DDD Study
*Corresponding author. Email: [email protected] (H.C.M.); [email protected] (J.C.B.)
Published 8 November 2018 as Science First Release
DOI: 10.1126/science.aar6731
This PDF file includes: Materials and Methods Figs. S1 to S20 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/science.aar6731/DC1)
Tables S1 to S7 as separate Excel files
Funding statement
The DDD study presents independent research commissioned by the Health Innovation
Challenge Fund (grant HICF-1009-003), a parallel funding partnership between the Wellcome
Trust and the UK Department of Health, and the Wellcome Trust Sanger Institute (grant
WT098051). The views expressed in this publication are those of the author(s) and not
necessarily those of the Wellcome Trust or the UK Department of Health. The study has UK
Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research
Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics
Committee). The research team acknowledges the support of the National Institutes for Health
Research, through the Comprehensive Clinical Research Network. This study makes use of
DECIPHER (http://decipher.sanger.ac.uk), which is funded by the Wellcome Trust.
Materials and Methods
Family recruitment
The DDD project recruited individuals with severe, undiagnosed developmental disorders from
24 clinical genetics centres within the United Kingdom National Health Service and the
Republic of Ireland as described previously ( 11) , using specific criteria
(https://decipher.sanger.ac.uk/files/ddd/documents/policy/ddd_project_referral_guide_clinical_
genetics.pdf). Families gave informed consent to participate, and the study was approved by the
UK Research Ethics Committee (10/H0305/83, granted by the Cambridge South Research
Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics
Committee). DNA was collected from saliva samples obtained from the probands and their
parents, and from blood obtained from the probands, then samples were processed as previously
described (1). The analyses in paper are based on a data freeze that includes 7,831 trios from
7,446 families and 1,791 patients without parental samples (Fig. S2). These individuals include
those analysed in previous publications (4 , 9).
Clinical features
The patients were systematically phenotyped: detailed developmental phenotypes were recorded
using Human Phenotype Ontology (HPO) terms (27), and growth measurements, family history,
developmental milestones etc. were collected using a standard restricted-term questionnaire
within DECIPHER ( 28). Some HPO terms fall under multiple organ systems (e.g. microcephaly
falls under "nervous system", "head or neck" and "skeletal system"), and for Fig. 1 and Table
S1, we wanted to avoid counting multiple organ systems that represented a single HPO term
multiple times. Thus, we first ranked organ systems according to the number of raw counts of
individuals with at least one term under that system in the full DDD cohort. We then took
individuals with at least one HPO under the organ system most commonly affected, and
assigned these individuals a count of one organ system. We then removed these HPOs from the
patients’ lists, and identified individuals with at least one HPO in the organ system ranked next
most commonly affected. We continued to count organ systems and remove HPOs for 19
non-overlapping systems. The organ systems in Fig. 1A and Table S1 were determined using
this procedure (e.g. a patient with microcephaly is only included under “nervous system”, not
under “head or neck” or “skeletal system”), and Fig. 1B shows the distribution of the
non-redundant counts.
We note that certain DDD clinicians tend to be more diligent about reporting multiple HPO
terms, and some tend to report certain organ systems more thoroughly. Thus, the small
differences in clinical features between EABI and PABI (Fig. 1; Fig. S3; Table S1) could be
partly because clinicians who work in areas with a high South Asian population have different
reporting styles from others. The slightly greater severity of the PABI group might also partly
be because, in communities enriched for cousin marriages, there is a higher number of children
with genetic disorders (29 ), and clinical geneticists are only able to see the more severe cases.
Exome sequencing and variant quality control
Exome sequencing, alignment and calling of single-nucleotide variants and small insertions and
deletions was carried out as previously described (4), as was the filtering of de novo mutations.
For the analysis of biallelic genotypes, we chose thresholds for genotype and site filters to
balance sensitivity (number of retained variants) and specificity (as assessed by Mendelian error
rate and transition/transversion ratio). We removed sites with a strand bias test p-value < 0.001.
We then set individual genotypes to missing if they had genotype quality < 20, depth < 7 or, for
heterozygous calls, a p-value from a binomial test for allele balance < 0.001. Since the samples
had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit,
we subsequently only retained sites that passed a missingness cutoff in both the V3 and the V5
samples. We found that, after setting a depth filter, the proportion of missing genotypes allowed
had a more substantial effect on the number of Mendelian errors than genotype quality and
allele balance cutoffs (Fig. S20). Thus, we ran the biallelic burden analysis on two different
callsets, using a 10% (strict) or a 50% (lenient) missingness filter, and found that the results
were very similar. We report results from the more lenient filter in this paper, since it allowed
us to include more variants. Genotypes were set to missing for a trio if there was a Mendelian
error, and variants were removed if more than one trio had a Mendelian error and if the ratio of
trios with Mendelian errors to trios carrying the variant without a Mendelian error was greater
than 0.1. If any of the individuals in a trio had a missing genotype at a variant, all three
individuals were set to missing for that variant.
Variants were annotated with Ensembl Variant Effect Predictor (30) based on Ensembl gene
build 83, using the LOFTEE plugin. The transcript with the most severe consequence was
selected. We analyzed three categories of variant based on the predicted consequence: (1)
synonymous variants; (2) loss-of-function variants (LoFs) classed as “high confidence” by
LOFTEE (including the annotations splice donor, splice acceptor, stop gained, frameshift,
initiator codon and conserved exon terminus variant); (3) damaging missense variants (i.e. those
not classed as “benign” by PolyPhen or SIFT, with CADD>25). Variants were also annotated
with MAF data from four different populations of the 1000 Genomes Project (31 ) (American,
Asian, African and European), two populations from the NHLBI GO Exome Sequencing
Project (European Americans and African Americans) and six populations from the Exome
Aggregation Consortium (ExAC) (African, East Asian, non-Finnish European, Finnish, South
Asian, Latino), and an internal allele frequency generated using unaffected parents from the
DDD.
Ancestry inference
We ran a principal components analysis in EIGENSOFT (32) on 5,853 common exonic SNPs
defined by the ExAC project. We set genotypes with GL<20 to missing and excluded SNPs
with >2% missingness. We calculated principal components in the 1000 Genomes Phase III
samples and then projected the DDD samples onto them. We grouped samples into three broad
ancestry groups (European, South Asian, and Other) as shown in Fig. S2 (right hand plots). By
drawing ellipses around the densest clusters of DDD samples, we defined two narrower groups:
European Ancestry from the British Isles (EABI) and Pakistani Ancestry from the British Isles
(PABI).
For the burden and gene-based analysis, we primarily focused on these narrowly-defined EABI
and PABI groups because it is difficult to accurately estimate population allele frequencies in
more broadly defined groups. For example, in 4,942 European-ancestry probands, the number
of observed biallelic synonymous variants was slightly higher than the number expected (ratio =
1.06; p=2.7×10-4).
Calling autozygous regions
To call autozygous regions, we ran bcftools/roh(33 ) (bcftools version 1.5-4-gb0d640e)
separately on the different broad ancestry groups. We LD pruned our data to avoid overcalling
small runs of homozygosity as autozygous regions. Because rates of consanguinity differ
dramatically between EABI and PABI, we chose r2 cutoffs for each that brought the ratio of
observed to expected biallelic synonymous variants with MAF<0.01 closest to 1 (see below for
calculation of the number expected): PLINK options --indep-pairwise 50 5 0.4 for EABI and
--indep-pairwise 50 5 0.8 for PABI.
Sample quality control and subsetting
Our strategy for filtering probands and defining different subsets is shown in Fig. S1. For our
main analyses we excluded individuals without parental data (though we did incorporate them
into some of the follow-up on EIF3F and KDM5B). Next, we removed 1,403 probands whose
ancestry was not in the EABI or PABI groups described above. Finally we removed 388
probands that failed quality control filters. Specifically: 47 probands with >5% missingness at
the SNPs used for PCA (since their ancestry assignment was likely uncertain), 7 probands with
uniparental disomy, and one individual from every pair of probands who were related (kinship >
0.044, estimated by PCRelate (34), equivalent to third-degree relatives).
For estimating cumulative frequencies of rare variants, we removed 924 parents reported to be
affected, since one might expect these to be enriched for damaging variants compared to the
general population, and 9 European parents with an abnormally high number of rare (MAF<1%)
synonymous genotypes (>834, compared to the 99.9th percentile of 223). However, we retained
their offspring in the burden and gene-based analyses.
We stratified probands by high autozygosity (>2% of the genome classed as autozygous),
whether or not they had an affected sibling, and whether or not they already had a likely
diagnostic dominant or X-linked exonic mutation (a likely damaging de novo mutation or
inherited damaging variant in a known monoallelic DDG2P gene
(http://www.ebi.ac.uk/gene2phenotype/) [if the parent was affected] or a damaging X-linked
variant in a known X-linked DDG2P gene). The 4,458 patients who had no such diagnostic
variants were included in the “undiagnosed” set, along with 193 patients who had diagnostic
variants in recessive DDG2P genes or potentially diagnostic variants in monoallelic or X-linked
DDG2P genes but had high autozygosity or affected siblings. There were 1,366 EABI and 23
PABI probands in the diagnosed set, and 4,318 EABI and 333 PABI probands in the
undiagnosed set. For the set of probands with affected siblings shown in Figures 1 and 2, we
restricted to families from which more than one independent (i.e. non-MZ twin) child was
included in DDD and in which the siblings’ phenotypes were more similar than expected by
chance given the distribution of HPO terms in the full cohort (HPO similarity p-value < 0.05
(9 ) ).
Burden analyses and gene-based tests
Variants were filtered on class (LoF, damaging missense or synonymous) and by different MAF
cutoffs. Variants failing the MAF cutoff in any of the publicly available control populations, the
full set of unaffected DDD parents, or the unaffected DDD parents in that population subset
(PABI or EABI) were removed.
Following the approach we used previously (9), we calculated Bg,c, the expected number of rare
biallelic genotypes of class c (LoF, damaging missense or synonymous) in each gene g, as
follows:
(B ) N λE g,c = prob g,c
where is the number of probands and is the expected frequency of biallelic Nprob
λg,c
genotypes of class c in gene, calculated as follows:
(1 )f fλg,c = − ag c,g2 + ag c,g
where fc,g is the cumulative frequency of variants of class c in gene g with MAF less than the
cutoff, and ag is the fraction of individuals autozygous at gene g. An individual was defined as
being autozygous if he/she had a region of homozygosity with any overlap of gene g; in
practice, autozygous regions almost always overlapped genes completely rather than partially.
The rate of LoF/damaging missense compound heterozygous genotypes is:
(1 )[2f f (1 )]λg,LoF /miss = − ag LoF ,g miss,g − f
LoF ,g
To calculate the cumulative frequency of variants of class c in gene g, f c,g, we first phased the
variants in the DDD parents based on the inheritance information. The cumulative frequency is
then given by:
fc,g = hc,g
Nhaps
where is the number of parental haplotypes with at least one variant of class c in gene g , and hc
is the total number of parental haplotypes.Nhaps
For each gene, we calculated the binomial probability (given probands and rate of Nprob
)λg,c
the observed number of biallelic genotypes of class c. We did this for four consequence classes
(biallelic LoF, biallelic LoF+LoF/damaging missense, biallelic damaging missense, and biallelic
LoF+LoF/damaging missense+biallelic damaging missense). Our reasoning was that, in some
genes, biallelic LoFs might be embryonic lethal and LoF/damaging missense compound
heterozygotes might cause DD, but in other genes, including rare damaging missense variants in
the analysis might drown out signal from truly pathogenic LoFs. We conducted the test on two
sets of probands (EABI only, and EABI+PABI). We did not analyze PABI separately due to
low power.
For the set of EABI only, we conducted a simple binomial test. For the combined EABI+PABI
test, we took into account the different ways in which n or more probands with the relevant
genotype could be distributed between the two groups and the probability of observing each
combination using population-specific rates (e.g. two observed biallelic genotypes could be
both seen in EABI, both in PABI, or one in each). We then summed these probabilities across
all possible combinations to obtain an aggregate probability for sampling n or more probands by
chance, as described in (9 ) .
For some genes, was estimated to be 0 in one or both populations because there were no λg,c
variants in the parents that passed filtering. The vast majority of these also had . We (B )O g,c = 0
dropped these genes from the tests, but still included them in our Bonferroni correction. We also
excluded 715 genes either because they were in the HLA region or because they were classed as
having suspiciously many or suspiciously few synonymous or synonymous+missense variants
in ExAC, leaving 18,630 genes. We thus set a significance threshold of 0.05/(8 tests 18,630 ×
genes) = p<3.4×10-7. For Fig. S8, we ordered the genes by their lowest p-value, randomized the
order of genes with the same p-value, then tested for a difference in the distribution of ranks
between recessive DDG2P genes and all other genes using a Kolmogorov-Smirnov test.
For the burden analysis, we summed up the observed and expected number of biallelic
genotypes across all genes to give and , then calculated their difference (B ) O c (B )E c
(the excess) and their ratio . Under the null hypothesis, we expect(B ) (B )O c − E c E(B ) c
O(B ) c (B )O c
to follow a Poisson distribution with rate .(B )E c
Alternative methods for estimating the expected number of biallelic
genotypes
For Fig. S4, we used two alternative methods to the one described above for estimating the
expected number of biallelic genotypes. Firstly, we calculated the cumulative frequency based
on ExAC by summing the frequencies of individual variants of class c in gene g separately for
non-Finnish Europeans (NFE) for EABI and from South Asians (SAS) for PABI:
fc,g,ExAC = ∑
V
v=1qc,g,v
For this, we restricted to ExAC variants within the intersection of the V3 and V5 Agilent probes
used in DDD (including 100bp flanks on each side), and removed variants if they had >50%
missingness in the relevant ExAC population.
We also used a modified version of the method described by Jin et al. (3) for approximating the
expected number of biallelic genotypes by fitting a polynomial regression on the mutability:
(B ) m mO g = β0 + β1 g + β22g
+ ε
where is the observed number of biallelic genotypes for gene g, is the mutability for (B )O g mg
gene g calculated using the method of Samocha et al. (35 ) , , and are coefficients, and β0 β1 β2
is a random noise term. We fitted this model to the observed number of biallelic synonymousε
genotypes (MAF<0.01) for each gene to estimate , and , then calculated the expected β0 β1 β2
number of biallelic genotypes for each gene using the fitted value:
(B ) m mE g = β0 + β1 g + β22g
Note that we did not do the additional step described in Jin et al. in which they calculate the
expected number of biallelic genotypes for gene g as , since this (B )∑
genes
O g
β +β m +β m0 1 g 22g
+β m +β m∑
genes
β0 1 g 22g
constrains the total expected number of biallelic genotypes to be equal to the number observed,
making exome-wide burden analysis impossible.
Estimating the proportion of cases with diagnostic biallelic coding variants or
de novo mutations
We are interested in estimating the proportion of probands with diagnostic variants of ,πc
consequence class c. Under the null hypothesis in which none of the genotypes of class c are
pathogenic, the number of such genotypes we expect to see in probands is:Npr
NE(b )probands,c null = λpr,c pr
where is the total expected frequency of genotypes of class c across all genes.λpr,c = ∑G
g=1λg,c
However, under the alternative hypothesis, suppose that some fraction of genotypes in φcausal,c
class c cause DD, and some fraction are lethal. Assuming complete penetrance, we can φlethal,c
thus split into genotypes that are due to chance and those that are diagnostic:(b )Eprobands,c
+NE(b )probands.c alt = (1 )λ− φ
causal,c − φlethal,c pr,c pr Nπc pr
φ λcausal,c pr,c
1−e−φ λcausal,c pr,c
The component due to chance is , and is the average N(1 )λ− φcausal,c − φ
lethal,c pr,c pr φ λcausal,c pr,c
1−e−φ λcausal,c pr,c
number of pathogenic genotypes per individual, given that the individual has at least one such
genotype.
In healthy parents, biallelic genotypes of class c are all due to chance from the portion of c Npa
that is not pathogenic:
NE(b )parents.c alt = (1 )λ− φ
causal,c − φlethal,c pa,c pa
where is the expected rate of biallelic genotypes of class c in the parents, given the λpa,c
cumulative frequencies estimated in the same set of people, and the autozygosity rates. We can
thus obtain a maximum likelihood estimate for using , the observed φc = φcausal,c + φ
lethal,c Opa,c
number of biallelic genotypes of class c in the parents:
φc = 1 − Opa,c
λ Npa,c pa
The 95% confidence interval for is , . We show the φc 1( − λ Npa,c pa
O +1.96pa,c √Opa,c )λ Npa,c pa
O −1.96pa,c √Opa,c
estimates of from the EABI and PABI parents in Fig. S7. To estimate , we combined data φc πc
from both populations for MAF<0.01 variants to estimate , and obtained the following φc
maximum likelihood estimates and 95% confidence intervals: 0.141 (0.046,0.238), φLoF /LoF =
0.083 (-0.009,0.175), and 0.007 (-0.028,0.042).φLoF /miss = φ
miss/miss =
For biallelic genotypes, we can substitute into the expression above, substitute for φc Opr,c
and rearrange to obtain a maximum likelihood estimate of :(b ) Eprobands
πc
1 )) ≈( )πc = ( Opr,c
λ Npr,c pr
− ( − φc φcausal,c
1−e−φ λcausal,c pr,c Opr,c
λ Npr,c pr
− 1 + φc φ,c1−e−φ λ,c pr,c
We cannot disentangle and with the available data, but we find that the ratio φcausal,c φ
lethal,c
makes very little difference to the estimate of , so we make the assumption thatφlethal,c
φcausal,c πc
. The 95% confidence interval is thenφcausal,c = φc
.( ) , ) ][ λ Npr,c pr
O −1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c ( λ Npr,c pr
O +1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c
De novo mutations were called as previously described (4), selecting the threshold on ppDNM
(posterior probability of a de novo mutation) such that the observed number of synonymous de
novos matched the number expected. Using Sanger validation data from an earlier dataset (1),
we adjusted the observed number of mutations to account for specificity and sensitivity as
follows:
Ode novo, adjusted = 0.95β
O αde novo, raw
where is the positive predicted value (the proportion of candidate mutations that are true α
positive at the chosen threshold) and is the sensitivity to true positives at the same threshold. β
The adjustment by 0.95 is due to exome sequencing being only about 95% sensitive. The
overall de novo mutation rate was calculated in different sets of probands using the λde novo,c,pr
model from (35), adjusting for sex, as described previously (4) .
Since we cannot estimate for de novo mutations using the parents, as we did for recessive φc
variants, we instead set to 0.099, the fraction of genes with pLI>0.99. This estimate is φDN LoF
more speculative than the directly observed depletion of biallelic genotypes above, but we note
that the estimate of for the full set of 7,832 trios only increases from ~0.129 to ~0.154 πDN LoF
if we increase from 0.01 to 0.3. To estimate , we make use of this φDN LoF φ
DN missense
relationship:
≈λ φ N Pr(DDD|c)πc c c BI where is the population of the British Isles, and is the probability that an N
BI r(DDD|c)P
individual is recruited to the DDD given he/she has a pathogenic mutation of class c. If we
assume that this recruitment probability is the same for de novo missense mutations as for de
novo LoFs, we can write:
=πDN LoF
πDN missense
λ φDN LoF DN LoF
λ φDN missense DN missense
We know and , will assume so we can estimate , λDN missense λ
DN LoF .099φDN LoF = 0 π
DN LoF
and can thus write the number of de novo missense mutations we expect to see as:
+ ) NE(mpr,DN missense = (1 )λ− φ
DN missense pr,DN missense pr Npr
(φ λ ) πDN missense prDN missense
2DN LoF
(λ φ )(1−e )DN LoF DN LoF
−φ λDN missense pr,DN missense
We calculated for a range of values of and found that )E(mpr,DN missense φ
DN missense
best matched the observed data, so we used this value for estimating.036φDN missense = 0
.πDN missense
Structural analysis of EIF3F
This is shown in Fig. S11. Human EIF3:f (pdb 3j8c:f) was submitted to the Protein structure
comparison service PDBeFold at the European Bioinformatics Institute (36, 37 ). Of the close
structural matches returned, the X-ray yeast structure pdb entry 4OCN was chosen to display
the human variant position, as the structural resolution (2.25Å) was better than the human
EIF3:f pdb 3j8c:f structure (11.6Å) and it was the most complete structure among the yeast
models. In order to map the Phe232 variant onto the equivalent position on the yeast structure,
the structural alignment from PDBeFold was used. Solvent accessibility was calculated using
the Naccess software (38) using the standard parameters of a 1.4Å probe radius. Amino acid
sequence conservation was calculated using the Scorecons server (39) and displayed using
sequence logos ( 40).
Introducing the EIF3F Phe232Val variant with CRISPR/Cas9
The Phe232Val mutation in the human EIF3F gene was generated by a single base substitution
(T>G) using CRISPR/Cas9-induced homology directed repair in the kolf2_C1 human iPSC line,
a clonal derivative of the kolf2 line generated by the HipSci consortium (41 ). This was achieved
by nucleofection (Lonza, P3 buffer, program CA137) of 106 cells with Cas9-crRNA-tracrRNA
ribonucleoprotein (RNP) complexes. Synthetic RNA oligonucleotides (target site:
5’-AGGCGTGAACATCACTCCCA-3’, 225 pmol crRNA/tracrRNA, IDT) were annealed by
heating to 95°C for 2 min in duplex buffer (IDT) and cooling slowly, followed by addition of
122 pmol recombinant eSpCas9_1.1 protein (in 10 mM Tris-HCl, pH 7.4, 300 mM NaCl, 0.1
mM EDTA, 1 mM DTT, PMID: 26628643), and 500 pmol of a 100 nt ssDNA oligonucleotide
(5’-CTCCGATGCGTTCAGTGTCGTAGTACGCGTATTTCACTGTCAGAGGCGT
GACCATCACTCCCATGGTCCTCCCAGGGACTCCCATTAAAGTGCTGCGGAAGG-3’,
IDT Ultramer) as a homology-directed repair template to introduce the desired base change.
After recovery, plating at single cell density and colony picking into 96 well plates, 384 clones
were screened for heterozygous and homozygous mutations by high throughput sequencing of
amplicons spanning the target site using an Illumina MiSeq instrument.
Crude DNA lysates were prepared by incubation of cells in 100 l of yolk sac lysis buffer (10
mM Tris-HCl pH 8.3, 50 mM KCl, 2 mM MgCl2, 0.45% IGEPAL CA-630, 0.45% Tween 20,
400 g/ml proteinase K) at 60°C 1 h, 95C 10 min followed by dilution 10x in water. The region
surrounding the mutation was amplified by nested PCR using 1 l diluted lysate and KAPA HiFi
hotstart polymerase (Kapa Biosystems) by 35 cycles of {98°C 20 s, 65°C 15 s, 72°C 30 s},
using primers eIF3F_500_F and eIF3F_500_R (see below). Subsequently, reactions were
diluted 10x and re-amplified with primers eIF3F_200_F and eIF3F_200_R (see below) (Tm =
60°C) to ensure specificity for the EIF3F gene. Indexed Illumina sequencing adaptors were
added by a third round of PCR to identify the location of positive clones. Final cell lines were
further validated by Sanger sequencing. Since this region is highly similar to several other
regions in the genome, we analysed the final clones for off-target mutagenesis at the
homologous sites, and found no evidence of this at the two closest off-target sites on
chr2:58252105-58252196 and chr12:5032866-5032941 in any of the lines obtained.
Primer sequences: (5’-3’)
eIF3F_500_F: tctggggttcatttggtccc
eIF3F_500_R: ctgctcagtcacacttcctg
eIF3F_200_F: ACACTCTTTCCCTACACGACGCTCTTCCGATCTagacatggaaaccctctccc
eIF3F_200_R: TCGGCATTCCTGCTGAACCGCTCTTCCGATCTagtcccttttcaaaccaccc
Investigating EIF3F protein levels in iPSC lines containing the EIF3F variant
iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in
TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO 2:95% air.
At 75% confluence, iPS cells were incubated in Gentle Cell Dissociation Reagent (07174,
Stemcell Technologies) for 5 min, collected into a tube, centrifuged at 300 g for 3 min and
resuspended in ice-cold 20 mM Tris-HCL pH 7.4 containing protease inhibitor cocktail
(1861281, Thermo Scientific). Cells were agitated at 4oC for 30 min before passing through a 26
gauge needle several times and centrifuging at 13000 g for 15 min at 4oC. Protein in the
supernatant was quantified using the Qubit fluorometer (Invitrogen). We repeated this for two
independent cultures of iPSC lines. Samples were reduced (NP0009, Invitrogen) and mixed
with sample buffer (NP0007, Invitrogen) before an equal amount of total protein from each cell
line (6 µg for culture 1, 4 µg for culture 2) was electrophoresed through a 4-12% Bis-Tris gel
(NP0322, Invitrogen) and transferred onto a membrane (IPVH00010, Immobilon P) using the
X-Cell SureLock system, according to the manufacturer’s protocol (E19051, Invitrogen). Blots
were blocked in 5% non-fat milk (Marvel) in tris buffered saline containing 0.1% Tween 20
(TBST) for 1 h at room temperature. Primary antibodies were as follows: 1:500 anti-rabbit
EIF3F (638202, Biolegend) or 1:1000 beta III tubulin (EP1569Y; AB52623, Abcam), both
diluted in 5% non-fat milk in TBST and incubated overnight at 4oC. Rabbit-HRP-linked
secondary antibody (7074, Cell Signalling Technologies) was diluted 1:2000 in 5% non-fat milk
in TBST and incubated at room temperature for 2 h. Blots were incubated for 2 min with
Western Bright ECL Spray (Advansta) followed by detection with the ImageQuant LAS 4000
(GE Healthcare). Band intensities were measured using ImageJ 1.48v and EIF3F expression
values were normalised to beta-tubulin.
We jointly analysed the data from the two independent replicates, having normalised the
relative expression values by dividing by the mean of the wild-type lines for each replicate. We
tested for lower relative EIF3F expression in the homozygous cells compared to heterozygous
and wild-type cells combined using a one-sided t-test (Fig. S10).
Investigating protein synthesis in iPSC lines containing the EIF3F variant
Protein synthesis was assayed using the Click-iT L-homopropargylglycine (HPG) Alexa Fluor
488 Protein Synthesis Assay Kit (C10429, Molecular Probes). Briefly, in this assay, we took
iPSCs with or without the EIF3F variant and depleted them of methionine to stop translation,
then added a methionine analogue that contains an alkyne moiety which is incorporated into
nascent proteins. We then added Alexa Fluor 488 azide, which leads to a chemoselective "click"
reaction between the fluorescent azide and the alkyne, allowing the modified proteins to be
detected by image-based analysis as a proxy for the amount of protein synthesis.
We optimised the assay for use with iPSCs, using the translational elongation inhibitor
cycloheximide as a control to confirm the sensitivity and reproducibility of the assay (Fig. S12).
iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in
TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO2:95% air.
Cells were washed in PBS (D8537, Sigma), dissociated with Accutase (A11105-01, Gibco),
then 30 000 cells were seeded per well of a 96 well plate, in triplicate in TeSR-E8 containing 10
M ROCK inhibitor (Y-27632 dihydrochloride; Ab120129, Abcam). The following day, cells
were depleted of methionine for 1 h by incubation in methionine-free DMEM (21013-024,
Gibco), which was supplemented with insulin, transferrin and selenium (41400045, Gibco), 20
ng/ml human FGF-basic protein (100-18C, PeproTech), sodium pyruvate (11360-039, Gibco),
L-glutamine (25030-81, Gibco) and N-acetyl cysteine (A7250, Sigma), at 37℃ in 5%CO2:95%
air. The media was then changed to methionine-free DMEM plus supplements and 100 M
Click-iT HPG (L-homopropargylglycine; C10202, Molecular Probes), a methionine-analogue,
or vehicle (DMSO; D2650, Sigma), and cells were incubated for 2 h at 37℃ in 5%CO2:95%
air. As a control, cycloheximide was added to control wells at the same time, at different
concentrations (Fig. S12B).
At the end of the incubation, cells were washed briefly with PBS, trypsinised for 5 min
(25300-054, Gibco) to dissociate single cells, diluted in DMEM containing 20% serum to
quench the trypsin, transferred to a V-bottom 96-well plate and triplicate reactions were pooled.
Cells were then processed using standard immunocytochemical methods through a series of
incubations, centrifugations (all performed at 300 g for 5 min) and washes in PBS, all at room
temperature, in the following order: incubation in 3.7% formaldehyde in PBS (28906, Thermo
Scientific) for 15 min to fix the cells; incubation in 0.5% Triton X-100 (X100, Sigma) for 20
min to permeabilise the cells; incubation in 3% bovine serum albumin (A9543, Sigma) in PBS
for 10 min to block non-specific binding sites; incubation in 100 l Click-iT reaction cocktail,
which was prepared according to the manufacturer’s recommendation, for 1 h, followed by a
wash with Click-iT reaction rinse buffer, before resuspending in PBS and analysing
fluorescence at 488 nm on a BD LSRFortessa (BD Biosciences).
This experiment was repeated three times for each of the two independent cell lines, for each of
the three genotypes (wild-type, heterozygous, homozygous). Data were analysed using FlowJo
(v10.4.2). After gating on single cells, we obtained a bimodal distribution of Alexa Fluor
488-positive cells (Fig. S12A), due to the different rates of protein synthesis during the cell
cycle (G1/G2 have high rates of protein synthesis and S/M phase have lower rates). We gated
on the AlexaFlour 488-positive population (fluorescence > 104), which comprised ~80% of the
cells in the absence of cycloheximide (Fig. S12A). This proportion did not significantly depend
on genotype, replicate or cell line (ANOVA p>0.05). After gating, there were an average of 139
cells per replicate (standard deviation = 76), on which we determined the median fluorescence
intensity. We then tested for the effect of genotype on median fluorescence intensity using
linear regression, controlling for replicate. As a summary, we averaged the median fluorescence
for the two cell lines of the same genotype for each replicate, calculated the ratio of homozygote
to wild-type, and then averaged this across replicates.
Investigating proliferation and apoptosis in iPSC lines containing the EIF3F
variant
We investigated proliferation of iPS cell lines using two methods. In the first assay, 50,000
single cells were plated and four days later total cells were counted (Fig. 13A). In the second
assay, the membrane of single cells was stained with the fluorescent dye CellTrace Violet TM, the
fluorescence of which reduces with each cell division and was assessed using flow cytometry
(Fig. 3C; Fig. 13B).
For these assays, iPS cells were cultured on Vitronectin XF (07180, Stemcell
Technologies)-coated plates in TeSR-E8 medium (05990, Stemcell Technologies) and
incubated at 37oC in 5%CO2:95% air. At 65% confluence, iPS cells were washed with PBS
(D8537, Sigma), dissociated with Accutase (A11105-01, Gibco), resuspended in 10X volume of
TeSR-E8 medium (05990, Stemcell Technologies) containing 10 μM ROCK inhibitor (Y-27632
dihydrochloride; Ab120129, Abcam), passed through a 10 μm cell strainer (pluriStrainer,
pluriSelect; 43-10010-40) and centrifuged at 300 g for 3 min.
In the first assay, cells were resuspended in TeSR-E8 medium and 50,000 live cells (assessed
using Trypan blue exclusion; Invitrogen, 15250061) were plated per well of a 6-well plate
containing TeSR-E8 medium and 10 μM ROCK inhibitor. Cells were cultured for the next four
days in TeSR-E8 medium, before trypsinising for 5 min (25300-054, Gibco), and counting the
total number of live cells (Figure S13A).
In the second assay, cells were resuspended in PBS containing 4 μM CellTrace Violet(TM)
(Molecular Probes, C34557) and incubated at room temperature, in the dark for 20 min. Cells
were diluted in TeSR-E8 medium containing 10 μM ROCK inhibitor and centrifuged at 300 g
for 3 min, and then 40,000 cells were plated per well of a 48-well plate. Cells were cultured for
the next two days in TeSR-E8 medium, washed in PBS, dissociated to single cells with
Accutase, transferred to a V-bottom plate and centrifuged at 400 g for 4 min before being fixed
in 100 μl fixation buffer (eBioscience™ IC Fixation Buffer, 00-8222-49) and resuspended with
staining buffer (eBioscience™ Flow Cytometry Staining Buffer, 00-4222-26). Cells stained and
fixed at day zero were used as a ‘no division’ control. Note that Fig. 3B shows the distributions
from one representative replicate per genotype. The data were consistent across cell lines and
replicates (Fig. S13B) and clearly showed that the homozygous lines had a higher fraction of
cells that had only undergone one division than both the wild-type and heterozygous cells. To
test this formally, we fitted a linear regression of the number of cells in division 1 against the
genotype (0=wild-type or heterozygous; 1=homozygous) and the total number of cells.
We assessed apoptosis by staining with annexin V conjugated to a fluorescent dye and detected
dead cells with propidium iodide (PI). Briefly, phosphatidylserine is externalised in apoptotic
cells and is bound by recombinant annexin V-FITC (green fluorescence), while PI is excluded
by the membrane of live cells and only stains necrotic cells or those in late apoptosis with red
fluorescence. After incubation with both probes, early apoptotic cells show green fluorescence,
and late apoptotic cells show both red and green fluorescence. Necrotic cells show red only and
live cells show little or no fluorescence. We used a topoisomerase inhibitor (topotecan) as a
positive control in this assay. Cells were harvested at 65% confluence (including the positive
control pre-incubated for 2 h in 10 μM topotecan) by dissociation with Accutase, and processed
according to the manufacturer’s protocol (Dead Cell Apoptosis Kit; Invitrogen, V13242).
Flow cytometry data were acquired on an LSRFortessa (Becton Dickinson) instrument.
Doublets cells were excluded and CTV, Annexin V or PI fluorescence analyzed using FlowJo
10.4.2 software package (LCC). Fig. 3B and Fig. S13B summarise the results from the CTV
proliferation assay, and Fig. S14 shows the results from the apoptosis assay.
Validation of KDM5B variants by targeted re-sequencing
We re-sequenced all KDM5B de novo mutations and inherited LoF variants, with the exception
of two large deletions. PCR primers were designed using Primer3 to amplify the site of interest,
generating approximately a 230 bp product centred on the site. PCR amplification of the
targeted regions was carried out using JumpStart TM AccuTaqTM LA DNA Polymerase
(Sigma-Aldrich), using 40 ng of input DNA from the proband and their parents. Unique
identifying tag sequences were introduced into the PCR amplicons in a second round of PCR
using KAPA HiFi HotStart ReadyMixPCR Kit (KapaBiosystems). PCR amplicons were pooled
and 96 products were sequenced in one MiSeq lane using 250 bp paired-end reads. Reference
and alternate read counts extracted from the resulting bam files and were used determine the
presence of the variant in question. In addition, read data were visualised using IGV.
Transmission-disequilibrium test on KDM5B LoFs
We observed 15 trios in which one parent transmitted a LoF to the child, 5 trios in which one
parent had a LoF that was not transmitted, 2 quartets in which one parent had a LoF that was
transmitted to one out of two affected children, and 4 trios in which both parents transmitted a
LoF to the child. We tested for significant over-transmission using the
transmission-disequilibrium test as described by Knapp (42 ) . There were 7 LoFs (including one
large deletion) observed in probands whose parents were not originally sequenced, which we
excluded from the TDT. Of the six for which we attempted validation and segregation analysis,
one was found to be de novo and five inherited.
Searching for coding and regulatory modifiers of KDM5B
We defined a set of genes that might modify KDM5B function as: interactors of KDM5B
obtained from the STRING database of protein-protein interactions (43) (HIST2H3A , MYC,
TFAP2C , CDKN1A, TFAP2A, SETD1A, SETD1B, KDM1A, KDM2B, PAX9) plus those
mentioned by Klein et al. ( 44 ) (RBBP4, HDAC1, HDAC4 , MTA2 , CHD4 , FOXG1, FOXC9), as
well as all lysine demethylases, lysine methyltransferases, histone deacetylases, and SET
domain-containing genes from http://www.genecards.org/. The final list contained 95 genes. We
looked for LoF or rare missense variants in these genes in the monoallelic KDM5B LoF carriers
that might have a modifying effect, but found none that were shared by more than two of the de
novo carriers.
We also looked for indirect evidence of a regulatory “second hit” near KDM5B by examining
the haplotypes of common SNPs in the region (Fig. S15). DDD probands and a subset of their
parents were genotyped on either the Illumina OmniExpress chip or the Illumina CoreExome
chip. We performed variant and sample quality control for each dataset separately. Briefly, we
removed variants and samples with high data missingness (>=0.03), samples with high or low
heterozygosity, sample duplicates, individuals of African and East Asian ancestry, and SNPs
with MAF<0.005. We then ran SHAPEIT2 ( 45) to phase the SNPs within 2Mb either side of
KDM5B. To make Fig. S15, we used the heatmap() function in R to cluster the phased
haplotypes using the default hierarchical clustering method (based on Euclidean distance).
Analysis of DNA methylation in KDM5B probands
DNA from 64 DDD whole blood samples comprising 41 probands with a KDM5B variant and
23 negative controls was run on an Illumina EPIC 850K methylation array. Negative controls
were selected from DDD probands with de novo mutations in genes not expressed in whole
blood (SCN2A , KCNQ2, SLC6A1 , and FOXG1), since we would not expect these to
significantly impact the methylation phenotype in that tissue. Samples were randomised on the
array to reduce batch effects, and were QCed using a combination of data from control probes
and numbers of CpGs that failed to meet the standard detection p-value of 0.05. Based on these
criteria, two samples failed and were excluded from further analysis (one of the negative
controls and one of the inherited KDM5B LoF carriers).
We looked at methylation levels in the KDM5B LoF carriers to search for an “epimutation”
(hypermethylation on or around the promoter) that might be acting as second hit. We analyzed a
subset of CpGs in and around the KDM5B promoter region: the CpG island in the KDM5B
promoter itself, and a CpG island in the promoter of KDM5B-AS1, a lnc-RNA not specifically
associated with KDM5B, but also highly expressed in the testis. We also extended analysis 5kb
on either side of the start and stop sites of the KDM5B promoter. We examined the distribution
of the beta values (the ratio of methylated to unmethylated alleles) at each of the CpGs in the
10kb region (Fig. S16).
A final post-QC set of 754,273 methylation probes and samples were analysed for
whole-genome methylation changes using principal component analysis (Fig. S17). One of the
samples with a de novo was removed from PCA analysis as it was a significant outlier. PCA
was run on the post-QC set of beta values using the standard prcomp() functions in R (version
3.4.1).
Generation of the KDM5B knockout mice
A mouse Kdm5b loss-of-function allele (MGI:6153378) was generated by
CRISPR/CAS9-mediated deletion of coding exon 7 (ENSMUSE00001331577), leading to a
premature translational termination due to a downstream frameshift. CAS9 mRNA (46) and two
pairs of guide RNAs flanking exon 7 (targeting CCTAGTAACACTAGGTGTTAATA,
GTGTTTGGTTGTCAGTTAGAGGG, CCTCTCGTACATACATCCTAGGC,
CCTAGGCTCGAACTTCACCATGT) were injected into 1-cell stage C57BL/6NJ zygotes,
which were then implanted into host mothers. Mice born from these transfers were tested for
germ line transmission, and identified founders were bred on a C57BL/6NJ background to
establish colonies for further testing.
Breeding and housing of mice and all experimental procedures were assessed by the Animal
Welfare and Ethical Review Body of the Wellcome Sanger Institute and conducted under the
regulation of UK Home Office license (P6320B89B), and in accordance with institutional
guidelines. From in-crosses of heterozygous Kdm5b mutant mice, we recovered 350 pups at
weaning, of which 39 were homozygous (11.1% versus 25% expected). Kdm5b -/- pups were
raised and weaned together with Kdm5b+/+ pups. Mice were housed in mixed genotype cages
(2-5 mice) with food and water ad-libitum, under controlled temperature and humidity and a
12h light cycle (light on at 7am) at the Research Support Facility of the Wellcome Sanger
Institute.
Phenotyping of wild-type and KDM5B knockout mice
At 10 weeks of age, a cohort of 16 wild-type and 16 homozygous knockout male mice
underwent a series of behavioral tests. These were carried out between 9am – 5pm, after 1h of
habituation to the testing room. Experimenters were blind to genotype, mouse movements were
recorded with an overhead infrared video camera and tracked by automated video tracking
(Ethovision XT 11.5, Noldus Information Technology).
Light-dark box (Fig. 4C) : This test was adapted from Gapp et al. (47) . Mice were housed in
pairs for 20 minutes before introducing them individually into the light-dark box, a plastic box
(40 × 42 × 26 cm) divided in two compartments. One is smaller, closed and dark (1/3 of the
total surface area) and is connected through a door (5 cm) to a larger, brightly lit (370 lux with
an overhead lamp) compartment (2/3 of the box). Each mouse was placed in the dark
compartment, the door was then opened and the animal was left to explore for 10 min. The time
spent in the light compartment was tracked and the difference between genotypes was estimated
using a t-test.
Three-Chamber Sociability Assay (Fig. 4D): The protocol was adapted from Dias et al. (48 ).
In brief, the test arena was divided into 3 equally sized chambers (25 x 37.5 cm) connected by
small doors (5 x 5 cm). Mice were first individually habituated to the arena for 5 minutes, where
the central chamber was empty and each of the side chambers had an empty cylindrical corral (8
cm diameter, 15 cm tall, 1 cm gap between bars). The mouse was then returned to the central
chamber and doors were closed, while a stimulus mouse of a different strain (129Sv) was
introduced into one of the corrals. Doors were opened and the test mouse was allowed to
explore all three chambers: one side chamber with an empty corral and the other with a corral
containing a freely moving novel mouse, for 5 minutes. The time spent investigating both
corrals was recorded and measured as total investigation. The time spent investigating the
mouse was assessed as a percentage of the total investigation time, i.e. [time investigating corral
with mouse]/[time investigating corral with mouse + time investigating empty corral]. A t-test
was used to evaluate the difference between genotypes of the time spent investigating the
mouse.
Social Recognition Assay (Fig. 4E): This paradigm assesses memory and relies on the natural
preference of mice to investigate a novel mouse over a familiar one. The assay consists of two
days, as reported by Dias et al. ( 48). On both days, mice were habituated for 10 minutes to the
test box, an empty, clean cage. Briefly, on the first day, mice were tested on a habituation
paradigm to assess their olfaction and social investigation. An anesthetized stimulus mouse was
placed in the center of the arena for 1 minute, repeated 4 times at 10 minute intervals. On the
5th trial, a novel mouse was presented. Both Kdm5b wild-type and homozygous knockout mice
showed a similar decreasing investigation of the repeatedly-presented stimulus mouse, and an
increased investigation on trial 5 of the novel mouse. On day 2, after 24 hours, the
discrimination test was performed. Mice were simultaneously presented with the familiar mouse
from Day 1 and a completely novel mouse. The amount of time the test animal spent
investigating each stimulus mouse by close-proximity (sniffing, oronasal contact, or
approaching within 1–2 cm) was recorded. Using predefined exclusion criteria, one wild-type
mouse was excluded on day 1, and a second wild-type was excluded on day 2, both for
insufficient overall investigation. A two-way ANOVA was used to assess the discrimination
difference between genotypes, followed by Bonferroni's multiple comparisons test (two
comparisons, familiar versus unfamiliar mouse for wild type and mutant).
Barnes Maze probe trial (Fig. S19AB) : This assay is a test of visuo-spatial learning and
memory on a circular maze (120cm diameter table) with 20 holes around the perimeter. One of
the holes leads to a small dark box (Target) where they can escape from the brightly lit maze.
Mice were trained for three days, 10 trials (4 min maximum each), to find the target location.
On the probe trial, 72h after the last training day, the escape box was removed. Each mouse was
given 4 minutes to explore the maze. The mouse's movements were tracked, and the amount of
time spent around each of the holes was analyzed. Data is expressed as the percentage of time
spent dwelling around each hole, relative to the total amount of time spent investigating all
holes. With 20 holes to investigate, the amount of time spent by chance around each hole would
be 5%. The difference between genotypes in remembering the target location was evaluated
using a two-way ANOVA with Bonferroni's multiple comparisons test.
Statistical analysis for all mouse behavior experiments was performed with GraphPad Prism 7
(GraphPad Software).
X-rays (Fig. S19C): Seven wild-type and seven homozygous Kdm5b knockout mice were
anaesthetised with ketamine/xylazine (100mg/ 10mg per kg of body weight) and then placed in
an MX-20 X-ray machine (Faxitron X-Ray LLC). Whole body radiographs were taken in
dorso-ventral and lateral positions. Images were then analysed and morphological abnormalities
assessed using Sante DICOM Viewer v7.2.1 (Santesoft LTD).
Supplementary Figures
Fig. S1: Flow diagram indicating the final number of samples used in the various analyses, and
why samples were removed. Note that we use “diagnosed” and “undiagnosed” as shorthand in
the manuscript, but these terms are not fully accurate descriptors of these groups (see Methods
section on “Sample quality control and subsetting”). The “diagnosed” probands include only
those with diagnostic dominant or X-linked exonic variants. The “undiagnosed” set includes
193 patients who had diagnostic variants in recessive DDG2P genes or potentially diagnostic
variants in monoallelic or X-linked DDG2P genes but had high autozygosity or affected siblings
Fig. S2: Principal components analysis of the 1000 Genomes Phase 3 samples (left) with DDD
samples projected on top of them (right). The ellipses used to define the EABI and PABI
populations in DDD are shown on the PC2 versus PC3 plot.
Fig. S3: Distribution of the number of organ systems affected per proband (10 ). P-values are
from Wilcoxon rank-sum tests.
Fig. S4: Number of observed and expected biallelic synonymous genotypes per individual,
estimated across 5684 EABI and 356 PABI probands. The grey lines indicate the observed
number and the points with small black lines represent the expected number with a 95%
confidence interval. Two-sided p-values are indicated. The expected numbers were calculated in
different ways as described in Methods. Note that the estimate based on the DDD parental
haplotypes (black points) best matches the observed data, and that use of the ExAC frequencies
or the polynomial model of Jin et al. (3) gives estimates of the expected number that are
significantly different from the observed number.
Fig. S5: Histograms of levels of autozygosity across EABI and PABI probands.
Fig. S6. Burden of biallelic damaging genotypes in different gene sets, for undiagnosed EABI
and PABI probands combined (N=4,651). This includes biallelic LoFs, biallelic damaging
missense and compound heterozygous LoF/damaging missense. The lines show 95%
confidence intervals and the p-values are from a one-sided Poisson test. Note the strong
enrichment in known recessive DD genes and genes predicted to be intolerant of biallelic LoFs
based on the pRec score (22), and lack of enrichment in known dominant DD genes and genes
predicted to be intolerant of monoallelic LoFs (pLI>0.9).
Fig. S7: Estimates of , the proportion of biallelic genotypes that are causal for DD or lethal. φ
These were estimated from the parental data. See Methods for details. The points show
maximum likelihood estimates and the lines show 95% confidence intervals. Points at the same
MAF cutoff have been slightly scattered along the x-axis for ease of visualisation.
Fig. S8: Distribution of the ranks of minimum p-values per gene for known recessive genes
versus all other genes. The order of genes with the same minimum p-value was randomised. A
Kolmogorov-Smirnov (KS) test indicated that these distributions were significantly different.
This, combined with the fact that fourteen of the sixteen genes with p<10-4 are known recessive
DD genes, suggests our approach for identifying these genes is valid and Bonferroni correction
is conservative.
Fig. S9: Anterior-posterior facial photographs of selected individuals with the homozygous
Phe232Val variant in EIF3F. DECIPHER IDs are shown in the top right corner. Affected
individuals did not have a distinctive facial appearance. Individual 265452 (leftmost) had
muscle atrophy, as demonstrated in photographs of the anterior surface of the hands which show
wasting of the thenar and hypothenar eminences. This is notable because it is only reported in
one other proband in the DDD study and, in mice, Eif3f has been shown to play a role in
regulating skeletal muscle size via interaction with the mTOR pathway (49 ). None of the other
individuals were either assessed to have or previously recorded to have muscle atrophy.
Fig. S10. Results from Western blot showing lower levels of the EIF3F protein in iPSCs
homozygous for the Phe232Val variant. WT: wild-type, HET and HOM: heterozygous and
homozygous for Phe232Val. Note that two independent cell lines were used for each genotype,
and that the two culture represent independent replicates. Beta-tubulin was used as a
normalising control. Band intensities were quantified using ImageJ. We jointly analysed the
data from both independent cultures, having normalised the relative expression values by
dividing by the mean of the wild-type lines for each replicate. There was significantly lower
relative EIF3F expression in the homozygous cells compared to heterozygous and wild-type
cells combined (one-sided t-test; p=0.003), with a mean reduction of 26.6%.
Fig. S11: Predicted structural effects of the pathogenic Phe232Val variant in EIF3F. The
secondary structure, domain architecture and 3D fold of EIF3F is conserved between species
but sequence similarity is low (29% between yeast and humans). A) Section of the amino acid
sequence logo for EIF3F where the strength of conservation across species is indicated by the
size of the letters. The sequence below represents the human EIF3F. Boxed characters are the
aromatic residues conserved between humans and yeast and proximal in space to Phe232. B)
Structure of the section of EIF3F containing the Phe232Val variant, highlighted in green.
Amino acids conserved between yeast and human sequences as highlighted in panel A are
shown in grey. The conserved Phe232 side chain is buried (solvent accessibility 0.7%) and
likely plays a stabilizing role, so the Phe232Val variant may disrupt protein stability. See (10 )
for details of structure prediction.
Fig. S12: Optimising the sensitivity of the Alexa Fluor 488 Protein Synthesis Assay Kit for
assessing protein synthesis in iPSCs with or without the EIF3F Phe232Val variant. A)
Representative histogram of fluorescence intensities (AlexaFluor-488), a proxy for the rate of
nascent protein synthesis, across multiple wild-type iPSCs measured by flow cytometry. This
plot shows a single replicate. We gated the major population (blue rectangle; i.e. fluorescence >
104), and used its median (red dotted line) in downstream analysis. B) Dose-response curve for
iPSCs treated with cycloheximide. Data are represented as mean and standard deviation of
median fluorescence intensity of the major Alexa-Fluor 488-positive population across four
independent replicates, three of which were run in parallel with experiments presented in Figure
3C. Note that, for the leftmost point, the error bars are smaller than the dot and thus not visible.
Fig. S13. The EIF3F Phe232Val variant reduces proliferation of iPSCs in the homozygous but
not heterozygous state. (A) Mean cell counts four days after plating 50,000 cells per line, with
95% confidence intervals, from three independent replicates. (B) Proportions of cells that have
undergone zero, one or multiple divisions in a cell trace violet (CTV) proliferation assay. There
are four replicates for each of the two independent cell lines for each genotype. The
homozygous line had significantly more cells in division 1 than wild-type and heterozygous
lines (p=2x10-5; linear regression including total number of cells as covariate). See Fig. 3C for
representative distributions of CTV intensity, one for each genotype.
Fig. S14: No difference in cell apoptosis of iPSCs containing the EIF3F variant versus
wild-type cells. Graph shows mean and standard deviation of the percentage of live, apoptotic
or necrotic cells, assayed using annexin V or propidium iodide staining (see (10)). Data derived
from three replicates.
Fig. S15: Plot showing haplotypes of common SNPs around KDM5B in individuals with de
novo missense or LoF mutations or with monoallelic or biallelic LoFs. These is no evidence for
a local haplotype shared by multiple probands with monoallelic LoFs that was not also present
in an unaffected parent with a monoallelic LoF. The region shown lies between two
recombination hotspots. The rows represent phased haplotypes, with orange and green
rectangles corresponding to the different alleles at the SNPs at the positions indicated along the
bottom. Hierarchical clustering has been applied to the haplotypes, as indicated by the
dendrogram on the left, and the labels on the right indicate which individual carries the
haplotype, and whether the individual was a proband carrying a de novo (purple), a biallelic
LoF (dark green), or an inherited heterozygous LoF (yellow), or a parent carrying a
heterozygous LoF (pink).
Fig. S16: Violin plots of the beta values (the ratio of methylated to unmethylated alleles) at each
of the CpGs in the 10kb region around the KDM5B promoter. The CpGs within the KDM5B and
KDM5B-AS1 promoters are annotated below the plot, with coordinates relative to hg19. The
bottom panel shows the negative controls (probands with likely causal de novo mutations in
known DD genes not expressed in blood), and the other panels show probands with variants in
KDM5B that are either biallelic (top panel), de novo (second panel) or monoallelic and inherited
(third panel).
Fig. S17: Results from a principal components analysis of genome-wide DNA methylation in
probands with de novo mutations or biallelic or inherited monoallelic LoFs in KDM5B. The lack
of differences between groups suggests that LoFs in KDM5B do not manifest in DNA
methylation changes, at least not in whole blood of children in this age range (age 18 months to
16 years).
Fig. S18: Anterior-posterior facial photographs of one of the individuals with biallelic KDM5B
variants. Note the narrow palpebral fissures, arched or thick eyebrows, dark eyelashes, low
hanging columella, smooth philtrum and thin upper vermillion border.
Fig. S19: Cognitive and skeletal defects of homozygous Kdm5b knockout mice. A) Impaired
long-term spatial memory in the Probe trial of Barnes Maze assay. 72 hours after learning the
location of the target hole, wild-type mice spent significantly more time around the target hole
compared to knockout mice. Two-way ANOVA, p=0.026 for hole ✕ genotype interaction; with
Bonferroni’s multiple comparison test, p<0.001 for Target hole. B) Representative heatmap of
wild-type mouse activity around the Barnes Maze during the 72 hour probe trial. C) Skeletal
defects identified from X-ray analysis. All homozygous Kdm5b knockout mice analysed (n=7)
had transitional vertebrae (one-tailed Fisher's exact test p=0.03) compared to none of the
wild-types (n=7). This is an X-ray image of the dorsal view of wild-type and homozygous
Kdm5b knockout mice. This case shows a transitional Lumbar (L1) to Sacral (S1) vertebra,
resulting in an extra Sacral vertebra (arrow), and the absence of one Thoracic vertebra (T13,
arrowhead). Green dots indicate Thoracic vertebra, blue are Lumbar, red are Sacral and brown
are Caudal vertebra.
Fig. S20: Plots showing effect of variant filtering strategies on number of variants, Mendelian
errors and Ti/Tv. We first set genotypes to missing based on genotype quality (GQ), depth (AD)
and the p-value from a test of allele balance (pAB), and then removed sites according to the
proportion of missing genotypes.
Supplementary Tables
Table S1: Summary of the phenotypes in the study, for different subsets of probands. We show
the proportion of patients with each higher-level phenotype (based on HPO terms), and results
from Fisher’s exact tests for a difference in these proportions between different subsets of
probands. We also show summary statistics for the number of HPO terms, various
developmental milestones, and results from Mann-Whitney U tests comparing these quantitative
metrics between different subsets of probands. We also summarise the number of probands who
achieved (or who had not yet achieved) the milestone at an age ≤2SD, 2SD-4SD, 4SD-8SD and
>8SD from the mean for normal individuals. For these, the distributions for normal children
were extracted from (50) for age at first words and (51 ) for age at walking. % Within the ID/DD
category, the classes are mutually exclusive of one another and based on HPO terms that fall
under HP:0001249 (Intellectual Disability) or HP:0001263 (Global Developmental Delay). For
creating this combined category, we took the most severe or (if unclear) more specific term (e.g.
mild ID and severe/profound DD → severe/profound ID/DD; mild DD and DD of unknown
severity → mild ID/DD). & Several cognition-related terms do not come under HP:0001249
(Intellectual Disability) or HP:0001263 (Global Developmental Delay) (i.e. HP:0100543
(Cognitive impairment), HP:0001328 (Specific learning disability) and HP:0000750 (Delayed
speech and language development)), so we included these here. * For the mean, median,
standard deviation and Mann-Whitney U tests of the developmental milestones, if probands had
not yet achieved the milestone, we set them to missing if they were under age 18 months and to
their age at assessment if over 18 months. + The organ system counts were carried out as
described in the Methods, in order to avoid double-counting the same HPO term that occurred
under multiple organ systems.
Table S2: Estimates of the , the proportion of probands explained by diagnostic biallelic π
coding genotypes or de novo coding mutations, for different sample sets. Shown are the
maximum likelihood estimates for , and a 95% confidence interval. See Methods for how π
these were calculated.
Table S3: Results from tests of an excess of damaging biallelic genotypes for all genes. The
lowest p-value out of the eight tests conducted for each gene is shown. We give results for the
stringent ancestry filter (4318 EABI probands and 333 PABI probands), as shown in Table 1, as
well as the lenient ancestry filter (4942 European ancestry probands and 498 South Asian
ancestry probands).
Table S4: Genes enriched for damaging biallelic coding genotypes with p<1×10-4 (for eight
tests; see ( 10)). Shown are the number of observed biallelic genotypes of different consequence
classes, the lowest p-value out of the eight tests (achieved using EABI alone for all genes except
for VPS13B), details of the corresponding test, and the p-value for phenotypic similarity for the
relevant probands (9). Known recessive DD genes from the DDG2P list are indicated
(http://www.ebi.ac.uk/gene2phenotype/). LZTR1 has been implicated as a dominant cause of
Noonan syndrome ( 52) and there is some prior evidence supporting LINS as a recessive ID gene
(53, 54).
Table S5: Phenotypes of the nine patients homozygous for the EIF3F Phe232Val variant. Only
five were used in the initial gene discovery analysis, which excluded one sibling from each of
families 2 and 3, 260950 (who had a potentially diagnostic inherited X-linked variant in
HUWE1 , subsequently deemed to be benign since it did not segregate with disease in his
family), and 265452 (who had no parental genetic data available).
Table S6: De novo mutations in KDM5B from this and previous studies, and inherited LoFs in
KDM5B from this study. Note that, of the 26 probands with inherited LoFs, five of them were
reported to have a parent who had at least one clinical phenotype shared with the child, but for
only two families was this the parent who carried the LoF.
Table S7: Phenotypes of the probands with biallelic or de novo KDM5B variants.
References and Notes 1. The Deciphering Developmental Disorders Study, Large-scale discovery of novel genetic
causes of developmental disorders. Nature 519, 223–228 (2015). doi:10.1038/nature14135 Medline
2. R. K. C Yuen, D. Merico, M. Bookman, J. L Howe, B. Thiruvahindrapuram, R. V. Patel, J. Whitney, N. Deflaux, J. Bingham, Z. Wang, G. Pellecchia, J. A. Buchanan, S. Walker, C. R. Marshall, M. Uddin, M. Zarrei, E. Deneault, L. D’Abate, A. J. S. Chan, S. Koyanagi, T. Paton, S. L. Pereira, N. Hoang, W. Engchuan, E. J. Higginbotham, K. Ho, S. Lamoureux, W. Li, J. R. MacDonald, T. Nalpathamkalam, W. W. L. Sung, F. J. Tsoi, J. Wei, L. Xu, A.-M. Tasse, E. Kirby, W. Van Etten, S. Twigger, W. Roberts, I. Drmic, S. Jilderda, B. M. K. Modi, B. Kellam, M. Szego, C. Cytrynbaum, R. Weksberg, L. Zwaigenbaum, M. Woodbury-Smith, J. Brian, L. Senman, A. Iaboni, K. Doyle-Thomas, A. Thompson, C. Chrysler, J. Leef, T. Savion-Lemieux, I. M. Smith, X. Liu, R. Nicolson, V. Seifer, A. Fedele, E. H. Cook, S. Dager, A. Estes, L. Gallagher, B. A. Malow, J. R. Parr, S. J. Spence, J. Vorstman, B. J. Frey, J. T. Robinson, L. J. Strug, B. A. Fernandez, M. Elsabbagh, M. T. Carter, J. Hallmayer, B. M. Knoppers, E. Anagnostou, P. Szatmari, R. H. Ring, D. Glazer, M. T. Pletcher, S. W. Scherer, Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017). doi:10.1038/nn.4524 Medline
3. S. C. Jin, J. Homsy, S. Zaidi, Q. Lu, S. Morton, S. R. DePalma, X. Zeng, H. Qi, W. Chang, M. C. Sierant, W.-C. Hung, S. Haider, J. Zhang, J. Knight, R. D. Bjornson, C. Castaldi, I. R. Tikhonoa, K. Bilguvar, S. M. Mane, S. J. Sanders, S. Mital, M. W. Russell, J. W. Gaynor, J. Deanfield, A. Giardini, G. A. Porter Jr., D. Srivastava, C. W. Lo, Y. Shen, W. S. Watkins, M. Yandell, H. J. Yost, M. Tristani-Firouzi, J. W. Newburger, A. E. Roberts, R. Kim, H. Zhao, J. R. Kaltman, E. Goldmuntz, W. K. Chung, J. G. Seidman, B. D. Gelb, C. E. Seidman, R. P. Lifton, M. Brueckner, Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017). doi:10.1038/ng.3970 Medline
4. Deciphering Developmental Disorders Study, Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017). doi:10.1038/nature21062 Medline
5. H. H. Ropers, Genetics of early onset cognitive impairment. Annu. Rev. Genomics Hum. Genet. 11, 161–187 (2010). doi:10.1146/annurev-genom-082509-141640 Medline
6. L. E. L. M. Vissers, C. Gilissen, J. A. Veltman, Genetic studies in intellectual disability and related disorders. Nat. Rev. Genet. 17, 9–18 (2016). doi:10.1038/nrg3999 Medline
7. P. A. Baird, T. W. Anderson, H. B. Newcombe, R. B. Lowry, Genetic disorders in children and young adults: A population study. Am. J. Hum. Genet. 42, 677–693 (1988). Medline
8. S. J. Schrodi, A. DeBarber, M. He, Z. Ye, P. Peissig, J. J. Van Wormer, R. Haws, M. H. Brilliant, R. D. Steiner, Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data. Hum. Genet. 134, 659–669 (2015). doi:10.1007/s00439-015-1551-8 Medline
9. N. Akawi, J. McRae, M. Ansari, M. Balasubramanian, M. Blyth, A. F. Brady, S. Clayton, T. Cole, C. Deshpande, T. W. Fitzgerald, N. Foulds, R. Francis, G. Gabriel, S. S. Gerety, J. Goodship, E. Hobson, W. D. Jones, S. Joss, D. King, N. Klena, A. Kumar, M. Lees, C. Lelliott, J. Lord, D. McMullan, M. O’Regan, D. Osio, V. Piombo, E. Prigmore, D. Rajan, E. Rosser, A. Sifrim, A. Smith, G. J. Swaminathan, P. Turnpenny, J. Whitworth, C. F. Wright, H. V. Firth, J. C. Barrett, C. W. Lo, D. R. FitzPatrick, M. E. Hurles; DDD study, Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat. Genet. 47, 1363–1369 (2015). doi:10.1038/ng.3410 Medline
10. Materials and methods are available as supplementary materials. 11. C. F. Wright, T. W. Fitzgerald, W. D. Jones, S. Clayton, J. F. McRae, M. van Kogelenberg,
D. A. King, K. Ambridge, D. M. Barrett, T. Bayzetinova, A. P. Bevan, E. Bragin, E. A. Chatzimichali, S. Gribble, P. Jones, N. Krishnappa, L. E. Mason, R. Miller, K. I. Morley, V. Parthiban, E. Prigmore, D. Rajan, A. Sifrim, G. J. Swaminathan, A. R. Tivey, A. Middleton, M. Parker, N. P. Carter, J. C. Barrett, M. E. Hurles, D. R. FitzPatrick, H. V. Firth; DDD study, Genetic diagnosis of developmental disorders in the DDD study: A scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). doi:10.1016/S0140-6736(14)61705-0 Medline
12. A. S. Teebi, Autosomal recessive disorders among Arabs: An overview from Kuwait. J. Med. Genet. 31, 224–233 (1994). doi:10.1136/jmg.31.3.224 Medline
13. G. O. Tadmouri, P. Nair, T. Obeid, M. T. Al Ali, N. Al Khaja, H. A. Hamamy, Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009). doi:10.1186/1742-4755-6-17 Medline
14. C. Gilissen, J. Y. Hehir-Kwa, D. T. Thung, M. van de Vorst, B. W. M. van Bon, M. H. Willemsen, M. Kwint, I. M. Janssen, A. Hoischen, A. Schenck, R. Leach, R. Klein, R. Tearle, T. Bo, R. Pfundt, H. G. Yntema, B. B. A. de Vries, T. Kleefstra, H. G. Brunner, L. E. L. M. Vissers, J. A. Veltman, Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344–347 (2014). doi:10.1038/nature13394 Medline
15. C. L. Beaulieu, L. Huang, A. M. Innes, M.-A. Akimenko, E. G. Puffenberger, C. Schwartz, P. Jerry, C. Ober, R. A. Hegele, D. R. McLeod, J. Schwartzentruber, J. Majewski, D. E. Bulman, J. S. Parboosingh, K. M. Boycott; FORGE Canada Consortium, Intellectual disability associated with a homozygous missense mutation in THOC6. Orphanet J. Rare Dis. 8, 62 (2013). doi:10.1186/1750-1172-8-62 Medline
16. A. Fogli, O. Boespflug-Tanguy, The large spectrum of eIF2B-related diseases. Biochem. Soc. Trans. 34, 22–29 (2006). doi:10.1042/BST0340022 Medline
17. V. Faundes, W. G. Newman, L. Bernardini, N. Canham, J. Clayton-Smith, B. Dallapiccola, S. J. Davies, M. K. Demos, A. Goldman, H. Gill, R. Horton, B. Kerr, D. Kumar, A. Lehman, S. McKee, J. Morton, M. J. Parker, J. Rankin, L. Robertson, I. K. Temple, S. Banka; Clinical Assessment of the Utility of Sequencing and Evaluation as a Service (CAUSES) Study; Deciphering Developmental Disorders (DDD) Study, Histone Lysine
Methylases and Demethylases in the Landscape of Human Developmental Disorders. Am. J. Hum. Genet. 102, 175–187 (2018). doi:10.1016/j.ajhg.2017.11.013 Medline
18. I. Iossifov, B. J. O’Roak, S. J. Sanders, M. Ronemus, N. Krumm, D. Levy, H. A. Stessman, K. T. Witherspoon, L. Vives, K. E. Patterson, J. D. Smith, B. Paeper, D. A. Nickerson, J. Dea, S. Dong, L. E. Gonzalez, J. D. Mandell, S. M. Mane, M. T. Murtha, C. A. Sullivan, M. F. Walker, Z. Waqar, L. Wei, A. J. Willsey, B. Yamrom, Y. H. Lee, E. Grabowska, E. Dalkic, Z. Wang, S. Marks, P. Andrews, A. Leotta, J. Kendall, I. Hakker, J. Rosenbaum, B. Ma, L. Rodgers, J. Troge, G. Narzisi, S. Yoon, M. C. Schatz, K. Ye, W. R. McCombie, J. Shendure, E. E. Eichler, M. W. State, M. Wigler, The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014). doi:10.1038/nature13908 Medline
19. G. L. Carvill, H. C. Mefford, Microdeletion syndromes. Curr. Opin. Genet. Dev. 23, 232–239 (2013). doi:10.1016/j.gde.2013.03.004 Medline
20. H. H. Ropers, T. Wienker, Penetrance of pathogenic mutations in haploinsufficient genes for intellectual disability and related disorders. Eur. J. Med. Genet. 58, 715–718 (2015). doi:10.1016/j.ejmg.2015.10.007 Medline
21. C. N. Vallianatos, S. Iwase, Disrupted intricacy of histone H3K4 methylation in neurodevelopmental disorders. Epigenomics 7, 503–519 (2015). doi:10.2217/epi.15.1 Medline
22. M. Lek, K. J. Karczewski, E. V. Minikel, K. E. Samocha, E. Banks, T. Fennell, A. H. O’Donnell-Luria, J. S. Ware, A. J. Hill, B. B. Cummings, T. Tukiainen, D. P. Birnbaum, J. A. Kosmicki, L. E. Duncan, K. Estrada, F. Zhao, J. Zou, E. Pierce-Hoffman, J. Berghout, D. N. Cooper, N. Deflaux, M. DePristo, R. Do, J. Flannick, M. Fromer, L. Gauthier, J. Goldstein, N. Gupta, D. Howrigan, A. Kiezun, M. I. Kurki, A. L. Moonshine, P. Natarajan, L. Orozco, G. M. Peloso, R. Poplin, M. A. Rivas, V. Ruano-Rubio, S. A. Rose, D. M. Ruderfer, K. Shakir, P. D. Stenson, C. Stevens, B. P. Thomas, G. Tiao, M. T. Tusie-Luna, B. Weisburd, H.-H. Won, D. Yu, D. M. Altshuler, D. Ardissino, M. Boehnke, J. Danesh, S. Donnelly, R. Elosua, J. C. Florez, S. B. Gabriel, G. Getz, S. J. Glatt, C. M. Hultman, S. Kathiresan, M. Laakso, S. McCarroll, M. I. McCarthy, D. McGovern, R. McPherson, B. M. Neale, A. Palotie, S. M. Purcell, D. Saleheen, J. M. Scharf, P. Sklar, P. F. Sullivan, J. Tuomilehto, M. T. Tsuang, H. C. Watkins, J. G. Wilson, M. J. Daly, D. G. MacArthur; Exome Aggregation Consortium, Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). doi:10.1038/nature19057 Medline
23. J. X. Chong, M. J. McMillin, K. M. Shively, A. E. Beck, C. T. Marvin, J. R. Armenteros, K. J. Buckingham, N. T. Nkinsi, E. A. Boyle, M. N. Berry, M. Bocian, N. Foulds, M. L. G. Uzielli, C. Haldeman-Englert, R. C. M. Hennekam, P. Kaplan, A. D. Kline, C. L. Mercer, M. J. M. Nowaczyk, J. S. Klein Wassink-Ruiter, E. W. McPherson, R. A. Moreno, A. E. Scheuerle, V. Shashi, C. A. Stevens, J. C. Carey, A. Monteil, P. Lory, H. K. Tabor, J. D. Smith, J. Shendure, D. A. Nickerson, M. J. Bamshad, M. J. Bamshad, J. Shendure, D. A. Nickerson, G. R. Abecasis, P. Anderson, E. M. Blue, M. Annable, B. L. Browning, K. J. Buckingham, C. Chen, J. Chin, J. X. Chong, G. M. Cooper, C. P. Davis, C. Frazar, T. M.
Harrell, Z. He, P. Jain, G. P. Jarvik, G. Jimenez, E. Johanson, G. Jun, M. Kircher, T. Kolar, S. A. Krauter, N. Krumm, S. M. Leal, D. Luksic, C. T. Marvin, M. J. McMillin, S. McGee, P. O’Reilly, B. Paeper, K. Patterson, M. Perez, S. W. Phillips, J. Pijoan, C. Poel, F. Reinier, P. D. Robertson, R. Santos-Cortez, T. Shaffer, C. Shephard, K. M. Shively, D. L. Siegel, J. D. Smith, J. C. Staples, H. K. Tabor, M. Tackett, J. G. Underwood, M. Wegener, G. Wang, M. M. Wheeler, Q. Yi; University of Washington Center for Mendelian Genomics, De novo mutations in NALCN cause a syndrome characterized by congenital contractures of the limbs and face, hypotonia, and developmental delay. Am. J. Hum. Genet. 96, 462–473 (2015). doi:10.1016/j.ajhg.2015.01.003 Medline
24. M. D. Al-Sayed, H. Al-Zaidan, A. Albakheet, H. Hakami, R. Kenana, Y. Al-Yafee, M. Al-Dosary, A. Qari, T. Al-Sheddi, M. Al-Muheiza, W. Al-Qubbaj, Y. Lakmache, H. Al-Hindi, M. Ghaziuddin, D. Colak, N. Kaya, Mutations in NALCN cause an autosomal-recessive syndrome with severe hypotonia, speech impairment, and cognitive delay. Am. J. Hum. Genet. 93, 721–726 (2013). doi:10.1016/j.ajhg.2013.08.001 Medline
25. J. Rainger, D. Pehlivan, S. Johansson, H. Bengani, L. Sanchez-Pulido, K. A. Williamson, M. Ture, H. Barker, K. Rosendahl, J. Spranger, D. Horn, A. Meynert, J. A. B. Floyd, T. Prescott, C. A. Anderson, J. K. Rainger, E. Karaca, C. Gonzaga-Jauregui, S. Jhangiani, D. M. Muzny, A. Seawright, D. C. Soares, M. Kharbanda, V. Murday, A. Finch, R. A. Gibbs, V. van Heyningen, M. S. Taylor, T. Yakut, P. M. Knappskog, M. E. Hurles, C. P. Ponting, J. R. Lupski, G. Houge, D. R. FitzPatrick, M. Hurles, D. R. FitzPatrick, S. Al-Turki, C. Anderson, I. Barroso, P. Beales, J. Bentham, S. Bhattacharya, K. Carss, K. Chatterjee, S. Cirak, C. Cosgrove, A. Daly, J. Floyd, C. Franklin, M. Futema, S. Humphries, S. McCarthy, H. Mitchison, F. Muntoni, A. Onoufriadis, V. Parker, F. Payne, V. Plagnol, L. Raymond, D. Savage, P. Scambler, M. Schmidts, R. Semple, E. Serra, J. Stalker, M. van Kogelenberg, P. Vijayarangakannan, K. Walter, G. Wood; UK10K; Baylor-Hopkins Center for Mendelian Genomics, Monoallelic and biallelic mutations in MAB21L2 cause a spectrum of major eye malformations. Am. J. Hum. Genet. 94, 915–923 (2014). doi:10.1016/j.ajhg.2014.05.005 Medline
26. M. Albert, S. U. Schmitz, S. M. Kooistra, M. Malatesta, C. Morales Torres, J. C. Rekling, J. V. Johansen, I. Abarrategui, K. Helin, The histone demethylase Jarid1b ensures faithful mouse development by protecting developmental genes from aberrant H3K4me3. PLOS Genet. 9, e1003461 (2013). doi:10.1371/journal.pgen.1003461 Medline
27. S. Köhler, M. H. Schulz, P. Krawitz, S. Bauer, S. Dölken, C. E. Ott, C. Mundlos, D. Horn, S. Mundlos, P. N. Robinson, Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464 (2009). doi:10.1016/j.ajhg.2009.09.003 Medline
28. E. Bragin, E. A. Chatzimichali, C. F. Wright, M. E. Hurles, H. V. Firth, A. P. Bevan, G. J. Swaminathan, DECIPHER: Database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42 (D1), D993–D1000 (2014). doi:10.1093/nar/gkt937 Medline
29. E. Sheridan, J. Wright, N. Small, P. C. Corry, S. Oddie, C. Whibley, E. S. Petherick, T. Malik, N. Pawson, P. A. McKinney, R. C. Parslow, Risk factors for congenital anomaly
in a multiethnic birth cohort: An analysis of the Born in Bradford study. Lancet 382, 1350–1359 (2013). doi:10.1016/S0140-6736(13)61132-0 Medline
30. W. McLaren, L. Gil, S. E. Hunt, H. S. Riat, G. R. S. Ritchie, A. Thormann, P. Flicek, F. Cunningham, The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). doi:10.1186/s13059-016-0974-4 Medline
31. 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015). doi:10.1038/nature15393 Medline
32. A. L. Price, N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, D. Reich, Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). doi:10.1038/ng1847 Medline
33. V. Narasimhan, P. Danecek, A. Scally, Y. Xue, C. Tyler-Smith, R. Durbin, BCFtools/RoH: A hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32, 1749–1751 (2016). doi:10.1093/bioinformatics/btw044 Medline
34. M. P. Conomos, A. P. Reiner, B. S. Weir, T. A. Thornton, Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 98, 127–148 (2016). doi:10.1016/j.ajhg.2015.11.022 Medline
35. K. E. Samocha, E. B. Robinson, S. J. Sanders, C. Stevens, A. Sabo, L. M. McGrath, J. A. Kosmicki, K. Rehnström, S. Mallick, A. Kirby, D. P. Wall, D. G. MacArthur, S. B. Gabriel, M. DePristo, S. M. Purcell, A. Palotie, E. Boerwinkle, J. D. Buxbaum, E. H. Cook Jr., R. A. Gibbs, G. D. Schellenberg, J. S. Sutcliffe, B. Devlin, K. Roeder, B. M. Neale, M. J. Daly, A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). doi:10.1038/ng.3050 Medline
36. E. Krissinel, K. Henrick, Multiple alignment of protein structures in three dimensions. CompLife 3695, 67–78 (2005).
37. E. Krissinel, K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004). doi:10.1107/S0907444904026460 Medline
38. S. J. Hubbard, J. M. Thornton, NACCESS-Computer Program. 1993. Department of Biochemistry and Molecular Biology, University College London.
39. W. S. J. Valdar, Scoring residue conservation. Proteins 48, 227–241 (2002). doi:10.1002/prot.10146 Medline
40. G. E. Crooks, G. Hon, J.-M. Chandonia, S. E. Brenner, WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). doi:10.1101/gr.849004 Medline
41. H. Kilpinen, A. Goncalves, A. Leha, V. Afzal, K. Alasoo, S. Ashford, S. Bala, D. Bensaddek, F. P. Casale, O. J. Culley, P. Danecek, A. Faulconbridge, P. W. Harrison, A. Kathuria, D. McCarthy, S. A. McCarthy, R. Meleckyte, Y. Memari, N. Moens, F. Soares, A. Mann, I. Streeter, C. A. Agu, A. Alderton, R. Nelson, S. Harper, M. Patel, A. White, S. R. Patel, L. Clarke, R. Halai, C. M. Kirton, A. Kolb-Kokocinski, P. Beales, E. Birney, D. Danovi, A.
I. Lamond, W. H. Ouwehand, L. Vallier, F. M. Watt, R. Durbin, O. Stegle, D. J. Gaffney, Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017). doi:10.1038/nature22403 Medline
42. M. Knapp, The transmission/disequilibrium test and parental-genotype reconstruction: The reconstruction-combined transmission/ disequilibrium test. Am. J. Hum. Genet. 64, 861–870 (1999). doi:10.1086/302285 Medline
43. D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, C. von Mering, The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45 (D1), D362–D368 (2017). doi:10.1093/nar/gkw937 Medline
44. B. J. Klein, L. Piao, Y. Xi, H. Rincon-Arano, S. B. Rothbart, D. Peng, H. Wen, C. Larson, X. Zhang, X. Zheng, M. A. Cortazar, P. V. Peña, A. Mangan, D. L. Bentley, B. D. Strahl, M. Groudine, W. Li, X. Shi, T. G. Kutateladze, The histone-H3K4-specific demethylase KDM5B binds to its substrate and product through distinct PHD fingers. Cell Reports 6, 325–335 (2014). doi:10.1016/j.celrep.2013.12.021 Medline
45. O. Delaneau, J.-F. Zagury, J. Marchini, Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013). doi:10.1038/nmeth.2307 Medline
46. P. Mali, L. Yang, K. M. Esvelt, J. Aach, M. Guell, J. E. DiCarlo, J. E. Norville, G. M. Church, RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013). doi:10.1126/science.1232033 Medline
47. K. Gapp, J. Bohacek, J. Grossmann, A. M. Brunner, F. Manuella, P. Nanni, I. M. Mansuy, Potential of Environmental Enrichment to Prevent Transgenerational Effects of Paternal Trauma. Neuropsychopharmacology 41, 2749–2758 (2016). doi:10.1038/npp.2016.87 Medline
48. C. Dias, S. B. Estruch, S. A. Graham, J. McRae, S. J. Sawiak, J. A. Hurst, S. K. Joss, S. E. Holder, J. E. V. Morton, C. Turner, J. Thevenon, K. Mellul, G. Sánchez-Andrade, X. Ibarra-Soria, P. Deriziotis, R. F. Santos, S.-C. Lee, L. Faivre, T. Kleefstra, P. Liu, M. E. Hurles, S. E. Fisher, D. W. Logan; DDD Study, BCL11A Haploinsufficiency Causes an Intellectual Disability Syndrome and Dysregulates Transcription. Am. J. Hum. Genet. 99, 253–274 (2016). doi:10.1016/j.ajhg.2016.05.030 Medline
49. A. Csibi, K. Cornille, M.-P. Leibovitch, A. Poupon, L. A. Tintignac, A. M. J. Sanchez, S. A. Leibovitch, The translation regulatory subunit eIF3f controls the kinase-dependent mTOR signaling required for muscle differentiation and hypertrophy in mouse. PLOS ONE 5, e8994 (2010). doi:10.1371/journal.pone.0008994 Medline
50. M. C. Frank, M. Braginsky, D. Yurovsky, V. A. Marchman, Wordbank: An open repository for developmental vocabulary data. J. Child Lang. 44, 677–694 (2017). doi:10.1017/S0305000916000209 Medline
51. M. E. Smith, G. Lecker, J. W. Dunlap, E. E. Cureton, The Effects of Race, Sex, and Environment on the Age at Which Children Walk. Pedagog. Semin. J. Genet. Psychol. 38, 489–498 (1930). doi:10.1080/08856559.1930.10532284
52. G. L. Yamamoto, M. Aguena, M. Gos, C. Hung, J. Pilch, S. Fahiminiya, A. Abramowicz, I. Cristian, M. Buscarilli, M. S. Naslavsky, A. C. Malaquias, M. Zatz, O. Bodamer, J. Majewski, A. A. L. Jorge, A. C. Pereira, C. A. Kim, M. R. Passos-Bueno, D. R. Bertola, Rare variants in SOS2 and LZTR1 are associated with Noonan syndrome. J. Med. Genet. 52, 413–421 (2015). doi:10.1136/jmedgenet-2015-103018 Medline
53. N. A. Akawi, F. Al-Jasmi, A. M. Al-Shamsi, B. R. Ali, L. Al-Gazali, LINS, a modulator of the WNT signaling pathway, is involved in human cognition. Orphanet J. Rare Dis. 8, 87 (2013). doi:10.1186/1750-1172-8-87 Medline
54. H. Najmabadi, H. Hu, M. Garshasbi, T. Zemojtel, S. S. Abedini, W. Chen, M. Hosseini, F. Behjati, S. Haas, P. Jamali, A. Zecha, M. Mohseni, L. Püttmann, L. N. Vahid, C. Jensen, L. A. Moheb, M. Bienek, F. Larti, I. Mueller, R. Weissmann, H. Darvish, K. Wrogemann, V. Hadavi, B. Lipkowitz, S. Esmaeeli-Nieh, D. Wieczorek, R. Kariminejad, S. G. Firouzabadi, M. Cohen, Z. Fattahi, I. Rost, F. Mojahedi, C. Hertzberg, A. Dehghan, A. Rajab, M. J. S. Banavandi, J. Hoffer, M. Falah, L. Musante, V. Kalscheuer, R. Ullmann, A. W. Kuss, A. Tzschach, K. Kahrizi, H. H. Ropers, Deep sequencing reveals 50 novel genes for recessive cognitive disorders. Nature 478, 57–63 (2011). doi:10.1038/nature10423 Medline