supplementary material for...had undergone dna capture with either the agilent sureselect human all...

www.sciencemag.org/cgi/content/full/science.aar6731/DC1

Supplementary Material for

Quantifying the contribution of recessive coding variation to developmental disorders

Hilary C. Martin*, Wendy D. Jones, Rebecca McIntyre, Gabriela Sanchez-Andrade,

Mark Sanderson, James D. Stephenson, Carla P. Jones, Juliet Handsaker, Giuseppe Gallone, Michaela Bruntraeger, Jeremy F. McRae, Elena Prigmore,

Patrick Short, Mari Niemi, Joanna Kaplanis, Elizabeth J. Radford, Nadia Akawi, Meena Balasubramanian, John Dean, Rachel Horton, Alice Hulbert, Diana S. Johnson, Katie Johnson, Dhavendra Kumar, Sally Ann Lynch, Sarju G. Mehta, Jenny Morton,

Michael J. Parker, Miranda Splitt, Peter D Turnpenny, Pradeep C. Vasudevan, Michael Wright, Andrew Bassett, Sebastian S. Gerety, Caroline F. Wright,

David R. FitzPatrick, Helen V. Firth, Matthew E. Hurles, Jeffrey C. Barrett*, on behalf of the DDD Study

*Corresponding author. Email: [email protected] (H.C.M.); [email protected] (J.C.B.)

Published 8 November 2018 as Science First Release

DOI: 10.1126/science.aar6731

This PDF file includes: Materials and Methods Figs. S1 to S20 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/science.aar6731/DC1)

Tables S1 to S7 as separate Excel files

Funding statement

The DDD study presents independent research commissioned by the Health Innovation

Challenge Fund (grant HICF-1009-003), a parallel funding partnership between the Wellcome

Trust and the UK Department of Health, and the Wellcome Trust Sanger Institute (grant

WT098051). The views expressed in this publication are those of the author(s) and not

necessarily those of the Wellcome Trust or the UK Department of Health. The study has UK

Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research

Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics

Committee). The research team acknowledges the support of the National Institutes for Health

Research, through the Comprehensive Clinical Research Network. This study makes use of

DECIPHER (http://decipher.sanger.ac.uk), which is funded by the Wellcome Trust.

Materials and Methods

Family recruitment

The DDD project recruited individuals with severe, undiagnosed developmental disorders from

24 clinical genetics centres within the United Kingdom National Health Service and the

Republic of Ireland as described previously ( 11) , using specific criteria

(https://decipher.sanger.ac.uk/files/ddd/documents/policy/ddd_project_referral_guide_clinical_

genetics.pdf). Families gave informed consent to participate, and the study was approved by the

UK Research Ethics Committee (10/H0305/83, granted by the Cambridge South Research

Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics

Committee). DNA was collected from saliva samples obtained from the probands and their

parents, and from blood obtained from the probands, then samples were processed as previously

described (1). The analyses in paper are based on a data freeze that includes 7,831 trios from

7,446 families and 1,791 patients without parental samples (Fig. S2). These individuals include

those analysed in previous publications (4 , 9).

https://paperpile.com/c/PfoAAF/wzYCC



https://paperpile.com/c/PfoAAF/JGTJ



https://paperpile.com/c/PfoAAF/dw7J6+OGJ0g





Clinical features

The patients were systematically phenotyped: detailed developmental phenotypes were recorded

using Human Phenotype Ontology (HPO) terms (27), and growth measurements, family history,

developmental milestones etc. were collected using a standard restricted-term questionnaire

within DECIPHER ( 28). Some HPO terms fall under multiple organ systems (e.g. microcephaly

falls under "nervous system", "head or neck" and "skeletal system"), and for Fig. 1 and Table

S1, we wanted to avoid counting multiple organ systems that represented a single HPO term

multiple times. Thus, we first ranked organ systems according to the number of raw counts of

individuals with at least one term under that system in the full DDD cohort. We then took

individuals with at least one HPO under the organ system most commonly affected, and

assigned these individuals a count of one organ system. We then removed these HPOs from the

patients’ lists, and identified individuals with at least one HPO in the organ system ranked next

most commonly affected. We continued to count organ systems and remove HPOs for 19

non-overlapping systems. The organ systems in Fig. 1A and Table S1 were determined using

this procedure (e.g. a patient with microcephaly is only included under “nervous system”, not

under “head or neck” or “skeletal system”), and Fig. 1B shows the distribution of the

non-redundant counts.

We note that certain DDD clinicians tend to be more diligent about reporting multiple HPO

terms, and some tend to report certain organ systems more thoroughly. Thus, the small

differences in clinical features between EABI and PABI (Fig. 1; Fig. S3; Table S1) could be

partly because clinicians who work in areas with a high South Asian population have different

reporting styles from others. The slightly greater severity of the PABI group might also partly

be because, in communities enriched for cousin marriages, there is a higher number of children

with genetic disorders (29 ), and clinical geneticists are only able to see the more severe cases.

Exome sequencing and variant quality control

https://paperpile.com/c/PfoAAF/eVZzg



https://paperpile.com/c/PfoAAF/22M2W



https://paperpile.com/c/PfoAAF/mzoxG



Exome sequencing, alignment and calling of single-nucleotide variants and small insertions and

deletions was carried out as previously described (4), as was the filtering of de novo mutations.

For the analysis of biallelic genotypes, we chose thresholds for genotype and site filters to

balance sensitivity (number of retained variants) and specificity (as assessed by Mendelian error

rate and transition/transversion ratio). We removed sites with a strand bias test p-value < 0.001.

We then set individual genotypes to missing if they had genotype quality < 20, depth < 7 or, for

heterozygous calls, a p-value from a binomial test for allele balance < 0.001. Since the samples

had undergone DNA capture with either the Agilent SureSelect Human All Exon V3 or V5 kit,

we subsequently only retained sites that passed a missingness cutoff in both the V3 and the V5

samples. We found that, after setting a depth filter, the proportion of missing genotypes allowed

had a more substantial effect on the number of Mendelian errors than genotype quality and

allele balance cutoffs (Fig. S20). Thus, we ran the biallelic burden analysis on two different

callsets, using a 10% (strict) or a 50% (lenient) missingness filter, and found that the results

were very similar. We report results from the more lenient filter in this paper, since it allowed

us to include more variants. Genotypes were set to missing for a trio if there was a Mendelian

error, and variants were removed if more than one trio had a Mendelian error and if the ratio of

trios with Mendelian errors to trios carrying the variant without a Mendelian error was greater

than 0.1. If any of the individuals in a trio had a missing genotype at a variant, all three

individuals were set to missing for that variant.

Variants were annotated with Ensembl Variant Effect Predictor (30) based on Ensembl gene

build 83, using the LOFTEE plugin. The transcript with the most severe consequence was

selected. We analyzed three categories of variant based on the predicted consequence: (1)

synonymous variants; (2) loss-of-function variants (LoFs) classed as “high confidence” by

LOFTEE (including the annotations splice donor, splice acceptor, stop gained, frameshift,

initiator codon and conserved exon terminus variant); (3) damaging missense variants (i.e. those

not classed as “benign” by PolyPhen or SIFT, with CADD>25). Variants were also annotated

with MAF data from four different populations of the 1000 Genomes Project (31 ) (American,

https://paperpile.com/c/PfoAAF/dw7J6



https://paperpile.com/c/PfoAAF/7TshA



https://paperpile.com/c/PfoAAF/LTsnn



Asian, African and European), two populations from the NHLBI GO Exome Sequencing

Project (European Americans and African Americans) and six populations from the Exome

Aggregation Consortium (ExAC) (African, East Asian, non-Finnish European, Finnish, South

Asian, Latino), and an internal allele frequency generated using unaffected parents from the

DDD.

Ancestry inference

We ran a principal components analysis in EIGENSOFT (32) on 5,853 common exonic SNPs

defined by the ExAC project. We set genotypes with GL<20 to missing and excluded SNPs

with >2% missingness. We calculated principal components in the 1000 Genomes Phase III

samples and then projected the DDD samples onto them. We grouped samples into three broad

ancestry groups (European, South Asian, and Other) as shown in Fig. S2 (right hand plots). By

drawing ellipses around the densest clusters of DDD samples, we defined two narrower groups:

European Ancestry from the British Isles (EABI) and Pakistani Ancestry from the British Isles

(PABI).

For the burden and gene-based analysis, we primarily focused on these narrowly-defined EABI

and PABI groups because it is difficult to accurately estimate population allele frequencies in

more broadly defined groups. For example, in 4,942 European-ancestry probands, the number

of observed biallelic synonymous variants was slightly higher than the number expected (ratio =

1.06; p=2.7×10-4).

Calling autozygous regions

To call autozygous regions, we ran bcftools/roh(33 ) (bcftools version 1.5-4-gb0d640e)

separately on the different broad ancestry groups. We LD pruned our data to avoid overcalling

small runs of homozygosity as autozygous regions. Because rates of consanguinity differ

dramatically between EABI and PABI, we chose r2 cutoffs for each that brought the ratio of

observed to expected biallelic synonymous variants with MAF<0.01 closest to 1 (see below for

https://paperpile.com/c/PfoAAF/9vwmt



https://paperpile.com/c/PfoAAF/lYEew



calculation of the number expected): PLINK options --indep-pairwise 50 5 0.4 for EABI and

--indep-pairwise 50 5 0.8 for PABI.

Sample quality control and subsetting

Our strategy for filtering probands and defining different subsets is shown in Fig. S1. For our

main analyses we excluded individuals without parental data (though we did incorporate them

into some of the follow-up on EIF3F and KDM5B). Next, we removed 1,403 probands whose

ancestry was not in the EABI or PABI groups described above. Finally we removed 388

probands that failed quality control filters. Specifically: 47 probands with >5% missingness at

the SNPs used for PCA (since their ancestry assignment was likely uncertain), 7 probands with

uniparental disomy, and one individual from every pair of probands who were related (kinship >

0.044, estimated by PCRelate (34), equivalent to third-degree relatives).

For estimating cumulative frequencies of rare variants, we removed 924 parents reported to be

affected, since one might expect these to be enriched for damaging variants compared to the

general population, and 9 European parents with an abnormally high number of rare (MAF<1%)

synonymous genotypes (>834, compared to the 99.9th percentile of 223). However, we retained

their offspring in the burden and gene-based analyses.

We stratified probands by high autozygosity (>2% of the genome classed as autozygous),

whether or not they had an affected sibling, and whether or not they already had a likely

diagnostic dominant or X-linked exonic mutation (a likely damaging de novo mutation or

inherited damaging variant in a known monoallelic DDG2P gene

(http://www.ebi.ac.uk/gene2phenotype/) [if the parent was affected] or a damaging X-linked

variant in a known X-linked DDG2P gene). The 4,458 patients who had no such diagnostic

variants were included in the “undiagnosed” set, along with 193 patients who had diagnostic

variants in recessive DDG2P genes or potentially diagnostic variants in monoallelic or X-linked

DDG2P genes but had high autozygosity or affected siblings. There were 1,366 EABI and 23

https://paperpile.com/c/PfoAAF/dyrUs



http://www.ebi.ac.uk/gene2phenotype/

PABI probands in the diagnosed set, and 4,318 EABI and 333 PABI probands in the

undiagnosed set. For the set of probands with affected siblings shown in Figures 1 and 2, we

restricted to families from which more than one independent (i.e. non-MZ twin) child was

included in DDD and in which the siblings’ phenotypes were more similar than expected by

chance given the distribution of HPO terms in the full cohort (HPO similarity p-value < 0.05

(9 ) ).

Burden analyses and gene-based tests

Variants were filtered on class (LoF, damaging missense or synonymous) and by different MAF

cutoffs. Variants failing the MAF cutoff in any of the publicly available control populations, the

full set of unaffected DDD parents, or the unaffected DDD parents in that population subset

(PABI or EABI) were removed.

Following the approach we used previously (9), we calculated Bg,c, the expected number of rare

biallelic genotypes of class c (LoF, damaging missense or synonymous) in each gene g, as

follows:

(B ) N λE g,c = prob g,c

where is the number of probands and is the expected frequency of biallelic Nprob

λg,c

genotypes of class c in gene, calculated as follows:

(1 )f fλg,c = − ag c,g2 + ag c,g

where fc,g is the cumulative frequency of variants of class c in gene g with MAF less than the

cutoff, and ag is the fraction of individuals autozygous at gene g. An individual was defined as

being autozygous if he/she had a region of homozygosity with any overlap of gene g; in

practice, autozygous regions almost always overlapped genes completely rather than partially.

The rate of LoF/damaging missense compound heterozygous genotypes is:

(1 )[2f f (1 )]λg,LoF /miss = − ag LoF ,g miss,g − f

LoF ,g

https://paperpile.com/c/PfoAAF/OGJ0g






To calculate the cumulative frequency of variants of class c in gene g, f c,g, we first phased the

variants in the DDD parents based on the inheritance information. The cumulative frequency is

then given by:

fc,g = hc,g

Nhaps

where is the number of parental haplotypes with at least one variant of class c in gene g , and hc

is the total number of parental haplotypes.Nhaps

For each gene, we calculated the binomial probability (given probands and rate of Nprob

)λg,c

the observed number of biallelic genotypes of class c. We did this for four consequence classes

(biallelic LoF, biallelic LoF+LoF/damaging missense, biallelic damaging missense, and biallelic

LoF+LoF/damaging missense+biallelic damaging missense). Our reasoning was that, in some

genes, biallelic LoFs might be embryonic lethal and LoF/damaging missense compound

heterozygotes might cause DD, but in other genes, including rare damaging missense variants in

the analysis might drown out signal from truly pathogenic LoFs. We conducted the test on two

sets of probands (EABI only, and EABI+PABI). We did not analyze PABI separately due to

low power.

For the set of EABI only, we conducted a simple binomial test. For the combined EABI+PABI

test, we took into account the different ways in which n or more probands with the relevant

genotype could be distributed between the two groups and the probability of observing each

combination using population-specific rates (e.g. two observed biallelic genotypes could be

both seen in EABI, both in PABI, or one in each). We then summed these probabilities across

all possible combinations to obtain an aggregate probability for sampling n or more probands by

chance, as described in (9 ) .

For some genes, was estimated to be 0 in one or both populations because there were no λg,c

variants in the parents that passed filtering. The vast majority of these also had . We (B )O g,c = 0

dropped these genes from the tests, but still included them in our Bonferroni correction. We also




excluded 715 genes either because they were in the HLA region or because they were classed as

having suspiciously many or suspiciously few synonymous or synonymous+missense variants

in ExAC, leaving 18,630 genes. We thus set a significance threshold of 0.05/(8 tests 18,630 ×

genes) = p<3.4×10-7. For Fig. S8, we ordered the genes by their lowest p-value, randomized the

order of genes with the same p-value, then tested for a difference in the distribution of ranks

between recessive DDG2P genes and all other genes using a Kolmogorov-Smirnov test.

For the burden analysis, we summed up the observed and expected number of biallelic

genotypes across all genes to give and , then calculated their difference (B ) O c (B )E c

(the excess) and their ratio . Under the null hypothesis, we expect(B ) (B )O c − E c E(B ) c

O(B ) c (B )O c

to follow a Poisson distribution with rate .(B )E c

Alternative methods for estimating the expected number of biallelic

genotypes

For Fig. S4, we used two alternative methods to the one described above for estimating the

expected number of biallelic genotypes. Firstly, we calculated the cumulative frequency based

on ExAC by summing the frequencies of individual variants of class c in gene g separately for

non-Finnish Europeans (NFE) for EABI and from South Asians (SAS) for PABI:

fc,g,ExAC = ∑

V

v=1qc,g,v

For this, we restricted to ExAC variants within the intersection of the V3 and V5 Agilent probes

used in DDD (including 100bp flanks on each side), and removed variants if they had >50%

missingness in the relevant ExAC population.

We also used a modified version of the method described by Jin et al. (3) for approximating the

expected number of biallelic genotypes by fitting a polynomial regression on the mutability:

(B ) m mO g = β0 + β1 g + β22g

+ ε

https://paperpile.com/c/PfoAAF/72wNj



where is the observed number of biallelic genotypes for gene g, is the mutability for (B )O g mg

gene g calculated using the method of Samocha et al. (35 ) , , and are coefficients, and β0 β1 β2

is a random noise term. We fitted this model to the observed number of biallelic synonymousε

genotypes (MAF<0.01) for each gene to estimate , and , then calculated the expected β0 β1 β2

number of biallelic genotypes for each gene using the fitted value:

(B ) m mE g = β0 + β1 g + β22g

Note that we did not do the additional step described in Jin et al. in which they calculate the

expected number of biallelic genotypes for gene g as , since this (B )∑

genes

O g

β +β m +β m0 1 g 22g

+β m +β m∑

genes

β0 1 g 22g

constrains the total expected number of biallelic genotypes to be equal to the number observed,

making exome-wide burden analysis impossible.

Estimating the proportion of cases with diagnostic biallelic coding variants or

de novo mutations

We are interested in estimating the proportion of probands with diagnostic variants of ,πc

consequence class c. Under the null hypothesis in which none of the genotypes of class c are

pathogenic, the number of such genotypes we expect to see in probands is:Npr

NE(b )probands,c null = λpr,c pr

where is the total expected frequency of genotypes of class c across all genes.λpr,c = ∑G

g=1λg,c

However, under the alternative hypothesis, suppose that some fraction of genotypes in φcausal,c

class c cause DD, and some fraction are lethal. Assuming complete penetrance, we can φlethal,c

thus split into genotypes that are due to chance and those that are diagnostic:(b )Eprobands,c

+NE(b )probands.c alt = (1 )λ− φ

causal,c − φlethal,c pr,c pr Nπc pr

φ λcausal,c pr,c

1−e−φ λcausal,c pr,c

https://paperpile.com/c/PfoAAF/VpkbS



The component due to chance is , and is the average N(1 )λ− φcausal,c − φ

lethal,c pr,c pr φ λcausal,c pr,c

1−e−φ λcausal,c pr,c

number of pathogenic genotypes per individual, given that the individual has at least one such

genotype.

In healthy parents, biallelic genotypes of class c are all due to chance from the portion of c Npa

that is not pathogenic:

NE(b )parents.c alt = (1 )λ− φ

causal,c − φlethal,c pa,c pa

where is the expected rate of biallelic genotypes of class c in the parents, given the λpa,c

cumulative frequencies estimated in the same set of people, and the autozygosity rates. We can

thus obtain a maximum likelihood estimate for using , the observed φc = φcausal,c + φ

lethal,c Opa,c

number of biallelic genotypes of class c in the parents:

φc = 1 − Opa,c

λ Npa,c pa

The 95% confidence interval for is , . We show the φc 1( − λ Npa,c pa

O +1.96pa,c √Opa,c )λ Npa,c pa

O −1.96pa,c √Opa,c

estimates of from the EABI and PABI parents in Fig. S7. To estimate , we combined data φc πc

from both populations for MAF<0.01 variants to estimate , and obtained the following φc

maximum likelihood estimates and 95% confidence intervals: 0.141 (0.046,0.238), φLoF /LoF =

0.083 (-0.009,0.175), and 0.007 (-0.028,0.042).φLoF /miss = φ

miss/miss =

For biallelic genotypes, we can substitute into the expression above, substitute for φc Opr,c

and rearrange to obtain a maximum likelihood estimate of :(b ) Eprobands

πc

1 )) ≈( )πc = ( Opr,c

λ Npr,c pr

− ( − φc φcausal,c

1−e−φ λcausal,c pr,c Opr,c

λ Npr,c pr

− 1 + φc φ,c1−e−φ λ,c pr,c

We cannot disentangle and with the available data, but we find that the ratio φcausal,c φ

lethal,c

makes very little difference to the estimate of , so we make the assumption thatφlethal,c

φcausal,c πc

. The 95% confidence interval is thenφcausal,c = φc

.( ) , ) ][ λ Npr,c pr

O −1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c ( λ Npr,c pr

O +1.96pr,c √Opr,c − 1 + φc φc1−e−φ λc pr,c

De novo mutations were called as previously described (4), selecting the threshold on ppDNM

(posterior probability of a de novo mutation) such that the observed number of synonymous de

novos matched the number expected. Using Sanger validation data from an earlier dataset (1),

we adjusted the observed number of mutations to account for specificity and sensitivity as

follows:

Ode novo, adjusted = 0.95β

O αde novo, raw

where is the positive predicted value (the proportion of candidate mutations that are true α

positive at the chosen threshold) and is the sensitivity to true positives at the same threshold. β

The adjustment by 0.95 is due to exome sequencing being only about 95% sensitive. The

overall de novo mutation rate was calculated in different sets of probands using the λde novo,c,pr

model from (35), adjusting for sex, as described previously (4) .

Since we cannot estimate for de novo mutations using the parents, as we did for recessive φc

variants, we instead set to 0.099, the fraction of genes with pLI>0.99. This estimate is φDN LoF

more speculative than the directly observed depletion of biallelic genotypes above, but we note

that the estimate of for the full set of 7,832 trios only increases from ~0.129 to ~0.154 πDN LoF

if we increase from 0.01 to 0.3. To estimate , we make use of this φDN LoF φ

DN missense

relationship:

≈λ φ N Pr(DDD|c)πc c c BI where is the population of the British Isles, and is the probability that an N

BI r(DDD|c)P

individual is recruited to the DDD given he/she has a pathogenic mutation of class c. If we

assume that this recruitment probability is the same for de novo missense mutations as for de

novo LoFs, we can write:

=πDN LoF

πDN missense

λ φDN LoF DN LoF

λ φDN missense DN missense

We know and , will assume so we can estimate , λDN missense λ

DN LoF .099φDN LoF = 0 π

DN LoF

and can thus write the number of de novo missense mutations we expect to see as:













+ ) NE(mpr,DN missense = (1 )λ− φ

DN missense pr,DN missense pr Npr

(φ λ ) πDN missense prDN missense

2DN LoF

(λ φ )(1−e )DN LoF DN LoF

−φ λDN missense pr,DN missense

We calculated for a range of values of and found that )E(mpr,DN missense φ

DN missense

best matched the observed data, so we used this value for estimating.036φDN missense = 0

.πDN missense

Structural analysis of EIF3F

This is shown in Fig. S11. Human EIF3:f (pdb 3j8c:f) was submitted to the Protein structure

comparison service PDBeFold at the European Bioinformatics Institute (36, 37 ). Of the close

structural matches returned, the X-ray yeast structure pdb entry 4OCN was chosen to display

the human variant position, as the structural resolution (2.25Å) was better than the human

EIF3:f pdb 3j8c:f structure (11.6Å) and it was the most complete structure among the yeast

models. In order to map the Phe232 variant onto the equivalent position on the yeast structure,

the structural alignment from PDBeFold was used. Solvent accessibility was calculated using

the Naccess software (38) using the standard parameters of a 1.4Å probe radius. Amino acid

sequence conservation was calculated using the Scorecons server (39) and displayed using

sequence logos ( 40).

Introducing the EIF3F Phe232Val variant with CRISPR/Cas9

The Phe232Val mutation in the human EIF3F gene was generated by a single base substitution

(T>G) using CRISPR/Cas9-induced homology directed repair in the kolf2_C1 human iPSC line,

a clonal derivative of the kolf2 line generated by the HipSci consortium (41 ). This was achieved

by nucleofection (Lonza, P3 buffer, program CA137) of 106 cells with Cas9-crRNA-tracrRNA

ribonucleoprotein (RNP) complexes. Synthetic RNA oligonucleotides (target site:

5’-AGGCGTGAACATCACTCCCA-3’, 225 pmol crRNA/tracrRNA, IDT) were annealed by

heating to 95°C for 2 min in duplex buffer (IDT) and cooling slowly, followed by addition of

122 pmol recombinant eSpCas9_1.1 protein (in 10 mM Tris-HCl, pH 7.4, 300 mM NaCl, 0.1

https://paperpile.com/c/PfoAAF/w8nSq+1ORKR





https://paperpile.com/c/PfoAAF/BaxOg



https://paperpile.com/c/PfoAAF/WNuSH



https://paperpile.com/c/PfoAAF/4sR8p



https://paperpile.com/c/PfoAAF/HorKx



mM EDTA, 1 mM DTT, PMID: 26628643), and 500 pmol of a 100 nt ssDNA oligonucleotide

(5’-CTCCGATGCGTTCAGTGTCGTAGTACGCGTATTTCACTGTCAGAGGCGT

GACCATCACTCCCATGGTCCTCCCAGGGACTCCCATTAAAGTGCTGCGGAAGG-3’,

IDT Ultramer) as a homology-directed repair template to introduce the desired base change.

After recovery, plating at single cell density and colony picking into 96 well plates, 384 clones

were screened for heterozygous and homozygous mutations by high throughput sequencing of

amplicons spanning the target site using an Illumina MiSeq instrument.

Crude DNA lysates were prepared by incubation of cells in 100 l of yolk sac lysis buffer (10

mM Tris-HCl pH 8.3, 50 mM KCl, 2 mM MgCl2, 0.45% IGEPAL CA-630, 0.45% Tween 20,

400 g/ml proteinase K) at 60°C 1 h, 95C 10 min followed by dilution 10x in water. The region

surrounding the mutation was amplified by nested PCR using 1 l diluted lysate and KAPA HiFi

hotstart polymerase (Kapa Biosystems) by 35 cycles of {98°C 20 s, 65°C 15 s, 72°C 30 s},

using primers eIF3F_500_F and eIF3F_500_R (see below). Subsequently, reactions were

diluted 10x and re-amplified with primers eIF3F_200_F and eIF3F_200_R (see below) (Tm =

60°C) to ensure specificity for the EIF3F gene. Indexed Illumina sequencing adaptors were

added by a third round of PCR to identify the location of positive clones. Final cell lines were

further validated by Sanger sequencing. Since this region is highly similar to several other

regions in the genome, we analysed the final clones for off-target mutagenesis at the

homologous sites, and found no evidence of this at the two closest off-target sites on

chr2:58252105-58252196 and chr12:5032866-5032941 in any of the lines obtained.

Primer sequences: (5’-3’)

eIF3F_500_F: tctggggttcatttggtccc

eIF3F_500_R: ctgctcagtcacacttcctg

eIF3F_200_F: ACACTCTTTCCCTACACGACGCTCTTCCGATCTagacatggaaaccctctccc

eIF3F_200_R: TCGGCATTCCTGCTGAACCGCTCTTCCGATCTagtcccttttcaaaccaccc

Investigating EIF3F protein levels in iPSC lines containing the EIF3F variant

iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in

TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO 2:95% air.

At 75% confluence, iPS cells were incubated in Gentle Cell Dissociation Reagent (07174,

Stemcell Technologies) for 5 min, collected into a tube, centrifuged at 300 g for 3 min and

resuspended in ice-cold 20 mM Tris-HCL pH 7.4 containing protease inhibitor cocktail

(1861281, Thermo Scientific). Cells were agitated at 4oC for 30 min before passing through a 26

gauge needle several times and centrifuging at 13000 g for 15 min at 4oC. Protein in the

supernatant was quantified using the Qubit fluorometer (Invitrogen). We repeated this for two

independent cultures of iPSC lines. Samples were reduced (NP0009, Invitrogen) and mixed

with sample buffer (NP0007, Invitrogen) before an equal amount of total protein from each cell

line (6 µg for culture 1, 4 µg for culture 2) was electrophoresed through a 4-12% Bis-Tris gel

(NP0322, Invitrogen) and transferred onto a membrane (IPVH00010, Immobilon P) using the

X-Cell SureLock system, according to the manufacturer’s protocol (E19051, Invitrogen). Blots

were blocked in 5% non-fat milk (Marvel) in tris buffered saline containing 0.1% Tween 20

(TBST) for 1 h at room temperature. Primary antibodies were as follows: 1:500 anti-rabbit

EIF3F (638202, Biolegend) or 1:1000 beta III tubulin (EP1569Y; AB52623, Abcam), both

diluted in 5% non-fat milk in TBST and incubated overnight at 4oC. Rabbit-HRP-linked

secondary antibody (7074, Cell Signalling Technologies) was diluted 1:2000 in 5% non-fat milk

in TBST and incubated at room temperature for 2 h. Blots were incubated for 2 min with

Western Bright ECL Spray (Advansta) followed by detection with the ImageQuant LAS 4000

(GE Healthcare). Band intensities were measured using ImageJ 1.48v and EIF3F expression

values were normalised to beta-tubulin.

We jointly analysed the data from the two independent replicates, having normalised the

relative expression values by dividing by the mean of the wild-type lines for each replicate. We

tested for lower relative EIF3F expression in the homozygous cells compared to heterozygous

and wild-type cells combined using a one-sided t-test (Fig. S10).

Investigating protein synthesis in iPSC lines containing the EIF3F variant

Protein synthesis was assayed using the Click-iT L-homopropargylglycine (HPG) Alexa Fluor

488 Protein Synthesis Assay Kit (C10429, Molecular Probes). Briefly, in this assay, we took

iPSCs with or without the EIF3F variant and depleted them of methionine to stop translation,

then added a methionine analogue that contains an alkyne moiety which is incorporated into

nascent proteins. We then added Alexa Fluor 488 azide, which leads to a chemoselective "click"

reaction between the fluorescent azide and the alkyne, allowing the modified proteins to be

detected by image-based analysis as a proxy for the amount of protein synthesis.

We optimised the assay for use with iPSCs, using the translational elongation inhibitor

cycloheximide as a control to confirm the sensitivity and reproducibility of the assay (Fig. S12).

iPS cells were cultured on Vitronectin XF (07180, Stemcell Technologies)-coated plates in

TeSR-E8 medium (05990, Stemcell Technologies) and incubated at 37oC in 5%CO2:95% air.

Cells were washed in PBS (D8537, Sigma), dissociated with Accutase (A11105-01, Gibco),

then 30 000 cells were seeded per well of a 96 well plate, in triplicate in TeSR-E8 containing 10

M ROCK inhibitor (Y-27632 dihydrochloride; Ab120129, Abcam). The following day, cells

were depleted of methionine for 1 h by incubation in methionine-free DMEM (21013-024,

Gibco), which was supplemented with insulin, transferrin and selenium (41400045, Gibco), 20

ng/ml human FGF-basic protein (100-18C, PeproTech), sodium pyruvate (11360-039, Gibco),

L-glutamine (25030-81, Gibco) and N-acetyl cysteine (A7250, Sigma), at 37℃ in 5%CO2:95%

air. The media was then changed to methionine-free DMEM plus supplements and 100 M

Click-iT HPG (L-homopropargylglycine; C10202, Molecular Probes), a methionine-analogue,

or vehicle (DMSO; D2650, Sigma), and cells were incubated for 2 h at 37℃ in 5%CO2:95%

air. As a control, cycloheximide was added to control wells at the same time, at different

concentrations (Fig. S12B).

At the end of the incubation, cells were washed briefly with PBS, trypsinised for 5 min

(25300-054, Gibco) to dissociate single cells, diluted in DMEM containing 20% serum to

quench the trypsin, transferred to a V-bottom 96-well plate and triplicate reactions were pooled.

Cells were then processed using standard immunocytochemical methods through a series of

incubations, centrifugations (all performed at 300 g for 5 min) and washes in PBS, all at room

temperature, in the following order: incubation in 3.7% formaldehyde in PBS (28906, Thermo

Scientific) for 15 min to fix the cells; incubation in 0.5% Triton X-100 (X100, Sigma) for 20

min to permeabilise the cells; incubation in 3% bovine serum albumin (A9543, Sigma) in PBS

for 10 min to block non-specific binding sites; incubation in 100 l Click-iT reaction cocktail,

which was prepared according to the manufacturer’s recommendation, for 1 h, followed by a

wash with Click-iT reaction rinse buffer, before resuspending in PBS and analysing

fluorescence at 488 nm on a BD LSRFortessa (BD Biosciences).

This experiment was repeated three times for each of the two independent cell lines, for each of

the three genotypes (wild-type, heterozygous, homozygous). Data were analysed using FlowJo

(v10.4.2). After gating on single cells, we obtained a bimodal distribution of Alexa Fluor

488-positive cells (Fig. S12A), due to the different rates of protein synthesis during the cell

cycle (G1/G2 have high rates of protein synthesis and S/M phase have lower rates). We gated

on the AlexaFlour 488-positive population (fluorescence > 104), which comprised ~80% of the

cells in the absence of cycloheximide (Fig. S12A). This proportion did not significantly depend

on genotype, replicate or cell line (ANOVA p>0.05). After gating, there were an average of 139

cells per replicate (standard deviation = 76), on which we determined the median fluorescence

intensity. We then tested for the effect of genotype on median fluorescence intensity using

linear regression, controlling for replicate. As a summary, we averaged the median fluorescence

for the two cell lines of the same genotype for each replicate, calculated the ratio of homozygote

to wild-type, and then averaged this across replicates.

Investigating proliferation and apoptosis in iPSC lines containing the EIF3F

variant

We investigated proliferation of iPS cell lines using two methods. In the first assay, 50,000

single cells were plated and four days later total cells were counted (Fig. 13A). In the second

assay, the membrane of single cells was stained with the fluorescent dye CellTrace Violet TM, the

fluorescence of which reduces with each cell division and was assessed using flow cytometry

(Fig. 3C; Fig. 13B).

For these assays, iPS cells were cultured on Vitronectin XF (07180, Stemcell

Technologies)-coated plates in TeSR-E8 medium (05990, Stemcell Technologies) and

incubated at 37oC in 5%CO2:95% air. At 65% confluence, iPS cells were washed with PBS

(D8537, Sigma), dissociated with Accutase (A11105-01, Gibco), resuspended in 10X volume of

TeSR-E8 medium (05990, Stemcell Technologies) containing 10 μM ROCK inhibitor (Y-27632

dihydrochloride; Ab120129, Abcam), passed through a 10 μm cell strainer (pluriStrainer,

pluriSelect; 43-10010-40) and centrifuged at 300 g for 3 min.

In the first assay, cells were resuspended in TeSR-E8 medium and 50,000 live cells (assessed

using Trypan blue exclusion; Invitrogen, 15250061) were plated per well of a 6-well plate

containing TeSR-E8 medium and 10 μM ROCK inhibitor. Cells were cultured for the next four

days in TeSR-E8 medium, before trypsinising for 5 min (25300-054, Gibco), and counting the

total number of live cells (Figure S13A).

In the second assay, cells were resuspended in PBS containing 4 μM CellTrace Violet(TM)

(Molecular Probes, C34557) and incubated at room temperature, in the dark for 20 min. Cells

were diluted in TeSR-E8 medium containing 10 μM ROCK inhibitor and centrifuged at 300 g

for 3 min, and then 40,000 cells were plated per well of a 48-well plate. Cells were cultured for

the next two days in TeSR-E8 medium, washed in PBS, dissociated to single cells with

Accutase, transferred to a V-bottom plate and centrifuged at 400 g for 4 min before being fixed

in 100 μl fixation buffer (eBioscience™ IC Fixation Buffer, 00-8222-49) and resuspended with

staining buffer (eBioscience™ Flow Cytometry Staining Buffer, 00-4222-26). Cells stained and

fixed at day zero were used as a ‘no division’ control. Note that Fig. 3B shows the distributions

from one representative replicate per genotype. The data were consistent across cell lines and

replicates (Fig. S13B) and clearly showed that the homozygous lines had a higher fraction of

cells that had only undergone one division than both the wild-type and heterozygous cells. To

test this formally, we fitted a linear regression of the number of cells in division 1 against the

genotype (0=wild-type or heterozygous; 1=homozygous) and the total number of cells.

We assessed apoptosis by staining with annexin V conjugated to a fluorescent dye and detected

dead cells with propidium iodide (PI). Briefly, phosphatidylserine is externalised in apoptotic

cells and is bound by recombinant annexin V-FITC (green fluorescence), while PI is excluded

by the membrane of live cells and only stains necrotic cells or those in late apoptosis with red

fluorescence. After incubation with both probes, early apoptotic cells show green fluorescence,

and late apoptotic cells show both red and green fluorescence. Necrotic cells show red only and

live cells show little or no fluorescence. We used a topoisomerase inhibitor (topotecan) as a

positive control in this assay. Cells were harvested at 65% confluence (including the positive

control pre-incubated for 2 h in 10 μM topotecan) by dissociation with Accutase, and processed

according to the manufacturer’s protocol (Dead Cell Apoptosis Kit; Invitrogen, V13242).

Flow cytometry data were acquired on an LSRFortessa (Becton Dickinson) instrument.

Doublets cells were excluded and CTV, Annexin V or PI fluorescence analyzed using FlowJo

10.4.2 software package (LCC). Fig. 3B and Fig. S13B summarise the results from the CTV

proliferation assay, and Fig. S14 shows the results from the apoptosis assay.

Validation of KDM5B variants by targeted re-sequencing

We re-sequenced all KDM5B de novo mutations and inherited LoF variants, with the exception

of two large deletions. PCR primers were designed using Primer3 to amplify the site of interest,

generating approximately a 230 bp product centred on the site. PCR amplification of the

targeted regions was carried out using JumpStart TM AccuTaqTM LA DNA Polymerase

(Sigma-Aldrich), using 40 ng of input DNA from the proband and their parents. Unique

identifying tag sequences were introduced into the PCR amplicons in a second round of PCR

using KAPA HiFi HotStart ReadyMixPCR Kit (KapaBiosystems). PCR amplicons were pooled

and 96 products were sequenced in one MiSeq lane using 250 bp paired-end reads. Reference

and alternate read counts extracted from the resulting bam files and were used determine the

presence of the variant in question. In addition, read data were visualised using IGV.

Transmission-disequilibrium test on KDM5B LoFs

We observed 15 trios in which one parent transmitted a LoF to the child, 5 trios in which one

parent had a LoF that was not transmitted, 2 quartets in which one parent had a LoF that was

transmitted to one out of two affected children, and 4 trios in which both parents transmitted a

LoF to the child. We tested for significant over-transmission using the

transmission-disequilibrium test as described by Knapp (42 ) . There were 7 LoFs (including one

large deletion) observed in probands whose parents were not originally sequenced, which we

excluded from the TDT. Of the six for which we attempted validation and segregation analysis,

one was found to be de novo and five inherited.

Searching for coding and regulatory modifiers of KDM5B

We defined a set of genes that might modify KDM5B function as: interactors of KDM5B

obtained from the STRING database of protein-protein interactions (43) (HIST2H3A , MYC,

TFAP2C , CDKN1A, TFAP2A, SETD1A, SETD1B, KDM1A, KDM2B, PAX9) plus those

mentioned by Klein et al. ( 44 ) (RBBP4, HDAC1, HDAC4 , MTA2 , CHD4 , FOXG1, FOXC9), as

well as all lysine demethylases, lysine methyltransferases, histone deacetylases, and SET

domain-containing genes from http://www.genecards.org/. The final list contained 95 genes. We

looked for LoF or rare missense variants in these genes in the monoallelic KDM5B LoF carriers

that might have a modifying effect, but found none that were shared by more than two of the de

novo carriers.

https://paperpile.com/c/PfoAAF/HNNxX/?noauthor=1



https://paperpile.com/c/PfoAAF/Znx9n



https://paperpile.com/c/PfoAAF/urc7q/?noauthor=1



http://www.genecards.org/

We also looked for indirect evidence of a regulatory “second hit” near KDM5B by examining

the haplotypes of common SNPs in the region (Fig. S15). DDD probands and a subset of their

parents were genotyped on either the Illumina OmniExpress chip or the Illumina CoreExome

chip. We performed variant and sample quality control for each dataset separately. Briefly, we

removed variants and samples with high data missingness (>=0.03), samples with high or low

heterozygosity, sample duplicates, individuals of African and East Asian ancestry, and SNPs

with MAF<0.005. We then ran SHAPEIT2 ( 45) to phase the SNPs within 2Mb either side of

KDM5B. To make Fig. S15, we used the heatmap() function in R to cluster the phased

haplotypes using the default hierarchical clustering method (based on Euclidean distance).

Analysis of DNA methylation in KDM5B probands

DNA from 64 DDD whole blood samples comprising 41 probands with a KDM5B variant and

23 negative controls was run on an Illumina EPIC 850K methylation array. Negative controls

were selected from DDD probands with de novo mutations in genes not expressed in whole

blood (SCN2A , KCNQ2, SLC6A1 , and FOXG1), since we would not expect these to

significantly impact the methylation phenotype in that tissue. Samples were randomised on the

array to reduce batch effects, and were QCed using a combination of data from control probes

and numbers of CpGs that failed to meet the standard detection p-value of 0.05. Based on these

criteria, two samples failed and were excluded from further analysis (one of the negative

controls and one of the inherited KDM5B LoF carriers).

We looked at methylation levels in the KDM5B LoF carriers to search for an “epimutation”

(hypermethylation on or around the promoter) that might be acting as second hit. We analyzed a

subset of CpGs in and around the KDM5B promoter region: the CpG island in the KDM5B

promoter itself, and a CpG island in the promoter of KDM5B-AS1, a lnc-RNA not specifically

associated with KDM5B, but also highly expressed in the testis. We also extended analysis 5kb

on either side of the start and stop sites of the KDM5B promoter. We examined the distribution

https://paperpile.com/c/PfoAAF/k4rVn



of the beta values (the ratio of methylated to unmethylated alleles) at each of the CpGs in the

10kb region (Fig. S16).

A final post-QC set of 754,273 methylation probes and samples were analysed for

whole-genome methylation changes using principal component analysis (Fig. S17). One of the

samples with a de novo was removed from PCA analysis as it was a significant outlier. PCA

was run on the post-QC set of beta values using the standard prcomp() functions in R (version

3.4.1).

Generation of the KDM5B knockout mice

A mouse Kdm5b loss-of-function allele (MGI:6153378) was generated by

CRISPR/CAS9-mediated deletion of coding exon 7 (ENSMUSE00001331577), leading to a

premature translational termination due to a downstream frameshift. CAS9 mRNA (46) and two

pairs of guide RNAs flanking exon 7 (targeting CCTAGTAACACTAGGTGTTAATA,

GTGTTTGGTTGTCAGTTAGAGGG, CCTCTCGTACATACATCCTAGGC,

CCTAGGCTCGAACTTCACCATGT) were injected into 1-cell stage C57BL/6NJ zygotes,

which were then implanted into host mothers. Mice born from these transfers were tested for

germ line transmission, and identified founders were bred on a C57BL/6NJ background to

establish colonies for further testing.

Breeding and housing of mice and all experimental procedures were assessed by the Animal

Welfare and Ethical Review Body of the Wellcome Sanger Institute and conducted under the

regulation of UK Home Office license (P6320B89B), and in accordance with institutional

guidelines. From in-crosses of heterozygous Kdm5b mutant mice, we recovered 350 pups at

weaning, of which 39 were homozygous (11.1% versus 25% expected). Kdm5b -/- pups were

raised and weaned together with Kdm5b+/+ pups. Mice were housed in mixed genotype cages

(2-5 mice) with food and water ad-libitum, under controlled temperature and humidity and a

https://paperpile.com/c/PfoAAF/buc9i



12h light cycle (light on at 7am) at the Research Support Facility of the Wellcome Sanger

Institute.

Phenotyping of wild-type and KDM5B knockout mice

At 10 weeks of age, a cohort of 16 wild-type and 16 homozygous knockout male mice

underwent a series of behavioral tests. These were carried out between 9am – 5pm, after 1h of

habituation to the testing room. Experimenters were blind to genotype, mouse movements were

recorded with an overhead infrared video camera and tracked by automated video tracking

(Ethovision XT 11.5, Noldus Information Technology).

Light-dark box (Fig. 4C) : This test was adapted from Gapp et al. (47) . Mice were housed in

pairs for 20 minutes before introducing them individually into the light-dark box, a plastic box

(40 × 42 × 26 cm) divided in two compartments. One is smaller, closed and dark (1/3 of the

total surface area) and is connected through a door (5 cm) to a larger, brightly lit (370 lux with

an overhead lamp) compartment (2/3 of the box). Each mouse was placed in the dark

compartment, the door was then opened and the animal was left to explore for 10 min. The time

spent in the light compartment was tracked and the difference between genotypes was estimated

using a t-test.

Three-Chamber Sociability Assay (Fig. 4D): The protocol was adapted from Dias et al. (48 ).

In brief, the test arena was divided into 3 equally sized chambers (25 x 37.5 cm) connected by

small doors (5 x 5 cm). Mice were first individually habituated to the arena for 5 minutes, where

the central chamber was empty and each of the side chambers had an empty cylindrical corral (8

cm diameter, 15 cm tall, 1 cm gap between bars). The mouse was then returned to the central

chamber and doors were closed, while a stimulus mouse of a different strain (129Sv) was

introduced into one of the corrals. Doors were opened and the test mouse was allowed to

explore all three chambers: one side chamber with an empty corral and the other with a corral

containing a freely moving novel mouse, for 5 minutes. The time spent investigating both

https://paperpile.com/c/PfoAAF/TomEv

https://paperpile.com/c/PfoAAF/tqwAw



corrals was recorded and measured as total investigation. The time spent investigating the

mouse was assessed as a percentage of the total investigation time, i.e. [time investigating corral

with mouse]/[time investigating corral with mouse + time investigating empty corral]. A t-test

was used to evaluate the difference between genotypes of the time spent investigating the

mouse.

Social Recognition Assay (Fig. 4E): This paradigm assesses memory and relies on the natural

preference of mice to investigate a novel mouse over a familiar one. The assay consists of two

days, as reported by Dias et al. ( 48). On both days, mice were habituated for 10 minutes to the

test box, an empty, clean cage. Briefly, on the first day, mice were tested on a habituation

paradigm to assess their olfaction and social investigation. An anesthetized stimulus mouse was

placed in the center of the arena for 1 minute, repeated 4 times at 10 minute intervals. On the

5th trial, a novel mouse was presented. Both Kdm5b wild-type and homozygous knockout mice

showed a similar decreasing investigation of the repeatedly-presented stimulus mouse, and an

increased investigation on trial 5 of the novel mouse. On day 2, after 24 hours, the

discrimination test was performed. Mice were simultaneously presented with the familiar mouse

from Day 1 and a completely novel mouse. The amount of time the test animal spent

investigating each stimulus mouse by close-proximity (sniffing, oronasal contact, or

approaching within 1–2 cm) was recorded. Using predefined exclusion criteria, one wild-type

mouse was excluded on day 1, and a second wild-type was excluded on day 2, both for

insufficient overall investigation. A two-way ANOVA was used to assess the discrimination

difference between genotypes, followed by Bonferroni's multiple comparisons test (two

comparisons, familiar versus unfamiliar mouse for wild type and mutant).

Barnes Maze probe trial (Fig. S19AB) : This assay is a test of visuo-spatial learning and

memory on a circular maze (120cm diameter table) with 20 holes around the perimeter. One of

the holes leads to a small dark box (Target) where they can escape from the brightly lit maze.

Mice were trained for three days, 10 trials (4 min maximum each), to find the target location.




On the probe trial, 72h after the last training day, the escape box was removed. Each mouse was

given 4 minutes to explore the maze. The mouse's movements were tracked, and the amount of

time spent around each of the holes was analyzed. Data is expressed as the percentage of time

spent dwelling around each hole, relative to the total amount of time spent investigating all

holes. With 20 holes to investigate, the amount of time spent by chance around each hole would

be 5%. The difference between genotypes in remembering the target location was evaluated

using a two-way ANOVA with Bonferroni's multiple comparisons test.

Statistical analysis for all mouse behavior experiments was performed with GraphPad Prism 7

(GraphPad Software).

X-rays (Fig. S19C): Seven wild-type and seven homozygous Kdm5b knockout mice were

anaesthetised with ketamine/xylazine (100mg/ 10mg per kg of body weight) and then placed in

an MX-20 X-ray machine (Faxitron X-Ray LLC). Whole body radiographs were taken in

dorso-ventral and lateral positions. Images were then analysed and morphological abnormalities

assessed using Sante DICOM Viewer v7.2.1 (Santesoft LTD).

Supplementary Figures

Fig. S1: Flow diagram indicating the final number of samples used in the various analyses, and

why samples were removed. Note that we use “diagnosed” and “undiagnosed” as shorthand in

the manuscript, but these terms are not fully accurate descriptors of these groups (see Methods

section on “Sample quality control and subsetting”). The “diagnosed” probands include only

those with diagnostic dominant or X-linked exonic variants. The “undiagnosed” set includes

193 patients who had diagnostic variants in recessive DDG2P genes or potentially diagnostic

variants in monoallelic or X-linked DDG2P genes but had high autozygosity or affected siblings

Fig. S2: Principal components analysis of the 1000 Genomes Phase 3 samples (left) with DDD

samples projected on top of them (right). The ellipses used to define the EABI and PABI

populations in DDD are shown on the PC2 versus PC3 plot.

Fig. S3: Distribution of the number of organ systems affected per proband (10 ). P-values are

from Wilcoxon rank-sum tests.

https://paperpile.com/c/PfoAAF/KvzkU



Fig. S4: Number of observed and expected biallelic synonymous genotypes per individual,

estimated across 5684 EABI and 356 PABI probands. The grey lines indicate the observed

number and the points with small black lines represent the expected number with a 95%

confidence interval. Two-sided p-values are indicated. The expected numbers were calculated in

different ways as described in Methods. Note that the estimate based on the DDD parental

haplotypes (black points) best matches the observed data, and that use of the ExAC frequencies

or the polynomial model of Jin et al. (3) gives estimates of the expected number that are

significantly different from the observed number.


Fig. S5: Histograms of levels of autozygosity across EABI and PABI probands.

Fig. S6. Burden of biallelic damaging genotypes in different gene sets, for undiagnosed EABI

and PABI probands combined (N=4,651). This includes biallelic LoFs, biallelic damaging

missense and compound heterozygous LoF/damaging missense. The lines show 95%

confidence intervals and the p-values are from a one-sided Poisson test. Note the strong

enrichment in known recessive DD genes and genes predicted to be intolerant of biallelic LoFs

based on the pRec score (22), and lack of enrichment in known dominant DD genes and genes

predicted to be intolerant of monoallelic LoFs (pLI>0.9).

https://paperpile.com/c/PfoAAF/bTosz



Fig. S7: Estimates of , the proportion of biallelic genotypes that are causal for DD or lethal. φ

These were estimated from the parental data. See Methods for details. The points show

maximum likelihood estimates and the lines show 95% confidence intervals. Points at the same

MAF cutoff have been slightly scattered along the x-axis for ease of visualisation.

Fig. S8: Distribution of the ranks of minimum p-values per gene for known recessive genes

versus all other genes. The order of genes with the same minimum p-value was randomised. A

Kolmogorov-Smirnov (KS) test indicated that these distributions were significantly different.

This, combined with the fact that fourteen of the sixteen genes with p<10-4 are known recessive

DD genes, suggests our approach for identifying these genes is valid and Bonferroni correction

is conservative.

Fig. S9: Anterior-posterior facial photographs of selected individuals with the homozygous

Phe232Val variant in EIF3F. DECIPHER IDs are shown in the top right corner. Affected

individuals did not have a distinctive facial appearance. Individual 265452 (leftmost) had

muscle atrophy, as demonstrated in photographs of the anterior surface of the hands which show

wasting of the thenar and hypothenar eminences. This is notable because it is only reported in

one other proband in the DDD study and, in mice, Eif3f has been shown to play a role in

regulating skeletal muscle size via interaction with the mTOR pathway (49 ). None of the other

individuals were either assessed to have or previously recorded to have muscle atrophy.

Fig. S10. Results from Western blot showing lower levels of the EIF3F protein in iPSCs

homozygous for the Phe232Val variant. WT: wild-type, HET and HOM: heterozygous and

homozygous for Phe232Val. Note that two independent cell lines were used for each genotype,

and that the two culture represent independent replicates. Beta-tubulin was used as a

normalising control. Band intensities were quantified using ImageJ. We jointly analysed the

data from both independent cultures, having normalised the relative expression values by

dividing by the mean of the wild-type lines for each replicate. There was significantly lower

relative EIF3F expression in the homozygous cells compared to heterozygous and wild-type

cells combined (one-sided t-test; p=0.003), with a mean reduction of 26.6%.

Fig. S11: Predicted structural effects of the pathogenic Phe232Val variant in EIF3F. The

secondary structure, domain architecture and 3D fold of EIF3F is conserved between species

but sequence similarity is low (29% between yeast and humans). A) Section of the amino acid

sequence logo for EIF3F where the strength of conservation across species is indicated by the

size of the letters. The sequence below represents the human EIF3F. Boxed characters are the

aromatic residues conserved between humans and yeast and proximal in space to Phe232. B)

Structure of the section of EIF3F containing the Phe232Val variant, highlighted in green.

Amino acids conserved between yeast and human sequences as highlighted in panel A are

shown in grey. The conserved Phe232 side chain is buried (solvent accessibility 0.7%) and

likely plays a stabilizing role, so the Phe232Val variant may disrupt protein stability. See (10 )

for details of structure prediction.

Fig. S12: Optimising the sensitivity of the Alexa Fluor 488 Protein Synthesis Assay Kit for

assessing protein synthesis in iPSCs with or without the EIF3F Phe232Val variant. A)

Representative histogram of fluorescence intensities (AlexaFluor-488), a proxy for the rate of

nascent protein synthesis, across multiple wild-type iPSCs measured by flow cytometry. This

plot shows a single replicate. We gated the major population (blue rectangle; i.e. fluorescence >

104), and used its median (red dotted line) in downstream analysis. B) Dose-response curve for

iPSCs treated with cycloheximide. Data are represented as mean and standard deviation of

median fluorescence intensity of the major Alexa-Fluor 488-positive population across four

independent replicates, three of which were run in parallel with experiments presented in Figure

3C. Note that, for the leftmost point, the error bars are smaller than the dot and thus not visible.

Fig. S13. The EIF3F Phe232Val variant reduces proliferation of iPSCs in the homozygous but

not heterozygous state. (A) Mean cell counts four days after plating 50,000 cells per line, with

95% confidence intervals, from three independent replicates. (B) Proportions of cells that have

undergone zero, one or multiple divisions in a cell trace violet (CTV) proliferation assay. There

are four replicates for each of the two independent cell lines for each genotype. The

homozygous line had significantly more cells in division 1 than wild-type and heterozygous

lines (p=2x10-5; linear regression including total number of cells as covariate). See Fig. 3C for

representative distributions of CTV intensity, one for each genotype.

Fig. S14: No difference in cell apoptosis of iPSCs containing the EIF3F variant versus

wild-type cells. Graph shows mean and standard deviation of the percentage of live, apoptotic

or necrotic cells, assayed using annexin V or propidium iodide staining (see (10)). Data derived

from three replicates.

Fig. S15: Plot showing haplotypes of common SNPs around KDM5B in individuals with de

novo missense or LoF mutations or with monoallelic or biallelic LoFs. These is no evidence for

a local haplotype shared by multiple probands with monoallelic LoFs that was not also present

in an unaffected parent with a monoallelic LoF. The region shown lies between two

recombination hotspots. The rows represent phased haplotypes, with orange and green

rectangles corresponding to the different alleles at the SNPs at the positions indicated along the

bottom. Hierarchical clustering has been applied to the haplotypes, as indicated by the

dendrogram on the left, and the labels on the right indicate which individual carries the

haplotype, and whether the individual was a proband carrying a de novo (purple), a biallelic

LoF (dark green), or an inherited heterozygous LoF (yellow), or a parent carrying a

heterozygous LoF (pink).

Fig. S16: Violin plots of the beta values (the ratio of methylated to unmethylated alleles) at each

of the CpGs in the 10kb region around the KDM5B promoter. The CpGs within the KDM5B and

KDM5B-AS1 promoters are annotated below the plot, with coordinates relative to hg19. The

bottom panel shows the negative controls (probands with likely causal de novo mutations in

known DD genes not expressed in blood), and the other panels show probands with variants in

KDM5B that are either biallelic (top panel), de novo (second panel) or monoallelic and inherited

(third panel).

Fig. S17: Results from a principal components analysis of genome-wide DNA methylation in

probands with de novo mutations or biallelic or inherited monoallelic LoFs in KDM5B. The lack

of differences between groups suggests that LoFs in KDM5B do not manifest in DNA

methylation changes, at least not in whole blood of children in this age range (age 18 months to

16 years).

Fig. S18: Anterior-posterior facial photographs of one of the individuals with biallelic KDM5B

variants. Note the narrow palpebral fissures, arched or thick eyebrows, dark eyelashes, low

hanging columella, smooth philtrum and thin upper vermillion border.

Fig. S19: Cognitive and skeletal defects of homozygous Kdm5b knockout mice. A) Impaired

long-term spatial memory in the Probe trial of Barnes Maze assay. 72 hours after learning the

location of the target hole, wild-type mice spent significantly more time around the target hole

compared to knockout mice. Two-way ANOVA, p=0.026 for hole ✕ genotype interaction; with

Bonferroni’s multiple comparison test, p<0.001 for Target hole. B) Representative heatmap of

wild-type mouse activity around the Barnes Maze during the 72 hour probe trial. C) Skeletal

defects identified from X-ray analysis. All homozygous Kdm5b knockout mice analysed (n=7)

had transitional vertebrae (one-tailed Fisher's exact test p=0.03) compared to none of the

wild-types (n=7). This is an X-ray image of the dorsal view of wild-type and homozygous

Kdm5b knockout mice. This case shows a transitional Lumbar (L1) to Sacral (S1) vertebra,

resulting in an extra Sacral vertebra (arrow), and the absence of one Thoracic vertebra (T13,

arrowhead). Green dots indicate Thoracic vertebra, blue are Lumbar, red are Sacral and brown

are Caudal vertebra.

Fig. S20: Plots showing effect of variant filtering strategies on number of variants, Mendelian

errors and Ti/Tv. We first set genotypes to missing based on genotype quality (GQ), depth (AD)

and the p-value from a test of allele balance (pAB), and then removed sites according to the

proportion of missing genotypes.

Supplementary Tables

Table S1: Summary of the phenotypes in the study, for different subsets of probands. We show

the proportion of patients with each higher-level phenotype (based on HPO terms), and results

from Fisher’s exact tests for a difference in these proportions between different subsets of

probands. We also show summary statistics for the number of HPO terms, various

developmental milestones, and results from Mann-Whitney U tests comparing these quantitative

metrics between different subsets of probands. We also summarise the number of probands who

achieved (or who had not yet achieved) the milestone at an age ≤2SD, 2SD-4SD, 4SD-8SD and

>8SD from the mean for normal individuals. For these, the distributions for normal children

were extracted from (50) for age at first words and (51 ) for age at walking. % Within the ID/DD

category, the classes are mutually exclusive of one another and based on HPO terms that fall

under HP:0001249 (Intellectual Disability) or HP:0001263 (Global Developmental Delay). For

creating this combined category, we took the most severe or (if unclear) more specific term (e.g.

mild ID and severe/profound DD → severe/profound ID/DD; mild DD and DD of unknown

severity → mild ID/DD). & Several cognition-related terms do not come under HP:0001249

(Intellectual Disability) or HP:0001263 (Global Developmental Delay) (i.e. HP:0100543

(Cognitive impairment), HP:0001328 (Specific learning disability) and HP:0000750 (Delayed

speech and language development)), so we included these here. * For the mean, median,

standard deviation and Mann-Whitney U tests of the developmental milestones, if probands had

not yet achieved the milestone, we set them to missing if they were under age 18 months and to

their age at assessment if over 18 months. + The organ system counts were carried out as

described in the Methods, in order to avoid double-counting the same HPO term that occurred

under multiple organ systems.

Table S2: Estimates of the , the proportion of probands explained by diagnostic biallelic π

coding genotypes or de novo coding mutations, for different sample sets. Shown are the

maximum likelihood estimates for , and a 95% confidence interval. See Methods for how π

these were calculated.

Table S3: Results from tests of an excess of damaging biallelic genotypes for all genes. The

lowest p-value out of the eight tests conducted for each gene is shown. We give results for the

stringent ancestry filter (4318 EABI probands and 333 PABI probands), as shown in Table 1, as

well as the lenient ancestry filter (4942 European ancestry probands and 498 South Asian

ancestry probands).

Table S4: Genes enriched for damaging biallelic coding genotypes with p<1×10-4 (for eight

tests; see ( 10)). Shown are the number of observed biallelic genotypes of different consequence

classes, the lowest p-value out of the eight tests (achieved using EABI alone for all genes except

for VPS13B), details of the corresponding test, and the p-value for phenotypic similarity for the

relevant probands (9). Known recessive DD genes from the DDG2P list are indicated

(http://www.ebi.ac.uk/gene2phenotype/). LZTR1 has been implicated as a dominant cause of

Noonan syndrome ( 52) and there is some prior evidence supporting LINS as a recessive ID gene

(53, 54).

Table S5: Phenotypes of the nine patients homozygous for the EIF3F Phe232Val variant. Only

five were used in the initial gene discovery analysis, which excluded one sibling from each of

families 2 and 3, 260950 (who had a potentially diagnostic inherited X-linked variant in

HUWE1 , subsequently deemed to be benign since it did not segregate with disease in his

family), and 265452 (who had no parental genetic data available).

Table S6: De novo mutations in KDM5B from this and previous studies, and inherited LoFs in

KDM5B from this study. Note that, of the 26 probands with inherited LoFs, five of them were

reported to have a parent who had at least one clinical phenotype shared with the child, but for

only two families was this the parent who carried the LoF.

Table S7: Phenotypes of the probands with biallelic or de novo KDM5B variants.

References and Notes 1. The Deciphering Developmental Disorders Study, Large-scale discovery of novel genetic

causes of developmental disorders. Nature 519, 223–228 (2015). doi:10.1038/nature14135 Medline

2. R. K. C Yuen, D. Merico, M. Bookman, J. L Howe, B. Thiruvahindrapuram, R. V. Patel, J. Whitney, N. Deflaux, J. Bingham, Z. Wang, G. Pellecchia, J. A. Buchanan, S. Walker, C. R. Marshall, M. Uddin, M. Zarrei, E. Deneault, L. D’Abate, A. J. S. Chan, S. Koyanagi, T. Paton, S. L. Pereira, N. Hoang, W. Engchuan, E. J. Higginbotham, K. Ho, S. Lamoureux, W. Li, J. R. MacDonald, T. Nalpathamkalam, W. W. L. Sung, F. J. Tsoi, J. Wei, L. Xu, A.-M. Tasse, E. Kirby, W. Van Etten, S. Twigger, W. Roberts, I. Drmic, S. Jilderda, B. M. K. Modi, B. Kellam, M. Szego, C. Cytrynbaum, R. Weksberg, L. Zwaigenbaum, M. Woodbury-Smith, J. Brian, L. Senman, A. Iaboni, K. Doyle-Thomas, A. Thompson, C. Chrysler, J. Leef, T. Savion-Lemieux, I. M. Smith, X. Liu, R. Nicolson, V. Seifer, A. Fedele, E. H. Cook, S. Dager, A. Estes, L. Gallagher, B. A. Malow, J. R. Parr, S. J. Spence, J. Vorstman, B. J. Frey, J. T. Robinson, L. J. Strug, B. A. Fernandez, M. Elsabbagh, M. T. Carter, J. Hallmayer, B. M. Knoppers, E. Anagnostou, P. Szatmari, R. H. Ring, D. Glazer, M. T. Pletcher, S. W. Scherer, Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017). doi:10.1038/nn.4524 Medline

3. S. C. Jin, J. Homsy, S. Zaidi, Q. Lu, S. Morton, S. R. DePalma, X. Zeng, H. Qi, W. Chang, M. C. Sierant, W.-C. Hung, S. Haider, J. Zhang, J. Knight, R. D. Bjornson, C. Castaldi, I. R. Tikhonoa, K. Bilguvar, S. M. Mane, S. J. Sanders, S. Mital, M. W. Russell, J. W. Gaynor, J. Deanfield, A. Giardini, G. A. Porter Jr., D. Srivastava, C. W. Lo, Y. Shen, W. S. Watkins, M. Yandell, H. J. Yost, M. Tristani-Firouzi, J. W. Newburger, A. E. Roberts, R. Kim, H. Zhao, J. R. Kaltman, E. Goldmuntz, W. K. Chung, J. G. Seidman, B. D. Gelb, C. E. Seidman, R. P. Lifton, M. Brueckner, Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017). doi:10.1038/ng.3970 Medline

4. Deciphering Developmental Disorders Study, Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017). doi:10.1038/nature21062 Medline

5. H. H. Ropers, Genetics of early onset cognitive impairment. Annu. Rev. Genomics Hum. Genet. 11, 161–187 (2010). doi:10.1146/annurev-genom-082509-141640 Medline

6. L. E. L. M. Vissers, C. Gilissen, J. A. Veltman, Genetic studies in intellectual disability and related disorders. Nat. Rev. Genet. 17, 9–18 (2016). doi:10.1038/nrg3999 Medline

7. P. A. Baird, T. W. Anderson, H. B. Newcombe, R. B. Lowry, Genetic disorders in children and young adults: A population study. Am. J. Hum. Genet. 42, 677–693 (1988). Medline

8. S. J. Schrodi, A. DeBarber, M. He, Z. Ye, P. Peissig, J. J. Van Wormer, R. Haws, M. H. Brilliant, R. D. Steiner, Prevalence estimation for monogenic autosomal recessive diseases using population-based genetic data. Hum. Genet. 134, 659–669 (2015). doi:10.1007/s00439-015-1551-8 Medline

http://dx.doi.org/10.1038/nature14135

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=25533962&dopt=Abstract

http://dx.doi.org/10.1038/nn.4524


http://dx.doi.org/10.1038/ng.3970




http://dx.doi.org/10.1146/annurev-genom-082509-141640


http://dx.doi.org/10.1038/nrg3999



http://dx.doi.org/10.1007/s00439-015-1551-8


9. N. Akawi, J. McRae, M. Ansari, M. Balasubramanian, M. Blyth, A. F. Brady, S. Clayton, T. Cole, C. Deshpande, T. W. Fitzgerald, N. Foulds, R. Francis, G. Gabriel, S. S. Gerety, J. Goodship, E. Hobson, W. D. Jones, S. Joss, D. King, N. Klena, A. Kumar, M. Lees, C. Lelliott, J. Lord, D. McMullan, M. O’Regan, D. Osio, V. Piombo, E. Prigmore, D. Rajan, E. Rosser, A. Sifrim, A. Smith, G. J. Swaminathan, P. Turnpenny, J. Whitworth, C. F. Wright, H. V. Firth, J. C. Barrett, C. W. Lo, D. R. FitzPatrick, M. E. Hurles; DDD study, Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families. Nat. Genet. 47, 1363–1369 (2015). doi:10.1038/ng.3410 Medline

10. Materials and methods are available as supplementary materials. 11. C. F. Wright, T. W. Fitzgerald, W. D. Jones, S. Clayton, J. F. McRae, M. van Kogelenberg,

D. A. King, K. Ambridge, D. M. Barrett, T. Bayzetinova, A. P. Bevan, E. Bragin, E. A. Chatzimichali, S. Gribble, P. Jones, N. Krishnappa, L. E. Mason, R. Miller, K. I. Morley, V. Parthiban, E. Prigmore, D. Rajan, A. Sifrim, G. J. Swaminathan, A. R. Tivey, A. Middleton, M. Parker, N. P. Carter, J. C. Barrett, M. E. Hurles, D. R. FitzPatrick, H. V. Firth; DDD study, Genetic diagnosis of developmental disorders in the DDD study: A scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). doi:10.1016/S0140-6736(14)61705-0 Medline

12. A. S. Teebi, Autosomal recessive disorders among Arabs: An overview from Kuwait. J. Med. Genet. 31, 224–233 (1994). doi:10.1136/jmg.31.3.224 Medline

13. G. O. Tadmouri, P. Nair, T. Obeid, M. T. Al Ali, N. Al Khaja, H. A. Hamamy, Consanguinity and reproductive health among Arabs. Reprod. Health 6, 17 (2009). doi:10.1186/1742-4755-6-17 Medline

14. C. Gilissen, J. Y. Hehir-Kwa, D. T. Thung, M. van de Vorst, B. W. M. van Bon, M. H. Willemsen, M. Kwint, I. M. Janssen, A. Hoischen, A. Schenck, R. Leach, R. Klein, R. Tearle, T. Bo, R. Pfundt, H. G. Yntema, B. B. A. de Vries, T. Kleefstra, H. G. Brunner, L. E. L. M. Vissers, J. A. Veltman, Genome sequencing identifies major causes of severe intellectual disability. Nature 511, 344–347 (2014). doi:10.1038/nature13394 Medline

15. C. L. Beaulieu, L. Huang, A. M. Innes, M.-A. Akimenko, E. G. Puffenberger, C. Schwartz, P. Jerry, C. Ober, R. A. Hegele, D. R. McLeod, J. Schwartzentruber, J. Majewski, D. E. Bulman, J. S. Parboosingh, K. M. Boycott; FORGE Canada Consortium, Intellectual disability associated with a homozygous missense mutation in THOC6. Orphanet J. Rare Dis. 8, 62 (2013). doi:10.1186/1750-1172-8-62 Medline

16. A. Fogli, O. Boespflug-Tanguy, The large spectrum of eIF2B-related diseases. Biochem. Soc. Trans. 34, 22–29 (2006). doi:10.1042/BST0340022 Medline

17. V. Faundes, W. G. Newman, L. Bernardini, N. Canham, J. Clayton-Smith, B. Dallapiccola, S. J. Davies, M. K. Demos, A. Goldman, H. Gill, R. Horton, B. Kerr, D. Kumar, A. Lehman, S. McKee, J. Morton, M. J. Parker, J. Rankin, L. Robertson, I. K. Temple, S. Banka; Clinical Assessment of the Utility of Sequencing and Evaluation as a Service (CAUSES) Study; Deciphering Developmental Disorders (DDD) Study, Histone Lysine



http://dx.doi.org/10.1016/S0140-6736(14)61705-0


http://dx.doi.org/10.1136/jmg.31.3.224


http://dx.doi.org/10.1186/1742-4755-6-17




http://dx.doi.org/10.1186/1750-1172-8-62


http://dx.doi.org/10.1042/BST0340022


Methylases and Demethylases in the Landscape of Human Developmental Disorders. Am. J. Hum. Genet. 102, 175–187 (2018). doi:10.1016/j.ajhg.2017.11.013 Medline

18. I. Iossifov, B. J. O’Roak, S. J. Sanders, M. Ronemus, N. Krumm, D. Levy, H. A. Stessman, K. T. Witherspoon, L. Vives, K. E. Patterson, J. D. Smith, B. Paeper, D. A. Nickerson, J. Dea, S. Dong, L. E. Gonzalez, J. D. Mandell, S. M. Mane, M. T. Murtha, C. A. Sullivan, M. F. Walker, Z. Waqar, L. Wei, A. J. Willsey, B. Yamrom, Y. H. Lee, E. Grabowska, E. Dalkic, Z. Wang, S. Marks, P. Andrews, A. Leotta, J. Kendall, I. Hakker, J. Rosenbaum, B. Ma, L. Rodgers, J. Troge, G. Narzisi, S. Yoon, M. C. Schatz, K. Ye, W. R. McCombie, J. Shendure, E. E. Eichler, M. W. State, M. Wigler, The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014). doi:10.1038/nature13908 Medline

19. G. L. Carvill, H. C. Mefford, Microdeletion syndromes. Curr. Opin. Genet. Dev. 23, 232–239 (2013). doi:10.1016/j.gde.2013.03.004 Medline

20. H. H. Ropers, T. Wienker, Penetrance of pathogenic mutations in haploinsufficient genes for intellectual disability and related disorders. Eur. J. Med. Genet. 58, 715–718 (2015). doi:10.1016/j.ejmg.2015.10.007 Medline

21. C. N. Vallianatos, S. Iwase, Disrupted intricacy of histone H3K4 methylation in neurodevelopmental disorders. Epigenomics 7, 503–519 (2015). doi:10.2217/epi.15.1 Medline

22. M. Lek, K. J. Karczewski, E. V. Minikel, K. E. Samocha, E. Banks, T. Fennell, A. H. O’Donnell-Luria, J. S. Ware, A. J. Hill, B. B. Cummings, T. Tukiainen, D. P. Birnbaum, J. A. Kosmicki, L. E. Duncan, K. Estrada, F. Zhao, J. Zou, E. Pierce-Hoffman, J. Berghout, D. N. Cooper, N. Deflaux, M. DePristo, R. Do, J. Flannick, M. Fromer, L. Gauthier, J. Goldstein, N. Gupta, D. Howrigan, A. Kiezun, M. I. Kurki, A. L. Moonshine, P. Natarajan, L. Orozco, G. M. Peloso, R. Poplin, M. A. Rivas, V. Ruano-Rubio, S. A. Rose, D. M. Ruderfer, K. Shakir, P. D. Stenson, C. Stevens, B. P. Thomas, G. Tiao, M. T. Tusie-Luna, B. Weisburd, H.-H. Won, D. Yu, D. M. Altshuler, D. Ardissino, M. Boehnke, J. Danesh, S. Donnelly, R. Elosua, J. C. Florez, S. B. Gabriel, G. Getz, S. J. Glatt, C. M. Hultman, S. Kathiresan, M. Laakso, S. McCarroll, M. I. McCarthy, D. McGovern, R. McPherson, B. M. Neale, A. Palotie, S. M. Purcell, D. Saleheen, J. M. Scharf, P. Sklar, P. F. Sullivan, J. Tuomilehto, M. T. Tsuang, H. C. Watkins, J. G. Wilson, M. J. Daly, D. G. MacArthur; Exome Aggregation Consortium, Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). doi:10.1038/nature19057 Medline

23. J. X. Chong, M. J. McMillin, K. M. Shively, A. E. Beck, C. T. Marvin, J. R. Armenteros, K. J. Buckingham, N. T. Nkinsi, E. A. Boyle, M. N. Berry, M. Bocian, N. Foulds, M. L. G. Uzielli, C. Haldeman-Englert, R. C. M. Hennekam, P. Kaplan, A. D. Kline, C. L. Mercer, M. J. M. Nowaczyk, J. S. Klein Wassink-Ruiter, E. W. McPherson, R. A. Moreno, A. E. Scheuerle, V. Shashi, C. A. Stevens, J. C. Carey, A. Monteil, P. Lory, H. K. Tabor, J. D. Smith, J. Shendure, D. A. Nickerson, M. J. Bamshad, M. J. Bamshad, J. Shendure, D. A. Nickerson, G. R. Abecasis, P. Anderson, E. M. Blue, M. Annable, B. L. Browning, K. J. Buckingham, C. Chen, J. Chin, J. X. Chong, G. M. Cooper, C. P. Davis, C. Frazar, T. M.

http://dx.doi.org/10.1016/j.ajhg.2017.11.013




http://dx.doi.org/10.1016/j.gde.2013.03.004


http://dx.doi.org/10.1016/j.ejmg.2015.10.007


http://dx.doi.org/10.2217/epi.15.1




Harrell, Z. He, P. Jain, G. P. Jarvik, G. Jimenez, E. Johanson, G. Jun, M. Kircher, T. Kolar, S. A. Krauter, N. Krumm, S. M. Leal, D. Luksic, C. T. Marvin, M. J. McMillin, S. McGee, P. O’Reilly, B. Paeper, K. Patterson, M. Perez, S. W. Phillips, J. Pijoan, C. Poel, F. Reinier, P. D. Robertson, R. Santos-Cortez, T. Shaffer, C. Shephard, K. M. Shively, D. L. Siegel, J. D. Smith, J. C. Staples, H. K. Tabor, M. Tackett, J. G. Underwood, M. Wegener, G. Wang, M. M. Wheeler, Q. Yi; University of Washington Center for Mendelian Genomics, De novo mutations in NALCN cause a syndrome characterized by congenital contractures of the limbs and face, hypotonia, and developmental delay. Am. J. Hum. Genet. 96, 462–473 (2015). doi:10.1016/j.ajhg.2015.01.003 Medline

24. M. D. Al-Sayed, H. Al-Zaidan, A. Albakheet, H. Hakami, R. Kenana, Y. Al-Yafee, M. Al-Dosary, A. Qari, T. Al-Sheddi, M. Al-Muheiza, W. Al-Qubbaj, Y. Lakmache, H. Al-Hindi, M. Ghaziuddin, D. Colak, N. Kaya, Mutations in NALCN cause an autosomal-recessive syndrome with severe hypotonia, speech impairment, and cognitive delay. Am. J. Hum. Genet. 93, 721–726 (2013). doi:10.1016/j.ajhg.2013.08.001 Medline

25. J. Rainger, D. Pehlivan, S. Johansson, H. Bengani, L. Sanchez-Pulido, K. A. Williamson, M. Ture, H. Barker, K. Rosendahl, J. Spranger, D. Horn, A. Meynert, J. A. B. Floyd, T. Prescott, C. A. Anderson, J. K. Rainger, E. Karaca, C. Gonzaga-Jauregui, S. Jhangiani, D. M. Muzny, A. Seawright, D. C. Soares, M. Kharbanda, V. Murday, A. Finch, R. A. Gibbs, V. van Heyningen, M. S. Taylor, T. Yakut, P. M. Knappskog, M. E. Hurles, C. P. Ponting, J. R. Lupski, G. Houge, D. R. FitzPatrick, M. Hurles, D. R. FitzPatrick, S. Al-Turki, C. Anderson, I. Barroso, P. Beales, J. Bentham, S. Bhattacharya, K. Carss, K. Chatterjee, S. Cirak, C. Cosgrove, A. Daly, J. Floyd, C. Franklin, M. Futema, S. Humphries, S. McCarthy, H. Mitchison, F. Muntoni, A. Onoufriadis, V. Parker, F. Payne, V. Plagnol, L. Raymond, D. Savage, P. Scambler, M. Schmidts, R. Semple, E. Serra, J. Stalker, M. van Kogelenberg, P. Vijayarangakannan, K. Walter, G. Wood; UK10K; Baylor-Hopkins Center for Mendelian Genomics, Monoallelic and biallelic mutations in MAB21L2 cause a spectrum of major eye malformations. Am. J. Hum. Genet. 94, 915–923 (2014). doi:10.1016/j.ajhg.2014.05.005 Medline

26. M. Albert, S. U. Schmitz, S. M. Kooistra, M. Malatesta, C. Morales Torres, J. C. Rekling, J. V. Johansen, I. Abarrategui, K. Helin, The histone demethylase Jarid1b ensures faithful mouse development by protecting developmental genes from aberrant H3K4me3. PLOS Genet. 9, e1003461 (2013). doi:10.1371/journal.pgen.1003461 Medline

27. S. Köhler, M. H. Schulz, P. Krawitz, S. Bauer, S. Dölken, C. E. Ott, C. Mundlos, D. Horn, S. Mundlos, P. N. Robinson, Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464 (2009). doi:10.1016/j.ajhg.2009.09.003 Medline

28. E. Bragin, E. A. Chatzimichali, C. F. Wright, M. E. Hurles, H. V. Firth, A. P. Bevan, G. J. Swaminathan, DECIPHER: Database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation. Nucleic Acids Res. 42 (D1), D993–D1000 (2014). doi:10.1093/nar/gkt937 Medline

29. E. Sheridan, J. Wright, N. Small, P. C. Corry, S. Oddie, C. Whibley, E. S. Petherick, T. Malik, N. Pawson, P. A. McKinney, R. C. Parslow, Risk factors for congenital anomaly







http://dx.doi.org/10.1371/journal.pgen.1003461




http://dx.doi.org/10.1093/nar/gkt937


in a multiethnic birth cohort: An analysis of the Born in Bradford study. Lancet 382, 1350–1359 (2013). doi:10.1016/S0140-6736(13)61132-0 Medline

30. W. McLaren, L. Gil, S. E. Hunt, H. S. Riat, G. R. S. Ritchie, A. Thormann, P. Flicek, F. Cunningham, The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016). doi:10.1186/s13059-016-0974-4 Medline

31. 1000 Genomes Project Consortium, A global reference for human genetic variation. Nature 526, 68–74 (2015). doi:10.1038/nature15393 Medline

32. A. L. Price, N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick, D. Reich, Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006). doi:10.1038/ng1847 Medline

33. V. Narasimhan, P. Danecek, A. Scally, Y. Xue, C. Tyler-Smith, R. Durbin, BCFtools/RoH: A hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32, 1749–1751 (2016). doi:10.1093/bioinformatics/btw044 Medline

34. M. P. Conomos, A. P. Reiner, B. S. Weir, T. A. Thornton, Model-free Estimation of Recent Genetic Relatedness. Am. J. Hum. Genet. 98, 127–148 (2016). doi:10.1016/j.ajhg.2015.11.022 Medline

35. K. E. Samocha, E. B. Robinson, S. J. Sanders, C. Stevens, A. Sabo, L. M. McGrath, J. A. Kosmicki, K. Rehnström, S. Mallick, A. Kirby, D. P. Wall, D. G. MacArthur, S. B. Gabriel, M. DePristo, S. M. Purcell, A. Palotie, E. Boerwinkle, J. D. Buxbaum, E. H. Cook Jr., R. A. Gibbs, G. D. Schellenberg, J. S. Sutcliffe, B. Devlin, K. Roeder, B. M. Neale, M. J. Daly, A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). doi:10.1038/ng.3050 Medline

36. E. Krissinel, K. Henrick, Multiple alignment of protein structures in three dimensions. CompLife 3695, 67–78 (2005).

37. E. Krissinel, K. Henrick, Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. D Biol. Crystallogr. 60, 2256–2268 (2004). doi:10.1107/S0907444904026460 Medline

38. S. J. Hubbard, J. M. Thornton, NACCESS-Computer Program. 1993. Department of Biochemistry and Molecular Biology, University College London.

39. W. S. J. Valdar, Scoring residue conservation. Proteins 48, 227–241 (2002). doi:10.1002/prot.10146 Medline

40. G. E. Crooks, G. Hon, J.-M. Chandonia, S. E. Brenner, WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004). doi:10.1101/gr.849004 Medline

41. H. Kilpinen, A. Goncalves, A. Leha, V. Afzal, K. Alasoo, S. Ashford, S. Bala, D. Bensaddek, F. P. Casale, O. J. Culley, P. Danecek, A. Faulconbridge, P. W. Harrison, A. Kathuria, D. McCarthy, S. A. McCarthy, R. Meleckyte, Y. Memari, N. Moens, F. Soares, A. Mann, I. Streeter, C. A. Agu, A. Alderton, R. Nelson, S. Harper, M. Patel, A. White, S. R. Patel, L. Clarke, R. Halai, C. M. Kirton, A. Kolb-Kokocinski, P. Beales, E. Birney, D. Danovi, A.

http://dx.doi.org/10.1016/S0140-6736(13)61132-0


http://dx.doi.org/10.1186/s13059-016-0974-4




http://dx.doi.org/10.1038/ng1847


http://dx.doi.org/10.1093/bioinformatics/btw044






http://dx.doi.org/10.1107/S0907444904026460


http://dx.doi.org/10.1002/prot.10146


http://dx.doi.org/10.1101/gr.849004


I. Lamond, W. H. Ouwehand, L. Vallier, F. M. Watt, R. Durbin, O. Stegle, D. J. Gaffney, Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017). doi:10.1038/nature22403 Medline

42. M. Knapp, The transmission/disequilibrium test and parental-genotype reconstruction: The reconstruction-combined transmission/ disequilibrium test. Am. J. Hum. Genet. 64, 861–870 (1999). doi:10.1086/302285 Medline

43. D. Szklarczyk, J. H. Morris, H. Cook, M. Kuhn, S. Wyder, M. Simonovic, A. Santos, N. T. Doncheva, A. Roth, P. Bork, L. J. Jensen, C. von Mering, The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45 (D1), D362–D368 (2017). doi:10.1093/nar/gkw937 Medline

44. B. J. Klein, L. Piao, Y. Xi, H. Rincon-Arano, S. B. Rothbart, D. Peng, H. Wen, C. Larson, X. Zhang, X. Zheng, M. A. Cortazar, P. V. Peña, A. Mangan, D. L. Bentley, B. D. Strahl, M. Groudine, W. Li, X. Shi, T. G. Kutateladze, The histone-H3K4-specific demethylase KDM5B binds to its substrate and product through distinct PHD fingers. Cell Reports 6, 325–335 (2014). doi:10.1016/j.celrep.2013.12.021 Medline

45. O. Delaneau, J.-F. Zagury, J. Marchini, Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013). doi:10.1038/nmeth.2307 Medline

46. P. Mali, L. Yang, K. M. Esvelt, J. Aach, M. Guell, J. E. DiCarlo, J. E. Norville, G. M. Church, RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013). doi:10.1126/science.1232033 Medline

47. K. Gapp, J. Bohacek, J. Grossmann, A. M. Brunner, F. Manuella, P. Nanni, I. M. Mansuy, Potential of Environmental Enrichment to Prevent Transgenerational Effects of Paternal Trauma. Neuropsychopharmacology 41, 2749–2758 (2016). doi:10.1038/npp.2016.87 Medline

48. C. Dias, S. B. Estruch, S. A. Graham, J. McRae, S. J. Sawiak, J. A. Hurst, S. K. Joss, S. E. Holder, J. E. V. Morton, C. Turner, J. Thevenon, K. Mellul, G. Sánchez-Andrade, X. Ibarra-Soria, P. Deriziotis, R. F. Santos, S.-C. Lee, L. Faivre, T. Kleefstra, P. Liu, M. E. Hurles, S. E. Fisher, D. W. Logan; DDD Study, BCL11A Haploinsufficiency Causes an Intellectual Disability Syndrome and Dysregulates Transcription. Am. J. Hum. Genet. 99, 253–274 (2016). doi:10.1016/j.ajhg.2016.05.030 Medline

49. A. Csibi, K. Cornille, M.-P. Leibovitch, A. Poupon, L. A. Tintignac, A. M. J. Sanchez, S. A. Leibovitch, The translation regulatory subunit eIF3f controls the kinase-dependent mTOR signaling required for muscle differentiation and hypertrophy in mouse. PLOS ONE 5, e8994 (2010). doi:10.1371/journal.pone.0008994 Medline

50. M. C. Frank, M. Braginsky, D. Yurovsky, V. A. Marchman, Wordbank: An open repository for developmental vocabulary data. J. Child Lang. 44, 677–694 (2017). doi:10.1017/S0305000916000209 Medline



http://dx.doi.org/10.1086/302285


http://dx.doi.org/10.1093/nar/gkw937


http://dx.doi.org/10.1016/j.celrep.2013.12.021


http://dx.doi.org/10.1038/nmeth.2307


http://dx.doi.org/10.1126/science.1232033


http://dx.doi.org/10.1038/npp.2016.87




http://dx.doi.org/10.1371/journal.pone.0008994


http://dx.doi.org/10.1017/S0305000916000209


51. M. E. Smith, G. Lecker, J. W. Dunlap, E. E. Cureton, The Effects of Race, Sex, and Environment on the Age at Which Children Walk. Pedagog. Semin. J. Genet. Psychol. 38, 489–498 (1930). doi:10.1080/08856559.1930.10532284

52. G. L. Yamamoto, M. Aguena, M. Gos, C. Hung, J. Pilch, S. Fahiminiya, A. Abramowicz, I. Cristian, M. Buscarilli, M. S. Naslavsky, A. C. Malaquias, M. Zatz, O. Bodamer, J. Majewski, A. A. L. Jorge, A. C. Pereira, C. A. Kim, M. R. Passos-Bueno, D. R. Bertola, Rare variants in SOS2 and LZTR1 are associated with Noonan syndrome. J. Med. Genet. 52, 413–421 (2015). doi:10.1136/jmedgenet-2015-103018 Medline

53. N. A. Akawi, F. Al-Jasmi, A. M. Al-Shamsi, B. R. Ali, L. Al-Gazali, LINS, a modulator of the WNT signaling pathway, is involved in human cognition. Orphanet J. Rare Dis. 8, 87 (2013). doi:10.1186/1750-1172-8-87 Medline

54. H. Najmabadi, H. Hu, M. Garshasbi, T. Zemojtel, S. S. Abedini, W. Chen, M. Hosseini, F. Behjati, S. Haas, P. Jamali, A. Zecha, M. Mohseni, L. Püttmann, L. N. Vahid, C. Jensen, L. A. Moheb, M. Bienek, F. Larti, I. Mueller, R. Weissmann, H. Darvish, K. Wrogemann, V. Hadavi, B. Lipkowitz, S. Esmaeeli-Nieh, D. Wieczorek, R. Kariminejad, S. G. Firouzabadi, M. Cohen, Z. Fattahi, I. Rost, F. Mojahedi, C. Hertzberg, A. Dehghan, A. Rajab, M. J. S. Banavandi, J. Hoffer, M. Falah, L. Musante, V. Kalscheuer, R. Ullmann, A. W. Kuss, A. Tzschach, K. Kahrizi, H. H. Ropers, Deep sequencing reveals 50 novel genes for recessive cognitive disorders. Nature 478, 57–63 (2011). doi:10.1038/nature10423 Medline

http://dx.doi.org/10.1080/08856559.1930.10532284

http://dx.doi.org/10.1136/jmedgenet-2015-103018


http://dx.doi.org/10.1186/1750-1172-8-87




supplementary material for...had undergone dna capture with either the agilent sureselect human all...

Documents