handbook - centomd® · clinical significance according to centomd® .....14 information on disease...

37
Page | 1 CentoMD® 3.0 Handbook_V1_May2016 Handbook Precautions/warnings: For professional use only. To support clinical diagnosis.

Upload: truonghanh

Post on 06-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page | 1

CentoMD® 3.0 Handbook_V1_May2016

Handbook

Precautions/warnings:

For professional use only.

To support clinical diagnosis.

Page | 2

CentoMD® 3.0 Handbook_V1_May2016

Contents

Introduction ................................................................................................. 3

Intended use ................................................................................................ 3

Facts and Features ........................................................................................ 3

Technologies used ......................................................................................... 5

Data acquisition and curation policy ................................................................... 7

Database curators ........................................................................................... 7

Data acquisition ............................................................................................. 7

Curation workflow .......................................................................................... 8

Quality status ................................................................................................ 8

Variant-related information ............................................................................. 9

Genetic variants ............................................................................................ 9

Variant location ............................................................................................ 10

Variant type on DNA level ................................................................................ 11

Coding effect ............................................................................................... 12

Variant zygosity ............................................................................................ 13

Allele frequency at CentoMD® ........................................................................... 14

Publication status .......................................................................................... 14

Clinical significance according to CentoMD® .......................................................... 14

Information on disease and inheritance ............................................................. 17

Individual-related information on phenotype and demographics .............................. 18

Clinical statement of CENTOGENE AG ................................................................ 21

Appendix .................................................................................................. 23

Abbreviations used in CentoMD® 3.0 ................................................................... 23

Evidence-based annotation rules to determine the clinical statement ............................ 23

Glossary .................................................................................................... 28

Page | 3

CentoMD® 3.0 Handbook_V1_May2016

Introduction

Diagnosing a patient with a rare disease is a complex task because not all existing genetic

variants have been described or precisely annotated. Medical professionals need to obtain

all available knowledge about the detected genetic variants in a patient in order to

establish a possible most accurate diagnosis.

CentoMD® is a holistic database that combines phenotype and genotype information

gathered from genetic tests conducted at CENTOGENE AG. This means that every variant

reported in CentoMD® is linked to at least one clinically described individual analyzed

through a standardized workflow with accredited quality. Respectively, CentoMD® is a

growing database; newly generated data will be imported quarterly.

This handbook describes the content of CentoMD®, how this content is generated, how

clinical significance classes are defined, and how quality standards are fulfilled. The

accompanying CentoMD® user guide provides a detailed description of how to use this

web based database.

Intended use

CentoMD® is browser based software that supports a comprehensive and unique

repository of genetic and clinical information based on patient’s diagnosis. It aids

medically trained professionals in the evaluation of the genetic variants that have been

identified in their own patients. This enhances the validity of the genetic analytical

workflow and aids the clinicians in evaluating treatment options for patients with

hereditary diseases. It correlates the clinical information of consented patients and

probands of different ethnical background with a large dataset of genetic variants, and

biomarkers (where available). The genetic variants are detected utilizing accredited

laboratory technologies for Sanger, NGS and WES sequencing as well as insertion and

deletion analysis by MLPA/qPCR.

Facts and Features

CentoMD® provides detailed information of variants detected in consented individuals

who were referred to genetic testing by their physicians in order to evaluate whether

Page | 4

CentoMD® 3.0 Handbook_V1_May2016

they are affected by or are carriers of mutations which cause rare hereditary diseases.

This patient cohort is a unique representation of the global population originating from

more than 100 countries. The allele frequencies stated in CentoMD® reflect the

frequency observed in this particular worldwide cohort. For every analyzed individual,

CentoMD® provides information about the genotype-phenotype correlation based on

tested clinical cases. Therefore, all genetic variants are associated to epidemiological

data and clinical information – such as signs and symptoms of the disease – if described by

the physician.

CentoMD® 3.0 contains more than 40,000 variants which are classified and curated (see

Variant quality status). In total, ~ 2,900 phenotypes and ~ 200 million alleles have been

identified in > 74,000 screened individuals. The current release contains more than

11,000 HPO (Human Phenotype Ontology) terms and approximately 23,000 individuals-

HPO term(s) associations.

CentoMD® 3.0 provides the following key features:

o Classified variants and WES variants are now integrated in Genotype to Phenotype

module: Based on approved gene symbols given by the users, CentoMD® provides

detailed data on corresponding genetic variants and the associated epidemiological

data and clinical information following HPO nomenclature.

o Advanced Phenotype to Genotype module: Based on HPO terms provided by the

users, CentoMD® provides hints on candidate genes and related variants underlying

the phenotype of interest.

o Interactive search interface: Users are given the flexibility to perform searching,

sorting, filtering and access specific data contents by simple clicks.

o For clinically relevant (CRV) and uncertain (VUS) variants data can be retrieved at

4 different levels: variant rationale, curated individuals, statistics, and individual

view. Users can see the reasons behind the variant classification, and view

statistics and detailed individual-related data.

Rationale: Summary supporting the clinical significance according to the ACMG

guidelines and internal evidences. In the current release, ~3,700 CRV/VUS are

linked with rationales.

Curated individuals: Detailed information on curated individuals tested

positive for the variant of interest.

Page | 5

CentoMD® 3.0 Handbook_V1_May2016

Statistics: Statistical analyses of curated individuals tested positive for the

variant of interest.

Individual view: Information on individuals (curated and uncurated) tested

positive for the variant of interest as well as classified and curated and/or

classified variants associated with each individual.

o Co-occurrences are indicated: Users can view the association of the variant of

interest with other CRV/VUS variants in the same gene or other genes.

o Data export functions: Users can export data into read-only excel file.

o The annotation and classification of genetic variants is strictly curated by medical

professionals: Users have access to high quality data.

o 56% of CentoMD® classified and curated CRV/VUS variants are unpublished: Users

can access data of CRV/VUS which have not been previously published in

literature.

o Users are notified when variants are re-classified: Users get the latest information

on the clinical significance class of variants of interest.

Technologies used

The following validated technologies are used at CENTOGENE AG to detect changes on

genetic levels and to identify the cause of the disease:

o Sanger: Classical method of DNA sequencing, developed by Fred Sanger, using

chemically altered "dideoxy" bases to terminate newly synthesized DNA fragments

at specific bases (either A, C, T, or G). These fragments are then size-separated,

and the DNA sequence can be read.

o NGS: Next-Generation Sequencing: High-throughput sequencing technology,

allowing the parallel sequencing of multiple genes, producing thousands or millions

of sequences concurrently.

o qPCR: Quantitative Polymerase Chain Reaction: Method to amplify and

simultaneously quantify a targeted DNA molecule. Used especially for detecting

large/gross and gene rearrangements.

o MLPA: Multiplex Ligation-dependent Probe Amplification: Variation of the

multiplex PCR that permits multiple targets to be amplified with only a single

primer pair. Used especially for detecting large/gross and gene rearrangements.

Page | 6

CentoMD® 3.0 Handbook_V1_May2016

o Other method: Used when another methodology has been employed to detect the

variants (like fragment length).

o WES: Whole Exome Sequencing: Brute-force approach that involves modern day

sequencing technology and DNA sequence assembly tools to piece together all

coding portions of the genome. The sequence is then compared to a reference

genome and any differences are noted.

Interpretations of the enzymatic activities and biomarker levels are provided, when

available, as supporting evidence for the relevance of the detected genetic change. For

example, for Fabry disease, which is an X-linked rare genetic lysosomal storage disease,

measurements of enzymatic activities are conducted in males, and measurements of the

biomarker levels are conducted in both males and females.

The terms used to describe the results of biochemical analyses are explained as below:

o Biochemical analysis: Method to analyze enzymatic activity or levels of biomarkers

in samples obtained from patients usually suspected of being affected by a

metabolic disorder. This is a test performed via Tandem Mass Spectrometry to

detect, diagnose, and monitor diseases, disease processes, and susceptibility, and

to determine a course of treatment.

o Biomarker interpretation: Evaluation of the biomarker levels compared to the

reference interval

Normal: Biomarker levels are within the normal range (no change).

Pathological: Biomarker levels are significantly increased compared to the

normal range.

Slightly decreased: Biomarker levels are only slightly decreased compared to

the normal range.

Slightly increased: Biomarker levels are only slightly increased compared to

the normal range.

o Enzyme interpretation: Evaluation of the enzyme activity compared to the

reference interval

Normal: Levels of activity are within the normal range (no change).

Pathological: Levels of activity are significantly decreased compared to the

normal range.

Page | 7

CentoMD® 3.0 Handbook_V1_May2016

Slightly decreased: Levels of activity are only slightly decreased compared

to the normal range.

Slightly increased: Levels of activity are only slightly increased compared to

the normal range.

Data acquisition and curation policy

Curation is the process of collection, association, update and review of genetic and

phenotypic data of patients genetically analyzed at CENTOGENE AG into a structured and

standardized format. It utilizes a combination of computer-based tools and manual

review in order to assure the accuracy, efficiency and quality of the curation process.

Database curators

CentoMD® curators are biologists with strong background in human genetics. They

continuously undergo extensive training to ensure curation consistency and

standardization. They confirm that CentoMD® is error-free (items properly associated and

interpreted, no inconsistencies, and/or discrepancies against detected observations in

house and external sources), and close the curation process by manual approval that

reviewed and curated data agree with standard procedures established in house.

Data acquisition

Data gathering and variant curation are procedures developed and implemented in a web-

based software, that is compliant with the HGNC, HGVS and HPO nomenclatures allowing

collection of variants detected in nuclear coding, nuclear non-coding and mitochondrial

genes. The software integrates in-house sample management systems and analysis

platforms, and additionally utilizes external databases providing the curator with a

comprehensive and straightforward overview of the evidences regarding genotype-

phenotype correlation available in-house versus external information.

The data is gathered by a combination of manual submission and data import following an

individual-oriented model where characteristics belonging to a particular individual

(patient information, clinical data, methodology and detected genetic variants) are

stored and associated together.

Page | 8

CentoMD® 3.0 Handbook_V1_May2016

Curation workflow

To provide high-quality data, the curation process at CENTOGENE AG is divided in 3

phases: variant-wise, individual-wise and warnings-wise procedures.

Curation by variant: To begin the curation process, the variant-linked information is

reviewed. This includes approval of variant nomenclature, terminology, accuracy,

consistency, record completeness.

Curation by individual: In order to start curation by individual, all variants detected in

this individual must be approved. It aims at assuring that the entries belonging to an

individual follow the rules for clinical statement closely, and that all associated data is in

agreement with the agreed guidelines. The following factors are considered as critical for

the clinical statement: variant clinical significance, patient genotype (number of

clinically relevant changes, their zygosity and location -i.e. cis vs. trans), inheritance

pattern of the disorder, the sex of the patient (for X-linked diseases), the phenotypic

description, and if available- levels of biomarkers.

Curation by warning: The database generates warnings at different levels (variant,

individual, gene, database levels) to detect errors, invalid terms and nomenclatures,

inconsistencies, and can provide hints where updates and reviews are necessary. Mostly

these warnings are due to additional evidences obtained internally (medical reports

issued at CENTOGENE AG) or detected externally (e.g. additional articles, publications

and external databases). Each warning is manually resolved. Whenever additional

evidence becomes available, the variants are revised and re-classified accordingly.

Quarterly, all approved individuals are anonymized and then released to CentoMD®,

offering the most complete and up-to-date information possible to its users.

CentoMD® is a constantly growing and enriched database. Whenever additional evidence

provided by the medical professionals in house or by peer-reviewed literature becomes

available, the variants are revised and re-classified accordingly. A detailed overview of

the clinical significance classes captured in CentoMD® is provided in the chapters

“Variant-related information” and “Clinical significance of according to CentoMD®”.

Quality status

CentoMD® 3.0 offers a dataset of variants derived from the integration of classified

variants and WES variants and processed through a standardized workflow which follows

Page | 9

CentoMD® 3.0 Handbook_V1_May2016

international standards and ensures high data quality. In CentoMD® 3.0, different types

of variant and individual quality status are indicated.

There are three types of variant quality status:

o Classified and curated (++): a variant has been assigned to a clinical significance

class and curated by following strictly the ACMG guidelines and internal expertise.

o Classified (+): a variant has been assigned to a clinical significance class according

to ACMG guidelines but has not yet been curated.

o Unclassified (0): a variant has not yet been assigned to any clinical significance

class due to the lack of information. Further evaluation is required.

There are two types of individual quality status:

o Curated (++): An individual associated with classified and curated CRV and/or VUS.

o Uncurated (+): An individual associated with classified (only) CRV and/or VUS or

with unclassified variants.

Variant-related information

Genetic variants

CentoMD® includes genetic variants detected in all types of genes. A gene is defined by a

sequence of DNA that represents a basic unit of heredity, being expressed in RNA and

proteins.

o Mitochondrial: A gene located in the mitochondria.

o Nuclear coding: A gene located in the cell nucleus of a eukaryote that encodes for

protein.

o Nuclear non-coding: A gene located in the cell nucleus that does not encode for a

protein product.

In CentoMD®, each gene is linked with a transcript or reference sequence, i.e. a digital

nucleic acid sequence, assembled by scientists as a representative example of a species'

set of genes. Coding DNA reference sequence refers to a cDNA-derived sequence

containing the full length of all coding regions and non-coding untranslated regions.

Page | 10

CentoMD® 3.0 Handbook_V1_May2016

According to the reference sequence used, the genetic variants are linked with the

corresponding location within the gene, with a particular mutation type on three

different levels: genomic/mitochondrial, cDNA, and protein, closely following the HGVS

guidelines and recommendations, for both small and gross / gene rearrangements.

o Genomic DNA change: Change at gDNA level following numbering based on genomic

DNA reference sequence.

o cDNA change: Change at cDNA level following numbering based on coding DNA

reference sequences.

o Protein change: Change at protein level following numbering based on the amino

acid sequence, using one letter amino acid code and X for designating a translation

termination codon.

Variant location

Variant location refers to the location of the DNA change relative to the transcriptional

initiation site, initiation codon, polyadenylation site, or termination codon of the

corresponding gene.

o Upstream: The region located 5' (upstream) from the 5'UTR region of the gene.

o 5'UTR (5'-Untranslated Region): Sequences on the 5' end of messenger RNA (mRNA)

but not translated into protein. It extends from the transcription start site to just

before the ATG translation initiation codon. 5' UTR may contain sequences that

regulate translation efficiency or mRNA stability.

o Exon: The protein-coding DNA sequences of the gene.

o Intron: The non-coding regions of a gene that interrupt the protein coding regions

(exons).

o 3'UTR (3' Untranslated Region): Particular section of mRNA that starts with the

nucleotide immediately following the stop codon of the coding region. This region

contains transcription and translation regulating sequences.

o Downstream: The region located 3' (downstream) from the polyadenilation signal of

the gene.

For large deletions/duplications and gene rearrangements, the location is indicated by

the first and the last exon affected by the change (for example, e1_e9 stands for a large

deletion/duplication affecting exon 1 to exon 9). If, for example, only one exon is linked

Page | 11

CentoMD® 3.0 Handbook_V1_May2016

with a large deletion, this indicates that particular exon is completely removed (see

mutation types below).

Please note that for mitochondrial genes, only the following locations are valid:

upstream, exon 1, and downstream.

For nuclear non-coding genes, 5’UTR and 3’UTR are invalid entries.

Variant type on DNA level

The variant type describes the different types of changes that can occur in the DNA

sequence. The following types are included in CentoMD®:

o Chromosomal deletion: Loss of parts of chromosomes.

o Complex rearrangement: Involves the structures or number of the chromosomes, it

is referred to as chromosome mutation, or rearrangement, rearranged

chromosomes.

o Conversion: Non-reciprocal transfer of information between homologous

sequences; one DNA sequence replaces a homologous sequence such that the

sequences become identical after the conversion event.

o Deletion: An abnormality in which part of a chromosome (carrying genetic

material) is lost.

o Duplication: Duplication of a sequence of DNA or section of chromosome.

o Gain of methylation: Gain of the normal DNA methylation level.

o Gene & regulatory region(s) deletion: Refers to loss of the entire gene and flanking

regions.

o Gene & regulatory region(s) duplication: Refers to the gain of the entire gene and

flanking regions.

o Gene deletion: Refers to loss of the entire gene.

o Gene duplication: Refers to gain/duplication of the entire gene.

o Gross deletion: Refers to loss of parts of a gene.

o Gross duplication: Refers to gain/duplication of part(s) of a gene.

o Gross inversion: Refers to 180 degree inversion of part(s) of a gene.

o Insertion/Deletion (Indel): Refers to the mutation class that includes a combination

of both insertions and deletions.

o Insertion: Genetic mutation where one or more nucleotides are added (inserted)

into a DNA sequence, or it may involve portions of a chromosome.

Page | 12

CentoMD® 3.0 Handbook_V1_May2016

o Inversion: Chromosomal abnormality where a segment of a chromosome is rotated

180 degree and reinserted.

o Loss of methylation: Loss of the normal DNA methylation level.

o Other/complex: Refers to all other types not included in any category.

o Pathological allele (D4Z4 motif): Deletion of 3.3-kb repeats from a chromosomal

tandem repeat called D4Z4 located near the end of chromosome 4 at the 4q35-ter

location. D4Z4 contains an ORF encoding a putative homeobox protein called

DUX4, a large polymorphic repeat structure consisting of 1–100 KpnI units.

o Repeat expansion: Refers to an increase number of repeats of a genomic tandemly

repeated DNA sequence.

o Retrotransposon insertion: Retrotransposons (also called transposons via RNA

intermediates) are genetic elements that can amplify themselves in a genome, and

can induce mutations by inserting near or within genes. Retrotransposon-induced

mutations are relatively stable, because the sequence at the insertion site is

retained as they transpose via the replication mechanism.

o Substitution: A sequence change where one nucleotide is replaced by one other

nucleotide. Substitutions are described using a ">"-character (indicating "changes

to").

Coding effect

The coding effect describes the sequence changes at protein level. The following types

are distinguished:

o Effect unknown: The coding effect on protein level has not been analyzed. An

effect is expected but difficult to predict.

o Frameshift: Special type of amino acid deletion/insertion affecting an amino acid

between the first (initiation, ATG) and last codon (termination, stop), replacing

the normal C-terminal sequence with one encoded by another reading frame.

o Increased polyglutamine tract/expanded polyQ: Portion of a protein consisting of a

sequence of several glutamine (Glu; Q) units.

o In-frame: A mutation that does not cause a shift in the triplet reading frame.

o Missense: Point mutation in which a single nucleotide change results in a codon

that codes for a different amino acid. Not all missense mutations are deleterious;

some changes can have no effect. Because of the ambiguity of missense mutations,

Page | 13

CentoMD® 3.0 Handbook_V1_May2016

it is often difficult to interpret the consequences of these mutations in causing

disease.

o New translation initiation site: A change affecting the translation initiation codon

(Met-1) introducing a new upstream initiation codon extending the N-terminus of

the encoded protein.

o Non-coding: The change on DNA level produces no effect on protein, or the effect

of regulatory mutations is unknown.

o Nonsense: Point mutation in a sequence of DNA that results in a premature stop

codon, and in a truncated, incomplete protein product.

o Silent: A form of point mutation at DNA level resulting in a codon that codes for

the same amino acid without any functional change in the protein product.

o Splicing mutation: DNA changes affecting the splicing process (i.e. intron removal

and exons joining). Splice-site mutations occur within genes in the noncoding

regions (introns) just next to the coding regions (exons). Splice-site mutations can

eliminate an existing donor or acceptor site, which will cause an exon to be

skipped and possibly result in a frameshift.

o Start loss: A start-loss mutation is a point mutation in the ATG start codon that

prevents the original start translation site from being used. This kind of mutation

will obviously eliminate gene function.

o New translation termination codon: A change affecting the translation termination

codon (Ter/*) introducing a new downstream termination codon extending the C-

terminus of the encoded protein.

Variant zygosity

Zygosity indicates if a variant is detected on one chromosome or on both chromosomes

and therefore describes the degree of similarity of the alleles for a trait in an organism.

The following zygosities are included in CentoMD®:

o Heterozygous (Het): Gene locus when cells contain two different alleles of a gene.

o Homozygous (Hom): Gene when identical alleles of the gene are present on both

homologous chromosomes.

o Hemizygous (Hem): Used for alleles detected in genes located on X-chromosome

for male cases.

Page | 14

CentoMD® 3.0 Handbook_V1_May2016

For the mitochondrial variants, the zygosity must be read as the degree of heteroplasmy,

i.e. as a mixture of more than one type of mitochondrial DNA (mDNA) within a

cell/individual. In those cases where a mutant in mDNA is responsible for a disease, the

larger the proportion of mutant mitochondria, the more likely the person will show

symptoms of the disease.

Two degrees of heteroplasmy are included:

o Heteroplasmic: The cell has some mitochondria that have a mutation in the mDNA

and some that do not.

o Homoplasmic: The cell has a uniform collection of mDNA: either completely normal

mDNA or completely mutant mDNA.

Allele frequency at CentoMD®

This number indicates the allele frequency of a particular variant which was observed at

CENTOGENE AG in comparison to the total number of analyzed individuals.

Publication status

The publication status indicates if the identified variant has previously been published in

the literature as a disease causing variant or not.

Additionally, the Single Nucleotide Polymorphism Database (dbSNP) ID is provided, if

available. The dbSNP is an archive of genetic variations within and across different

species developed and hosted by the National Center for Biotechnology Information

(NCBI) in collaboration with the National Human Genome Research Institute (NHGRI) and

available to the public.

Clinical significance according to CentoMD®

The classification of genetic germline variants is done according to the ACMG guidelines

(Richards et al. (2015), Genet. Med., doi:10.1038/gim2015.30 - except that neutral is

used instead of benign) with some modifications. These modifications arise from our

continuously growing internal expertise in the field of molecular diagnostic and are

represented mainly by new evidences regarding internal observed frequencies,

segregation data, genotype-phenotype correlation, co-occurrence, enzymatic and

biomarker levels.

Page | 15

CentoMD® 3.0 Handbook_V1_May2016

The detected genetic variants are first classified into one of the three classes concerning

their likelihood to predispose to or to cause the observed phenotype/ disease (see Figure

1): clinically relevant variants (CRV), clinically irrelevant variants (CIV) and uncertain

variants (VUS).

The CRV class includes the following subclasses: pathogenic, likely pathogenic, risk

factors and modifiers. Classification is based on their impact on disease presence,

severity or increased susceptibility. Main adjustment of the ACMG guidelines refers to the

classification as pathogenic which is only assigned to published variants for which there is

enough evidence for pathogenicity (e.g. loss of function variants (LoF), found in at least

two unrelated patients, with well-established functional studies or biochemistry data).

Variants that were classified as pathogenic or likely pathogenic according to HGMD are

re-assessed by evaluation of the original papers, and the variant is accordingly

reclassified. Novel variants (publication not available) which are LoF and missense

variants leading to a novel amino acid change where a previous pathogenic variant was

described (highly conserved and predicted as damaging by in silico tools) are classified as

likely pathogenic as well as de novo variants which lead to insertion or deletions within a

non-repetitive region or lead to a de novo amino acid change (if highly conserved and

predicted as damaging by in silico tools). Additionally, variants found in at least 3

unrelated, similarly affected patients or in 2 unrelated similarly affected patients for

whom biochemical confirmation is available or familial segregation is present are also

classified as likely pathogenic.

Page | 16

CentoMD® 3.0 Handbook_V1_May2016

Figure 1: Classification of genetic variants in CentoMD® 3.0. The classification rules determining the clinical significance of a genetic variant are provided in the text. CG:

CENTOGENE

The CIV includes the following sub-classes: neutral, likely neutral, disease-associated

polymorphisms, CENTOGENE (likely) neutral - published as (likely) pathogenic. They are

classified into this category based on their high frequency in population(s), no observed

impact on disease presence/ severity/ susceptibility, or non-segregation and /or co-

occurrence detected, etc. In addition, disease-associated polymorphisms are included for

disorders with known multigene, complex inheritance. Reported variants must have a

maximum MAF of 5% in public databases and the association should be replicated by at

least 2 independent studies or in 1 study with functional evidence. When the internal

evidence regarding the clinical significance of a variant is inconsistent compared to other

external resources, the sub-class “CENTOGENE (likely) neutral - published as (likely)

pathogenic” is used in order to emphasize the importance of this observation. Variants of

this category were detected in at least 2 unrelated, healthy/unaffected individuals

(taking into account for example age at onset for the disease) or that the variant is found

Page | 17

CentoMD® 3.0 Handbook_V1_May2016

in at least 1 patient (affected with another genetic disease) in whom a CRV has been

previously identified.

The VUS class includes rare variants reported or not in the literature with unknown risk of

developing/causing the disease, or when prediction software shows inconsistent effects

or, family studies did not support clear statement on its impact on the phenotype.

Variant re-evaluation and re-classification is a key feature of CentoMD® and performed

regularly in the light of literature, publicly available clinical databases and most

important, based on CENTOGENE AG’s own continuously growing and improving

proprietary information.

Information on disease and inheritance

Every genetic disorder which has been suggested or suspected by the physician is

described according to the OMIM catalog. OMIM (Online Mendelian Inheritance in Man)

was developed for the world-wide-web by NCBI and contains a list of human genes and

genetic diseases with links to other relevant resources

(http://www.ncbi.nlm.nih.gov/omim). Every entry in OMIM is linked to a unique

identifier, which is also captured in CentoMD®.

Each genetic disorder is linked with the observed mode of inheritance (MOI). MOI is

defined by the manner in which a particular genetic trait or disorder is passed from one

generation to the next. The following MOIs are included in CentoMD®:

o Autosomal dominant (AD): The pattern of inheritance in which an affected

individual has one copy of a mutant gene and one normal gene on a pair of

autosomal chromosomes.

o Autosomal recessive (AR): The pattern of inheritance in which both copies of an

autosomal gene must be abnormal for a genetic condition or disease to occur.

o Digenic (Di): The pattern of inheritance that is similar to recessive inheritance,

except that the trait only develops when mutations are found in one copy of each

of the two independent genes simultaneously.

o Imprinting/Epigenetic (Imp/Epi): The pattern of inheritance by mechanisms not

directly involving nucleotide sequences, but paramutations and parental

imprinting.

Page | 18

CentoMD® 3.0 Handbook_V1_May2016

o Mitochondrial (Mito): The pattern of inheritance of a trait encoded in the

mitochondrial genome.

o Multifactorial (MF): The pattern of inheritance caused by the interplay between

genetic factors and environmental factors.

o Pseudoautosomal dominant (P-AD): The inheritance pattern seen with genes in the

pseudoautosomal region of the X and Y chromosome that can exchange regularly

between the two sex chromosomes. Alleles for genes in the pseudoautosomal

region can show male-to-male transmission, and therefore mimic autosomal

inheritance, because they can cross over from the X to the Y during male

gametogenesis and be passed on from a father to his male offspring.

o X-linked (X): The mode of inheritance in which a mutation in a gene on the X

chromosome causes the phenotype to be expressed in males who are hemizygote

for the mutated gene (i.e., they have only one X chromosome) and in females who

are homozygote for the mutated gene (i.e., they have a copy of the gene mutation

on each of their two X chromosomes). Carrier females who have only one copy of

the mutation do not usually express the phenotype, although differences in X-

chromosome inactivation can lead to varying degrees of clinical expression in

carrier females.

o Y-linked (Y): The pattern of inheritance that may result from a mutant gene

located on a Y chromosome. By definition, only males are affected.

o Unknown (?): This mode of inheritance is selected for genes not yet associated with

any pathological condition or disease, therefore no pattern of inheritance has been

observed.

Individual-related information on phenotype and demographics

All patient data in CentoMD® is fully anonymized. The following epidemiological and

clinical data are reported for individuals associated with classified and curated CRV

and/or VUS in CentoMD®:

o Random patient ID: Unique identifier assigned to each consented individual in

CentoMD®.

o Finding: Indicates if a variant is related to the indication for testing. Primary findings

are variants related to the indication for testing. Incidental findings are derived from

Page | 19

CentoMD® 3.0 Handbook_V1_May2016

whole exome sequencing (WES) and are pathogenic or likely pathogenic variants

identified in genes for which incidental findings are reported, based on the ACMG

recommendations for reporting of incidental findings in clinical exome and genome

sequencing (Genetics in Medicine, 2013). Incidental findings are unrelated to the

indication for testing.

o OMIM disease: OMIM number of the disease suspected by the corresponding physician

according to the clinical symptoms.

o MOI: Mode of inheritance. It is defined by the manner in which a particular genetic

trait or disorder is passed from one generation to the next.

o Anonymized random family number (ARFN): Unique family number used to keep all

members together when relationship links are provided.

o Pedigree: Indicates the connection/relation among individuals by blood, marriage, or

adoption. Based on the ARFN and the relationships within one family, it is possible to

reconstruct the family trees accordingly. In each family, the index patient is

indicated. The index patient represents the affected individual through whom the

family with a genetic disorder is first diagnosed.

o Sex: Indicates the biological state of the individual of being male, female or unknown

sex (when no information was provided or a prenatal case was analyzed).

o Age: Age at diagnosis. It is calculated as date of sample entry at CENTOGENE AG

minus date of birth, and is expressed in years. For patients referred to CENTOGENE AG

several times, the date of the first order entry is used by default to calculate the age

at diagnosis.

o Country: Country of sample origin. It indicates the area of the world the patient is

coming from. The basis for this information is the country from which the sample has

been sent to CENTOGENE AG. If physician provides information about the ethnicity of

the patient (e.g. Canadian citizen of German origin), then this (in this case Germany)

is the country selected in this situation.

o Region: Continental region the patient is coming from.

o Clinical information (HPO terms): Description of features and characteristics that the

corresponding physician has provided as supporting evidence of the presence of a

particular disease translated into the vocabulary defined by the HPO

(http://www.human-phenotype-ontology.org/) by medical experts.

Page | 20

CentoMD® 3.0 Handbook_V1_May2016

Sometimes it is not possible to describe the clinical picture accurately, because the

details are not given by the physician or only general assumptions have been made.

Such cases are documented in CentoMD® in the following manner:

No information/unknown: selected when no clinical information has been

provided;

Not affected/asymptomatic: selected when the physician has explicitly

indicated that the person is healthy, asymptomatic, or not affected;

Suspected/affected: selected when only very general statements are provided

by the physician (e.g. “patient suffering from Breast Cancer” or “clinical

features of Parkinson”).

o Variant zygosity: Indication if the variant is detected on one chromosome or on both

chromosomes.

o Total number of variants: Total number of detected variants for this case (clinically

relevant; clinically irrelevant) on this particular gene. For example, “10 (1 ; 9)” is to

be interpreted as follows: the total number of variants that were identified in this

proband/patient for this particular gene is 10, one of which is clinically relevant,

while 9 are clinically irrelevant variants.

o Genotype: Genetic constitution of this case with respect to the number of alleles and

their clinical significance for this particular gene.

o Enzyme and Biomarker interpretation: Interpretation of the enzyme activity and

biomarker levels compared to the reference interval.

o Clinical statement: The finding or the conclusion of the molecular genetic test

conducted at CENTOGENE AG.

o Sample type: Includes DNA, Cells, Tissue, Blood, DBS (dry blood spot), AF/CV

(amniotic fluid/ chorionic villi).

o Age at onset: Refers to the age at which an individual acquires, develops, or first

experiences a condition or symptoms of a disease or disorder.

o Carrier testing: Indicates if the individual was interested in performing a carrier

screening when the presence of specific genetic variant was detected already in other

family members.

o Consanguineous parents: Refers to the marriage between two genetically related

persons.

Page | 21

CentoMD® 3.0 Handbook_V1_May2016

o Family history: Indicates the presence or the absence of a particular disorder or

symptomatology in blood relatives of a patient.

o Detailed family history: Detailed description of disorders from which direct blood

relatives of the patient have suffered.

Clinical statement of CENTOGENE AG

The clinical statement is the finding or the conclusion of the molecular genetic test

conducted at CENTOGENE AG. The clinical statement may confirm or disprove the

suspected diagnosis, or serve to elucidate the genetic cause of an uncertain or

questionable condition or disease. When deriving the clinical statement, the following

factors are considered:

o The mode of inheritance of the disorder

o The patient’s genotype

o The clinical significance of all identified genetic variants

o The clinical data provided, if available

o Additionally, sex and/or biochemical evidences, if applicable

The evidence-based rules determining the clinical significance class are summarized in

Table 1 and Figure 2. The following clinical statements are used in CentoMD®:

o Affected: Indicates an individual where rules applied to determine clinical

statement confirmed the suspected diagnosis.

o Probably affected: Refers only to Fabry male patients carrying a VUS associated

with only pathological enzymatic levels, but not with pathological biomarker

levels. Identification of males carrying that particular VUS with pathological

biomarker levels induces the VUS re-classification into a likely pathogenic variant.

o At least carrier: Describes a patient suspected for a disease with autosomal

recessive mode of inheritance, who carries one CRV or VUS.

o Probably carrier: Indicates a carrier of a VUS screened for either recessive

disorders or females screened for X-linked disorders.

o Carrier: An individual who is heterozygote or other/complex (like 2 heterozygous

mutations located in cis) for an autosomal recessive disorder. This statement is not

accepted for autosomal dominant disorders.

Page | 22

CentoMD® 3.0 Handbook_V1_May2016

o Increased risk of developing the disease: Describes an individual carrying the

disease-causing mutation(s) where either the clinical details were not provided or

the patient is too young to develop the disorder. Usually used for late-onset

disorders.

o Uncertain: Indicates an individual carrying genetic variant(s) with unknown clinical

significance.

o Unaffected: Indicates an individual where the susceptibility of the disease was not

confirmed in respect to the screened gene.

For example, for an autosomal dominant disorder where the patient’s genotype is heterozygote,

meaning he carries one clinical relevant variant (except VUS), the expected clinical statement is

either “Affected” or “Increased risk for developing the disease” (according to the provided

clinical information).

Page | 23

CentoMD® 3.0 Handbook_V1_May2016

Appendix

Abbreviations used in CentoMD® 3.0

Evidence-based annotation rules to determine the clinical statement

(next 2 pages)

MOI Mode of Inheritance

Abbreviation Definition

AD Autosomal dominant

AR Autosomal recessive

Di Digenic

Imp/Epi Imprinting/Epigenetic

Mito Mitochrondrial

MF Multifactoral

P-AD Pseudoautosomal dominant

X X-linked

Y Y-linked

? unknown

Genotype

Abbreviation Definition

Comp Het Compound heterozygote

Hem Hemizygote

Het Heterozygote

Hom Homozygote

Other Other/complex

WT Wild type

Zygosity

Abbreviation Definition

Hem Hemizygous

Het Heterozygous

Hom Homozygous

Page | 24

CentoMD® 3.0 Handbook_V1_May2016

Genotype1)

MOI2)

Significance3)

Significance 23)

CI4)

Clinical statement

AD

AR

X-linked7)

Path5)

VUS6)

Path5)

VUS6)

- + ?

Hom/

Hem

x x x increased risk

x x x affected

x x x affected / increased risk

x x x uncertain

x x x uncertain

x x x uncertain

x x x increased risk

x x x affected

x x x affected

x x x uncertain

x x x uncertain

x x x uncertain

x x x increased risk

x x x affected

x x x affected / increased risk

x x x uncertain

x x x uncertain

x x x Uncertain

Het

x x x increased risk

x x x affected

x x x affected / increased risk

x x x uncertain

x x x uncertain

x x x uncertain

x x x carrier

x x x carrier

x x x carrier

x x x probably carrier

x x x probably carrier

x x x probably carrier

x x x carrier

x x x carrier

x x x carrier

x x x uncertain

x x x uncertain

x x x uncertain

Page | 25

CentoMD® 3.0 Handbook_V1_May2016

Table 1: Evidence-based annotation rules to determine the clinical statement at CentoMD®. See Figure 2 for further illustration of the decision process.

1): the most detected annotation classes are included. The wild type genotype is excluded. For wild type the clinical statement is “Unaffected”. 2) Mode of Inheritance 3): indicates the clinical significance of the identified variant 4): clinical information

-: indicates the absence of signs and symptoms of the disease (i.e. healthy/unaffected) +: indicates the presence of signs and symptoms of the disease ?: indicates that no clinical information was provided

5): refers to a variant annotated as pathogenic, likely pathogenic or risk factor 6): Uncertain variant 7): Two X-linked diseases (i.e. Fabry disease and Hunter disease) do not follow these definitions closely, as additional information is available and used as a decision factor when selecting the finding. For these two diseases, please see the decision trees presented in Figure 3.

Comp Het

x x x x increased risk

x x x x affected

x x x x Affected / increased risk

x x x x increased risk

x x x x affected

x x x x Affected / increased risk

x x x x uncertain

x x x x uncertain

x x x x uncertain

x x x x increased risk

x x x x affected

x x x x affected

x x x x at least carrier

x x x x at least carrier

x x x x at least carrier

x x x x uncertain

x x x x uncertain

x x x x uncertain

x x x x increased risk

x x x x affected

x x x x affected

x x x x at least carrier

x x x x affected

x x x x at least carrier

x x x x uncertain

x x x x uncertain

x x x x uncertain

Page | 26

CentoMD® 3.0 Handbook_V1_May2016

Figure 2: Decision trees that illustrate the evidence-based annotation rules which determine the clinical statement at CentoMD® The decision levels illustrated are: MOI – Genotype – Clinical significance (variant effect) – Clinical information – Clinical statement (the

caption of Table 1 also applies to this figure).

Page | 27

CentoMD® 3.0 Handbook_V1_May2016

Figure 3: Decision trees that illustrate the evidence-based annotation rules which determine the clinical statement for Fabry and Hunter disease. The decision levels illustrated are: MOI – Genotype – Clinical significance (variant effect) – Clinical information – Clinical

statement (the caption of Table 1 also applies to this figure).

Page | 28

CentoMD® 3.0 Handbook_V1_May2016

Glossary

Biochemical analysis

Method to analyze enzymatic activity or levels of biomarkers in samples

obtained from patients usually suspected being affected by a metabolic

disorder.

Enzyme

interpretation Evaluation of the enzyme activity compared to the reference interval.

Pathological Levels of activity are significantly decreased compared to the normal range.

Normal Levels of activity are compared with the normal range (no change).

Slightly increased Levels of activity are only slightly increased compared to the normal range.

Slightly decreased Levels of activity are only slightly decreased compared to the normal range.

Biomarker

interpretation Evaluation of the biomarker levels compared to the reference interval.

Pathological Biomarker levels are significantly increased compared to the normal range.

Biomarker level-

Normal Biomarkers levels are compared with the normal range (no change).

Slightly increased Biomarkers levels are only slightly increased compared to the normal range.

Slightly decreased Biomarker levels are only slightly decreased compared to the normal range.

Disease

Particular abnormal, pathological condition that affects part or all of an

organism. It is often construed as a medical condition associated with

specific symptoms and signs.

Mode of Inheritance

(MOI)

The manner in which a particular genetic trait or disorder is passed from one

generation to the next.

Autosomal dominant

(AD)

The pattern of inheritance in which an affected individual has one copy of a

mutant gene and one normal gene on a pair of autosomal chromosomes.

Autosomal recessive

(AR)

The pattern of inheritance in which both copies of an autosomal gene must

be abnormal for a genetic condition or disease to occur.

Digenic (Di)

The pattern of inheritance that is similar to recessive inheritance, except

that the trait only develops when mutations are found in one copy of each of

the two independent genes simultaneously.

Imprinting/Epigenetic

(Imp/Epi)

The pattern of inheritance by mechanisms not directly involving nucleotide

sequences, but paramutations and parental imprinting.

Mitochondrial (Mito) The pattern of inheritance of a trait encoded in the mitochondrial genome.

Multifactorial (MF) The pattern of inheritance caused by the interplay between genetic factors

and environmental factors.

Pseudoautosomal

dominant (P-AD)

The inheritance pattern seen with genes in the pseudoautosomal region of

the X and Y chromosome that can exchange regularly between the two sex

Page | 29

CentoMD® 3.0 Handbook_V1_May2016

chromosomes. Alleles for genes in the pseudoautosomal region can show

male-to-male transmission, and therefore mimic autosomal inheritance,

because they can cross over from the X to the Y during male gametogenesis

and be passed on from a father to his male offspring.

Unknown (?)

This mode of inheritance is selected for genes not yet being associated with

any pathological condition or disease, and therefore no pattern of

inheritance observed.

X-linked (X)

The mode of inheritance in which a mutation in a gene on the X chromosome

causes the phenotype to be expressed in males who are hemizygote for the

mutated gene and in females who are homozygote for the mutated gene.

Y-linked (Y) The pattern of inheritance that may result from a mutant gene located on a

Y chromosome. By definition, only males are affected.

Gene Sequence of DNA that represents a basic unit of heredity, being

expressed in RNA and proteins.

Gene symbol The HUGO Gene Nomenclature Committee (HGNC) has assigned unique gene

symbols and names to almost 38,000 human loci, of which around 19,000 are

protein coding.

Nuclear coding A gene located in the cell nucleus of a eukaryote that encodes for protein.

Nuclear non-coding A gene located in the cell nucleus that does not encode for a protein

product.

Mitochondrial A gene located in the mitochondria.

Transcript/Reference

Sequence

Digital nucleic acid sequence, assembled by scientists as a representative

example of a species' set of genes. Coding DNA reference sequence refers to

a cDNA-derived sequence containing the full length of all coding regions and

non-coding untranslated regions.

cDNA DNA that is synthesized from a messenger RNA template; the single-stranded

form is often used as a probe in physical mapping.

mDNA An extranuclear double-stranded DNA found exclusively in mitochondria that

in most eukaryotes is a circular molecule and is maternally inherited.

Transcript used in

CentoMD®

The transcript that is used at CENTOGENE AG/CentoMD® as a reference

sequence.

Genotype Represents the genetic constitution of an individual with respect to the

number of alleles and their clinical significance identified for a particular

gene.

Compound

heterozygote (Comp

An individual carrying two different, heterozygous, in trans, clinically

relevant (includes uncertain, likely pathogenic, pathogenic, risk factor)

Page | 30

CentoMD® 3.0 Handbook_V1_May2016

Het) alleles at a given locus.

Hemizygote (Hem) A male individual carrying one clinically significant (includes pathogenic,

likely pathogenic, uncertain, risk factor) allele located on X-chromosome.

Heterozygote (Het) An individual carrying one clinically significant (includes pathogenic, likely

pathogenic, uncertain, risk factor) allele.

Homozygote (Hom) An individual carrying two identical, clinically relevant (includes pathogenic,

likely pathogenic, uncertain, risk factor) alleles at one locus.

Other/complex

(other)

Individuals carrying clinically relevant (includes pathogenic, likely

pathogenic, uncertain, risk factor) alleles in other combinations than

described above (e.g. two alleles located in cis, three heterozygous

mutations, one homozygous and one heterozygous, etc.).

Wild type (WT) Individuals carrying alleles with no clinical significance (includes neutral,

likely neutral, disease-associated polymorphism, CENTOGENE (likely) neutral

- published as (likely) pathogenic).

Clinical statement of

CENTOGENE AG

The clinical statement is the finding or the conclusion of the molecular

genetic test conducted at CENTOGENE AG.

Affected Indicates an individual where rules applied to determine final statement

confirmed the suspected diagnosis.

At least carrier

Describes a patient suspected for a disease with autosomal recessive mode

of inheritance, who carries one (likely) pathogenic variant or one variant

with uncertain clinical significance.

Carrier

An individual who is heterozygote or other/complex (like 2 heterozygous

mutations located in cis) for an autosomal recessive disorder. This statement

is not accepted for autosomal dominant disorders.

Increased risk of

developing the

disease (Risk)

Describes an individual carrying the disease-causing mutation(s) where

either the clinical details were not provided or the patient is too young to

develop the disorder. Usually used for late-onset disorders.

Probably affected

Refers only to Fabry male patients carrying a VUS associated with only

pathological enzymatic levels, but not with pathological biomarker levels.

Identification of males carrying that particular VUS with pathological

biomarker levels induces the VUS re-classification into a likely pathogenic

variant.

Probably carrier Indicates a carrier of a VUS screened for either recessive disorders or

females screened for X-linked disorders.

Uncertain Indicates an individual carrying genetic variant(s) with unknown clinical

significance.

Unaffected Indicates an individual where the susceptibility of the disease was not

confirmed in respect to the screened gene.

Page | 31

CentoMD® 3.0 Handbook_V1_May2016

Screening method The test used to identify the cause of the disease.

MLPA Multiplex Ligation-dependent Probe Amplification: Variation of the

multiplex PCR that permits multiple targets to be amplified with only a

single primer pair. Used especially for detecting large/gross and gene

rearrangements, if gross/gene rearrangements are detected.

NGS Next-Generation Sequencing: High-throughput sequencing technology,

allowing the parallel sequencing of multiple genes, producing thousands or

millions of sequences concurrently.

Other method Other methodology used to detect the variants (like fragment length).

qPCR Quantitative Polymerase Chain Reaction: Method to amplify and

simultaneously quantify a targeted DNA molecule. Used in special to detect

large/gross and gene rearrangements.

Sanger Classical method of DNA sequencing, developed by Fred Sanger, using

chemically altered "dideoxy" bases to terminate newly synthesized DNA

fragments at specific bases (either A, C, T, or G). These fragments are then

size-separated, and the DNA sequence can be read.

WES Whole Exome Sequencing: Brute-force approach that involves modern day

sequencing technology and DNA sequence assembly tools to piece together

all coding portions of the genome. The sequence is then compared to a

reference genome and any differences are noted.

Phenotype

Case ID Random patient ID referring to a consented individual where the diagnosis

was confirmed by genetic testing at CENTOGENE AG.

HPO ID Unique HPO identifier for the attributed HPO term.

HPO term Phenotypic description of individuals provided by medical experts and

translated into the vocabulary defined by the HPO.

Shared HPO terms Indication how many HPO terms of a case analyzed at CENTOGENE AG

match the HPO terms provided by the users.

P-value Defines the likeliness of obtaining the corresponding similarity score or

higher by accident. The p-value is calculated by comparing individuals with

random symptoms and their similarity scores. The p-value reasons over the

similarity score distribution. The higher the p-value, the more likely it is to

obtain the corresponding similarity score by accident. The p-value ranges

from 0 to 1, where 0 is best.

Similarity score Phenotypic semantic similarity measure based on the HPO. The similarity

score of two patients is a formal measure of their resemblance with

respect to their standardized symptoms. The score is calculated by a

Page | 32

CentoMD® 3.0 Handbook_V1_May2016

pairwise comparison between each symptom of the two patients. The

higher the score, the more similar the patients.

Similar cases The cases analyzed at CENTOGENE AG which match the HPO terms

provided by the user. In Phenotype to Genotype module, by default only

similar cases sharing a minimum similarity score of 1 are indicated.

Clinical information

(HPO terms)

Description of features and characteristics that the corresponding physician

has provided as supporting evidence of the presence of a particular disease

translated into the vocabulary defined by the HPO by medical experts.

No information/

unknown

Selected when no clinical information has been provided.

Not affected/

asymptomatic

Selected when the physician has explicitly indicated that the person is

healthy, asymptomatic, or not affected.

Suspected/

Affected

Selected when only very general statements are provided by the physician

(e.g. "patient is suffering from Breast Cancer" or "clinical features of

Parkinson").

Individual Represents a unique individual who was tested for a certain disease,

condition or carrier status at CENTOGENE AG.

Sex Indicates the biological state of the individual of being male (m), female (f)

or unknown (?) sex (when no information was provided or a prenatal case

was analyzed).

Age at diagnosis Is calculated as date of sample entry at CENTOGENE AG minus date of

birth, and is expressed in years. For patients referred to CENTOGENE AG

several times, the date of the first order entry is used by default to

calculate the age at diagnosis.

Age at onset Refers to the age at which an individual acquires, develops or first

experience a condition or symptoms of a disorder.

Country Indicates the area of the world the patient is coming from. The basis for

this information is the country where the patient lives. If physician

provides information about the ethnicity of the patient (e.g. Canadian

citizen of German origin), then this (in this case Germany) is the item

selected in this situation.

Pedigree Indicates the connection/relation among individuals by blood, marriage, or

adoption.

Index patient Represents the affected individual through whom the family with a genetic

disorder is brought to the attention of others.

Anonymized random

family number (ARFN)

Family unique number used to keep all members together when

relationship links are provided.

Page | 33

CentoMD® 3.0 Handbook_V1_May2016

Variant A sequence variation in a gene.

Allele frequency at

CentoMD®

Indicates the allele frequency of a particular variant which was observed at

CENTOGENE AG in comparison to the total number of analyzed individuals.

cDNA change Change at cDNA level following numbering based on coding DNA reference

sequences.

Genomic DNA change Change at gDNA level following numbering based on genomic DNA

reference sequence.

Protein change Change at protein level following numbering based on the amino acid

sequence, using one letter amino acid code and X for designating a

translation termination codon.

Total number of

variants

The total number of detected variants for a case (clinically

relevant/uncertain; clinically irrelevant) on a particular gene.

Positive individuals Indication how many times a particular variant was observed at

CENTOGENE AG in comparison to the total number of analyzed individuals

for a particular gene.

Positive individuals (%) Indication how many times a particular variant was observed at

CENTOGENE AG relative to the number of analyzed individuals for a

particular gene (provided as %).

Location The location of the DNA change relative to the transcriptional initiation

site, initiation codon, polyadenylation site or termination codon of the

corresponding gene.

Downstream The region placed 3' (downstream) from the polyadenilation signal of the

gene.

Exon The protein-coding DNA sequences of the gene.

Intron The non-coding regions of a gene that interrupt the protein coding regions

(exons).

Upstream The region located 5' (upstream) from the 5'UTR region of the gene.

3'UTR 3' Untranslated Region: Particular section of messenger RNA (mRNA) that it

starts with the nucleotide immediately following the stop codon of the

coding region. This region contains transcription and translation regulating

sequences.

5'UTR 5'-Untranslated Region: Sequences on the 5' end of mRNA but not

translated into protein. It extends from the transcription start site to just

before the ATG translation initiation codon. 5' UTR may contain sequences

that regulate translation efficiency or mRNA stability.

Clinical significance

according to CentoMD

Indicates the likelihood of this variant to predispose to or to cause the

disorder.

CENTOGENE (likely)

neutral - published as

Variants published in the literature as (likely) pathogenic, but at

CENTOGENE re-classified as (likely) neutral based on the observed

Page | 34

CentoMD® 3.0 Handbook_V1_May2016

(likely) pathogenic frequency or family segregation studies.

Clinically irrelevant

variant (CIV)

Includes variants of the following significance: neutral, likely neutral,

disease-associated polymorphism, CENTOGENE (likely) neutral - published

as (likely) pathogenic.

Clinically relevant

variant (CRV)

Includes variants of the following significance: likely pathogenic,

pathogenic, risk factor, modifier.

Disease associated

polymorphism (DP)

Variant reported to be significantly associated with a phenotype/disease.

Likely neutral Variants reported to be likely neutral, prediction software indicates a

probably not pathological effect, and or high frequency in population

observed. This classification class is equivalent to “likely benign”.

Likely pathogenic Variants with probable pathogenicity, or the effect on the protein function

is predicted to be likely deleterious (>90% probability to cause the

disease).

Neutral Variants reported not to influence the disease risk of the individual, or

predicted to be neutral based on the high frequency in population, no

effect on protein or regulatory regions. This classification class is

equivalent to “benign”.

Modifier A genetic variant that can alter the expression of another gene in the

phenotype of an individual.

Uncertain variant

(VUS)

Variants reported in the literature with unknown risk of developing/

causing the disease, prediction software show inconsistent effects or,

family studies did not support clear statement on its impact on the

phenotype.

Pathogenic Variants that are known to cause the phenotype/disease.

Pathological D4Z4

allele

Large, polymorphic repeat structure associated with a rough and inverse

relationship between clinical severity and the residual repeat size, with

the smallest repeats causing the most severe phenotype.

Risk factor Variants reported to be associated with the phenotype/disease and

influencing the function(s) of the protein.

Secondary

mitochondrial

mutation

The primary molecular defect resides in a nuclear gene, which leads to

secondary mDNA abnormalities, such as loss of mDNA copy number or

multiple mDNA deletions.

Type of variant on DNA

level

Different types of change than can occur in the DNA sequence.

Chromosomal deletion Loss of parts of chromosomes.

Complex

rearrangement

Involves the structures or number of the chromosomes, it is referred to as

chromosome mutation, or rearrangement, rearranged chromosomes.

Conversion Non-reciprocal transfer of information between homologous sequences;

Page | 35

CentoMD® 3.0 Handbook_V1_May2016

one DNA sequence replaces a homologous sequence such that the

sequences become identical after the conversion event.

Deletion An abnormality in which part of a chromosome (carrying genetic material)

is lost.

Duplication Duplication of a sequence of DNA or section of chromosome.

Gain of methylation Gain of the normal DNA methylation level.

Gene deletion Refers to loss of the entire gene.

Gene duplication Refers to gain /duplication of the entire gene.

Gene duplication Refers to gain /duplication of the entire gene.

Gene & regulatory

region(s) deletion

Refers to loss of the entire gene and flanking regions.

Gene & regulatory

region(s) duplication

Refers to the gain of the entire gene and flanking regions.

Gross deletion Refers to loss of parts of a gene.

Gross duplication Refers to gain /duplication of part(s) of a gene.

Gross inversion Refers to 180 degree inversion of part(s) of a gene.

Insertion/Deletion

(Indel)

Refers to the mutation class that includes a combination of both insertions

and deletions.

Insertion Genetic mutation where one or more nucleotides are added (inserted) into

a DNA sequence or it may involve portions of a chromosome.

Inversion Chromosomal abnormality where a segment of a chromosome is rotated

180° and reinserted.

Loss of methylation Loss of the normal DNA methylation level.

Other/complex Refers to all other types not included in any category under Variant-

Mutation type.

Pathological allele

(D4Z4 motif)

Deletion of 3.3-kb repeats from a chromosomal tandem repeat called D4Z4

located near the end of chromosome 4 at the 4q35-ter location. D4Z4

contains an ORF encoding a putative homeobox protein called DUX4, a

large polymorphic repeat structure consisting of 1–100 KpnI units.

Repeat expansion Refers to an increase number of repeats of a genomic tandemly repeated

DNA sequence.

Retrotransposon

insertion

Retrotransposons (also called transposons via RNA intermediates) are

genetic elements that can amplify themselves in a genome, and can induce

mutations by inserting near or within genes. Retrotransposon-induced

mutations are relatively stable, because the sequence at the insertion site

is retained as they transpose via the replication mechanism.

Substitution A sequence change where one nucleotide is replaced by one other

nucleotide. Substitutions are described using a ">"-character (indicating

"changes to").

Page | 36

CentoMD® 3.0 Handbook_V1_May2016

Coding effect Refers to the impact the observed DNA change has on protein level.

Effect unknown The coding effect on protein level has not been analyzed. An effect is

expected but difficult to predict.

Extension Affect either the first (start, translation initiation, N-terminus. ATG) or last codon (translation termination, stop) and as a consequence extend the protein sequence N- or C-terminally with one or more amino acids.

Frameshift Special type of amino acid deletion/insertion affecting an amino acid

between the first (initiation, ATG) and last codon (termination, stop),

replacing the normal C-terminal sequence with one encoded by another

reading frame.

Increased

polyglutamine tract/

expanded polyQ

Portion of a protein consisting of a sequence of several glutamine (Glu; Q)

units.

In-frame A mutation that does not cause a shift in the triplet reading frame.

Missense Point mutation in which a single nucleotide change results in a codon that

codes for a different amino acid. Not all missense mutations are

deleterious, some changes can have no effect. Because of the ambiguity of

missense mutations, it is often difficult to interpret the consequences of

these mutations in causing disease.

New translation

initiation site

A change affecting the translation initiation codon (Met-1) introducing a

new upstream initiation codon extending the N-terminus of the encoded

protein.

Non-coding The change on DNA level produces no effect on protein, or the effect of

regulatory mutations is unknown.

Nonsense Point mutation in a sequence of DNA that results in a premature stop

codon, and in a truncated, incomplete protein product.

Silent A form of point mutation at DNA level resulting in a codon that codes for

the same amino acid but without any functional change in the protein

product.

Splicing mutation DNA changes affecting the splicing process (i.e. intron removal and exons

joining). Splice-site mutations occur within genes in the noncoding regions

(introns) just next to the coding regions (exons). Splice site mutations can

eliminate an existing donor or acceptor site, which will cause an exon to be

skipped and possibly result in a frameshift.

Start loss A start-loss mutation is a point mutation in the ATG start codon that

prevents the original start translation site from being used. This kind of

mutation will obviously eliminate gene function.

Translation initiation

codon

A translation initiation codon is a point mutation creating a new ATG start

codon upstream of the original start translation site. If the new ATG is

close enough to the original one (so that it is within the processed

Page | 37

CentoMD® 3.0 Handbook_V1_May2016

transcript and downstream of a ribosome-binding site) and in frame, it will

be used to initiate translation, adding amino acids to the amino terminus

of the original protein.

Translation

termination codon

A change affecting the translation termination codon (Ter/*) introducing a

new downstream termination codon extending the C-terminus of the

encoded protein.

Zygosity Indicates if a variant is detected on one chromosome or on both

chromosomes. Describes the degree of similarity of the alleles for a trait in

an organism.

Hemizygous (Hem) Used for alleles detected in genes located on X-chromosome for male

individuals.

Heterozygous (Het) Gene locus when cells contain two different alleles of a gene.

Het/Hom/Hem Ratio indicating the number of individuals relative to variant zygosity.

Homozygous (Hom) Gene when identical alleles of the gene are present on both homologous

chromosomes.

Degree of

heteroplasmy

Mixture of more than one type of mitochondrial DNA (mDNA) within a

cell/individual. In those cases where a mutant in mDNA is responsible for a

disease, the larger the proportion of mutant mitochondria, the more likely

the person will show symptoms of the disease.

Heteroplasmic Cell has some mitochondria that have a mutation in the mDNA and some

that do not.

Homoplasmic Cell has a uniform collection of mDNA: either completely normal mDNA or

completely mutant mDNA.

Publication status Indicates if the identified variant has previously been published in the

literature as a disease causing variant or not.

dbSNP The Single Nucleotide Polymorphism Database (dbSNP) is a free public

archive for genetic variation within and across different species developed

and hosted by the National Center for Biotechnology Information (NCBI) in

collaboration with the National Human Genome Research Institute (NHGRI).

PMID PubMed-Indexed for MEDLINE, PubMed identifier or PubMed unique

identifier is a unique number assigned to each PubMed record.

Published Indicates that the identified genetic variant has been already published

and/or characterized and associated with clinical data.

Unpublished Indicates that the detected genetic variant has either not been previously

published in literature, or is not yet associated with any disease.