Whole exome analysis of primary immunodeficiency

Первичный иммунодефицит представляет собой гетерогенную группу редких наследственных мутаций единичного гена, вызывающих сбой в работе иммунной системы человека и проявляющихся в предрасположенности паци ентов к тяжелым жизнеугрожающим инфекциям. Гетерогенная природа первичного иммунодефицита, при которой мутации могут быть подвержены по меньшей мере 300 различных генов, серьезно осложняет его диагностику. И хотя было подсчитано, что от этого заболевания могут страдать около шести миллионов человек, только немногие из них могут рассчитывать на постановку правильного диагноза. Однако развитие методов секвенирования ДНК и доступность высокотехнологичного оборудования позволили сделать значительный шаг вперед в области молекулярных исследований генетических заболеваний. Технология полноэкзомного анализа ДНК может оказать су щественную помощь врачам при диагностировании менделевских предрасположенностей к микробактериальным инфекциям и других форм редких генетических заболеваний. В представленном исследовании мы использовали метод полноэкзомного анализа ДНК для обследования двух младенцев с симптомами первичного иммунодефицита, такими как гемофагоцитарный лимфогистиоцитоз (ГЛГ) и тяжелый комбинированный иммунодефицит (ТКИД). Полноэкзомный анализ выявил мутацию UNC13D гена (NM_199242.2:c.627delT) у пациента с ГЛГ и мутацию RAG1 гена (NM_000448.2:c.322C>G) – у пациента с ТКИД. Исследование показало, что полноэкзомный анализ – это быстрый и экономичный метод, помогающий поставить правильный диагноз пациентам с первичным иммунодефицитом.

The human primary immunodeficiency diseases (PIDs) refer to a rare heterogeneous group of single-gene inherited disorders causing malfunctions in the immune system, and thus the affected patients have a predisposition to severe life-threatening infections. The heterogeneous nature of PIDs, which involves at list 300 different genes, makes diagnosis of the disease a complex issue. Although studies revealed that six million people have a kind of PID, but due to a complex diagnosis procedure many affected individuals have not gotten a correct diagnosis. However, thanks to advancing in the DNA sequencing method and availability of sophisticated sequencers molecular characterization of genetic disorders have been revolutionized. The whole exome sequencing (WES) method can help clinicians detect Mendelian disease and other complex genetic disorders. The presented study used WES to investigate two infants with symptoms of primary immunodeficiency including hemophagocytic lymphohistiocytosis (HLH) and severe combined immunodeficiency (SCID). It has been shown that the HLH patient had a mutation in the UNC13D gene (NM_199242.2:c.627delT), and the SCID patient had a mutation in the RAG1 gene (NM_000448.2:c.322C>G). It has been demonstrated that WES is a fast and cost-effective method facilitating genetic diagnosis in PID sufferers.
H uman primary immunodeficiencies (PIDs) include at least 300 genetically-defined single-gene inherited disorders causing malfunctions in the immune system . Recent studies have revealed that six million people worldwide have a kind of PID, whereas only 27000-60000 individual have been diagnosed. Despite the fact that the prevalence of PIDs is highest among children, there are also many adult patients with PIDs (Bousfiha et al., 2013). Patients with PIDs are predisposed to severe infections such as the EBV, Neisseria, Papillomavirus, Streptococcus pneumoniae, weakly virulent mycobacteria, Herpes simplex virus, and Candida Albicans. Furthermore, the immune dysregulation and aberrant inflammatory responses (Abolhassani et al., 2014) such as Allergy, Angioedema, Hemophagocytosis, Autoinflammation, and Autoimmunity have also been diagnosed in such patients (Bousfiha et al., 2013). PIDs are classified into nine groups based on the clinical and laboratory parameters . The first group includes the most common combined T-and B-cell immunodeficiency (CID) and a severe PID form known as severe combined immunodeficiency (SCID) (Bonilla et al., 2005). SCID is characterized by profound defects of T-cell development and it affects some of the B and NK cells (Kwan, Puck, 2015). Hemophagocytic lymphohistiocytosis (HLH) is a heterogeneous group of disorders related to dysregulation of the immune system (PID class 4) that are classified into two groups, namely genetic (familial HLH (FHL)) and acquired forms based on the etiology. FHL includes five forms of loss-of-function mutation (FHL1-5) leading to defects in the cytotoxic granule secretion pathway in NK and CD8 + T-cells, and they consequently lead to failure in exocytosis of granules in immunologic synopsis and completely eradicate target cells, in situation of immune response (Sifers et al., 2016). HLH can also occur due to infections and autoinflammatory/autoimmune and malignant diseases which are known as the acquired forms (Janka, 2012).
Current procedures for PID diagnosis are very complex and involve using specialized immunologic tests, including lymphocyte proliferation and cytotoxic assay, evaluation of serum immunoglobulin level, flow cytometry, neutrophil function assays and complementary analysis (McCusker, Warrington, 2011). However, immunological evaluation is performed to assess a patient's immune status for primary PID diagnosis. In addition, phenotype -based PID diagnosis is often complex, expensive and not always successful. Genetic investigation of PIDs is also very complex, and covers more than 300 genes that may be involved. Allelic heterogeneity and locus heterogeneity also increase the complexity of genetic analysis (Moens et al., 2014;Stoddard et al., 2014).
Advances in next-generation sequencing (NGS), particularly in whole exome sequencing (WES) have revolutionized molecular diagnosis of Mendelian disorders (Gilissen et al., 2012), and thus the traditional methods can be replaced by interrogation of a large set of genes in the single test in a timely and cost-effective manner instead of gene-by-gene approaches (Stoddard et al., 2014;Vrijenhoek et al., 2015). Thirty four new gene defects in PIDs have been diagnosed using NGS technology, so far (Conley, Casanova, 2014). The most recent research by S. Tamura et al. (2015) described the identification of novel compound heterozygous mutation in the DNA ligase IV (LIG4) gene through WES.
The present study was aimed at using WES for molecular characterization of two families with PID-affected children to be confirmed by the Sanger sequencing. We found a mutation in UNC13D and RAG1 genes in HLH and SCID families respectively. Our results have proved WES to be a useful method for detecting pathogenic variants in PID sufferers.

Patients, materials, and methods
Patients. The patients selected for molecular characterization using the whole exome sequencing were two children from two Iranian families in consanguineous marriage hospitalized in Children's medical center (Tehran) and their PIDs were confirmed with specialized immunology methods. The first patient was a six-month female infant with clinical symptoms of HLH, and the second -a two-month male infant diagnosed with SCID. The HLH patient had prolonged fever, hepatosplenomegaly, an infection caused by Epstein-Barr virus (EBV) and increased levels of ferritin, while the SCID patient suffered of recurrent diarrhea, respiratory infections without any circulating T and B cells (T -B -SCID). For the children could participate in the study, the parents had signed an ethical consent form.
Blood sampling. Peripheral blood samples were taken from the children and their family members. The clinical information obtained from their medical records included date and year of diagnosis, disease class and severity, surgical history, medication and family history. Genomic DNA was extracted using salting-out protocol (Miller et al., 1988).
Whole exome sequencing and data analysis. One hundred ng/μl of high-quality genomic DNA was first used for whole exome enrichment in the IonAmpliSeq Exome RDY plates and Ion AmpliSeq HiFi Mix. After ligation of Proton adapters and quantification by qPCR, the final library was sequenced using the Ion Proton platform which produced raw FASTQ files at an average coverage depth of 50X.
The following general workflow was used for performing bioinformatics analysis of the FASTQ raw data to prioritize causative variants (Fig. 1).
The raw FASTQ files were processed using the NGS QC toolkit (Patel, Jain, 2012) to estimate the quality and states of sequence reads. Sequencing reads, which have some errors such as adaptor and primer contamination, low quality 5′ and 3′ end bases, short reads and those with quality scores (Phered score) bellow 20, were trimmed from the FASTQ files using the FASTX-toolkit (http://hannonlab.cshl.edu/ fastx_toolkit/). BWA-MEM algorithm (Caboche et al., 2014) was used to align reads against the reference genome sequence (GRCh37), and then the results were stored in the SAM (Sequence Alignment/Map) file format. BWA-MEM is fast and accurate alignment software for nucleotide sequences of about 70 bp-1 Mbp. Duplicate reads were marked by Picard tools (https://broadinstitute.github.io/picard/). Afterwards, the SNP and InDel calling was performed with the Genome Analysis Toolkit (GATK; v 3.6) (Van der Auwera et al., 2013). The functional variant annotation was performed using the following software and databases: ANNOVAR (Wang et al., 2010), KGGSeq (v1.0) (Li et al., 2012) (Harrow et al., 2012), knownGene, RefGene (UCSC), mouse phenotype (Eppig et al., 2015) and DDD study (Deciphering Developmental Disorders) (Firth et al., 2009). The KGGSeq filtering strategy was executed to filter out both common benign variants and recurrent artifact as well as to find causal variants. At the first level, variant quality control was checked using the KGGSeq software, which includes various filters such as genotype QC, variants QC and sample QC to filter out errors and low-quality variants. A cut-off of 20, 50 and 20 was done for variants' Phred quality score, mapping quality score and depth coverage respectively. SIFT (Ng, Henikoff, 2003), PolyPhen-2 (Adzhubei et al., 2010) and CADD (Kircher et al., 2014) tools were used for predicting and scoring the possible impact of amino acid substitution on the structure and function of human proteins. Prioritization was performed with the focus on a primary immunodeficiency panel, which was gathered from the NCBI gene database (http://www.mcbi.nlm.nih.gov/gene/) and recommended genes from National Immunology Society resulting in 400 genes and genomic regions. The causative variants were limited to following criteria: 1) a minor allele frequency (MAF) of less than 0.1 % in data from the 1000-Genome Project, EVS, and ExAC, 2) minimum CADD score of 20, and 3) high or moderate effect determined by SNPEff (v 4.2) (Cingolani et al., 2012).
Sanger sequencing validation. The chromosomal regions containing the candidate causative variants in patients were amplified by polymerase chain reaction (Eppendorf, Germany) with Taq DNA polymerase (Amplicon, Germany) and designed primer for each of them at PCR conditions as summarized in Table 1. The sequencing diagrams were obtained using the Chromas software (v 2.6) (http://technelysium.com.au).

Results
Molecular diagnosis of PIDs by means of whole exome sequencing is a cost-effective and valuable approach for patients. In the present project, WES was used to study two families with HLH and SCID symptoms respectively. After performing the sequencing, the total reads were about 44 and 55 million for HHL and SCID patients respectively (Table 2). QC analysis of the FASTQ files showed that the average length of reads consisted of 190 nucleotides and 51 % of GC content, and more than 86 present of reads had quality scores equal or more than the cut-off (Q20) (see Table 2). After QC checking and trimming, the trimmed files were mapped to human reference genome version of GRCh37. Alignment statistics report using Samtools (Li et al., 2009) indicated that 95 % The green circles denote software, while the orange box shows all the databases and algorithms provided by KGGSeq that were used for annotation and filtering. of the reads were mapped to reference genome in the both files (see Table 2). 49426 variants were observed in the HLH patient, and 49629 variants -in the SCID patient after variant calling (see Table 2). After using the KGGSeq filters, the observed variations for the two patients were 48838 (HLH) and 49181 (SCID) respectively. Allele frequency trimming performed using 1000G, ESP6500 and ExAc indicated that 30569 and 29968 variants had greater than 1 % of MAF in the HLH and SCID patients respectively. Functional effect prediction of variants was performed using the SIFT, Polyphen-2 and CADD software resulting in 80 pathogenic and deleterious variants in the both patients. Top 10 variants with higher scores are presented in Table 3. Finally, two variants were considered as potential causative variants for the abovementioned phenotype of HLH and SCID. Selected variants included a frameshift variant in Exon 8 of the UNC13D gene (NM_199242.2:c.627delT) and a premature stop codon variant in Exon 2 of the RAG1 gene (NM_000448.2:c.322C>G) in the HLH and SCID patient respectively (Table 4). Sanger sequencing confirmed the both variants (Fig. 2).

Discussion
Molecular diagnosis of PIDs with at least 300 genes, allelic and locus heterogeneity is a challenging issue. Recent studies indicate that 121 gene defects have been identified in these disea ses in addition to the list of genes involved in PID (179 ge nes) since 2011 (Al-Herz et al., 2011;Picard et al., 2015). Utilization of the Sanger sequencing as a direct sequencing method for finding mutations in genes has become a de facto issue in genetic diagnosis (Sanger et al., 1977). Despite the fact that the gene-by-gene approach is a conventional diagnostic test for monogenic diseases, such methods are too expensive and time-consuming in multigenic diseases such as the PIDs (Chou et al., 2012;Sikkema-Raddatz et al., 2013). In contrast, WES method has become a favorable test for genetic diagnosis of multigenic diseases and it sequences all coding regions using amplification and parallel sequencing in a single test (Sikkema-Raddatz et al., 2013;Stoddard et al., 2014). For example, E. Mukda et al. (2017) evaluated 25 HLH patients and diagnosed pathogenic mutations in PFR1, UNC13D, STXBP2, LYST and XIAP genes. G.S. Schulert et al. (2015) also conducted WES in 16 HLH patients and diagnosed the disease-causing mutations in PFR1 and LYST gene. CARD11 gene inactivation due to a pre mature stop codon was diagnosed in a SCID patient by means of WES (Greil et al., 2013). Accordingly, the present study sought to obtain a molecular diagnosis of two patients with HLH and SCID by means of WES. A variant (rs755619812) in UNC13D gene and a variant (rs193922464) in RAG1 gene were detected in HLH and SCID patients respectively after exome sequencing and processing the raw data. In HLH case, the UNC13D gene (17q25.1) comprised 32 exons and 4.5-kb transcript with the highest expression in spleen, thymus, and peripheral blood leukocytes. In lymphocytes, the encoded protein (Munc13-4, 1090 amino acids, Uni-Prot ID: Q70J99) played an important role in cytotoxic granule exocytosis. Munc13-4 is an essential protein for maturation, docking, and priming of cytotoxic granules in cytotoxic cells (CTLs and NK) (Feldmann et al., 2003). Pathogenic mutations in the UNC13D gene led to an ineffective protein create type 3 of familial HLH (FHL-3). We found a homozygous nucleotide deletion (c.627delT:p.V210Wfs*39) in exon 8 of UNC13D, and it caused a frameshift mutation (see Fig. 2, a). This damaging mutation led to the substitution of valine with tryptophan at position 210 resulting in a premature stop codon at 39 codons after this substitution which produced a truncated protein. This mutation results in defects in the killing ability of NK and CD8 + T-cells and uncontrolled hyperinflammation in HLH patients. In 2006, Stasdt et al. studied 63 HLH patients and described the c.627delT mutation in two infants in order to find the mutations spectra of the PFR1, UNC13D, STX11 and RAB27A genes (Stadt et al., 2006).
In the second patient with clinical symptoms of SCID, we found a homozygous single nucleotide substitution in the first position of Arginine codon 108 in Exon 2 of the RAG1 gene (c.322C>G: R108*) leading to a premature stop codon and consequently a truncated protein (see Fig. 2, b). The human RAG1 gene (11p12) consists of 2 exons and encodes a protein with 1043 amino acids (UniProt ID: P15918). Recombinationactivating protein 1 (RAG1) is a catalytic component of the RAG complex as a multi-protein complex that mediates the DNA cleavage phase during V(D) J recombination. In the RAG complex (RAG1/2), RAG1 mediates the DNAbinding to the conserved recombination a -in the HLH patient, IGV screenshot shows a homozygous deletion in the UNC13D gene confirmed both in the proband and the parents; b -IGV screenshot from the SCID patient also shows a homozygous single nucleotide substitution in the RAG1 gene confirmed both in the proband and the parents.  T  T  T  T  T  T   T  T  T  T  T  signal sequences (RSS) and catalyzes DNA cleavage activities by introducing a double-strand break between RSS and the adjacency to each coding V, D, and J DNA segment (Melek, Gellert, 2000). This somatic recombination leads to the diversity of immunoglobulins and T-cell receptors (TCRs). The pathogenic mutations affecting the active core (amino acids 384-1009) of RAG1 produce the clinical symptoms of SCID disease such as absence of lymphocyte (B-and T-cells) circulation (Corneo et al., 2001) observed in our patient.

Conclusion
Whole exome sequencing has proved itself as a fast and costeffective method for detection of causative rare mutations in PID patients.