Riassunto analitico
Facioscapulohumeral muscular dystrophy (FSHD) (OMIM 158900) is a common hereditary myopathy with a prevalence of approximately 1:20.000. FSHD1 has been associated with the heterozygous reduction of the number of tandemly arrayed repetitive elements, named D4Z4, located at the subtelomeric region of the long arm of chromosome 4, at 4q35. One hypothesis to explain disease pathogenesis is that the reduction of D4Z4 elements below a certain threshold, fewer than 11, associated with the telomeric 4qA polymorphism, results in the toxic transcription of the DUX4 retrogene. FSHD2 (OMIM 158901), groups 5% of the FSHD clinical cases who carry two D4Z4 array with 11 or more repeats. In these cases, mutations in genes coding chromatin-remodeling factors SMCHD1, DNMT3B, LRIF1 have been described. These findings led to the hypothesis that disease results from chromatin changes determining the anomalous expression of 4q35 genes. On this basis, D4Z4 CpG reduced methylation has been adopted as disease biomarker. Each copy of the 3.3 kb D4Z4 repeat contains two homeoboxes and two previously described repetitive sequences, LSau and a GC-rich low copy repeat designated hhspm3. These heterochromatic elements are highly repetitive and polymorphic and are scattered all over the human genome. The T2T-CHM13 assembly released by the Telomere-to-Telomere (T2T) Consortium in 2022 provides a complete human genome sequence, including gapless telomere to telomere assemblies for all chromosomes and opens new possibilities for investigating the functional role of repetitive elements. Here we show the results of high-depth methylation analysis of the D4Z4 repeat, including more than 900 CpGs, in bisulfite treated DNAs from a heterogeneous cohort of subjects. We performed comprehensive bioinformatic analysis using the T2T genome as reference and mapped the D4Z4-like sequences. Our analysis revealed the complexity and the widespread distribution of these sequences in the human genome; in particular, enrichment was found on the short arms of all acrocentric chromosomes and chromosome 1. Bioinformatic analysis uncovered the great inter- and intra-individual variability of repetitive elements undetectable in the hg38 assembly. The study of FSHD2 cases revealed the heterogeneous methylation pattern of the D4Z4-like elements and showed that reduced D4Z4 methylation at 4q and 10q is associated with SMCHD1 mutation. We also studied D4Z4 methylation in carriers of pathogenic or likely pathogenic variants in SMCHD1 from six clinically heterogeneous cohorts. We analyzed aggregated data consisting of 22.370 SMCHD1 variants identified from whole genome, whole exome and sequencing panels provided by six centers from Italy, France and Finland. We found 64 pathogenic or likely pathogenic variants and selected 19 variants to investigate the D4Z4 methylations status in relationship to SMCHD1 mutation. D4Z4 methylation analysis was conducted with the same procedure. Of the 23 samples, 10 display D4Z4 reduced methylation. In these cases, variants were likely generating loss of function. Collectively results obtained in our studies provide new hints to the interpretation of the epigenetic status of D4Z4 elements in FSHD. In particular they indicate that caution should be paid in the interpretation of SMCHD1 variants. Based on these results we propose a new methodology to study the epigenetic setting of repetitive elements based on the T2T genome assembly. This analytical approach provides valuable tools for preventing biased interpretation and demonstrates that the availability of genomic data from ample cohorts offers the opportunity for a more precise understanding of WES or WGS data.
|