Skip to main content

Comparative meta-analysis of transcriptomic studies in spinal muscular atrophy: comparison between tissues and mouse models

Abstract

Background

Spinal Muscular Atrophy (SMA), a neuromuscular disorder that leads to weakness in the muscles due to degeneration of motor neurons. Mutations in the survival motor neuron 1 (SMN1) gene leads to the deficiency of SMN protein that causes SMA. The molecular alterations associated with SMA extends across the transcriptome and proteome. Although several studies have examined the transcriptomic profile of SMA, the difference in experimental settings across these studies highlight the need for a comparative meta-analysis to better understand these differences.

Methods and data

We conducted a systematic comparative meta-analysis of publicly available gene expression data from six selected studies to elucidate variations in the transcriptomic landscape across different experimental conditions, including tissue types and mouse models. We used both microarray and RNA-seq datasets, retrieved from Gene Expression Omnibus (GEO) and ArrayExpress (AE). Methods included normalization, differential expression analysis, gene-set enrichment analysis (GSEA), network reconstruction and co-expression analysis.

Results

Differential expression analysis revealed varying numbers of differentially expressed genes ranging between zero and 1,655 across the selected studies. Notably, the Metallothionein gene Mt2 was common in several of the eight comparisons. This highlights its role in oxidative stress and detoxification. Additionally, genes such as Hspb1, St14 and Sult1a1 were among the top ten differentially expressed genes in more than one comparison. The Snrpa1 gene, involved in pre-mRNA splicing, was upregulated in the spinal cord and has a strong correlation with other differentially expressed genes from other comparisons in our network reconstruction analysis. Gene-set enrichment analysis identified significant GO terms such as contractile fibers and myosin complexes in more than one comparison which highlights its significant role in SMA.

Conclusions

Our comparative meta-analysis identified only few genes and pathways that were consistently dysregulated in SMA across different tissues and experimental settings. Conversely, many genes and pathways appeared to play a tissue-specific role in SMA. In comparison with the original studies, reproducibility was rather weak.

Peer Review reports

Introduction

Spinal muscular atrophy (SMA) is a genetic neuromuscular disorder caused by the degeneration of motor neurons in the spinal cord and brain stem leading to muscle weakness [24, 33]. The survival motor neuron 1 (SMN1) gene is either deleted or shows point mutations, which in turn leads to the deficiency of the SMN protein. As a consequence, the decrease of SMN causes to transcriptomic and proteomic alterations. Currently, five types of SMA are known, where Type 0 is the most severe and rare form, developing already prenatally (Mercury et al., 2022). Therapies that have been developed so far target the SMN protein, e.g. by replacing SMN1.

A number of studies has been performed to study and investigate the transcriptome under SMA. These studies focus on the transcriptome in different organs and under different experimental settings which are performed at different time points. While individual transcriptomic studies provide insights under specific experimental settings, the aim of this work is to systematically re-analyse publicly available gene expression data of a selection of these studies to better understand the differences in the transcriptome under different settings such as different organs and mouse models. Moreover, the scientific evidence of individual studies is usually limited [6] [29] and only the synthesis of multiple research outcomes can provide a more conclusive picture. The selected studies in our comparative analysis involve both microarray and RNAseq study types [7]. Data of high-throughput gene expression, measured by DNA microarrays or RNAseq, can be retrieved from public databases such as Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) [1, 18] or BioStudies-ArrayExpress (AE, https://www.ebi.ac.uk/biostudies/arrayexpress) [49]. Raw public gene expression data opens up to conduct classical meta-analysis, which means to combine results from individual studies in the form of fold change or p-value combination [38, 46]. Alternatively, raw data can be merged, batch-effect corrected [32, 60] and analysed as a whole [57].

While classical meta-analysis aims to combine studies of the same setting in order to increase statistical power [53] and thereby scientific evidence and reproducibility [6] [52], we focus here on studies with different settings in a comparative way. Comparative meta-analysis allows to study the differences in the transcriptomic changes in different organs, species or in general under different experimental settings [48, 54]. The advantage of doing a comparative meta-analysis with raw data – in contrast to a literature review – is that all steps of analyses on the individual studies can be harmonized. In this study, our data analysis includes standard analyses such as normalization, differential expression analysis, gene-set enrichment analysis (GSEA). In addition to the method spectrum of the original publications, we included network reconstruction and co-expression analysis. While data were analysed with different approaches in the original publications with a different aim of study, the same methods for data analysis can be used in a comparative meta-analysis. Additionally, original publications often do not provide the full set of results such as gene lists with fold changes.

In this study, we detail the process of selecting studies and the data sets included as well as our approach for comparative analysis. Thereafter, the results of differential expression analysis, gene-set based enrichment analysis, network reconstruction and co-expression analysis are presented. We conclude with a discussion of the computational approach and their corresponding biological findings.

Methods and data

In the following we describe the selection of relevant datasets from the GEO and AE databases, the methods for data preparation and normalization, differential expression analysis and GSEA, as well as for network reconstruction and co-expression analysis. All data analyses were performed in the R programming environment ([43], www.r-project.org, V4.3.2).

Database search and data selection

A systematic search was performed in Gene Expression Omnibus (GEO) and Array Express (AE) using the search term ‘Spinal Muscular Atrophy’ following the ‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses’ (PRISMA) guidelines [42]. A total number of 113 hits from GEO and 111 hits from AE were retrieved on 09 August 2023. Studies that were not related to ‘Mus musculus’ were excluded. Next, studies from GEO that were also in AE were removed. Furthermore, studies that were not SMA and not a series were excluded. Lastly, in order to meet the selection criteria for meta-analysis, series that still had other diseases, no control group and no proper data were excluded. Finally, 6 studies were included for meta-analysis (Supplementary Data S1 SF1). These studies include three microarray and three RNA seq study types, where cells and tissues included are skeletal muscle, brain, motor neurons, spinal cord and embryonic stem cell- derived motor neurons. All GSE series matrix files for microarray data were downloaded into the R programming environment using the GEOquery R-package from Bioconductor (www.bioconductor.org, V3.2). Raw FASTQ-files for RNA-seq data were downloaded using the SRAToolkit [34]. A summary of the six selected studies is given in Table 1.

Table 1 Detailed information on the six selected studies including their GEO or Array Express accession ID and their corresponding Pubmed references

Data preparation and differential analysis

RNA-seq data (GSE154106, GSE102204 and GSE56284) were obtained from Sequence Read Archive (SRA). SRA sample files were downloaded using the prefetch command from SRAToolKit [34]. The software FastQC [56] was used for quality control, CutAdapt [39] was used for trimming and removing the adapters, the STAR tool [16] for mapping to the mm39 genome, and duplicates were removed using the picard tool (http://broadinstitute.github.io/picard/) and finally feature counts [35] was used to count the reads. Differential expression analysis was performed for these count data using the DESeq2 R-package [36]. For primary identification of differentially expressed genes, we used a cut-off of 5% for FDR-adjusted p-values according to the method of Benjamini and Hochberg [2].

To select genes for the intersection analysis between the different studies, we used two additional selection criteria. First, a combined criterion requiring an FDR-adjusted p-value < 0.05 and an absolute logFC > 2, in order to avoid too many false positives. Second, in order not to miss relevant genes with a weak effect but having maybe biological importance we also studied the set of top 500 genes in each comparison, ranked by their raw p-values.

Microarray data of the studies GSE207890 and E-MEXP-2428 were not yet log-transformed, so we performed this step prior to normalization. However, the dataset GSE10224 was already publicly provided with the log-transformed version. All microarray datasets were normalized using quantile normalization [4] and differential expression analysis was performed using the R-package limma [47].

Differential expression analyses were done between a control group and a diseased group using the limma-package. For those studies that had more than two groups (GSE207890 and GSE10224), multiple differential analyses were done. In particular, for dataset GSE207890, the comparisons include Smn1 knockout treated with amniotic fluid stem (Smn1deltaSKM + AFS) versus Ctrl (Control group) and Smn1 knockout (Smn1deltaSKM) versus Ctrl. Furthermore, for dataset GSE10224, the comparisons included SMA versus Ctrl and transplanted SMA (transSMA) versus Ctrl. Thus, a total of eight different group comparisons were performed.

Gene-set enrichment analysis

GSEA for Gene Ontology (GO) terms: biological processes (BP), molecular functions (MF) and, cellular components (CC) was performed subsequent to all eight differential expression analyses using the clusterProfiler R-package [58]. We chose GSEA instead of over representation because it is only based on the ranking of genes and not on a particular set of differentially expressed genes. Due to the different study settings, having such sets available can be a problem. In addition to the GO enrichment analysis, enrichment analysis for Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathways were performed using clusterProfiler, too. To analyse the robustness of the GSEA results, we used the bootGSEA R-package [23] to understand and interpret the robustness of GO terms and pathways. This approach repeats GSEA multiple times with subsets of the gene sets based on bootstrap approach to see whether individual genes have an effect on the outcome of GSEA. The bootstrap based analyses resulted in original and bootstrap ranks for each GO terms and pathways. The resulting tables, ordered by bootstrap ranks, show the difference in ranks between original and bootstrap analyses, thereby providing robust assessment of GSEA.

The biomaRt R-package [17] was used in order to get the GO annotations from the Gene Ontology database (https://geneontology.org/).

Network reconstruction

To better understand the interplay between the most significant genes across all comparisons in each study group, the union of top5 differentially expressed genes from all 8 comparisons was built. Thus, a reconstructed network for a particular study group involved also genes that might not be relevant in this tissue but in another tissue under SMA. We limited the input to the top5 from each comparison in order to obtain too large and too hard to interpret networks.

Networks were constructed for these genes with their count matrix for RNA sequencing studies and expression levels for microarray studies using the minet package [40] in R which helps in inferring mutual information networks to understand gene interactions. These networks were then visualized using the igraph R package [14], providing a comprehensive view of the interactions and relationships between the key genes identified.

Co-expression network analysis

To further understand the role of genes with similar expression, Weighted Gene Co-expression Network Analysis (WGCNA) was used to analyse and identify co-expressed genes within SMA and wild type groups individually. The microarray expression data were normalized individually for SMA and wildtype groups using quantile normalization from limma R-package. RNA seq data on the other hand, normalization was done using the normalize function from broman Rpackage for SMA and wild type groups separately.

WGCNA R package [30] was used to construct a co-expression network based on pairwise correlations between genes using the topological overlap matrix (TOM), where a soft-threshold power was determined using the scale independence criteria to approximate a scale-free topology. Furthermore, minimum cluster size parameter was adjusted between 100 and 800 in order to obtain biologically relevant modules for study groups (Supplementary Data S2) and cutHeight was set to 0.98 for all groups. Genes were clustered based on dissimilarity measure (1-TOM).

Top Hub genes were identified for modules related to neuromuscular disease using the R function “chooseTopHubInEachModule” in WGCNA R package and studied for their biological significance in SMA. Furthermore, common genes among the modules between SMA and wild type have been analysed to explore potential gene signatures with respect to SMA.

Results

Our approach towards comparative meta-analysis revealed significant overlaps in differentially expressed genes and enriched gene-sets across the selected studies. Furthermore, networks were constructed from the resulting top 5 differentially expressed genes (DEGs), providing information into the interconnections and potential regulatory relationships among these genes.

Differential expression analysis

The eight differential expression analyses were performed between the control and diseased groups. The numbers of selected genes, according to different criteria are listed in Table ST1 of Supplementary Data S1. After FDR-adjustment, there were three comparisons with a very large number (several thousand) of significant genes, two comparisons with a moderate number (several hundred) of significant genes, and three data sets with zero or only one significant genes. The largest effects (i.e., the largest portion of differentially expressed genes) by SMA on the transcriptome were seen in the studies on skeletal muscle (GSE207890) and embryonic stem cell-derived motor neurons (GSE56284). Moderate effects were also seen in the study on mouse brain tissues (GSE154106 and GSE102204). The analyses based on motor neurons of the spinal cord (GSE10224) and the whole spinal cord (E-MEXP-2428) yielded no or only one significant gene after p-values adjustment.

Due to these different sizes of effects, and because the overall number of genes in each experiment was quite different, we used two different criteria for intersection analysis. With the first – rather strong – criterion (adjusted p-value < 0.05 and absolute logFC > 2), we identified 31 genes which occurred in at least 3 out of 8 comparisons (Fig. 1, left). With the second criterion (gene among the top500), 61 genes occurring in 3 out of 8 comparisons, were selected (Fig. 1, right). The genes of these two sets are reported in Table ST2 of Supplementary Data S1. The overlap of both sets were the four genes Hbegf, Mt2, Sim1, and Sox9. While Mt2 was also among the top10 in the comparison ‘GSE10224: SMA versus Ctrl’, none of the three other genes was observed among the top10 from each comparison (Table 2). Among the top10 lists, only St14 and Sult1a1 occurred twice. Apart from these two genes, the top10 lists were disjunct.

Fig. 1
figure 1

Upset plots [12] of common DEGs among all the group comparisons based on either the selection criterion with an FDR-adjusted p-values < 0.05 and an absolute logFC > 2 (left) or on the criterion of being among the top500 (right). Only intersections among at least three comparisons are shown

Table 2 Top 10 differentially expressed genes from the 8 group comparisons with raw and FDR-adjusted p-values as well as the log fold change

Gene-set enrichment analysis

For seven of the eight comparison significantly enriched GO terms were found after FDR-adjustment of enrichment p-values, the most ones in the domain of biological processes (Table 3). Only for data set GSE102204 (brain tissue), no significant terms were found. The bootstrap analysis showed a high robustness of the standard GO enrichment analysis, i.e. there were only moderate changes in the original ranking. There were larger overlaps between pairs of these lists, but only a smaller overlap between more than three lists. E.g., there were 14 GO terms related to BP that occurred in at least four of the eight comparisons (Fig. 2). Fourteen GO terms were also detected for MFs in at least four lists and 32 GO terms related to CCs (Fig SF2, Supplementary Data S1).

Table 3 Number of significantly enriched GO-terms from in the categories of biological processes, molecular function and cellular component in the eight comparisons
Fig. 2
figure 2

Upset plot (left) showing the number of GO terms related to biological processes (BP) that occur among the top 100 in at least four of the eight comparisons, and corresponding table (right) listing the 14 GO terms highlighted in the upset plot. The comparison number in the table corresponds to the order (from top to bottom) in the upset plot

Among the top 5 GO terms related to BPs for each comparison (ranked by the bootstrap approach, Table 4) were two which could directly be related to SMA, ‘muscle tissue development’ (GO:0060537) and ‘muscle cell development’ (GO:0055001). While most GO-BP terms occurred only once among the top 5, three terms occurred twice: ‘extracellular matrix organization’ (GO:0030198), ‘extracellular structure organization’ (GO:0043062), ‘external encapsulating structure organization’ (GO:0045229).

Table 4 Top 5 GO terms related to biological processes (BP) from the eight comparisons, including the number of genes associated with each gene, and their ranks from the original and bootstrap based GSEA

The top5 GO terms related to MFs and CCs are given in Supplementary Data S3 and S4. Overall, there was a bit more overlap compared to BP terms (Figure SF2, Supplementary Data S1). Specifically, these were the MF-terms ‘actin binding’ (GO:0003779), ‘actin filament binding’ (GO:0051015), ‘calmodulin binding’ (GO:0005516), ‘calcium ion binding’ (GO:0005509), ‘DNA-binding transcription activator activity, RNA polymerase II-specific’ (GO:0001228), ‘DNA-binding transcription activator activity’ (GO:0001216), ‘extracellular matrix structural constituent’ (GO:0005201), ‘structural molecule activity’ (GO:0005198), ‘signalling receptor regulator activity’ (GO:0030545), and the CC-terms membrane raft (GO:0045121), cell leading edge (GO:0031252), membrane microdomain (GO:0098857), collagen-containing extracellular matrix (GO:0062023), mitochondrial protein-containing complex (GO:0098798) which occurred more than once among the top 5.

From the analysis of KEGG and Reactome pathways, a moderate number of significantly enriched pathways emerged for most of the eight comparisons (Table ST3 Supplementary Data S1), except for the comparison in the brain tissue samples of data set GSE102204 which resulted in zero enriched pathways after FDR-adjustment. The largest number of enriched pathways was detected in the group comparisons of the motor neuron data set (GSE10224), ranging between 133 and 214 enriched pathways, while for the other group comparison the numbers ranged between 1ß and 87 enriched pathways.

Among the top5 from all comparisons, there were six KEGG pathways that were detected twice (Supplementary Data S5): ‘MicroRNAs in cancer’ (mmu05206), ‘Rap1 signaling pathway’ (mmu04015), ‘PI3K-Akt signaling pathway’ (mmu04151), ‘Focal adhesion’ (mmu04510) and ‘Neuroactive ligand-receptor interaction’ (mmu04080). From the enrichment analysis of Reactome pathways, seven pathways occurred more than once among the top5 of all comparisons, two of them even four time (Supplementary Data S6): ‘Hemostasis’ (R-MMU-109582) and ‘Signaling by Receptor Tyrosine Kinases’ (R-MMU-9006934). These other five pathways occurred twice each: ‘Developmental Biology’ (R-MMU-1266738), ‘Extracellular matrix organization’ (R-MMU-1474244), ‘Integrin cell surface interactions’ (R-MMU-216083), ‘mRNA Splicing’ (R-MMU-72163), and ‘Processing of Capped Intron-Containing Pre-mRNA’ (R-MMU-72203).

Network reconstruction

The construction of networks was based on differential expression analyses from the eight different comparisons where the top5 genes were selected and merged to create a gene set for each comparison. We focus exemplarily on the networks arising from the data of data set GSE10224 with the comparison of SMA versus control and on connections with the Smn1 genes, the central player of SMA. For the SMA group (Fig. 3a), Smn1 has stronger correlation with genes Snrap1 (Small Nuclear Ribonucleoprotein Polypeptide A), Fam134b, Slco1b2, Igk-V1, Nr0b1, F2rl2, Aste1, Tnfsf10 and Rad, whereas in the control group Smn1 has strong correlation with genes Sult1a1, P4ha1, Fam134b, St14, Aste1, Tnfsf10 and Chodl (Fig. 3b). Furthermore, consistent patterns were observed between genes that includes Fam134b, Aste1 and Tnfsf10 across both SMA and control group. This could be because these genes have stable interactions or shared functional relationships. A strong change of correlations of the Smn1 gene with other genes between the disease group and the control group was also seen in the other data sets (Fig. SF3-SF9, Supplementary Data S1).

Fig. 3
figure 3

Networks constructed using the expression profiles from the GSE10224 data, SMA versus control, for the top 5 DEGs from all the datasets that were merged creating a gene set for each study group, where the top plot corresponds to SMA group and the plot at the bottom corresponds to control group. Nodes represents the genes and edges the interaction between them. The size and colour of the node shows the interaction size of the genes. The thickness of edge represents the correlation between the genes

Co-expression network analysis

To understand more about the expression of genes across samples in each group for each study Weighted Gene Co-expression Network Analysis (WGCNA) analysis was performed. As a result, we identified co-expression modules for SMA and control groups where these modules represent clusters of genes with similar expression patterns across samples within each group (Fig. 4). In data set GSE102204 (brain tissue), SMA group, with17,097 genes, we identified nine modules excluding the grey module which corresponds to the less correlated group of genes. The size of each module varied between 1,020 (magenta module) and 2,800 (blue module) (Fig. 4a). Furthermore, in data set GSE102204, control group with 16,900 genes, a total of seven modules were identified with each module having genes between 951 (turquoise module) and 4,825 (blue module respectively) (Fig. 4b). To further understand these modules, top hub genes from each module were identified that denote the highly connected gene in each module. The top hub genes in the the SMA group of data set GSE102204 are Rrp1, Ints9, Ptgds, Nnt, 2010111I01Rik, Snx2, Gm26869, Agpat6 and Irak1. In the control group on the other hand with seven modules include Ube2v2, Apba1, Gm1043, Ids, Tm9sf2, Fkbp4 and Tmem183a. These hub genes could potentially represent important regulators of these modules. Furthermore, common genes among the modules between the control and SMA group of each study were analysed to better understand their role in SMA. In data set GSE102204 (Fig. 5), blue and turquoise modules have higher percentage of common genes and therefore over representation analysis was performed for these genes. Interestingly, genes like Smn1 and Snrpa1 which had strong correlations between genes (Fig. 5) are present in the blue and turquoise module respectively of both SMA and control group.

Fig. 4
figure 4

Co-expression modules and Topological Overlap Matrix (TOM) Plot associated with SMA (a, b) and Control (c, d) group of data set GSE102204. Plot a & c) WGCNA dendogram where the coloured bars towards the x-axis represent the module colours and y-axis represents the similarity between genes based on the height of the branches in the dendogram. Higher the value, greater the similarity between genes. SMA group has 9 modules and Control group has 7 modules excluding the grey module which correspond to the less correlated gene group. Plot b & d) TOM plot showing the topological overlap of the genes from data set GSE102204. Rows and columns represent the genes, darker the red colour higher the correlation between the genes. Their corresponding gene dendogram are towards the top and along the left of the plot

Fig. 5
figure 5

Heat map representing the amount of common genes in percentage between modules in the SMA and control group of data set GSE102204. Rows denote the control group and columns denote the SMA group

Discussion

SMA has been studied at the level of the transcriptome in several studies, each with different settings and different approaches for data analysis. The idea of our meta-analysis was to aggregate the information from six selected studies by reanalysing the publicly available gene expression profiles in a harmonized way. Besides standard steps of analysis, such as differential expression analysis and gene-set enrichment analysis, we included network reconstruction and co-expression analysis which were not performed in the original studies. Overall, we could identify only small similarities between the results from the individual studies, however some differences which can be related to the different study settings such as mouse strains, types of organs and time points. Lists of differentially expressed genes are also provided in Supplementary Data S7. Potentially SMA-relevant genes were selected on different level: by differential expression analysis, by intersection analysis, and by network analysis. From these results, we highlight and discuss in the following some tissue specific and some commonly identified genes.

Tissue-specific genes and gene-sets

Among the top10 genes from the two group comparisons in the skeletal muscle study (GSE207890), Apoa1 was related to have an important function in the skeletal muscle for several diseases [11] and particular in the context of neuronal injury [50]. Furthermore, Mdb5 was mentioned I the context of neurodevelopmental abnormalities [25]. While there was no overlap between the top10 of both comparisons, some of the top10 were also in the overall intersect of both comparisons (according pFDR < 0.05 and abs(logFC) > 2): 1700057K13Rik, Rrad, Fam134b, H60b, Sap30, Egln2.

The top10 arising from the analysis of the embryonic stem cell-derived motor neurons include the ASIC4 which was also reported as an important player in muscle functions [21], and Cdh7 which was reported to have a central role in neural development [19, 28]. In the motor neurons from the spinal cord (GSE10224), only one significant gene was identified, 1700067P10Rik, which was not mentioned so far in the context of neuro-degenerative or muscle-related diseases. Despite the weak effects, we observed in these data, we want to highlight St14, which occurred among the top10, and which was already mentioned in the context of muscular dystrophy in the 1980s [22, 26]. Furthermore, Sult1a1 was observed among the top10, which was reported in several studies on neurodegenerative diseases [8] and it was found among the top10 in one of the brain studies (GSE102204) as well.

Regarding the top10 from the group comparisons in brain samples (studies GSE102204 & GSE154106), we want to highlight the chemokine Ccl21a which was also related to neurological diseases [9] and Slc38a5 which has been shown to cause developmental delay in motor dysfunction [44]. Furthermore, Abca7 which was related to multiple neurological diseases, in particular cortical and hippocampal atrophy [45].

Among the GO-BP-terms that occurred only once among the top5 lists were several which can directly be linked to SMA: ‘Muscle tissue development’ (GO:0060537) in one of the skeletal muscle tissues, and ‘Striated muscle cell development’ (GO:0055002) and ‘Muscle cell development’ (GO:0055001) in motor neurons. Furthermore, there were two terms which can be related to neurological diseases: ‘Ensheathmeant of neurons’ (GO:0007272) in the spinal cord data and ‘Synapse assembly’ (GO:0007416) in the motor neurons derived from embryonic stem cells. While it is difficult to name a tissue-specific GO-MF-term which can be related to SMA or neurological diseases, there were several GO-CC-terms: ‘Striated muscle thin filament’ (GO0005865) in the motor neuron data and several terms related to synapses in the motor neurons derived from embryonic stem cells. From the top5 KEGG pathways, only ‘Axon guidance’ (mmu04360) in one of the brain tissues can be directly related to neurological diseases, and from the top5 Reactome pathways ‘Muscle contradiction’ (R-MMU-397014) and ‘Neuronal System’ (R-MMU-112316) make a direct sense in the context of SMA or neurological diseases.

Tissue unspecific genes and gene-sets

From the 31 genes that showed some overlap between the eight comparisons, we want to highlight Mt2 (Metallothionein–1), a members of the Metallothionein family which is a metal binding protein involved in ion homeostasis, protection against oxidative stress and detoxification [15], and was is upregulated in motor neurons and skeletal muscle. Furthermore, Metallothioneins, Mt2 but also Mt1, have been linked to neuroprotection and neuro-regeneration which also promote activities such as neuronal survival and synaptic plasticity which in turn is important in the maintenance of neuronal function and integrity in neurodegenerative diseases [55] including SMA.

The three GO-BP terms that occurred among the top5 in motor neurons and spinal cord tissue, ‘Extracellular matrix organization’ (GO:00300198) has also been mentioned in the context of Duchenne muscular atrophy [59]. The GO-MF term ‘Actin binding’ (GO:0003779) which occurred three times among the top5 has been shown to play a role in sarcopenia, another muscular disease [20]. From the GO-CC enrichment analysis, ‘Collagen-containing extracellular matrix’ (GO:0062023) was among the top5 in motor neurons and skeletal muscle and was recently also mentioned in the context of intervertebral disc generation [61]. Furthermore the CC-term ‘mitochondrial protein-containing complex’ (GO:0098798) was mentioned to play a role in human juvenile Huntington’s disease, another neurological disease (Podvon et al., 2023).

Among the KEGG pathways that were among the top5 of the spinal cord and motor neuron analysis, was ‘Neuroactive ligand-receptor interaction’ (mmu04080) which was reported in the context of seizures and epilepsis [27,; Xiao et al., 2020). Interestingly, also two pathways which are related to cancer were among the top5: ‘MicroRNAs in cancer’ (mmu05206) and ‘PI3K-Akt signaling pathway’ (mmu04151). From the Reactome pathways that were among the top5 in several of the eight comparisons, ‘Hemostasis’ (R-MMU-109582) is naturally linked to muscle cells, and ‘Extracellular matrix organization’ (R-MMU-1474244) has regularly mentioned in the context of neurological disorder 51, 5].

Comparison to original studies, reproducibility issues and limitations

A comparison of the top10 selected genes from our analysis with the findings reported in the original studies showed a very weak overlap. In fact, there were only four genes among our top10 lists that where directly names in the original publications of the six studies: Hspb1, Cdkn1a, Smn1 and Rgs5. Although many of the other genes selected by our analysis can be found in supplementary tables of the original publications, our study shows the weak reproducibility of original findings in transcriptomics studies if different reporting criteria are applied.

While reporting guidelines for microarray and RNAseq experiments where published long time ago – the MIAMI and the MINSEQE guidelines (https://www.ncbi.nlm.nih.gov/geo/info/MIAME.html) – the comparison of our meta-analysis with the original publications shows that an extension of these guidelines would be important for comparative meta-analyses. Specifically, the existing guidelines ask analysts to report methods for data processing and filtering of genes, but they do not make a recommendation which of the results should presented and in which form. For example, not all of the original publications provided lists of differentially expressed genes with p-values and fold changes but we did.

Furthermore, as an extension of the original analyses, we could identify interesting correlations between the top differentially expressed genes and modules of co-expressed genes by means of network reconstruction and co-expression analysis.

Although, a comparative meta-analysis, as presented here, has several advantages over individual studies, such as the harmonized way of analysis and the possibility of comparisons between study settings, there remain the same limitations as in single studies. Among these limitations are the risk for false negatives and false positives, and in general a risk for making arbitrary or biased selections. Therefore, we avoided to over-interpret the results and are aware that the studies included, here, have very small sample sizes. This summary of six transcriptomic studies can therefore not be seen as a proof of facts, but may help to further support findings of future omics studies on SMA.

Data availability

The datasets analysed during the current study are available in the Gene Expression Omnibus (GEO) and ArrayExpress (AE) repository,

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE207890

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE154106

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102204

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE56284

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSe10224

https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MEXP-2428?query=E-MEXP-2428

Abbreviations

AE:

Array Express

Aste1:

Asteroid Homolog 1

BP:

Biological Process

Cacna2d2:

Calcium Voltage-Gated Channel Auxiliary Subunit Alpha2delta 2

CC:

Cellular Component

cDNA:

Complementary DNA

Chodl:

Chondrolectin

DEGs:

Differentially Expressed Genes

ES:

Enrichment score

F2rl2:

Coagulation Factor II Thrombin Receptor Like 2

Fam134b:

Reticulophagy Regulator 1

GEO:

Gene Expression Omnibus

GO:

Gene Ontology

GSEA:

Gene-Set Enrichment Analysis

Hspb1:

Heat Shock Protein Family B Member 1

Igk-V1:

Immunoglobulin Kappa Variable 1

KEGG:

Kyoto Encyclopedia of genes and genomes

MF:

Molecular Function

mRNA:

Messenger RNA

Mt1:

Metallothionein 1A

Mt2:

Metallothionein 2A

Nr0b1:

Nuclear Receptor Subfamily 0 Group B Member 1

P4ha1:

Prolyl 4-Hydroxylase Subunit Alpha 1

PRISMA:

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Rad:

Ras Related Glycolysis Inhibitor And Calcium Channel Regulator

Slco1b2:

Solute carrier organic anion transporter family, member 1b2

SMA:

Spinal Muscular Atrophy

Snrpa1:

Small Nuclear Ribonucleoprotein Polypeptide A

SRA:

Sequence Read Archive

St14:

Transmembrane Serine Protease Matriptase

Sult1a1:

Sulfotransferase Family 1A Member 1

Tnfsf10:

TNF Superfamily Member 10

TOM:

Topological Overlap Matrix

WGCNA:

Weighted Gene Co-expression Network Analysis

References

  1. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Soboleva A, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 2012;41(D1):D991–5.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodol). 1995;57(1):289–300.

    Article  Google Scholar 

  3. Bernabo P, Tebaldi T, Groen EJ, Lane FM, Perenthaler E, Mattedi F, Viero G, et al. In vivo tranpaolslatome profiling in spinal muscular atrophy reveals a role for SMN protein in ribosome biology. Cell Rep. 2017;21(4):953–65.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–93.

    Article  PubMed  CAS  Google Scholar 

  5. Bonneh-Barkay D, Wiley CA. Brain extracellular matrix in neurodegeneration. Brain Pathol. 2009;19(4):573–85.

    Article  PubMed  CAS  Google Scholar 

  6. Brown LA, Peirson SN. Improving reproducibility and candidate selection in transcriptomics using meta-analysis. J Exp Neurosci. 2018;12:1179069518756296.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Brown LA, Williams J, Taylor L, Thomson RJ, Nolan PM, Foster RG, Peirson SN. Meta-analysis of transcriptomic datasets identifies genes enriched in the mammalian circadian pacemaker. Nucleic Acids Res. 2017;45(17):9860–73.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Butcher NJ, Horne MK, Mellick GD, Fowler CJ, Masters CL, Minchin RF. Sulfotransferase 1A3/4 copy number variation is associated with neurodegenerative disease. Pharmacogenomics J. 2018;18(2):209–14.

    Article  PubMed  CAS  Google Scholar 

  9. Chen SC, Leach MW, Chen Y, Cai XY, Sullivan L, Wiekowski M, Lira SA, et al. Central nervous system inflammation and neurological disease in transgenic mice expressing the CC chemokine CCL21 in oligodendrocytes. J Immunol. 2002;168(3):1009–17.

    Article  PubMed  CAS  Google Scholar 

  10. Chemello F, Pozzobon M, Tsansizi LI, Varanita T, Quintana-Cabrera R, Bonesso D, Bean C, et al. Dysfunctional mitochondria accumulate in a skeletal muscle knockout model of Smn1, the causal gene of spinal muscular atrophy. Cell Death Dis. 2023;14(2):162.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Cochran BJ, Ong KL, Manandhar B, Rye KA. APOA1: a protein with multiple therapeutic functions. Curr Atheroscler Rep. 2021;23:1–10.

    Article  Google Scholar 

  12. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33(18):2938–40.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Corti S, Nizzardo M, Nardini M, Donadoni C, Salani S, Ronchi D, Comi GP, et al. Neural stem cell transplantation can ameliorate the phenotype of a mouse model of spinal muscular atrophy. J Clin Invest. 2008;118(10):3316–30.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Csardi G, Nepusz T. The igraph software. Complex Syst. 2006;1695:1–9.

    Google Scholar 

  15. Dai H, Wang L, Li L, Huang Z, Ye L. Metallothionein 1: a new spotlight on inflammatory diseases. Front Immunol. 2021;12:739918.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Gingeras TR, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.

    Article  PubMed  CAS  Google Scholar 

  17. Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009;4(8):1184–91.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207–10.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Feng W, Shao C, Liu HK. Versatile roles of the chromatin remodeler CHD7 during brain development and disease. Front Mol Neurosci. 2017;10:309.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Furutani M, Suganuma M, Akiyama S, Mitsumori R, Takemura M, Matsui Y, Shigemizu D, et al. RNA-sequencing analysis identification of potential biomarkers for diagnosis of sarcopenia. J Gerontol: Ser A. 2023;78(11):1991–8.

    Article  CAS  Google Scholar 

  21. Grifoni SC, Jernigan NL, Hamilton G, Drummond HA. ASIC proteins regulate smooth muscle cell migration. Microvasc Res. 2008;75(2):202–10.

    Article  PubMed  CAS  Google Scholar 

  22. Heilig R, Lemaire C, Mandel JL, Dandolo L, Amar L, Avner P. Localization of the region homologous to the Duchenne muscular dystrophy locus on the mouse X chromosome. Nature. 1987;328(6126):168–70.

    Article  PubMed  CAS  Google Scholar 

  23. Hemandhar Kumar S, Tapken I, Kuhn D, Claus P, Jung K. bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses. Front Bioinform. 2024;4:1380928.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Hensel N, Cieri F, Santonicola P, Tapken I, Schüning T, Taiana M, Claus P, et al. Impairment of the neurotrophic signaling hub B-Raf contributes to motoneuron degeneration in spinal muscular atrophy. Proc Natl Acad Sci. 2021;118(18):e2007785118.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Hodge JC, Mitchell E, Pillalamarri V, Toler TL, Bartel F, Kearney HM, Talkowski ME, et al. Disruption of MBD5 contributes to a spectrum of psychopathology and neurodevelopmental abnormalities. Mol Psychiatry. 2014;19(3):368–79.

    Article  PubMed  CAS  Google Scholar 

  26. Hodgson S, Boswinkel E, Cole C, Walker A, Dubowitz V, Granata C, Bobrow M, et al. A linkage study of Emery-Dreifuss muscular dystrophy. Hum Genet. 1986;74:409–16.

    Article  PubMed  CAS  Google Scholar 

  27. Hu X, Fu X, Jiang AO, Yang X, Fang X, Gong G, Wei C. Multiomic analysis of mice epilepsy models suggest that miR-21a expression modulates mRNA and protein levels related to seizure deterioration. Genet Res. 2015;97:e22.

    Article  Google Scholar 

  28. Hurd EA, Poucher HK, Cheng K, Raphael Y, Martin DM. The ATP-dependent chromatin remodeling enzyme CHD7 regulates pro-neural gene expression and neurogenesis in the inner ear. Development. 2010;137(18):3139–50.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Kosch R, Jung K. Conducting gene set tests in meta-analyses of transcriptome expression data. Res Synth Methods. 2019;10(1):99–112.

    Article  PubMed  Google Scholar 

  30. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:1–13.

    Article  Google Scholar 

  31. Lauria F, Bernabò P, Tebaldi T, Groen EJN, Perenthaler E, Maniscalco F, Viero G, et al. SMN-primed ribosomes modulate the translation of transcripts related to spinal muscular atrophy. Nat Cell Biol. 2020;22(10):1239–51.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Lefebvre S, Bürglen L, Reboullet S, Clermont O, Burlet P, Viollet L, Melki J, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell. 1995;80(1):155–65.

    Article  PubMed  CAS  Google Scholar 

  34. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2010;39(suppl_1):D19–21.

    PubMed  PubMed Central  Google Scholar 

  35. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30.

    Article  PubMed  CAS  Google Scholar 

  36. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21.

    Article  Google Scholar 

  37. Maeda M, Harris AW, Kingham BF, Lumpkin CJ, Opdenaker LM, McCahan SM, Butchbach ME, et al. Transcriptome profiling of spinal muscular atrophy motor neurons derived from mouse embryonic stem cells. PloS one. 2014;9(9):e106818.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Marot G, Foulley JL, Mayer CD, Jaffrézic F. Moderated effect size and P-value combinations for microarray meta-analyses. Bioinformatics. 2009;25(20):2692–9.

    Article  PubMed  CAS  Google Scholar 

  39. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17(1):10–2.

    Article  Google Scholar 

  40. Meyer PE, Lafitte F, Bontempi G. minet: AR/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics. 2008;9:1–10.

    Article  Google Scholar 

  41. Murray LM, Lee S, Bäumer D, Parson SH, Talbot K, Gillingwater TH. Pre-symptomatic development of lower motor neuron connectivity in a mouse model of severe spinal muscular atrophy. Hum Mol Genet. 2010;19(3):420–33.

    Article  PubMed  CAS  Google Scholar 

  42. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Moher D, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71.

    Article  PubMed  PubMed Central  Google Scholar 

  43. R Core Team, R. (2013). R: A language and environment for statistical computing

    Google Scholar 

  44. Radzishevsky I, Odeh M, Bodner O, Zubedat S, Shaulov L, Litvak M, Wolosker H, et al. Impairment of serine transport across the blood–brain barrier by deletion of Slc38a5 causes developmental delay and motor dysfunction. Proc Natl Acad Sci. 2023;120(42):e2302780120.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Ramirez LM, Goukasian N, Porat S, Hwang KS, Eastman JA, Hurtz S, Apostolova LG, et al. Common variants in ABCA7 and MS4A6A are associated with cortical and hippocampal atrophy. Neurobiol Aging. 2016;39:82–9.

    Article  PubMed  CAS  Google Scholar 

  46. Rau A, Marot G, Jaffrézic F. Differential meta-analysis of RNA-seq data from multiple studies. BMC Bioinformatics. 2014;15:1–10.

    Article  Google Scholar 

  47. Ritchie ME, Phipson B, Wu DI, Hu Y, Law CW, Shi W, Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47–e47.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Saedi S, Panahi R, Orak N, Jafarzadeh Shirazi MR. Comparative meta-analysis of adipose tissue transcriptomics data in PCOS patients and healthy control women. Reprod Sci. 2023;30(6):1823–33.

    Article  PubMed  CAS  Google Scholar 

  49. Sarkans U, Gostev M, Athar A, Behrangi E, Melnichuk O, Ali A, McEntyre J, et al. The BioStudies database—one stop shop for all data supporting a life sciences study. Nucleic Acids Res. 2018;46(D1):D1266–70.

    Article  PubMed  CAS  Google Scholar 

  50. Sengupta MB, Saha S, Mohanty PK, Mukhopadhyay KK, Mukhopadhyay D. Increased expression of ApoA1 after neuronal injury may be beneficial for healing. Mol Cell Biochem. 2017;424:45–55.

    Article  PubMed  CAS  Google Scholar 

  51. Soleman S, Filippov MA, Dityatev A, Fawcett JW. Targeting the neural extracellular matrix in neurological disorders. Neuroscience. 2013;253:194–213.

    Article  PubMed  CAS  Google Scholar 

  52. Sweeney TE, Haynes WA, Vallania F, Ioannidis JP, Khatri P. Methods to increase reproducibility in differential gene expression via meta-analysis. Nucleic Acids Res. 2017;45(1):e1–e1.

    Article  PubMed  Google Scholar 

  53. Tseng GC, Ghosh D, Feingold E. Comprehensive literature review and statistical considerations for microarray meta-analysis. Nucleic Acids Res. 2012;40(9):3785–99.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Vitti Gambim V, Laufer-Amorim R, Fonseca Alves RH, Grieco V, Fonseca-Alves CE. A comparative meta-analysis and in silico analysis of differentially expressed genes and proteins in canine and human bladder cancer. Front Vet Sci. 2020;7:558978.

    Article  PubMed  PubMed Central  Google Scholar 

  55. West AK, Hidalgo J, Eddins D, Levin ED, Aschner M. Metallothionein in the central nervous system: roles in protection, regeneration and cognition. Neurotoxicology. 2008;29(3):489–503.

    Article  PubMed  CAS  Google Scholar 

  56. Wingett SW, Andrews S. FastQ screen: a tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Winter C, Camarão AA, Steffen I, Jung K. Network meta-analysis of transcriptome expression changes in different manifestations of dengue virus infection. BMC Genomics. 2022;23(1):165.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Yu G, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation. 2021;2(3):100141.

    PubMed  PubMed Central  CAS  Google Scholar 

  59. Xu X, Hao Y, Xiong S, He Z. Comprehensive analysis of long non-coding RNA-associated competing endogenous RNA network in Duchenne muscular dystrophy. Interdiscip Sci: Comput Life Sci. 2020;12:447–60.

    Article  CAS  Google Scholar 

  60. Zhang Y, Parmigiani G, Johnson WE. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom Bioinform. 2020;2(3):lqaa078.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Zhang GZ, Li L, Luo ZB, Zhang CY, Wang YG, Kang XW. Identification and experimental validation of key extracellular proteins as potential targets in intervertebral disc degeneration. Bone Joint Res. 2023;12(9):522–35.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Not applicable

Funding

Open Access funding enabled and organized by Projekt DEAL. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956185. We acknowledge financial support by the Open Access Publication Fund of the University of Veterinary Medicine Hannover, Foundation.

Author information

Authors and Affiliations

Authors

Contributions

SHK contributed to analysis and interpretation of the data, and drafted the manuscript. PC, KB contributed to the interpretation of the data. KJ contributed to the concept and design of the work and to the analysis and interpretation of the data. All authors have contributed to finalize the manuscript and read and approved the final manuscript.

Corresponding author

Correspondence to Klaus Jung.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, S.H., Brandt, K., Claus, P. et al. Comparative meta-analysis of transcriptomic studies in spinal muscular atrophy: comparison between tissues and mouse models. BMC Med Genomics 17, 266 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12920-024-02040-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12920-024-02040-0

Keywords