Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 16;37(7):110022.
doi: 10.1016/j.celrep.2021.110022.

Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing

Affiliations

Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing

Szi Kay Leung et al. Cell Rep. .

Abstract

Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community.

Keywords: isoform, transcript, expression, brain, cortex, mouse, human, adult, fetal, long-read sequencing, alternative splicing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests Z.A. and D.A.C. were full-time employees of Eli Lilly & Company, Ltd., and E.T. was a full-time employee of PacBio at the time this work was performed. All other authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Generation of high-quality long-read transcriptome datasets for human and mouse cerebral cortex (A) The distribution of CCS read lengths in our human (n = 7 biologically independent samples) and mouse (n = 12 biologically independent samples) cortex Iso-Seq datasets. The distribution of CCS read lengths for individual samples can be found in Figure S3. (B) Distance between transcription start site (TSS) and closest annotated CAGE peak. A negative value refers to a CAGE peak located upstream of a TSS. (C) The distribution of coding potential scores for all transcripts detected in the human cortex. (D) The ORF lengths for transcripts predicted to be protein-coding. Equivalent plots for mouse cortex can be found in Figures S7A and S7B. (E) The number of isoforms identified per gene detected in the human and mouse cortex. (F) UCSC genome browser track of transcripts annotated to MEG3 in the human cortex. Transcripts are colored based on SQANTI2 classification categories (blue = FSM; cyan = ISM; red = NIC; orange = NNC).
Figure 2
Figure 2
A large proportion of cortical transcripts are not described in existing annotations (A) A transcript was classified as “FSM” if it aligned with the reference genome with the same splice junctions and contained the same number of exons; “ISM” if it contained fewer 5′ exons than the reference genome; “NIC” if it represented a novel transcript containing a combination of known donor or acceptor sites; and “NNC” if it represented a novel transcript with at least one novel donor or acceptor site. (B) Approximately half of all transcripts identified in the human cortex were FSM, with a large proportion of transcripts assigned as being novel (NIC, NNC). (C and D) Distribution of (C) ORF length and (D) coding probability of transcripts by category. A similar ORF length and CPAT probability score profile was observed for FSM, NIC, and NNC transcripts. Equivalent plots for mouse cortex can be found in Figures S7C and S7D. (E) Shown is a UCSC genome browser track of VTI1A in the human cortex. Interrogation of human protein data identified a peptide (NELLGDDGNSSENQLIK, highlighted blue) that confirmed inclusion of a novel exon.
Figure 3
Figure 3
A subset of genes are characterized by dramatic differences in cortical transcript diversity between species (human and mouse) and between developmental stages (fetal and adult) (A–E) UCSC genome browser tracks showing transcripts detected for (A) SORBS1 in human cortex (n = 5 transcripts); (B) Sorbs1 in mouse cortex (n = 86 transcripts); (C) TMEM191C in human cortex (n = 1 transcript); (D) Tmem191c in mouse cortex (n = 30 transcripts); and (E) SEPT4 in human adult cortex (n = 34 transcripts) and human fetal cortex (n = 2 transcripts). Additional examples of genes with considerable differences in the number of transcripts between human and mouse cortex are shown in Figures S9A–S9D. Additional examples of genes with considerable differences in the number of transcripts between fetal and adult cortex are shown in Figures S16A and S16B. For each gene, RNA-seq data tracks from human cortex (n = 3 samples) and mouse cortex (n = 12 samples) are also displayed. Transcripts are colored based on SQANTI2 classification categories (blue = FSM; cyan = ISM; red = NNC; orange = NIC).
Figure 4
Figure 4
Examples of fusion transcripts in the cortex (A) A fusion transcript incorporating exons from ELAC1 and SMAD4 in the human cortex. (B) Two read-through transcripts incorporating exons from MAPK3 and GDPD3 in the human cortex. Of note, one of the fusion transcripts is characterized by intron retention, as observed in another novel isoform of MAPK3. (C) A fusion transcript incorporating exons from FOXG1 and LINC01551 in the human cortex. (D) A fusion transcript incorporating exons across three pseudogenes in the human cortex. (E) Fusion transcripts with exons from SMIM17/Smim17 were identified in both human and mouse cortex. Additional examples of overlapping fusion transcripts between human and mouse cortex are shown in Figures S12A–S12D. (F) An example of a novel antisense transcript spanning Serpina1e and Serpina11 in the mouse cortex. Transcripts are colored based on SQANTI2 classification categories (blue = FSM; cyan = ISM; red = NNC; orange = NIC).
Figure 5
Figure 5
Alternative splicing (AS) events make a major contribution to transcript diversity in the cortex (A) An overview of the different types of AS considered in our analysis. (B) Alternative first (AF) exon use is the most prevalent AS event in both the human cortex and mouse cortex (Figure S14A). (C) The majority of human cortex-expressed genes are predominantly characterized by AF and SE. (D) AF events are supported by RNA-seq data. The differing lengths of first exon of CELF2 in human cortex correspond to differing RNA-seq coverage. (E) A large proportion of AS genes in human and mouse cortex are characterized by more than one type of splicing event. (F) Shown is a UCSC genome browser track of RELCH with a novel peptide (VAEHEVPLQER, highlighted blue) spanning across exons 2 and 4 of RELCH while skipping exon 3, confirming exon skipping in a novel transcript. (G) A novel peptide (GAELAGIGVGLR, highlighted blue) confirms translation of a retained intronic region observed in a transcript of RGS11.

Similar articles

Cited by

References

    1. Akiva P., Toporik A., Edelheit S., Peretz Y., Diber A., Shemesh R., Novik A., Sorek R. Transcription-mediated gene fusion in the human genome. Genome Res. 2006;16:30–36. - PMC - PubMed
    1. Amarasinghe S.L., Su S., Dong X., Zappia L., Ritchie M.E., Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 2020;21:30. - PMC - PubMed
    1. Ameur A., Zaghlool A., Halvardson J., Wetterbom A., Gyllensten U., Cavelier L., Feuk L. Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human brain. Nat. Struct. Mol. Biol. 2011;18:1435–1440. - PubMed
    1. Andrews S.J., Fulton-Howard B., Goate A. Interpretation of risk loci from genome-wide association studies of Alzheimer’s. The Lancet. 2020;19:326–335. - PMC - PubMed
    1. Bekris L.M., Yu C.E., Bird T.D., Tsuang D.W. Genetics of Alzheimer disease. J. Geriatr. Psychiatry Neurol. 2010;23:213–227. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources