Misha Batin

Misha Batin

About the length of genes

A long, long time ago, three years ago, when we started sawing Open Genes, Kostya Rafikov and I discussed that we should write as many physical characteristics as possible into the database. What if we see some interesting coincidences in genes associated with aging and longevity? So what do we know about gene sizes?

Genetics

More evolutionarily conserved genes are often longer. They also often have a larger intron load, that is, a larger number of non-coding protein gene fragments – introns. Introns, as Evgeny Kunin told us, appeared in the DNA of our ancient ancestors after symbiosis with mitochondria. From mitochondria, the new cell, among other things, received bacterial transposons. Which began to integrate into the cell genome and turn into non-coding inserts – introns. This is how alternative splicing came about, and many more proteins are now counted from one gene. More precisely, proteoforms. One gene is one protein, but there are several forms, proteoforms. They are usually of three types:

  1. alternative splicing products;
  2. proteoforms containing single amino acid polymorphisms (SAPs) arising from nonsynonymous single nucleotide polymorphisms (nsSNPs);
  3. those that undergo post-translational modifications.


In 2016, biochemists from the Institute of Biochemistry and Chemistry of the Russian Academy of Sciences counted1 more than 6 million proteoforms. And there are 19 (21.5?) thousand genes. Do you understand now what an important process is alternative splicing and how interesting is such a parameter as the length of the gene? By the way, there are proteins that are difficult or even impossible to detect. Membranes contain many insoluble proteins. And there are many more genes for which it is impossible to catch transcripts, because some parts of the chromosomes have inaccessible chromatin: if the expression level of transcripts is too low for the sensitivity of modern instruments, mass spectrometry will not see them. There are still peptides (short proteins) and they were counted for 20222 954 162. Moreover, peptide regulation is the top one in aging, requiring an incredible amount of research, just like everything else. According to Kunin, thanks to the fusion with mitochondria, prokaryotes also had a nucleus, the separation of transcription (in the nucleus) from translation (in the cytosol of the cell). But this is a different story. At the same time, the length of the gene is a fickle thing. A gene gets longer as it evolves, partly due to insertions of transposable elements. But after duplication, the gene may be shorter. Duplication is when once, and a second copy of the gene appears due to different chromosomal cases there. Thus, gene length correlates with both gene duplication and alternative splicing: longer genes are less likely to produce duplicates and more likely to show alternative splicing. evolution line, leading to multicellular animals, marked by an increase in the complexity of the organism, expressed in the development of cellular components and the emergence of new types of cells. Well, aging is closely related to multicellularity. This is how prokaryotes defeated aging and managed without it for two billion years (replicative does not count). Such a complication of the living must occur against the background of genomic changes. And the mechanisms of gene duplication and alternative splicing play a major role here. Both of these sources of evolutionary novelty allow features to emerge without affecting pre-existing ones. Thus, they can contribute to the occurrence of variations necessary for the growth of complexity. More exotic modes of innovation, such as the actual origin of protein-coding genes from non-protein-coding sequences, may also contribute to the evolution of the genome. But, as it is believed, their contribution is unlikely to be comparable with the contribution of gene duplication. In addition, gene loss, which is an integral aspect of the evolutionary process and has been extensive in some lineages, can also cause new genes to emerge. Sometimes a gene present in a particular lineage appears to be new because its homologues in other lineages have been lost in the course of evolution. At the same time, the processes of the appearance and loss of a gene are inextricably linked, since both, as a rule, occur during the period of evolutionary “free fall”, when the gene is free from the restrictions of purifying selection. Another relationship with the size of the gene is the level of its expression. can also cause the emergence of new genes. Sometimes a gene present in a particular lineage appears to be new because its homologues in other lineages have been lost in the course of evolution. At the same time, the processes of the appearance and loss of a gene are inextricably linked, since both, as a rule, occur during the period of evolutionary “free fall”, when the gene is free from the restrictions of purifying selection. Another relationship with the size of the gene is the level of its expression. can also cause the emergence of new genes. Sometimes a gene present in a particular lineage appears to be new because its homologues in other lineages have been lost in the course of evolution. At the same time, the processes of the appearance and loss of a gene are inextricably linked, since both, as a rule, occur during the period of evolutionary “free fall”, when the gene is free from the restrictions of purifying selection. Another relationship with the size of the gene is the level of its expression2. The highest expression is typical for short genes. Highly expressed genes will produce shorter proteins, which will reduce the translation cost of the organism. Positive selection could not pass by such a chance to gain an evolutionary advantage. Why, then, are long genes preserved, from which it is longer and more difficult for the body to synthesize protein? For example, a 100 bp gene will take only a few seconds to complete transcription elongation, while a 2 million bp gene will take a whole day. One of the possible reasons for such a variety of gene sizes is the ability to organize signaling cascades by synthesizing proteins in turn from genes of different lengths. That is, for the necessary proteins to appear in response to some stimulus, not all at once, but with a sequence given by the length of the genes. That is such an echeloned reaction. The same stress-response is so arranged. An article about this by Mats Ljungman et al is titled: Gene length as a biological timer for establishing the temporal regulation of transcription3. Another possible reason for the preservation of long genes in the course of evolution is that alternative splicing, characteristic of long genes, allows for the expansion of protein diversity. Here, in general, a fantastically interesting question arises: why does complexity increase in the course of evolution? Why wouldn’t a person turn back into a bacterium? Everything in this life is so difficult. As described in his 2021 paper by Joao Pedro de Magalhaes4, the main person in the systems biology of aging, long genes are mainly expressed in blood vessels, thyroid, brain and nervous tissue. Whereas the smallest, as a rule, are in the pancreas, skin, stomach, vagina and testicles. Natural selection represses changes for genes with longer transcripts and promotes changes for genes with smaller transcripts. The authors also observed that genes with longer transcripts tend to have more co-expressed genes and protein-protein interactions. Functional analysis of genes has shown that larger transcripts are often associated with the development and functioning of neurons. The smaller ones tend to play a role in skin development and the immune system. In addition, longer genes are involved in signaling pathways, associated with the development of cancer and heart disease. Smaller ones are present in pathways associated with immune responses and neurodegenerative diseases. In addition, as the authors emphasize, genes are more authentic, as a rule, associated with functions that are important in the early stages of development. And shorter genes tend to play a role in everyday functions that are important throughout life and require a quick response to a stimulus.

Now aging. What are "our" genes associated with longevity: long or short?

Long.

With what it can be connected? Longer genes will be characterized by a high mutation rate and genome instability.

In addition, as the team of Lorna Harris showed, with age, the splicing regulation mechanism is disrupted. Which will also disrupt the normal function of these genes.

In addition, as was recently shown5, with age, downregulation of long transcripts and upregulation of short transcripts occurs, which the authors of the work called “transcriptome imbalance due to length.” Which in humans is most noticeable with age in the brain.

Another characteristic feature of long genes was described last year by W. Li and colleagues6. As they found, L1 retrotransposons are predominantly located in longer genes. Naturally.

(By the way, our new book Open Longevity has a whole chapter dedicated to retrotransposons.)

Some of these genes are associated with DNA repair and downregulation of retrotransposons.

That is, retrotransposons suppress those who suppress them.

Retrotransposons can act as transcriptional “barriers” that prevent the expression of host genes. A special type of methylation of their RNA helps retrotrasposons in this: N6-methyladenosine.

Did you understand?! Someone smart is about to make N6-methyladenosine inhibitors and get a startup worth a few hundred million dollars.

If this text gets 1000 comments, we will make a cure for old age. Hahaha. I think 3 people have read this far.

The authors suggest that this RNA methylation-driven L1–host interaction may play a widespread role in gene regulation, genome integrity, human development, and pathology.

It seems such a simple topic, the length of the gene, but look how much we have dug up with you.

I was asked what to do in aging? That’s all to study. Experiment with N6-methyladenosine.

It is also good to systematize all the information useful for combating aging, as we do at open-genes.com

You can’t create a cure for old age by accident. You have to see the whole picture.

Write to us

  1. Ponomarenko, EA, Poverennaya, EV, Ilgisonis, EV, Pyatnitskiy, MA, Kopylov, AT, Zgoda, VG, Lisitsa, AV, & Archakov, AI (2016). The Size of the Human Proteome: The Width and Depth. International journal of analytical chemistry, 2016, 7436849. https://doi.org/10.1155/2016/7436849[]
  2. Grishkevich, V., & Yanai, I. (2014). Gene length and expression level shape genomic novelties. Genome research, 24(9), 1497–1503. https://doi.org/10.1101/gr.169722.113[]
  3. Kirkconnell, KS, Magnuson, B., Paulsen, MT, Lu, B., Bedi, K., & Ljungman, M. (2017). Gene length as a biological timer to establish temporal transcriptional regulation. Cell cycle (Georgetown, Tex.), 16(3), 259–270. https://doi.org/10.1080/15384101.2016.1234550[]
  4. Lopes I, Altab G, Raina P, de Magalhaes JP. Gene Size Matters: An Analysis of Gene Length in the Human Genome. Front Genet. 2021 Feb 11;12:559998. doi:10.3389/fgene.2021.559998[]
  5. Stoeger, T., Grant, RA, McQuattie-Pimentel, AC, Anekalla, K., Liu, SS, Tejedor-Navarro, H., … & Amaral, LAN (2019). Aging is associated with a systemic length-driven transcriptome imbalance. BioRxiv, 691154. https://doi.org/10.1101/691154[]
  6. Xiong, F., Wang, R., Lee, JH, Li, S., Chen, SF, Liao, Z., Hasani, LA, Nguyen, PT, Zhu, X., Krakowiak, J., Lee, DF, Han, L., Tsai, KL, Liu, Y., & Li, W. (2021). RNA m6A orchestration modifications a LINE-1-host interaction that facilitates retrotransposition and contributes to long gene vulnerability. Cell research, 31(8), 861–885. https://doi.org/10.1038/s41422-021-00515-8[]