Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability

dc.contributor.authorGuerrero Tamayo, Ana
dc.contributor.authorSanz Urquijo, Borja
dc.contributor.authorOlivares, Isabel
dc.contributor.author Moragues Tosantos, María Dolores
dc.contributor.authorCasado, Concepción
dc.contributor.authorPastor López, Iker
dc.date.accessioned2025-03-06T10:37:49Z
dc.date.available2025-03-06T10:37:49Z
dc.date.issued2024-08
dc.date.updated2025-03-06T10:37:49Z
dc.description.abstractThe global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/ 6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism’s unique attributes.en
dc.description.sponsorshipThis work was supported by the Research Training Grants Program - University of Deusto: Ref. FPI UD_2021_10en
dc.identifier.citationGuerrero-Tamayo, A., Sanz Urquijo, B., Olivares, I., Moragues Tosantos, M.-D., Casado, C., & Pastor-López, I. (2024). Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS ONE, 19(8). https://doi.org/10.1371/JOURNAL.PONE.0309391
dc.identifier.doi10.1371/JOURNAL.PONE.0309391
dc.identifier.eissn1932-6203
dc.identifier.urihttp://hdl.handle.net/20.500.14454/2463
dc.language.isoeng
dc.publisherPublic Library of Science
dc.rights© 2024 Guerrero-Tamayo et al.
dc.titleClassification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretabilityen
dc.typejournal article
dcterms.accessRightsopen access
oaire.citation.issue8
oaire.citation.titlePLoS ONE
oaire.citation.volume19
oaire.licenseConditionhttps://creativecommons.org/licenses/by/4.0/
oaire.versionVoR
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
guerrero_classification_2024.pdf
Tamaño:
3.26 MB
Formato:
Adobe Portable Document Format
Colecciones