Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability
dc.contributor.author | Guerrero Tamayo, Ana | |
dc.contributor.author | Sanz Urquijo, Borja | |
dc.contributor.author | Olivares, Isabel | |
dc.contributor.author | Moragues Tosantos, María Dolores | |
dc.contributor.author | Casado, Concepción | |
dc.contributor.author | Pastor López, Iker | |
dc.date.accessioned | 2025-03-06T10:37:49Z | |
dc.date.available | 2025-03-06T10:37:49Z | |
dc.date.issued | 2024-08 | |
dc.date.updated | 2025-03-06T10:37:49Z | |
dc.description.abstract | The global impact of the SARS-CoV-2 pandemic has underscored the need for a deeper understanding of viral evolution to anticipate new viruses or variants. Genetic recombination is a fundamental mechanism in viral evolution, yet it remains poorly understood. In this study, we conducted a comprehensive research on the genetic regions associated with genetic recombination features in SARS-CoV-2. With this aim, we implemented a two-phase transfer learning approach using genomic spectrograms of complete SARS-CoV-2 sequences. In the first phase, we utilized a pre-trained VGG-16 model with genomic spectrograms of HIV-1, and in the second phase, we applied HIV-1 VGG-16 model to SARS-CoV-2 spectrograms. The identification of key recombination hot zones was achieved using the Grad-CAM interpretability tool, and the results were analyzed by mathematical and image processing techniques. Our findings unequivocally identify the SARS-CoV-2 Spike protein (S protein) as the pivotal region in the genetic recombination feature. For non-recombinant sequences, the relevant frequencies clustered around 1/6 and 1/12. In recombinant sequences, the sharp prominence of the main hot zone in the Spike protein prominently indicated a frequency of 1/ 6. These findings suggest that in the arithmetic series, every 6 nucleotides (two triplets) in S may encode crucial information, potentially concealing essential details about viral characteristics, in this case, recombinant feature of a SARS-CoV-2 genetic sequence. This insight further underscores the potential presence of multifaceted information within the genome, including mathematical signatures that define an organism’s unique attributes. | en |
dc.description.sponsorship | This work was supported by the Research Training Grants Program - University of Deusto: Ref. FPI UD_2021_10 | en |
dc.identifier.citation | Guerrero-Tamayo, A., Sanz Urquijo, B., Olivares, I., Moragues Tosantos, M.-D., Casado, C., & Pastor-López, I. (2024). Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS ONE, 19(8). https://doi.org/10.1371/JOURNAL.PONE.0309391 | |
dc.identifier.doi | 10.1371/JOURNAL.PONE.0309391 | |
dc.identifier.eissn | 1932-6203 | |
dc.identifier.uri | http://hdl.handle.net/20.500.14454/2463 | |
dc.language.iso | eng | |
dc.publisher | Public Library of Science | |
dc.rights | © 2024 Guerrero-Tamayo et al. | |
dc.title | Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability | en |
dc.type | journal article | |
dcterms.accessRights | open access | |
oaire.citation.issue | 8 | |
oaire.citation.title | PLoS ONE | |
oaire.citation.volume | 19 | |
oaire.licenseCondition | https://creativecommons.org/licenses/by/4.0/ | |
oaire.version | VoR |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- guerrero_classification_2024.pdf
- Tamaño:
- 3.26 MB
- Formato:
- Adobe Portable Document Format