Understanding the role of diversity in ensemble-based AutoML methods for classification tasks

dc.contributor.authorOsei, Salomey
dc.contributor.authorMasegosa, Andrés R.
dc.contributor.authorMasegosa Arredondo, Antonio David
dc.date.accessioned2025-05-14T11:05:42Z
dc.date.available2025-05-14T11:05:42Z
dc.date.issued2025-04-17
dc.date.updated2025-05-14T11:05:42Z
dc.description.abstractEnsemble-based Automated Machine Learning (AutoML) methods have gained prominence for their ability to combine diverse machine learning models, achieving superior generalization performance. Despite their empirical success, the underlying mechanisms driving this performance, particularly the role of model diversity, are not yet adequately understood. This study uses novel theoretical frameworks related to the role of diversity in ensembles, which were recently proposed, to shed light on this issue. In this work, we focus on AutoML methods for classification tasks. We use AUTO-SKLEARN (a widely used AutoML ensemble-based method) as a basis. More specifically, we examine how individual model diversity and performance evolves across the four key phases of AUTO-SKLEARN (base-learners, meta-learning, Bayesian Optimization (BO), and Caruana Ensemble). We also examine how they contribute to the diversity and performance of the final ensemble produced by the AutoML method. Using datasets from the AutoML benchmark, we empirically validate these insights by analyzing error rates and diversity measures across the mentioned phases. Our findings highlight the trade-off between individual model accuracy and ensemble diversity, showing that phases like BO improve the mean error rate of classifiers by nearly 50% percent but reduce their mean diversity by 20%. However, the Caruana phase increases the diversity by a 50% compared to the BO phase, allowing better generalization despite the higher mean error rate of the selected individual models (48% higher than BO). This work provides theoretical and empirical evidence that diversity is critical to the success of ensemble-based AutoML methods and a deeper understanding of diversity’s impact on generalization performance and the role of the different AutoML phases. These findings can contribute to advance the development of more robust and theoretically grounded AutoML frameworksen
dc.description.sponsorshipThis work was supported in part by European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Grant 847624; in part by Spanish Ministry of Science and Innovation under Project PID2022-140612OB-I00; and in part by the Basque Government under Grant IT1564-22, Grant KK-2023/00012, and Grant KK-2023/00038en
dc.identifier.citationOsei, S., Masegosa, A. R., & Masegosa, A. D. (2025). Understanding the role of diversity in ensemble-based AutoML methods for classification tasks. IEEE Access, 13, 63566-63586. https://doi.org/10.1109/ACCESS.2025.3554093
dc.identifier.doi10.1109/ACCESS.2025.3554093
dc.identifier.eissn2169-3536
dc.identifier.urihttp://hdl.handle.net/20.500.14454/2755
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.rights© 2025 The Authors
dc.subject.otherAUTO-SKLEARN
dc.subject.otherAutomated machine learning (AutoML)
dc.subject.otherBayesian optimization (BO)
dc.subject.otherDiversity
dc.subject.otherEnsemble learning
dc.titleUnderstanding the role of diversity in ensemble-based AutoML methods for classification tasksen
dc.typejournal article
dcterms.accessRightsopen access
oaire.citation.endPage63586
oaire.citation.startPage63566
oaire.citation.titleIEEE Access
oaire.citation.volume13
oaire.licenseConditionhttps://creativecommons.org/licenses/by/4.0/
oaire.versionVoR
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
osei_understanding_2025.pdf
Tamaño:
5.82 MB
Formato:
Adobe Portable Document Format
Colecciones