Understanding the role of diversity in ensemble-based AutoML methods for classification tasks

Osei, Salomey; Masegosa, Andrés R.; Masegosa Arredondo, Antonio David

Understanding the role of diversity in ensemble-based AutoML methods for classification tasks

Archivos

osei_understanding_2025.pdf (5.82 MB)

Fecha

2025-04-17

Autores

Osei, Salomey

Masegosa, Andrés R.

Masegosa Arredondo, Antonio David

Editor

Institute of Electrical and Electronics Engineers Inc.

Resumen

Ensemble-based Automated Machine Learning (AutoML) methods have gained prominence for their ability to combine diverse machine learning models, achieving superior generalization performance. Despite their empirical success, the underlying mechanisms driving this performance, particularly the role of model diversity, are not yet adequately understood. This study uses novel theoretical frameworks related to the role of diversity in ensembles, which were recently proposed, to shed light on this issue. In this work, we focus on AutoML methods for classification tasks. We use AUTO-SKLEARN (a widely used AutoML ensemble-based method) as a basis. More specifically, we examine how individual model diversity and performance evolves across the four key phases of AUTO-SKLEARN (base-learners, meta-learning, Bayesian Optimization (BO), and Caruana Ensemble). We also examine how they contribute to the diversity and performance of the final ensemble produced by the AutoML method. Using datasets from the AutoML benchmark, we empirically validate these insights by analyzing error rates and diversity measures across the mentioned phases. Our findings highlight the trade-off between individual model accuracy and ensemble diversity, showing that phases like BO improve the mean error rate of classifiers by nearly 50% percent but reduce their mean diversity by 20%. However, the Caruana phase increases the diversity by a 50% compared to the BO phase, allowing better generalization despite the higher mean error rate of the selected individual models (48% higher than BO). This work provides theoretical and empirical evidence that diversity is critical to the success of ensemble-based AutoML methods and a deeper understanding of diversity’s impact on generalization performance and the role of the different AutoML phases. These findings can contribute to advance the development of more robust and theoretically grounded AutoML frameworks

Palabras clave

AUTO-SKLEARN
Automated machine learning (AutoML)
Bayesian optimization (BO)
Diversity
Ensemble learning

Cita

Osei, S., Masegosa, A. R., & Masegosa, A. D. (2025). Understanding the role of diversity in ensemble-based AutoML methods for classification tasks. IEEE Access, 13, 63566-63586. https://doi.org/10.1109/ACCESS.2025.3554093

URI

http://hdl.handle.net/20.500.14454/2755

Colecciones

Artículos

Página completa del ítem