Repositorio Institucional Examinando por Autor "Elsayed, Amgad Monir Mohamed"

Examinando por Autor "Elsayed, Amgad Monir Mohamed"

Mostrando 1 - 4 de 4

An analysis of heuristic metrics for classifier ensemble pruning based on ordered aggregation
(Elsevier Ltd, 2022-04) Elsayed, Amgad Monir Mohamed ; Onieva Caracuel, Enrique; Woźniak, Michał; Martínez Muñoz, Gonzalo
Classifier ensemble pruning is a strategy through which a subensemble can be identified via optimizing a predefined performance criterion. Choosing the optimum or suboptimum subensemble decreases the initial ensemble size and increases its predictive performance. In this article, a set of heuristic metrics will be analyzed to guide the pruning process. The analyzed metrics are based on modifying the order of the classifiers in the bagging algorithm, with selecting the first set in the queue. Some of these criteria include general accuracy, the complementarity of decisions, ensemble diversity, the margin of samples, minimum redundancy, discriminant classifiers, and margin hybrid diversity. The efficacy of those metrics is affected by the original ensemble size, the required subensemble size, the kind of individual classifiers, and the number of classes. While the efficiency is measured in terms of the computational cost and the memory space requirements. The performance of those metrics is assessed over fifteen binary and fifteen multiclass benchmark classification tasks, respectively. In addition, the behavior of those metrics against randomness is measured in terms of the distribution of their accuracy around the median. Results show that ordered aggregation is an efficient strategy to generate subensembles that improve both predictive performance as well as computational and memory complexities of the whole bagging ensemble.
Enhancement of ensemble data mining techniques via soft computing
(Universidad de Deusto, 2021-03-23) Elsayed, Amgad Monir Mohamed; Onieva Caracuel, Enrique; Facultad de Ingeniería; Programa de Doctorado en Ingeniería para la Sociedad de la Información y Desarrollo Sostenible por la Universidad de Deusto
Machine learning (ML) is the area of study that gives computers the ability to learn without being explicitly programmed. Sometimes this will reveal unsuspected correlations and lead to a deeper understanding of the problem. The magic is to \textit{learn from data}, as we are surrounded by data everywhere (user logs, financial data, production data, medical records, etc.). Machine learning is great for complex problems for which there is no good solution at all. Furthermore, ML is suitable for fluctuating environments as it can adapt to new data. While data mining is a related field that aims to discover patterns that were not immediately apparent. There are two important factors that drive this area: usage of effective models that capture the complex data, and design of scalable learning systems that learn from massive datasets. While it has been extensively reported in the literature that pooling together learning models is a desirable strategy to construct robust data mining systems. This is recognized as ensemble data mining. Ensemble systems for pattern classification have been expanded in the literature under the name of multiple classifier system (MCS). In classification tasks, various challenges are encountered, e.g., in terms of the data size, the number of classes, the dimensionality of the feature space, the overlap between instances, the balance between class categories, and the nonlinear complexity of the true unknown hypotheses. Those challenges cause the perfect solutions to be difficult to obtain. A promising solution is to train a set of diverse and accurate base classifiers and to combine them. A primary drawback of classifiers ensemble, despite its remarkable performance, is that it is necessary to combine a large number of classifiers to ensure that the error converges to its asymptotic value. This brings on high computational requirements, including the cost of training, the requirements for storage, and the time for a prediction. In addition, when classifiers are spread over a network, many communication costs are needed. To alleviate these drawbacks, various strategies will be proposed in this thesis. In particular, how soft computing techniques can be incorporated in MCS. Soft computing methods are pioneer computing paradigms that parallel the extraordinary ability of the human mind to reason and learn. Soft computing methods, computational intelligence, use approximate calculations to provide imprecise but usable solutions to unsolvable or just too time-consuming problems. From the literature in MCS, at most, soft computing methods were proposed either to optimize the classifiers' combination function or to select a subset of classifiers instead of aggregating all. However, the efficiency and efficacy of MCS can be still improved through our contributions in this thesis. The efficiency of MCS concerns; fast training, lower storage requirements, higher classification speed, lower communication cost between distributed models. Two directions were followed to achieve that. First, for data level, we apply instance selection (IS) methods as a preprocessing mechanism to decrease the training data-size. This could fast the training of MCS, and the accuracy of models could be increased through focusing on informative samples. Related to this part, we evaluate the interconnection between IS and MCS. Second, for the classifier level, ensemble pruning is a strategy by which a subset of classifiers can be selected while maintaining, even improving, the performance of the original ensemble. For that, we propose a guided search pruning method to combine multiple pruning metrics while retaining their performance. In addition, the simultaneous effect of downsizing the number of samples and downsizing the number of classifiers is analyzed. Furthermore, we analyze recent reordering-based MCS pruning metrics that are recognized as accurate and fast strategies to identify a subset of classifiers. The efficacy of MCS concerns the predictive performance, to go beyond what can be achieved from the state-of-art ensemble algorithms. Related to this part, we propose swarm intelligence (SI) algorithms, as soft computing techniques, to integrate multiple classifier decisions. In connection with that, a framework was proposed to combine three computational intelligence paradigms IS, MCS, and SI algorithms. The objective is to build a more diverse and highly accurate MCS, only from a reduced portion of the available data. In summary, this research introduces novel and improved strategies to increase the efficiency and the efficacy of MCS. Soft computing is applied to optimize the integration of classifiers and to identify the best classifier subsets. The results obtained throughout the thesis can boost the performance of ensemble systems by applying IS methods as a kind of data preprocessing technique. The application of SI algorithms or hybrid versions can be more promising to effectively integrate individuals' decisions. Furthermore, small-size ensembles with training on fewer samples could significantly outperform large-size ensembles that use whole training data. Finally, an analysis of recent heuristic metrics to prune bagging ensembles has been conducted.
Selective ensemble of classifiers trained on selective samples
(Elsevier B.V., 2022-04-14) Elsayed, Amgad Monir Mohamed ; Onieva Caracuel, Enrique; Woźniak, Michał
Classifier ensembles are characterized by the high quality of classification, thanks to their generalizing ability. Most existing ensemble algorithms use all learning samples to learn the base classifiers that may negatively impact the ensemble's diversity. Also, the existing ensemble pruning algorithms often return suboptimal solutions that are biased by the selection criteria. In this work, we present a proposal to alleviate these drawbacks. We employ an instance selection method to query a reduced training set that reduces both the space complexity of the formed ensemble members and the time complexity to classify an instance. Additionally, we propose a guided search-based pruning schema that perfectly explores large-size ensembles and brings on a near-optimal subensemble with less computational requirements in reduced memory space and improved prediction time. We show experimentally how the proposed method could be an alternative to large-size ensembles. We demonstrate how to form less-complex, small-size, and high-accurate ensembles through our proposal. Experiments on 25 datasets show that the proposed method can produce effective ensembles better than Random Forest and baseline classifier pruning methods. Moreover, our proposition is comparable with the Extreme Gradient Boosting Algorithm in terms of accuracy.
Training set selection and swarm intelligence for enhanced integration in multiple classifier systems
(Elsevier Ltd, 2020-10) Elsayed, Amgad Monir Mohamed ; Onieva Caracuel, Enrique; Woźniak, Michał
Multiple classifier systems (MCSs) constitute one of the most competitive paradigms for obtaining more accurate predictions in the field of machine learning. Systems of this type should be designed efficiently in all of their stages, from data preprocessing to multioutput decision fusion. In this article, we present a framework for utilizing the power of instance selection methods and the search capabilities of swarm intelligence to train learning models and to aggregate their decisions. The process consists of three steps: First, the essence of the complete training data set is captured in a reduced set via the application of intelligent data sampling. Second, the reduced set is used to train a group of heterogeneous classifiers using bagging and distance-based feature sampling. Finally, swarm intelligence techniques are applied to identify a pattern among multiple decisions to enhance the fusion process by assigning class-specific weights for each classifier. The proposed methodology yielded competitive results in experiments that were conducted on 25 benchmark datasets. The Matthews correlation coefficient (MCC) is regarded as the objective to be maximized by various nature-inspired metaheuristics, which include the moth-flame optimization algorithm (MFO), the grey wolf optimizer (GWO) and the whale optimization algorithm (WOA).

Examinando por Autor "Elsayed, Amgad Monir Mohamed"

Resultados por página

Opciones de ordenación