PhD defence - Tarek Abdunabi

Tuesday, May 3, 2016 1:00 pm - 1:00 pm EDT (GMT -04:00)

Candidate

Tarek Abdunabi

Title

A Framework for Ensemble Predictive Modeling

Supervisor

Otman Basir

Abstract

Predictive modeling is commonly used to refer to the process of developing a mathematical model, using a learning algorithm, to approximate the relationship between a target and independent variables. The developed model is then used to predict future, unknown, values of the target variable. However, systems designers are faced with two limitations, which often affect the proper approximation of the relationship. Firstly, real-world data sets often contain a substantial quantity of noise (e.g. errors, uninformative or highly correlated predictions), which can mislead the learning algorithm and produce non-optimal or wrong approximations. Secondly, most learning algorithms have limitations of their operations. So, it is possible that the model space considered by the learning algorithm for the problem does not contain the optimal model. As a result of these limitations, the building of a~perfect model/classifier for any given problem is often impossible. On the other hand, different learning algorithms vary in their interpretations of the data and noise, which may lead to different approximations of the relationship between the target and its variables. This diversity between learning algorithms had resulted in the development of ensemble systems.

Ensemble systems have been successfully applied in many fields, such as finance, bioinformatics, medicine, cheminformatics, manufacturing, geography, information security, information retrieval, image retrieval, and recommender systems. The ultimate objective of an ensemble system is to produce better predictions by combining the approximations of different classifiers/models. However, the ensemble performance depends on three main design features. Firstly, the diversity/independence of the base models/classifiers. If all models/classifiers produce similar/correlated predictions, then combining those predictions will not provide any improvement. Diversity is considered to be a key design feature of any successful ensemble system. Secondly, the fusion topology, namely, the selection of a representative topology. Thirdly, the fusion function, namely, the selection of a suitable function. Accordingly, building an effective ensemble system is a complex and challenging process, which requires intuition and deep knowledge of the problem context, and well defined predictive modeling process.

Although several taxonomies have been reported in the literature, which aim to categorize ensemble systems from the system's designer point of view, there are still research gaps need to be addressed. First, a comprehensive framework for developing ensemble systems is not yet available. Second, several strategies have been proposed to inject model diversity in the ensemble; however, there is a shortage of empirical studies that compare the effectiveness of these strategies. Third, most of the ensemble systems research has concentrated on simple problems, and relatively small/low-dimensional data sets. Further experimental research is required to investigate the application of ensemble systems to large and/or high-dimensional data sets, with a variety of data types.

This research attempts to fill these gaps. First, the thesis proposes a framework for ensemble predictive modeling. It coins the term “ensemble predictive modeling’’ to refer to the process of developing ensemble systems. Second, the thesis empirically compares several diversity injection strategies. Third, the thesis validates the proposed framework using two real-world, large/high-dimensional, regression and classification case studies. The empirical results indicate the effectiveness of the proposed framework.