Grad Seminar: Feature Analysis and Classification of Inflammatory Bowel Disease and Hidradenitis Suppurativa Using Data Mining

Monday, March 6, 2023 2:30 pm - 3:30 pm EST (GMT -05:00)

Abstract

Inflammatory Bowel Disease (IBD) refers to a group of conditions that primarily affect the gut and cause inflammation. In contrast, Hidradenitis Suppurativa (HS) is a chronic immune-mediated condition characterized by boils in a person's underarms, groyne, and/or under their breasts. In recent years, the research on HS has been gaining a growing level of interest in light of reliable recognition of these two diseases (i.e., IBD and HS) becoming crucial in clinical settings.

In this study, multiple machine learning and data mining algorithms will be investigated to shed light on HS versus IBD distinction, methods such as Decision Tree, Random Forest, Naive Bayes, and k-Nearest Neighbor algorithms. These potential solution to recognize HS-IBD boundaries are used to classify IBD and HS disease based on multiple features such as age, illness history, and clinical observations. The thesis conducts a comparative study on the various classification strategies which can be achieved through the use of machine learning in order to recognitize these two diseases. These methods have been applied to the IBD/HS dataset that was collected by the medical professionals at the Mayo clinic, Rochester, MN, USA. The information consists of 198 data records and 52 attributes; however, data cleaning process was necessary before employing the machine learning. During the evaluation, the performance of approaches were compared with respect to their accuracy as the commonly used metric. Based on the findings of the conducted comparisons, it was discovered that the random forest approach performed the best, achieving an accuracy of (93.8 %) for a reduced dataset that contained 20 features for each patient. The detailed results analysis is supported by several visualization techniques such as t-SNE.

In addition, the thesis makes an effort to determine a precise set of criteria and identify the features that are the most significant in separating these two diseases from one another. The results of this study provide medical professionals with the opportunity to investigate aspects that previously were assumed to not play a significant role in clinical practice. To the best of author’s knowledge, this is the first applied study to utilize machine learning and data mining techniques for the IBD and HS classification.

Presenter

Soheila Nadalian, MASc candidate in Systems Design Engineering 

Attending this seminar will count towards the graduate student seminar attendance milestone!