MASc Seminar: Dataset Creation and Imbalance Mitigation in Big Data: Enhancing Machine Learning Models for Forest Fire Prediction

Tuesday, October 10, 2023 1:00 pm - 2:00 pm EDT (GMT -04:00)

Candidate: Fatemeh Tavakoli

Date: October 10, 2023

Time: 1:00 PM - 2:00 PM

Location: EIT 3142

Supervisor(s): Kshirasagar Naik

Abstract: 

Forest fires pose a mounting threat to both ecological systems and human communities, particularly in the context of climate change. Traditional methods of predicting and managing forest fires have become increasingly inadequate, necessitating innovative approaches. This research addresses the urgent need for effective forest fire prediction and management strategies, specifically in the Canadian context, by harnessing machine learning methodologies. Using Copernicus reanalysis data, this study establishes a comprehensive predictive framework employing four cutting-edge machine learning algorithms. Random Forest, XGBoost, LightGBM, and CatBoost. The study features a robust data preprocessing pipeline, class imbalance correction, and rigorous model evaluation measures. Key contributions include the creation of a feature-rich dataset, pioneering methods for addressing class imbalance in large-scale datasets, and the development of a machine learning framework tailored for forest fire classification. Importantly, the study validates the model that performs best on unseen data to assess real-world applicability and generalizability. The findings have significant implications for data-driven forest management strategies, with the aim of facilitating proactive fire prevention measures on a large scale. One of the pronounced challenges encountered was the class imbalance inherent in the fire detection datasets. To counter this, resampling methodologies, encompassing NearMiss, SMOTE, and SMOTE-ENN, were integrated. To address the class imbalance problem inherent in fire detection tasks, this study found that the Nearmiss method, with a 0.09 sampling ratio, proved instrumental in addressing the class imbalance in fire prediction. This approach resulted in the Random Forest model securing metrics of 78.3\% accuracy, 74.8\% sensitivity, and 78.3\% specificity. Further analysis of XGBoost and LightGBM, post-resampling, unveiled compelling outcomes: XGBoost, when paired with NearMiss version 3 in a 0.09 ratio, surpassed other models, demonstrating an impressive accuracy of 98.08\%, a sensitivity of 86.06\% and a specificity of 93.03\%. Meanwhile, LightGBM reported metrics of 72.38\% accuracy, 76.03\% sensitivity, and 72.36\% specificity. The findings indicate that while high recall from NearMiss Version 3 optimized sensitivity, there was sometimes a trade-off with precision.