2025/12/5
Fatemeh Daneshfar

Fatemeh Daneshfar

Academic rank: Assistant Professor
ORCID:
Education: PhD.
H-Index:
Faculty: Faculty of Engineering
ScholarId:
E-mail: f.daneshfar [at] uok.ac.ir
ScopusId: View
Phone:
ResearchGate:

Research

Title
Explainable Multi-Class Classification of Student Performance through Ensemble Machine Learning and Graph-Based Feature Engineering
Type
Thesis
Keywords
Student Performance Prediction, Graph-Based Features, Ensemble Learning, Explainable AI, Online Learning Analytics
Year
2025
Researchers Halalah Ali Ahmad(Student)، Fatemeh Daneshfar(PrimaryAdvisor)، Sadegh Sulaimany(Advisor)

Abstract

Predicting student performance in online learning environments is pivotal for enabling timely interventions and personalized educational strategies, yet challenges such as class imbalance and lack of model transparency often limit practical adoption. This thesis proposes a novel machine learning framework for multi-class prediction of student outcomes (Fail, Pass, Distinction, Withdrawn) using the Open University Learning Analytics Dataset (OULAD) for the AAA module, comprising 712 unique student records with 18 traditional features (e.g., demographic, academic, behavioral) and six graph-based features (e.g., degree centrality, clustering coefficient). By integrating advanced feature engineering, ensemble learning, and explainable AI, the framework delivers high predictive accuracy and interpretable insights, addressing shortcomings in traditional predictive approaches. The methodology leverages a Gower distance-based graph construction to generate relational features, capturing complex student interaction patterns within the OULAD dataset. Class weighting was applied to address the class imbalance (469 Pass, 116 Withdrawn, 84 Fail, 43 Distinction), enhancing predictions for minority classes such as Distinction and Fail. A Voting Classifier, combining Random Forest, Gradient Boosting, AdaBoost, XGBoost, and CatBoost, was evaluated through 5-fold cross-validation. Local Interpretable Model-agnostic Explanations (LIME) ensured transparency by identifying key predictors driving outcome classifications. The framework achieved robust performance, with the Voting Classifier yielding an accuracy of 82.02%, precision of 81.31%, recall of 82.02%, F1-score of 80.88%, and AUC of 92.77%, demonstrating approximately 5.9% improvement in F1-score over recent studies. LIME explanations provided actionable insights, enabling educators to understand student-specific factors and tailor interventions, such as increasing virtual learning environment (VLE) engagement for at-risk students. The framework’s multi-class classification and interpretability mark significant advancements, supporting personalized education in online learning environments. This research advances educational data mining by integrating graph-based feature engineering, ensemble learning, and explainability, setting a new benchmark for student performance prediction. Limitations include the moderate computational complexity of the Voting Classifier and reliance on static features, which may overlook temporal dynamics in student behavior. Future work will explore longitudinal data to model performance trajectories, incorporate Graph Neural Networks (GNNs) for enhanced relational modeling, and validate the framework on diverse datasets to improve generalizability. These advancements will further strengthen the framework’s potential to deliver scalable, interpretable solutions for optimizing student outcomes in online learning.