مشخصات پژوهش

صفحه نخست /Improving the Classification ...
عنوان Improving the Classification Accuracy of Diabetes Using Three Way Clustering
نوع پژوهش پایان نامه
کلیدواژه‌ها Diabetes Mellitus; Machine Learning; Three-Way Clustering; Explainability; Feature Enrichment
چکیده Diabetes Mellitus poses a significant global health challenge, with early detection being critical to mitigating complications and improving patient outcomes. This study addresses the need for accurate and interpretable predictive models for diabetes classification, overcoming limitations of black-box machine learning approaches and traditional clustering methods that struggle with class imbalance and transparency. By integrating novel three-way clustering with advanced machine learning and explainability techniques, this framework enhances both predictive performance and clinical applicability. The proposed methodology leverages K-Medoids with cosine distance to generate three-way clustering features (core, fringe, outlier), enriching the feature space of two real-world datasets: the Mendeley dataset (1,000 patients; multi-class: Non-Diabetic, Prediabetic, Diabetic) and the KRD dataset (1,012 pregnant women; binary: Non-Diabetic, Diabetic). SMOTE preprocessing balanced the datasets to 2,454 and 1,012 samples, respectively, followed by training with XGBoost under stratified 10-fold cross-validation. SHAP (Shapley Additive Explanations) provided global and local interpretability, ensuring transparency in model predictions. The framework achieved exceptional results, with XGBoost yielding an accuracy of 0.9952, F1-score of 0.9952, and AUC of 0.9999 on the Mendeley dataset, and an accuracy of 0.8804, F1-score of 0.8799, and AUC of 0.9380 on the KRD dataset. The three-way clustering features significantly reduced false negatives, enhancing early detection of prediabetic and diabetic cases. SHAP analysis revealed key predictors (HbA1c, BMI, heredity, cholesterol, triglycerides), aligning with clinical guidelines and providing patient-specific insights that bridge the gap between algorithmic performance and clinical trust. This framework outperforms recent studies, offering a robust balance of predictive accuracy and interpretability, making it a promising decision-support tool for precision medicine. Future work should focus on validating the model on larger, more diverse datasets, incorporating longitudinal and lifestyle factors, and optimizing computational efficiency for real-time clinical applications. Combining SHAP with complementary explainability methods, such as LIME, could further enhance clinician confidence and patient engagement.
پژوهشگران صادق سلیمانی (استاد راهنما)، کاردو ابراهیم نورالدین (دانشجو)، چیمن حیدر صالح (استاد مشاور)