A Diabetes Mellitus Prediction Model Based on Supervised Machine Learning Techniques

Document Type : Original Article

Authors

1 Electronics and Communications Department, Faculty of Engineering, Mansoura University, Dakahlia, Egypt

2 Computers and Control Systems Engineering Department, faculty of engineering Mansoura University, Mansoura, Egypt

3 Head of the Cyber Security Department, Faculty of Artificial Intelligence, Delta University for Science and Technology, Dakahlia, Egypt.

4 Electronics and Communications Department, Faculty of Engineering, Mansoura University, Dakahlia, Egypt.

Abstract

It is no doubt that diabetes is considered one of the most common chronic diseases. Diabetes patients have high risk of diseases like renal failure, heart stroke, nerve and eye damage that can lead to blindness. Detection and prediction of diabetes mellitus is not a very easy process. Nevertheless, the cost of tests is high. Hospitals being busy therefore it could be a revolutionary if one could know the risk of being diabetic with no need to visit hospitals. This could only be done through artificial intelligence. In this paper, a classification model was proposed for diabetes mellitus classification and pre-diction, so that early diagnosis as well as treatment could prolong patients’ lives and minimize risk factors. The classification of datasets in medical healthcare is hindered by the problem of having suitable datasets. Proper processing was performed through null values imputation, normalization and encoding. Supervised algorithms were applied to ensure the effectiveness of the proposed model such as Random Forest (RF), Extreme Gradient Boosting (XGB) and Neural Network (NN). Results were compared using five performance metrics; accuracy, precision, f1-score, recall and run time. Training and testing are performed on two datasets. Results demonstrated that RF has overtaken both remaining techniques by achieving 80.5% accuracy compared to 79.65% for XGB and 76.36% for NN on the first dataset. While the second dataset results indicated RF superiority among remaining models by achieving an accuracy of 97.11% compared to 93.38% and 93.26 for NN and XGB respectively.

Keywords