An Enhancement of K-Nearest Neighbor Algorithm’s Data Pre-Processing for Dataset Classifications in Predicting Multiple Medical Diseases
Abstract
Purpose – This research intends to improve the K-Nearest Neighbor Algorithm's data preparation, emphasizing improving disease prediction across datasets of varied sizes by addressing imbalanced datasets and optimizing the selection of an effective k value.
Method – The researchers utilized SMOTE and GridSearch to address challenges in the K-Nearest Neighbor Algorithm. SMOTE balanced the datasets to prevent inaccurate representations, while GridSearch improved the k value accuracy, reducing challenges with constant fixed k values. These techniques contributed to the study's overall effectiveness in accurately predicting diseases.
Results – When compared to eight datasets, the improved K-Nearest Neighbor algorithm consistently surpasses the previous approach in terms of accuracy, precision, RMSE, MSE, and t-test evaluation. The findings suggest that the enhanced KNN algorithm outperformed the existing KNN method in terms of prediction. This resulted in improved performance in predicting a wide range of medical problems across eight datasets.
Conclusion – In conclusion, the study effectively aimed to boost the performance of the K-Nearest Neighbor (KNN) algorithm in categorizing medical conditions through enhanced data pre-processing techniques. Ultimately, the study's findings show that the enhanced KNN algorithm is effective in accurately predicting medical disease across a variety of datasets.
Recommendations – The researchers recommend employing high-dimensional datasets to address the 'Dimensionality Curse’ and to further ascertain the significance of this study. The results of this study will help improve medical diagnostics by predicting diseases more accurately.
Research Implications – The outcomes of this study offer improved medical diagnostics through more precise disease prediction, hence improving the effectiveness of the K-Nearest Neighbor (KNN) algorithm in identifying various health conditions.
Practical Implications – Through these enhancements, healthcare practitioners will be able to take action quickly, providing early treatment interventions and individualized treatment approaches, as disease prediction becomes more accurate.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.