An Enhancement of K-Nearest Neighbor Algorithm’s Data Pre-Processing for Dataset Classifications in Predicting Multiple Medical Diseases

  • Madeleine S. Tisang Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines
  • Jaira Venessa C. Obmina Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines
  • Francis Arlando L. Atienza Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines
  • Jonathan C. Morano Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines
  • Leisyl M. Mahusay Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines
  • Jamillah S. Guialil Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Abstract

Purpose – This research intends to improve the K-Nearest Neighbor Algorithm's data preparation, emphasizing improving disease prediction across datasets of varied sizes by addressing imbalanced datasets and optimizing the selection of an effective k value.

Method – The researchers utilized SMOTE and GridSearch to address challenges in the K-Nearest Neighbor Algorithm. SMOTE balanced the datasets to prevent inaccurate representations, while GridSearch improved the k value accuracy, reducing challenges with constant fixed k values. These techniques contributed to the study's overall effectiveness in accurately predicting diseases.

Results – When compared to eight datasets, the improved K-Nearest Neighbor algorithm consistently surpasses the previous approach in terms of accuracy, precision, RMSE, MSE, and t-test evaluation. The findings suggest that the enhanced KNN algorithm outperformed the existing KNN method in terms of prediction. This resulted in improved performance in predicting a wide range of medical problems across eight datasets.

Conclusion – In conclusion, the study effectively aimed to boost the performance of the K-Nearest Neighbor (KNN) algorithm in categorizing medical conditions through enhanced data pre-processing techniques. Ultimately, the study's findings show that the enhanced KNN algorithm is effective in accurately predicting medical disease across a variety of datasets.

Recommendations – The researchers recommend employing high-dimensional datasets to address the 'Dimensionality Curse’ and to further ascertain the significance of this study. The results of this study will help improve medical diagnostics by predicting diseases more accurately.

Research Implications – The outcomes of this study offer improved medical diagnostics through more precise disease prediction, hence improving the effectiveness of the K-Nearest Neighbor (KNN) algorithm in identifying various health conditions.

Practical Implications – Through these enhancements, healthcare practitioners will be able to take action quickly, providing early treatment interventions and individualized treatment approaches, as disease prediction becomes more accurate.

Author Biographies

Madeleine S. Tisang, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Madeleine Tisang is an aspiring programmer who is proactive in learning and utilizing programming languages tailored to specific projects. She also practices designing with the aim of professional growth. She demonstrated the ability to document progress during system analysis. Through their collective efforts and experiences, both authors are committed to continuously seeking knowledge in expanding their expertise along with emerging technologies.

Jaira Venessa C. Obmina, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Jaira Venessa Obmina is a dedicated leader and creative designer committed to personal and professional growth. Together with her partner, she poured their hearts into completing this research study, demonstrating resilience and determination. Driven by a passion for learning, Jaira continually seeks new experiences to enhance her skills and contributions to her field. Her innovative approach reflects her commitment to excellence and adaptability.

Francis Arlando L. Atienza, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Francis Arlando L. Atienza is a professor at Pamantasan ng Lungsod ng Maynila. He is also the adviser who guided his advisee and provided helpful insights and studies that contributed a lot to the study.

Jonathan C. Morano, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Jonathan C. Morano is a Lecturer I at Pamantasan ng Lungsod ng Maynila. He has more than 20 years of experience working and teaching in the field of Information Technology. In this position, Jonathan oversees the thesis process, guaranteeing that students obtain the necessary guidance and resources to complete their research.

Leisyl M. Mahusay, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Leisyl M. Mahusay serves as the Thesis Coordinator at the College of Information Systems and Technology Management, Pamantasan ng Lungsod ng Maynila, Philippines. She supervises the thesis process, guaranteeing that all elements, from proposal to defense, are executed seamlessly and effectively.

Jamillah S. Guialil, Computer Science Department, Pamantasan ng Lungsod ng Maynila, Philippines

Jamillah S. Guialil is a Bachelor of Science in Computer Studies, majoring in Computer Science, graduate in 2018. She is currently pursuing a Master of Information Technology at Pamantasan ng Lungsod ng Maynila. She also teaches as a part-time faculty member at the College of Information Systems and Technology Management at the same university. As an expert in information systems, is instrumental in assessing and offering feedback on student theses to ensure compliance with academic and industry standards.

 

Published
2025-05-11
How to Cite
TISANG, Madeleine S. et al. An Enhancement of K-Nearest Neighbor Algorithm’s Data Pre-Processing for Dataset Classifications in Predicting Multiple Medical Diseases. International Journal of Computing Sciences Research, [S.l.], v. 9, p. 3659-3673, may 2025. ISSN 2546-115X. Available at: <//www.stepacademic.net/ijcsr/article/view/720>. Date accessed: 08 june 2025.
Section
Articles

Most read articles by the same author(s)