MACHINE LEARNING OPTIMIZATION USING CORRELATION FEAUTER SELECTION AND SMOTE-ENN FOR EDUCATOR SENTIMENT

Indonesia

Authors

  • Jupriadi Jupriadi Universitas Bumigora, NTB
  • Anthony Anggarawan Universitas Bumigora, NTB
  • Hairani Hairani Universitas Bumigora, NTB

DOI:

https://doi.org/10.63893/jetcom.v4i3.314

Keywords:

analisis sentimen, tenaga pendidik, SVM, SMOTE, seleksi fitur

Abstract

Abstrak : Tenaga pendidik memiliki peran strategis dalam kemajuan pendidikan nasional, sehingga pemahaman terhadap sentimen mereka penting dalam meningkatkan kesejahteraan dan kualitas layanan pendidikan. Dalam pengolahan data sentimen, seringkali ditemui tantangan seperti ketidakseimbangan data antara sentimen mayoritas dan minoritas, serta tingginya jumlah fitur yang menyebabkan dimensionalitas data menjadi besar.Tujuan penelitian ini adalah menganalisis sentimen attitude tenaga pendidik di Indonesia menggunakan metode klasifikasi Machine Learning, dengan pendekatan seleksi fitur berbasis korelasi dan penyeimbangan data melalui Synthetic Minority Over-sampling Technique–Edited Nearest Neighbours (SMOTE ENN). Model klasifikasi dibangun menggunakan algoritma Naïve Bayes dan Support Vector Machine (SVM).Hasil penelitian menunjukkan bahwa SVM memberikan akurasi lebih tinggi dibandingkan Naïve Bayes, baik pada data asli maupun setelah penerapan SMOTE ENN. Akurasi Naïve Bayes meningkat dari 61% menjadi 89% setelah seleksi fitur berbasis korelasi, sedangkan SVM meningkat dari 69% menjadi 97%. Penelitian ini membuktikan bahwa kombinasi SVM, SMOTE ENN, dan seleksi fitur berbasis korelasi mampu meningkatkan akurasi klasifikasi sentimen tenaga pendidik di Indonesia secara signifikan.

Kata kunci : analisis sentimen, tenaga pendidik, SVM, SMOTE, seleksi fitur

 

Abstract : Educators play a strategic role in the progress of national education, so understanding their sentiments is important for improving their well-being and the quality of educational services. In sentiment data processing, challenges are often encountered, such as data imbalance between majority and minority sentiments, and a high number of features leading to high data dimensionality. The purpose of this study is to analyze the sentiment of Indonesian educators' attitudes using Machine Learning classification methods, with a correlation-based feature selection approach and data balancing through Synthetic Minority Over-sampling Technique–Edited Nearest Neighbours (SMOTE ENN). Classification models were built using the Naïve Bayes and Support Vector Machine (SVM) algorithms. The research results show that SVM provides higher accuracy compared to Naïve Bayes, both on the original data and after applying SMOTE ENN. Naïve Bayes' accuracy increased from 61% to 89% after correlation-based feature selection, while SVM's increased from 69% to 97%. This study proves that the combination of SVM, SMOTE ENN, and correlation-based feature selection can significantly improve the accuracy of sentiment classification for Indonesian educators

Keywords: Sentiment analysis, Educators, SMOTE, Feature selection

References

[1] M. Munir, “The Role Of The Teacher Determines The Quality Of Education In Indonesia,” pp. 1–8, 2021.

A. Iskandar, I. Rusydi, H. Amin, M. N. Hakim, and H. A. Haqq, “Strategic Management in Improving the Quality of Education in Boarding School,” AL-ISHLAH J. Pendidik., vol. 14, no. 4, pp. 7229–7238, Dec. 2022, doi: 10.35445/alishlah.v14i4.2075.

S. Nurhayati and S. Musa, “Teaching With Purpose: Indonesian Educators’ Response to The Challenges of Society 5.0,” Proc. Int. Conf. Res. Educ. Sci., vol. 10, no. 1, pp. 360–372, 2024.

M. Sofi-Karim, A. O. Bali, and K. Rached, “Online education via media platforms and applications as an innovative teaching method,” Educ. Inf. Technol., vol. 28, no. 1, pp. 507–523, Jan. 2023, doi: 10.1007/S10639-022-11188-0/METRICS.

F. Aftab et al., “A Comprehensive Survey on Sentiment Analysis Techniques,” Int. J. Technol., vol. 14, no. 6, pp. 1288–1298, 2023, doi: 10.14716/ijtech.v14i6.6632.

R. Obiedat et al., “Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution,” IEEE Access, vol. 10, pp. 22260–22273, 2022, doi: 10.1109/ACCESS.2022.3149482.

M. Imran, S. Hina, and M. M. Baig, “Analysis of Learner’s Sentiments to Evaluate Sustainability of Online Education System during COVID-19 Pandemic,” Sustain., vol. 14, no. 8, pp. 1–18, 2022, doi: 10.3390/su14084529.

O. I. Gifari, M. Adha, F. Freddy, and F. F. S. Durrand, “Analisis Sentimen Review Film Menggunakan TF-IDF dan Support Vector Machine,” J. Inf. Technol., vol. 2, no. 1, pp. 36–40, 2022, doi: 10.46229/jifotech.v2i1.330.

P. P. Putra, M. K. Anam, A. S. Chan, A. Hadi, N. Hendri, and A. Masnur, “Optimizing Sentiment Analysis on Imbalanced Hotel Review Data Using SMOTE and Ensemble Machine Learning Techniques,” J. Appl. Data Sci., vol. 6, no. 2, pp. 936–951, 2025, doi: 10.47738/jads.v6i2.618.

E. Septiani, T. M. Akhriza, and M. Husni, “Comparison of the Accuracy Between Naive Bayes Classifier and Support Vector Machine Algorithms for Sentiment Analysis in Mobile JKN Application Reviews,” vol. 1, no. 1, pp. 21–32, 2024.

A. H. Luthfi, A. Faqih, and G. Dwilestari, “Enhancing Model Accuracy in Sentiment Analysis of the by . U Application Using Naïve Bayes and SMOTE Techniques,” vol. 4, no. 2, 2025.

D. Andriyani, Ahmad Faqih, and Sandy Eka Permana, “The Effect of SMOTE Application on Support Vector Machine Performance in Sentiment Classification on Imbalanced Datasets,” J. Artif. Intell. Eng. Appl., vol. 4, no. 2, pp. 752–757, 2025, doi: 10.59934/jaiea.v4i2.742.

R. Madhumathi, A. M. Kowshalya, and R. Shruthi, “Assessment of Sentiment Analysis Using Information Gain Based Feature Selection Approach,” Comput. Syst. Sci. Eng., vol. 43, no. 2, pp. 849–860, 2022, doi: 10.32604/csse.2022.023568.

M. Mukherjee and M. Khushi, “Smote-enc: A novel smote-based method to generate synthetic data for nominal and continuous features,” Appl. Syst. Innov., vol. 4, no. 1, pp. 1–15, 2021, doi: 10.3390/asi4010018.

I. K. A. Purnawan, A. D. Wibawa, A. Kurniawati, and M. H. Purnomo, “Optimizing Diabetic Neuropathy Severity Classification Using Electromyography Signals Through Synthetic Oversampling Techniques,” J. Nas. Pendidik. Tek. Inform., vol. 13, no. 3, pp. 681–690, 2024, doi: 10.23887/janapati.v13i3.85675.

M. Muthukrishnan, S. Andavar, and R. S. P. Raj, “A Fused Feature Selection Technique for Enhanced Sentiment Analysis Using Deep Learning,” Brazilian Arch. Biol. Technol., vol. 67, pp. 1–16, 2024, doi: 10.1590/1678-4324-2024240183.

J. Zhou and J. min Ye, “Sentiment analysis in education research: a review of journal publications,” Interact. Learn. Environ., vol. 31, no. 3, pp. 1252–1264, Apr. 2023, doi: 10.1080/10494820.2020.1826985;PAGE:STRING:ARTICLE/CHAPTER.

A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,” Proc. 2019 8th Int. Conf. Syst. Model. Adv. Res. Trends, SMART 2019, pp. 266–270, 2020, doi: 10.1109/SMART46866.2019.9117512.

P. Edastama, A. S. Bist, and A. Prambudi, “Implementation Of Data Mining On Glasses Sales Using The Apriori Algorithm,” Int. J. Cyber IT Serv. Manag., vol. 1, no. 2, pp. 159–172, 2021, doi: 10.34306/ijcitsm.v1i2.46.

A. Sukarno Hatta, “Clustering Pada Data Sentimen Penggunaan Transportasi Online Menggunakan Algoritma Spectral Clustering,” e-Proceeding Eng., vol. 8, no. 6, p. 11945, 2021.

M. Soleimani, A. Intezari, and D. J. Pauleen, “Mitigating cognitive biases in developing ai-assisted recruitment systems: A knowledge-sharing approach,” Int. J. Knowl. Manag., vol. 18, no. 1, pp. 1–18, 2022, doi: 10.4018/IJKM.290022.

B. M. Iqbal, K. M. Lhaksmana, and E. B. Setiawan, “2024 Presidential Election Sentiment Analysis in News Media Using Support Vector Machine,” J. Comput. Syst. Informatics, vol. 4, no. 2, pp. 397–404, 2023, doi: 10.47065/josyc.v4i2.3051.

M. A. Latief, L. R. Nabila, W. Miftakhurrahman, S. Ma’rufatullah, and H. Tantyoko, “Handling Imbalance Data using Hybrid Sampling SMOTE ENN in Lung Cancer Classification,” Int. J. Eng. Comput. Sci. Appl., vol. 3, no. 1, pp. 11–18, 2024, doi: 10.30812/ijecsa.v3i1.3758.

H. Hairani, T. Widiyaningtyas, and D. Dwi Prasetya, “Addressing Class Imbalance of Health Data: a Systematic Literature Review on Modified Synthetic Minority Oversampling Technique (SMOTE) Strategies,” vol. 8, no. September, pp. 1310–1318, 2024.

H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” Int. J. Electr. Comput. Eng., vol. 11, no. 3, pp. 2407–2413, 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.

S. Kumar, J. Thakur, D. Ekka, and I. Sahu, “Web Scraping Using Python,” Int. J. Adv. Eng. Manag., vol. 4, no. 9, p. 235, 2022, doi: 10.35629/5252-0409235237.

N. G. Ramadhan, Adiwijaya, W. Maharani, and A. Akbar Gozali, “Chronic Diseases Prediction Using Machine Learning With Data Preprocessing Handling: A Critical Review,” IEEE Access, vol. 12, no. May, pp. 80698–80730, 2024, doi: 10.1109/ACCESS.2024.3406748.

A. Rahman and M. G. Muktadir, “SPSS: An Imperative Quantitative Data Analysis Tool for Social Science Research,” Int. J. Res. Innov. Soc. Sci., vol. 05, no. 10, pp. 300–302, 2021, doi: 10.47772/ijriss.2021.51012.

V. Çetin and O. Yıldız, “A comprehensive review on data preprocessing techniques in data analysis,” Pamukkale Univ. J. Eng. Sci., vol. 28, no. 2, pp. 299–312, 2022, doi: 10.5505/pajes.2021.62687.

N. Ebrahimiyan, M. Lotfi Ghahroud, S. Bastani, A. Abadi, and F. Jafari, “Tokenization and its application in different countries,” J. FinTech Artif. Intell., vol. 2021, no. 1, pp. 14–019, 2021, doi: 10.47277/JFAI/1(1)019.

M. T. Mohammed and O. F. Rashid, “Document retrieval using term frequency inverse sentence frequency weighting scheme,” Indones. J. Electr. Eng. Comput. Sci., vol. 31, no. 3, pp. 1478–1485, 2023, doi: 10.11591/ijeecs.v31.i3.pp1478-1485.

I. Arroyo-Fernández, C. F. Méndez-Cruz, G. Sierra, J. M. Torres-Moreno, and G. Sidorov, “Unsupervised sentence representations as word information series: Revisiting TF–IDF,” Comput. Speech Lang., vol. 56, pp. 107–129, 2021, doi: 10.1016/j.csl.2019.01.005.

C. A. Nurhaliza Agustina, R. Novita, Mustakim, and N. E. Rozanda, “The Implementation of TF-IDF and Word2Vec on Booster Vaccine Sentiment Analysis Using Support Vector Machine Algorithm,” Procedia Comput. Sci., vol. 234, pp. 156–163, 2024, doi: 10.1016/j.procs.2024.02.162.

C. P. Vandana and A. A. Chikkamannur, “Feature selection: An empirical study,” Int. J. Eng. Trends Technol., vol. 69, no. 2, pp. 165–170, 2021, doi: 10.14445/22315381/IJETT-V69I2P223.

P. Rani, R. Kumar, and A. Jain, “A Hybrid Approach for Feature Selection Based on Correlation Feature Selection and Genetic Algorithm,” Int. J. Softw. Innov., vol. 10, no. 1, pp. 1–17, 2022, doi: 10.4018/IJSI.292028.

H. K. Bhuyan, C. Chakraborty, S. K. Pani, and V. Ravi, “Feature and Subfeature Selection for Classification Using Correlation Coefficient and Fuzzy Model,” IEEE Trans. Eng. Manag., vol. 70, no. 5, pp. 1655–1669, 2023, doi: 10.1109/TEM.2021.3065699.

H. Hairani and D. Priyanto, “A New Approach of Hybrid Sampling SMOTE and ENN to the Accuracy of Machine Learning Methods on Unbalanced Diabetes Disease Data,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 8, pp. 585–590, 2023, doi: 10.14569/IJACSA.2023.0140864.

S. K. Umar, S. Kumari, P. Samui, and D. Kumar, “A Liquefaction Study Using ENN, CA, and Biogeography Optimized-Based ANFIS Technique,” Int. J. Appl. Metaheuristic Comput., vol. 13, no. 1, pp. 1–23, 2021, doi: 10.4018/ijamc.290535.

M. Heydarian, T. E. Doyle, and R. Samavi, “Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

A. M. Elkhatat, K. Elsaid, and S. Almeer, “Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text,” Int. J. Educ. Integr., vol. 19, no. 1, pp. 1–16, 2023, doi: 10.1007/s40979-023-00140-5.

Published

2025-11-24