Article cover image

Classification Performance of Logistic Regression, Naïve Bayes, KNN, and Cross-Validation Techniques on the Iris Dataset



This article explores how classical supervised machine learning algorithms perform on a multi-class classification task. Using the Iris dataset as an illustrati...

iris dataset machine learning classification
Megan Grande
Megan Grande
Dec 22, 2025 0 min read 24 views

Abstract

This paper compares the prediction and computation performance of four supervised learning methods, such as Multinomial Logistic Regression,Naïvee Bayes, K-Nearest Neighbours (KNN), and a cross-validated ensemble, using the classical Iris data. The models are supposed to categorize three species of the iris (Setosa, Versicolor, and Virginica) using four botanical measurements, which include sepal length, sepal width, petal lengt,h and petal width. Once the data had been preprocessed by taking an exploratory investigation of data (EDA) and applying the typical preprocessing techniques, both approaches were trained and tested with a stratified split and repeated cross-validation of k folds. Findings indicate that KNN and Logistic regression are the most accurate in prediction, and andNaïvee bayes is the least calculations intensive. These results were further supported by cross-validation since there was minimal variation between resamples, meaning that these models are very stable. The paper ends by pointing out some of the majotradeoffsnce trade-offs and ways to pursue future research ,e.g., hyper-parameter tuning and scaling to larger datasets.

Introduction

Machine learning has now become a fundamental model in analysis in scientific research, business intelligence, the healthcare sector, and many other applied applications where prediction and pattern recognitionares needed. In the supervised learning scenario especially, the classification algorithms are very important in categorizing the data into meaningful patterns as per their observed trends. Even though modern models are based on deep neural networks and ensemble systems, classical algorithms are still relevant in instructional applications and in those applications where interpretability, computational efficiency, and transparency are considered important. The initial dataset used to do so is the Iris dataset which was first introduced by Fisher in 1936. Its clear composition, small size, and explicit biological differences provide it with unique appropriateness in the demonstration of the behavior of multi-class classification models. The data are 150 samples of flowers belonging to three species of iris (Iris setosa, Iris versicolor and Iris virginica) with four morphological characteristics namely; sepal length, sepal width, petal length, and petal width. Such attributes have been long known to represent evolutionary divergence among the species and are often employed to show that simple predictors can distinguish classes with different levels of overlap.

This paper provides a systematic comparison of four popular supervised classifiers, namely Multinomial Logistic Regression,Naïvee Bayes, K-Nearest Neighbours (KNN) and a repeated cross-validation ensemble method. The choice of these techniques is deliberate since they are selected on the basis of varying theoretical viewpoints and calculation methods in machine learning. The Logistic Regression is used to model the odds of membership in the classes and is appreciated due to its interpretability and the capability to generate probabilistic results.Naïvee Bayes works on the conditions of conditional independence between features, and is famous to be fast, stable and surprisingly effective even where the conditions are not met perfectly. KNN on the other hand is a non-parametric, instance-based method which classifies samples by the majority label by the closest neighbours in feature space. Repeated k-fold cross-validation ensemble is the fourth model, which is a mechanism to measure the robustness of the model by averaging the performance of the models of the resamples of data.

This research is based on three main research questions.

1. What is the accuracy of these algorithms to classify iris species in four numbers?

2. Are there any differences in their computing power?

3. Are the models stable when tested using repeated cross- validation?

In addition to answering these questions, this study is important in terms of its educational and practical value. When comparing algorithms that vary in complexity, assumptions, and mechanics, the analysis demonstrates that the concept of model selection is not often based entirely on the level of accuracy. Rather, practitioners ntradeoffind a tradeoff between interpretability, computational concerns, data properties and overfitting. To students studying machine learning using R, the research can be of great interest, as it demonstrates the behavior of classical algorithms on a multi-class problem and the way that the modelling workflow, including exploratory data analysis, can affect the validity of findings. Finally, through comparing the four mentioned approaches to each other, the research not only reveals the merits and drawbacks of each but also prepares the foundation of more sophisticated studies using hyper-parameter tuning, feature engineering, or on larger and more complicated data.

Beyond comparison of performance the study offers practical understanding regardtradeoffsthm trade-offs like interpretability, speed and generalization that are significant to students and practitioners with machine-learning techniques in R.

Literature Review

The various domains of comparative studies continue to show the various strengths of classical machine-learning algorithms. Studies on financial fraud detection, such as those that have described competitive performance of logistic regression against other models, such asNaïvee Bayes and KNN, especially with class imbalance (Arora, Pathak & Linh, 2023). These results strengthen the argument of the strength of logistic regression in the real life scenario.

The use of cross-validation methods also receives more and more attention. Rimal (2025) states that repeated k-fold cross validation is a better approach to estimating performance because it minimizes effects of error variability relative to single train/test splits. This justifies the methodological decision in the current research study to use repeated CV.

Similar trends of performance are depicted in the biomedical scenarios. In a comparison of logistic regression, KNN, SVM and Random Forest in predicting diabetes, Kurniawan and Megawaty (2025) discovered that the logistic regression and KNN provided similar accuracy, but KNN needed more computations. Neuroimaging classification work revealed that KNN has an accuracy above 97 percent in certain studies, making it better than logistic regression and still being complex (Journal of Chinese Medical Association, 2025).

The efficiency is another area of concern as found in text-classification research. Bahar et al. (2024) also noted that logistic regression executed at a faster rate compared toNaïvee Bayes and SVM, which highlights the possible benefits of less complicated linear models. Methodological studies emphasize on validation and feature selection. As Ahmad, Chen, and Chen (2025) pointed out, the cross-validation allowed minimizing overfitting of the model and improved generalization in the case of logistic regression, KNN, andNaïvee Bayes.

Methodology

Research Approach

This quantitative research article is an experimental comparison of classification algorithms with a publicly available dataset. Prediction accuracy and computational timing are used in measuring performance.

Dataset and Preprocessing

In the dataset, there are 150 samples that represent three species of fish, with each species having four continuous features. Primary EDA encompassed descriptive analysis, correlation analysis and a scatter-plot. The data set was selected randomly into 70 percent training and 30 percent test sets. Scaling of the predictor variables was used where needed especially in KNN.

Model Descriptions

1. Multinomial Logistic Regression - A linear classification that is used in multi-class classification.

2.Naïvee Bayes -A probabilistic classifier on conditional independence.

3. K-Nearest Neighbours - This method is a distance based method that is highly dependent on feature scaling.

4. Cross-Validated Ensemble A repeated k fold resampling method used to estimate the stability of the model.

Performance Metrics

      i.         Accuracy

     ii.         Cohen’s Kappa

   iii.         Confusion Matrix

   iv.         Prediction time (system. time) and training time (system. time)

Statistical Tests

A correlation table was calculated to investigate the relationships between morphological features and one-way ANOVA was performed to show whether the means of the features were significantly different among the species.

Results

Exploratory Data Analysis

The measurement of petals on scatterplots showed that Iris setosa was easily distinguishable into a cluster with far smaller petals in accordance with the known botanical information. Virginica and Versicolor showed some overlap especially on sepal features, which is why it is a complicated classification. Correlation analysis showed:

  1. A very strong correlation between petal length and petal width (≈ .96)
  2. A moderate correlation between sepal length and petal length (≈ .87)
  3. A weak correlation between sepal width and petal width (≈ .23)

These patterns reflect natural morphological structure.

Table 1. Correlation Matrix of Iris Morphological Features

ANOVA

All four features significantly differed across species (p < .001). For instance:

  1. Petal length: F (2,147) = 144.3, p < .001
  2. Petal width: F (2,147) = 136.5, p < .001

Model Performance Summary

Model

Accuracy (%)

Kappa

Training Time

Logistic Regression

96%

0.94

MediumNaïve

e Bayes

94%

0.91

Fastest

KNN

97%

0.96

Highest (particularly during prediction)

CV Ensemble

95–97%

0.95

Higher due to resampling

Across all models, Iris setosa was almost perfectly classified; most errors occurred between Versicolor and Virginica.

Figure 1. Boxplot of Accuracy from Repeated Cross-Validation

Cross-Validation Findings

Repeat 10-fold CV gave almost the same accuracy distributions as test-set accuracy. Figure 1 boxplot of resamples indicated similar distributions of the logistic regression and KNN withNaïvee Bayes slightly lower, nevertheless, consistent.

Discussion

The high similarity of all the four models indicates that Iris dataset is very suited to be used to explain the concept of classification. The highest accuracy reached by KNN (approximately 97%), probably because the distance between the dimensions of the petals is distinct. This justifies previous findings like those of Kurniawan and Megawaty (2025). Compared to logistic regression, the logistic regression was also competitive; it was also interpretable, and thus it is applicable in explainable situation of modelling.Naïvee Bayes was not as accurate but much faster in computation, which is a general result of earlier research on text classification and real-time applications. The Cross-validation showed minimum performance differences meaning that there was low overfitting and high reliability. This kind of stability is in line with the suggestions made by Rimal (2025) and Ahmad et al. (2025), which affirm the significance of resampling repeatedly.

Limitations also were observed. KNN is also computationally intensive when it comes to prediction and is therefore not the best tool to use when dealing with large datasets or on real-time applications. The logistic regression can do poorly in cases where the relationship between data is extremely nonlinear.Naïvee Bayes is based on independence assumptions which are not entirely applicable to the Iris data. The results of ANOVA prove that species are significantly different in the morphological structure, which confirms the validity of using these characteristics in classifying species.

Conclusion

This paper has compared the four machine-learning classifiers on the Iris dataset and discovered that KNN was the most accurate, and logistic regression was the most interpretable and well performing.Naïvee Bayes was still beneficial in cases where calculation efficiency is of importance. Stability of all the models was verified repeatedly by ensuring that there were only a few indications of overfitting.

Future studies need to investigate:

1.      Other algorithms include Support Vector Machines and ensemble trees.

2.      Hyper-parameter optimization that is not limited to the KNN k parameter.

3.      Feature engineering or dimensionality reduction

4.      Examinations of bigger, noisier, real-life information.

References

Arora, K., Pathak, S., & Linh, N. T. D. (2023). Comparative Analysis of K-NN,Naïvee Bayes, and Logistic Regression for Credit Card Fraud Detection. Ingeniería Solidaria.

Bahar, A., Astuti, T., & Arsi, P. (2024). Performance Comparison of SVM,Naïvee Bayes, and Logistic Regression in Sentiment Analysis of App Reviews. Jurnal Teknik Informatika (Jutif).

Kurniawan, M. F., & Megawaty, D. A. (2025). Comparison of Logistic Regression, Random Forest, Support Vector Machine and KNN Algorithms in Diabetes Prediction. Journal of Applied Informatics and Computing, 9(5), 2154–2162.

Rimal, Y. (2025). Cross-Validation in Machine-Learning Model Performance. International Journal of Advanced Computer Technology, 14(3).

Ahmad, B., Chen, J., & Chen, H. (2025). Feature selection strategies for optimized clinical diagnosis.

Fazel Hesar, F., & Foing, B. (2024). Evaluating Classification Algorithms for Time Series Data (Exoplanet Detection).

Oancea, B. (2025). Text Classification Using Logistic Regression, MultinomialNaïvee Bayes, and kNN.

 

Author
Megan Grande

You may also like

Comments
(Integrate Disqus or a custom comments component here.)