RESEARCH

For dataset, will use Iris dataset, which is commonly used for classification tasks

24 JANUARY 2025

Mark Sikaundi - Data Scientist and AI Researcher.

Share this post

Why are we using the Iris dataset? The Iris dataset is a classic dataset for classification, machine learning, and data visualization. It is often used in introductory machine learning tutorials and courses. The dataset is small and easy to understand, making it ideal for educational purposes. The Iris dataset contains 150 samples of iris flowers. There are four features (sepal length, sepal width, petal length, and petal width) and one target variable (species). The target variable has three possible values (setosa, versicolor, and virginica). The goal is to predict the species of an iris flower based on the four features.

About the Iris dataset

The Iris dataset is a classic dataset for classification, machine learning, and data visualization. It is often used in introductory machine learning tutorials and courses. The dataset is small and easy to understand, making it ideal for educational purposes. The Iris dataset contains 150 samples of iris flowers. There are four features (sepal length, sepal width, petal length, and petal width) and one target variable (species). The target variable has three possible values (setosa, versicolor, and virginica). The goal is to predict the species of an iris flower based on the four features.

Features of the Iris dataset

The Iris dataset has four features: sepal length, sepal width, petal length, and petal width. These features are measured in centimeters. The sepal is the outer part of the flower that protects the petals. The petal is the inner part of the flower that produces pollen. The sepal length and width are the dimensions of the sepal, while the petal length and width are the dimensions of the petal. The features are used to predict the species of the iris flower.

Who is behind the Irish dataset

The Iris dataset was introduced by the British biologist and statistician Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems". The dataset is named after the iris flower, which is a genus of flowering plants in the family Iridaceae. The dataset is commonly used in machine learning and data science to demonstrate classification algorithms.

Target variable of the Iris dataset

What you need to know about the Iris dataset:

It is a small dataset with 150 samples
It has four features (sepal length, sepal width, petal length, petal width)
It has one target variable (species) with three possible values (setosa, versicolor, virginica)
It is commonly used for classification tasks

Step 1: Import Libraries

First, we need to import the necessary libraries.



          import numpy as np
import pandas as pd
from sklearn.datasets import load_iris # From public datasets imports
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

Step 2: Load the Iris dataset

Next, we need to load the Iris dataset. We can use the load_iris function from the sklearn.datasets module to load the dataset.


# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Train a Logistic Regression model

Now, we'll create an instance of the LogisticRegression class and train it using the training data.


# Create a logistic regression model
model = LogisticRegression(max_iter=200)

# Train the model
model.fit(X_train, y_train)

Step 4: Make predictions

Next, we'll use the trained model to make predictions on the test data.


# Make predictions
y_pred = model.predict(X_test)

Step 5: Evaluate the model

Finally, we'll evaluate the model by calculating the accuracy, confusion matrix, and classification report.


# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Generate confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(conf_matrix)

# Generate classification report
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)
print('Classification Report:')
print(class_report)

Get full source code on:Lupleg Community