A Beeswarm Plot (also called a Swarm Plot) is a powerful visualization used to display the distribution of data points across different categories. Unlike a simple scatter plot, a beeswarm plot adjusts the position of points so they don’t overlap, making it easier to see how data is spread within each category.
In this example, we use the Iris dataset to visualize how petal length varies across different flower species.
🔹 Why Use a Beeswarm Plot?
Beeswarm plots are useful when you want to:
-
Show individual data points
-
Understand the distribution of values
-
Compare multiple categories
-
Avoid overlapping points like in regular scatter plots
They are commonly used in data analysis, exploratory data science, and statistical visualization.
📊 Dataset Used
We are using the Iris dataset, one of the most popular datasets in machine learning and statistics.
The dataset contains measurements of iris flowers including:
-
Sepal Length
-
Sepal Width
-
Petal Length
-
Petal Width
-
Species
The three species are:
-
Setosa
-
Versicolor
-
Virginica
In this visualization, we compare petal length across these species.
🧠 Python Code
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import pandas as pd
# Load dataset
iris = load_iris()
# Create dataframe
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["Species"] = iris.target_names[iris.target]
# Create Beeswarm Plot
plt.figure(figsize=(8,5))
sns.swarmplot(data=df, x="Species", y="petal length (cm)")
# Title
plt.title("Beeswarm Plot: Petal Length by Species")
plt.tight_layout()
plt.show()
🔍 Code Explanation
1️⃣ Import Libraries
We import the required libraries:
-
Seaborn → for statistical visualizations
-
Matplotlib → for plotting
-
Scikit-learn → to load the Iris dataset
-
Pandas → for data manipulation
2️⃣ Load the Dataset
iris = load_iris()
This loads the iris dataset from scikit-learn.
3️⃣ Create a DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
We convert the dataset into a pandas DataFrame for easier handling.
Then we add the species column:
df["Species"] = iris.target_names[iris.target]
4️⃣ Create the Beeswarm Plot
sns.swarmplot(data=df, x="Species", y="petal length (cm)")
This line creates the beeswarm plot where:
-
x-axis → flower species
-
y-axis → petal length
-
Each dot represents one observation
The swarm algorithm spreads points horizontally to avoid overlap.
5️⃣ Add Title and Display
plt.title("Beeswarm Plot: Petal Length by Species")
plt.show()
This adds a chart title and displays the plot.
📈 What Insights Can We See?
From the beeswarm plot:
-
Setosa flowers have small petal lengths
-
Versicolor has medium petal lengths
-
Virginica generally has larger petals
The plot clearly shows distinct clusters for each species, which is why the Iris dataset is often used for classification problems in machine learning.
🚀 When Should You Use Beeswarm Plots?
Use beeswarm plots when you want to:
-
Show raw data points
-
Compare distributions across categories
-
Avoid overlapping points
-
Perform exploratory data analysis
They are especially useful in data science, biology, statistics, and machine learning.
🎯 Conclusion
The Beeswarm Plot is a simple yet powerful way to visualize categorical data distributions while preserving individual data points. Using Seaborn in Python, creating this plot becomes quick and effective for exploring patterns within your dataset.
In just a few lines of code, we were able to visualize petal length differences across iris species, revealing clear distinctions between the groups.
.png)

0 Comments:
Post a Comment