Day 47: Mosaic Plot in Python
On Day 47 of our Data Visualization series, we explored a powerful chart for analyzing relationships between categorical variables — the Mosaic Plot.
When you want to understand how two (or more) categorical variables interact with each other, Mosaic Plots provide a clear and intuitive visual representation.
Today, we applied it to the classic Iris dataset to examine the relationship between Species and Petal Size category.
🎯 What is a Mosaic Plot?
A Mosaic Plot is a graphical method for visualizing contingency tables (cross-tabulated categorical data).
It represents:
-
Categories as rectangles
-
Width proportional to one variable
-
Height proportional to another variable
-
Area representing frequency or proportion
👉 The larger the rectangle, the higher the frequency of that category combination.
📊 Dataset Used: Iris Dataset
The Iris dataset contains:
-
Sepal Length
-
Sepal Width
-
Petal Length
-
Petal Width
-
Species (Setosa, Versicolor, Virginica)
For this visualization, we:
-
Used Species as one categorical variable
-
Converted Petal Length into 3 categories:
-
Small
-
Medium
-
Large
-
This helps us visually compare petal size distribution across species.
🧑💻 Python Implementation
✅ Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
from sklearn.datasets import load_iris
-
Pandas → Data manipulation
-
Matplotlib → Plot rendering
-
Statsmodels → Mosaic plot function
-
Scikit-learn → Dataset loading
✅ Step 2: Load and Prepare Data
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["Species"] = iris.target_names[iris.target]
We convert numeric species labels into readable category names.
✅ Step 3: Create Petal Size Categories
df["Petal Size"] = pd.cut(
df["petal length (cm)"],
3,
labels=["Small", "Medium", "Large"]
)
Here we divide petal length into three equal bins.
This transforms a continuous variable into a categorical one.
✅ Step 4: Create the Mosaic Plot
plt.figure(figsize=(9, 5))
mosaic(df, ["Species", "Petal Size"], gap=0.02)
plt.title("Mosaic Plot: Species vs Petal Size")
plt.tight_layout()
plt.show()
Key Parameters:
["Species", "Petal Size"] → Defines categorical relationship
gap=0.02 → Adds spacing between tiles
figsize → Controls plot size
📈 What the Visualization Reveals
From the Mosaic Plot:
🌸 Setosa
-
Almost entirely in the Small petal category
-
Very little variation
🌿 Versicolor
-
Mostly in the Medium category
-
Some overlap into Small and Large
🌺 Virginica
-
Dominantly in the Large category
-
Some presence in Medium
🔍 Key Insight
The Mosaic Plot clearly shows that:
-
Petal size is strongly associated with species
-
Species are well separated based on petal length
-
This confirms why petal measurements are highly important features in classification models
Even without machine learning, we can visually detect separation patterns.
💡 Why Use Mosaic Plots?
✔ Excellent for categorical comparisons
✔ Shows proportional relationships clearly
✔ Works well with contingency tables
✔ Helpful in statistical analysis
✔ Easy to interpret once understood
🚀 Real-World Applications
-
Marketing: Customer segment vs product category
-
Healthcare: Disease type vs severity level
-
Education: Grade vs performance category
-
Business: Region vs sales category
-
Survey Analysis
📌 Day 47 Takeaway
Mosaic Plots transform categorical relationships into visual area comparisons.
They help you:
-
Understand category dominance
-
Identify imbalances
-
Discover associations
-
Validate statistical assumptions
.png)

0 Comments:
Post a Comment