Saturday, 7 March 2026

📊 Day 47: Mosaic Plot in Python

 

 Day 47: Mosaic Plot in Python

On Day 47 of our Data Visualization series, we explored a powerful chart for analyzing relationships between categorical variables — the Mosaic Plot.

When you want to understand how two (or more) categorical variables interact with each other, Mosaic Plots provide a clear and intuitive visual representation.

Today, we applied it to the classic Iris dataset to examine the relationship between Species and Petal Size category.


🎯 What is a Mosaic Plot?

A Mosaic Plot is a graphical method for visualizing contingency tables (cross-tabulated categorical data).

It represents:

  • Categories as rectangles

  • Width proportional to one variable

  • Height proportional to another variable

  • Area representing frequency or proportion

👉 The larger the rectangle, the higher the frequency of that category combination.


📊 Dataset Used: Iris Dataset

The Iris dataset contains:

  • Sepal Length

  • Sepal Width

  • Petal Length

  • Petal Width

  • Species (Setosa, Versicolor, Virginica)

For this visualization, we:

  1. Used Species as one categorical variable

  2. Converted Petal Length into 3 categories:

    • Small

    • Medium

    • Large

This helps us visually compare petal size distribution across species.


🧑‍💻 Python Implementation


✅ Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic
from sklearn.datasets import load_iris
  • Pandas → Data manipulation

  • Matplotlib → Plot rendering

  • Statsmodels → Mosaic plot function

  • Scikit-learn → Dataset loading


✅ Step 2: Load and Prepare Data

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["Species"] = iris.target_names[iris.target]

We convert numeric species labels into readable category names.


✅ Step 3: Create Petal Size Categories

df["Petal Size"] = pd.cut(
df["petal length (cm)"],
3,
labels=["Small", "Medium", "Large"]
)

Here we divide petal length into three equal bins.

This transforms a continuous variable into a categorical one.


✅ Step 4: Create the Mosaic Plot

plt.figure(figsize=(9, 5))
mosaic(df, ["Species", "Petal Size"], gap=0.02)
plt.title("Mosaic Plot: Species vs Petal Size")
plt.tight_layout()
plt.show()

Key Parameters:

  • ["Species", "Petal Size"] → Defines categorical relationship

  • gap=0.02 → Adds spacing between tiles

  • figsize → Controls plot size


📈 What the Visualization Reveals

From the Mosaic Plot:

🌸 Setosa

  • Almost entirely in the Small petal category

  • Very little variation

🌿 Versicolor

  • Mostly in the Medium category

  • Some overlap into Small and Large

🌺 Virginica

  • Dominantly in the Large category

  • Some presence in Medium


🔍 Key Insight

The Mosaic Plot clearly shows that:

  • Petal size is strongly associated with species

  • Species are well separated based on petal length

  • This confirms why petal measurements are highly important features in classification models

Even without machine learning, we can visually detect separation patterns.


💡 Why Use Mosaic Plots?

✔ Excellent for categorical comparisons
✔ Shows proportional relationships clearly
✔ Works well with contingency tables
✔ Helpful in statistical analysis
✔ Easy to interpret once understood


🚀 Real-World Applications

  • Marketing: Customer segment vs product category

  • Healthcare: Disease type vs severity level

  • Education: Grade vs performance category

  • Business: Region vs sales category

  • Survey Analysis


📌 Day 47 Takeaway

Mosaic Plots transform categorical relationships into visual area comparisons.

They help you:

  • Understand category dominance

  • Identify imbalances

  • Discover associations

  • Validate statistical assumptions

0 Comments:

Post a Comment

Popular Posts

Categories

100 Python Programs for Beginner (119) AI (215) Android (25) AngularJS (1) Api (7) Assembly Language (2) aws (28) Azure (9) BI (10) Books (262) Bootcamp (1) C (78) C# (12) C++ (83) Course (86) Coursera (300) Cybersecurity (29) data (4) Data Analysis (27) Data Analytics (20) data management (15) Data Science (316) Data Strucures (16) Deep Learning (130) Django (16) Downloads (3) edx (21) Engineering (15) Euron (30) Events (7) Excel (19) Finance (10) flask (3) flutter (1) FPL (17) Generative AI (65) Git (10) Google (50) Hadoop (3) HTML Quiz (1) HTML&CSS (48) IBM (41) IoT (3) IS (25) Java (99) Leet Code (4) Machine Learning (258) Meta (24) MICHIGAN (5) microsoft (11) Nvidia (8) Pandas (13) PHP (20) Projects (32) Python (1263) Python Coding Challenge (1064) Python Mistakes (50) Python Quiz (437) Python Tips (5) Questions (3) R (72) React (7) Scripting (3) security (4) Selenium Webdriver (4) Software (19) SQL (46) Udemy (17) UX Research (1) web application (11) Web development (8) web scraping (3)

Followers

Python Coding for Kids ( Free Demo for Everyone)