Tuesday, 28 April 2026

April Python Bootcamp Day 17

Python Coding April 28, 2026 Python No comments

Day 17: Web Scraping using Python

What is Web Scraping?

Web scraping is the process of extracting data from websites automatically using code instead of manually copying it.

It helps in:

Data collection
Automation
Building datasets
Market research

Tools Required

1. requests

Used to fetch webpage or API data
Works with HTTP requests

2. BeautifulSoup

Parses HTML content
Helps extract specific elements like headings, links, tables

Data Flow Understanding

HTML Scraping Flow:


Website → HTML → BeautifulSoup → Data

API Data Flow:


API Endpoint → JSON → Python → Data

Sample HTML File (index.html)


<!DOCTYPE html>
<html>
<head>
    <title>Sample Webpage</title>
</head>
<body>

    <h1>Main Heading</h1>
    <h2>Sub Heading</h2>
    <h3>Section Heading</h3>

    <p>This is a paragraph about web scraping.</p>
    <p>Python makes scraping easy using BeautifulSoup.</p>

    <a href="https://www.google.com">Google</a><br>
    <a href="https://www.github.com">GitHub</a>

    <h2>Student Table</h2>
    <table border="1">
        <tr>
            <th>Name</th>
            <th>Age</th>
            <th>City</th>
            <th>Email</th>
        </tr>
        <tr>
            <td>Piyush</td>
            <td>21</td>
            <td>Nagpur</td>
            <td>piyush@example.com</td>
        </tr>
        <tr>
            <td>Rahul</td>
            <td>22</td>
            <td>Pune</td>
            <td>Rahul@gmail.com</td>
        </tr>
    </table>

</body>
</html>

Web Scraping using BeautifulSoup


from bs4 import BeautifulSoup

with open("index.html", "r", encoding="utf-8") as file:
    html_content = file.read()

soup = BeautifulSoup(html_content, "html.parser")

# 1. Title
print(f"Title: {soup.title.text}")

# 2. Headings
for tag in ["h1", "h2", "h3"]:
    for heading in soup.find_all(tag):
        print(f"{tag.upper()}: {heading.text}")

# 3. Paragraphs
for p in soup.find_all("p"):
    print(p.text)

# 4. Links
for a in soup.find_all("a"):
    print(f"Text: {a.get_text()}, URL: {a.get('href')}")

# 5. Table Data
table = soup.find("table")
rows = table.find_all("tr")

for row in rows:
    cols = row.find_all(["td", "th"])
    data = [col.text.strip() for col in cols]
    print(data)

# 6. Extract all text
print(soup.get_text(separator="\n"))

Web Scraping using APIs (JSON Data)


import requests

url = "https://jsonplaceholder.typicode.com/posts"

response = requests.get(url)

if response.status_code == 200:
    data = response.json()

    for post in data[:5]:
        print(f"Title: {post['title']}")
        print(f"Body: {post['body']}")
else:
    print("Error:", response.status_code)

Advanced Example: Fetch API Data and Save to CSV


import requests
import csv

def fetch_api_data(url):
    headers = {
        "User-Agent": "Mozilla/5.0"
    }
    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print("Error:", e)
        return None

url = "https://jsonplaceholder.typicode.com/users"
data = fetch_api_data(url)

if data:
    for user in data:
        print(f"{user['name']} - {user['email']}")

with open("HR.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(['Name', 'Email'])

    for user in data:
        writer.writerow([user['name'], user['email']])

Key Concepts Learned

Difference between HTML scraping and API scraping
How to parse HTML using BeautifulSoup
Extracting headings, paragraphs, links, and tables
Fetching JSON data using requests
Saving extracted data into CSV files

Best Practices for Web Scraping

Always check website permissions (robots.txt)
Avoid sending too many requests (rate limiting)
Use headers like User-Agent
Prefer APIs over HTML scraping when available

Summary

Web scraping is a powerful skill for automating data extraction.
Using BeautifulSoup, you can extract structured data from HTML, while requests helps fetch data from APIs efficiently.

Assignment Questions

Theory-Based

What is web scraping? Explain with an example.
Difference between web scraping and API data fetching.
What is BeautifulSoup used for?
Why is JSON preferred in APIs?
What are headers in HTTP requests and why are they used?

Practical Questions

Extract all:
- Headings (h1, h2, h3)
- Paragraphs
- Links
  from the given HTML file.
Modify the code to:
- Extract only emails from the table.
Scrape:
- Only table data and convert it into a list of dictionaries.
Fetch data from:
```
https://jsonplaceholder.typicode.com/comments
```
- Print name and email of first 10 users.
Save API data into a CSV file with:
- ID, Name, Email

Challenge Tasks

Build a scraper that:
- Extracts all links from a webpage
- Saves them into a text file
Create a script that:
- Scrapes table data
- Converts it into JSON format
Combine both:
- Scrape HTML data
- Store it in CSV
- Also fetch API data and merge both datasets
Add error handling:
- Handle missing tags
- Handle request failures

Tuesday, 28 April 2026

Day 17: Web Scraping using Python

What is Web Scraping?

Tools Required

1. requests

2. BeautifulSoup

Data Flow Understanding

HTML Scraping Flow:

API Data Flow:

Sample HTML File (index.html)

Web Scraping using BeautifulSoup

Web Scraping using APIs (JSON Data)

Advanced Example: Fetch API Data and Save to CSV

Key Concepts Learned

Best Practices for Web Scraping

Summary

Assignment Questions

Theory-Based

Practical Questions

Challenge Tasks

0 Comments:

Post a Comment

Popular Posts

Categories

Followers

Free Courses

Python Coding for Kids ( Free Demo for Everyone)

Join us for Daily Python Discussion

Quiz Questions

Translate

Data Processing Using Python (Free Course)

Courses

Popular Posts

Deep Learning

Free Python Books

365 Days Python Coding Challenge

Cybersecurity for Everyone (Free Course)

Top 10 Python Data Science book

Blog Archive

Popular Posts

Join Us

Free Web Development using Python

Subscribe To

My Blog List

Join us for Daily Discussion