1 Hour to Web Scraping with Python

By the end of this tutorial, you'll have a working web scraper that extracts product data from a real website and saves it to CSV.

🎯 What You'll Build

A Python script that:

Fetches HTML from a website
Extracts structured data (titles, prices, ratings)
Handles pagination (multiple pages)
Saves results to CSV

⏱️ Time Breakdown

0–10min

Install tools & understand HTML

10–25min

First scrape: extract one item

25–40min

Extract all items on a page

40–55min

Handle pagination

55–60min

Save to CSV & test

📋 Prerequisites

Python 3.8+ (see 1 Hour to Python Basics if new)
Basic HTML knowledge (tags like <div>, <a>, <span>)

Step 1: Install Tools (0–10 min)

Install requests (fetch HTML) and beautifulsoup4 (parse HTML):

pip install requests beautifulsoup4

Test it:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://example.com")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.string)

✅

Checkpoint

You should see Example Domain printed. If you get ModuleNotFoundError, re-run pip install.

Step 2: Understand HTML Structure (10–15 min)

We'll scrape books.toscrape.com (a practice site).

Open it in your browser → Right-click a book → Inspect.

You'll see:

<article class="product_pod">
  <h3><a href="..." title="A Light in the Attic">A Light in the ...</a></h3>
  <p class="price_color">£51.77</p>
  <p class="star-rating Three">...</p>
</article>

Key selectors:

Book title: article.product_pod h3 a
Price: p.price_color
Rating: p.star-rating (class name contains rating)

Step 3: Extract One Item (15–25 min)

Create scraper.py:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find first book
book = soup.find('article', class_='product_pod')

title = book.h3.a['title']
price = book.find('p', class_='price_color').text
rating_class = book.find('p', class_='star-rating')['class'][1]

print(f"Title: {title}")
print(f"Price: {price}")
print(f"Rating: {rating_class}")

Run:

python scraper.py

✅

Checkpoint

You should see one book's title, price, and rating (e.g., "Three").

Step 4: Extract All Items (25–40 min)

Loop through all books on the page:

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    rating = book.find('p', class_='star-rating')['class'][1]
    
    print(f"{title} | {price} | {rating}")

✅

Checkpoint

You should see 20 books printed (one page has 20 items).

Step 5: Handle Pagination (40–55 min)

The site has a "next" button. Let's scrape multiple pages:

import requests
from bs4 import BeautifulSoup

base_url = "http://books.toscrape.com/catalogue/"
page_url = "page-{}.html"
all_books = []

for page_num in range(1, 4):  # Scrape 3 pages
    if page_num == 1:
        url = "http://books.toscrape.com/"
    else:
        url = base_url + page_url.format(page_num)
    
    print(f"Scraping {url}...")
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    books = soup.find_all('article', class_='product_pod')
    
    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text.strip('£')
        rating = book.find('p', class_='star-rating')['class'][1]
        
        all_books.append({
            'title': title,
            'price': price,
            'rating': rating
        })

print(f"Total books scraped: {len(all_books)}")

✅

Checkpoint

You should see Total books scraped: 60 (3 pages × 20 books).

Step 6: Save to CSV (55–60 min)

import csv

# ... (previous scraping code) ...

# Save to CSV
with open('books.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=['title', 'price', 'rating'])
    writer.writeheader()
    writer.writerows(all_books)

print("Saved to books.csv")

Run:

python scraper.py

Open books.csv in Excel or any text editor. You should see 60 books!

🎉 You just built a real web scraper in 60 minutes!

🎁 Bonus

Add delays (be polite):

import time

for page_num in range(1, 4):
    # ... scraping code ...
    time.sleep(1)  # Wait 1 second between pages

Handle errors:

try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()
except requests.RequestException as e:
    print(f"Error: {e}")
    continue

Use headers (avoid blocks):

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers)

📚 Next Steps

→

1 Hour to Python Basics

Go from zero to writing your first real Python script in 60 minutes.

60 min

→

1 Hour to Docker Basics

Containerize an app in 60 minutes. From installing Docker to running your own container and publishing an image to Docker Hub.

60 min

🔗 Resources

⚠️ Legal Note

Always check a website's robots.txt and Terms of Service before scraping. Respect rate limits and don't overload servers.