1 Hour Guide1 Hour Guide
Remaining:60 min
← Back to Tutorials
šŸ’» Code•60 min•Intermediate•May 2, 2026

1 Hour to Web Scraping with Python

Build a real web scraper in 60 minutes. Extract data from websites, handle pagination, and save to CSV. No prior scraping experience needed.

#python#scraping#automation#data

By the end of this tutorial, you'll have a working web scraper that extracts product data from a real website and saves it to CSV.

šŸŽÆ What You'll Build

A Python script that:

  • Fetches HTML from a website
  • Extracts structured data (titles, prices, ratings)
  • Handles pagination (multiple pages)
  • Saves results to CSV

ā±ļø Time Breakdown

0–10min
Install tools & understand HTML
10–25min
First scrape: extract one item
25–40min
Extract all items on a page
40–55min
Handle pagination
55–60min
Save to CSV & test

šŸ“‹ Prerequisites

Step 1: Install Tools (0–10 min)

Install requests (fetch HTML) and beautifulsoup4 (parse HTML):

pip install requests beautifulsoup4

Test it:

import requests
from bs4 import BeautifulSoup

response = requests.get("https://example.com")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.string)
āœ…

Checkpoint

You should see Example Domain printed. If you get ModuleNotFoundError, re-run pip install.

Step 2: Understand HTML Structure (10–15 min)

We'll scrape books.toscrape.com (a practice site).

Open it in your browser → Right-click a book → Inspect.

You'll see:

<article class="product_pod">
  <h3><a href="..." title="A Light in the Attic">A Light in the ...</a></h3>
  <p class="price_color">Ā£51.77</p>
  <p class="star-rating Three">...</p>
</article>

Key selectors:

  • Book title: article.product_pod h3 a
  • Price: p.price_color
  • Rating: p.star-rating (class name contains rating)

Step 3: Extract One Item (15–25 min)

Create scraper.py:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Find first book
book = soup.find('article', class_='product_pod')

title = book.h3.a['title']
price = book.find('p', class_='price_color').text
rating_class = book.find('p', class_='star-rating')['class'][1]

print(f"Title: {title}")
print(f"Price: {price}")
print(f"Rating: {rating_class}")

Run:

python scraper.py
āœ…

Checkpoint

You should see one book's title, price, and rating (e.g., "Three").

Step 4: Extract All Items (25–40 min)

Loop through all books on the page:

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    rating = book.find('p', class_='star-rating')['class'][1]
    
    print(f"{title} | {price} | {rating}")
āœ…

Checkpoint

You should see 20 books printed (one page has 20 items).

Step 5: Handle Pagination (40–55 min)

The site has a "next" button. Let's scrape multiple pages:

import requests
from bs4 import BeautifulSoup

base_url = "http://books.toscrape.com/catalogue/"
page_url = "page-{}.html"
all_books = []

for page_num in range(1, 4):  # Scrape 3 pages
    if page_num == 1:
        url = "http://books.toscrape.com/"
    else:
        url = base_url + page_url.format(page_num)
    
    print(f"Scraping {url}...")
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    books = soup.find_all('article', class_='product_pod')
    
    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text.strip('Ā£')
        rating = book.find('p', class_='star-rating')['class'][1]
        
        all_books.append({
            'title': title,
            'price': price,
            'rating': rating
        })

print(f"Total books scraped: {len(all_books)}")
āœ…

Checkpoint

You should see Total books scraped: 60 (3 pages Ɨ 20 books).

Step 6: Save to CSV (55–60 min)

import csv

# ... (previous scraping code) ...

# Save to CSV
with open('books.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=['title', 'price', 'rating'])
    writer.writeheader()
    writer.writerows(all_books)

print("Saved to books.csv")

Run:

python scraper.py

Open books.csv in Excel or any text editor. You should see 60 books!

šŸŽ‰ You just built a real web scraper in 60 minutes!

šŸŽ Bonus

Add delays (be polite):

import time

for page_num in range(1, 4):
    # ... scraping code ...
    time.sleep(1)  # Wait 1 second between pages

Handle errors:

try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()
except requests.RequestException as e:
    print(f"Error: {e}")
    continue

Use headers (avoid blocks):

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, headers=headers)

šŸ“š Next Steps

→
1 Hour to Python Basics
Go from zero to writing your first real Python script in 60 minutes.
60 min
→
1 Hour to Docker Basics
Containerize an app in 60 minutes. From installing Docker to running your own container and publishing an image to Docker Hub.
60 min

šŸ”— Resources

Always check a website's robots.txt and Terms of Service before scraping. Respect rate limits and don't overload servers.