1 Hour to LangChain: Build Your First LLM App That Actually Does Something
Build a functional AI research assistant that summarizes web articles and answers questions about them in 60 minutes.
By the end of this hour, you'll have a working AI research assistant that can fetch any web article, summarize it, and answer specific questions about the content using LangChain and OpenAI.
šÆ What You'll Build
A command-line research assistant that takes a URL, extracts the content, creates an intelligent summary, and lets you ask follow-up questions:
$ python research_assistant.py
Enter article URL: https://example.com/article
ā
Article processed: "The Future of AI"
š Summary: This article discusses emerging trends in artificial intelligence...
Ask a question (or 'quit'): What are the main benefits mentioned?
š¤ The article highlights three key benefits: automation of repetitive tasks...
ā±ļø Time Breakdown
š Prerequisites
- Python 3.8+ installed on your machine
- OpenAI API key (free tier works fine)
- Basic familiarity with Python and command line
- Text editor or IDE of your choice
Step 1: Set Up Your LangChain Environment (0ā10 min)
Create a new project directory and install the required packages:
mkdir langchain-assistant
cd langchain-assistant
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install LangChain and dependencies:
pip install langchain openai requests beautifulsoup4 python-dotenv
Create your environment file:
echo "OPENAI_API_KEY=your_api_key_here" > .env
Replace your_api_key_here with your actual OpenAI API key from platform.openai.com.
Checkpoint
Run python -c "import langchain; print('LangChain installed successfully!')" - what happens?
Step 2: Build the Web Scraper (10ā25 min)
Create scraper.py to fetch and clean web content:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
class WebScraper:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (compatible; ResearchBot/1.0)'
})
def scrape_article(self, url):
"""Extract main content from a web article"""
try:
response = self.session.get(url, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Remove unwanted elements
for element in soup(['script', 'style', 'nav', 'footer', 'header']):
element.decompose()
# Try to find main content
content = self._extract_main_content(soup)
title = soup.find('title')
title_text = title.get_text().strip() if title else "Unknown Title"
return {
'title': title_text,
'content': content,
'url': url
}
except Exception as e:
raise Exception(f"Failed to scrape {url}: {str(e)}")
def _extract_main_content(self, soup):
"""Extract the main text content"""
# Common content selectors
selectors = ['article', 'main', '.content', '.post-content', '.entry-content']
for selector in selectors:
content_div = soup.select_one(selector)
if content_div:
return content_div.get_text(separator=' ', strip=True)
# Fallback to body
body = soup.find('body')
return body.get_text(separator=' ', strip=True) if body else ""
Test your scraper:
# test_scraper.py
from scraper import WebScraper
scraper = WebScraper()
article = scraper.scrape_article("https://example.com")
print(f"Title: {article['title']}")
print(f"Content length: {len(article['content'])} characters")
Checkpoint
Test scraping a simple article URL - does it return title and content without errors?
Step 3: Create the Summarization Chain (25ā40 min)
Create summarizer.py to build your first LangChain chain:
import os
from dotenv import load_dotenv
from langchain.llms import OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
load_dotenv()
class ArticleSummarizer:
def __init__(self):
self.llm = OpenAI(
temperature=0.3,
openai_api_key=os.getenv("OPENAI_API_KEY")
)
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=3000,
chunk_overlap=200
)
# Custom prompt for better summaries
self.prompt_template = """
Summarize this article section in a clear, concise way:
{text}
Focus on:
- Main points and key insights
- Important facts and data
- Conclusions or recommendations
Summary:
"""
self.prompt = PromptTemplate(
template=self.prompt_template,
input_variables=["text"]
)
def summarize_article(self, article_data):
"""Create an intelligent summary of the article"""
content = article_data['content']
if len(content) < 100:
raise ValueError("Article content too short to summarize")
# Split text into manageable chunks
texts = self.text_splitter.split_text(content)
docs = [Document(page_content=text) for text in texts]
# Create summarization chain
chain = load_summarize_chain(
self.llm,
chain_type="map_reduce",
map_prompt=self.prompt,
combine_prompt=self.prompt
)
# Generate summary
summary = chain.run(docs)
return {
'title': article_data['title'],
'url': article_data['url'],
'summary': summary.strip(),
'original_length': len(content),
'chunks_processed': len(docs)
}
Test the summarizer:
# test_summary.py
from scraper import WebScraper
from summarizer import ArticleSummarizer
scraper = WebScraper()
summarizer = ArticleSummarizer()
article = scraper.scrape_article("https://example.com/your-test-article")
summary = summarizer.summarize_article(article)
print(f"Original: {summary['original_length']} chars")
print(f"Summary: {summary['summary']}")
Step 4: Add Q&A with Memory (40ā55 min)
Create qa_system.py to handle questions about the article:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.docstore.document import Document
class QASystem:
def __init__(self):
self.llm = OpenAI(temperature=0.1)
self.embeddings = OpenAIEmbeddings()
self.memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
self.qa_chain = None
self.vectorstore = None
def setup_qa_chain(self, article_data):
"""Set up Q&A system for a specific article"""
# Create document chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=100
)
texts = text_splitter.split_text(article_data['content'])
docs = [Document(page_content=text) for text in texts]
# Create vector store for semantic search
self.vectorstore = FAISS.from_documents(docs, self.embeddings)
# Set up conversational chain
self.qa_chain = ConversationalRetrievalChain.from_llm(
self.llm,
retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3}),
memory=self.memory,
return_source_documents=True
)
def ask_question(self, question):
"""Ask a question about the article"""
if not self.qa_chain:
raise ValueError("QA system not initialized. Call setup_qa_chain first.")
result = self.qa_chain({"question": question})
return {
'answer': result['answer'],
'sources': len(result['source_documents']),
'confidence': 'high' if len(result['source_documents']) >= 2 else 'medium'
}
def reset_conversation(self):
"""Clear conversation history"""
self.memory.clear()
Checkpoint
Initialize the QA system with an article and ask "What is this article about?" - do you get a relevant answer?
Step 5: Ship It (55ā60 min)
Create the main application research_assistant.py:
#!/usr/bin/env python3
from scraper import WebScraper
from summarizer import ArticleSummarizer
from qa_system import QASystem
def main():
print("š¬ AI Research Assistant")
print("=" * 40)
# Initialize components
scraper = WebScraper()
summarizer = ArticleSummarizer()
qa_system = QASystem()
try:
# Get article URL
url = input("Enter article URL: ").strip()
print("š„ Fetching article...")
article = scraper.scrape_article(url)
print("š¤ Generating summary...")
summary_result = summarizer.summarize_article(article)
print(f"\nā
Article processed: \"{summary_result['title'][:50]}...\"")
print(f"š Summary ({summary_result['chunks_processed']} sections):")
print(f"{summary_result['summary']}\n")
# Set up Q&A
print("š§ Setting up Q&A system...")
qa_system.setup_qa_chain(article)
# Interactive Q&A loop
print("š¬ Ask questions about the article (type 'quit' to exit):")
while True:
question = input("\nYour question: ").strip()
if question.lower() in ['quit', 'exit', 'q']:
break
if not question:
continue
try:
response = qa_system.ask_question(question)
print(f"š¤ {response['answer']}")
print(f" (Confidence: {response['confidence']}, Sources: {response['sources']})")
except Exception as e:
print(f"ā Error answering question: {e}")
print("\nš Thanks for using AI Research Assistant!")
except KeyboardInterrupt:
print("\nš Goodbye!")
except Exception as e:
print(f"ā Error: {e}")
if __name__ == "__main__":
main()
Make it executable and test:
chmod +x research_assistant.py
python research_assistant.py
š Your AI research assistant is ready! Test it with a news article or blog post to see how it summarizes content and answers your questions.
š Bonus
- Custom extraction: Add support for PDF files using
PyPDF2for research papers - Web interface: Build a simple Flask/Streamlit web UI instead of command-line
- Export feature: Save summaries and Q&A sessions to markdown files for later reference
š Next Steps
š Resources
- LangChain Documentation - Complete framework reference
- OpenAI API Guide - API usage and best practices
- Vector Databases Explained - Understanding embeddings and search
- Prompt Engineering Guide - Writing better prompts for LLMs
- LangChain Cookbook - Real-world examples and patterns