How to Build a Smart Document Q&A Agent with OpenAI and LangChain (Tutorial)

By Theo Grant / June 24, 2026

“`html

Article Outline – AI Tutorial

AI Automation Playbook

Step-by-step workflows for automating content, email, social media, and research with AI agents.

How to Build a Smart Document Q&A Agent with OpenAI and LangChain (Tutorial)

1. Setting Up Your Development Environment

Install Python 3.10+ and create a virtual environment with `venv` or `conda`.
Install required libraries: `openai`, `langchain`, `chromadb`, `pypdf`, and `python-dotenv`.
Set up your OpenAI API key securely using a `.env` file and `load_dotenv()`.

2. Preparing the Document – Chunking and Embedding

Load a PDF or text file using LangChain’s document loaders (e.g., `PyPDFLoader`).
Split the text into overlapping chunks with `RecursiveCharacterTextSplitter` (chunk size ~500, overlap 50).
Generate embeddings with OpenAI’s `text-embedding-3-small` and store them in a Chroma vector database.

3. Building the Retrieval Chain with LangChain

Create a retriever from the vector store that fetches the top‑k most relevant chunks (k=3–5).
Define a prompt template that instructs the LLM to answer strictly from the retrieved context.
Assemble the chain: `RetrievalQA` or a custom `LLMChain` with the retriever and GPT‑4‑turbo.

4. Implementing a Simple CLI Interface

Write a `while True` loop that accepts user queries from the command line.
Pass each query through the retrieval chain and print the generated answer.
Add a “quit” command to exit gracefully and handle empty inputs.

5. Enhancing Accuracy – Prompt Engineering & Source Citation

Modify the prompt to force the model to say “I don’t know” when the answer is not in the context.
Instruct the model to quote exact sentences and return the source page/chunk ID.
Test with ambiguous queries and tweak chunk overlap or retrieval k‑value to improve recall.

6. Deploying as a Web App with Streamlit

Build a minimal Streamlit UI with a text input and a “Ask” button.

<

Featured on

Listed on DevTool.io Listed on SaaSHub

AI Automation Playbook

Step-by-step workflows for automating content, email, social media, and research with AI agents.

No spam. Unsubscribe anytime.