“`html
AI Automation Playbook
Step-by-step workflows for automating content, email, social media, and research with AI agents.
How to Build a Smart Document Q&A Agent with OpenAI and LangChain (Tutorial)
1. Setting Up Your Development Environment
- Install Python 3.10+ and create a virtual environment with `venv` or `conda`.
- Install required libraries: `openai`, `langchain`, `chromadb`, `pypdf`, and `python-dotenv`.
- Set up your OpenAI API key securely using a `.env` file and `load_dotenv()`.
2. Preparing the Document – Chunking and Embedding
- Load a PDF or text file using LangChain’s document loaders (e.g., `PyPDFLoader`).
- Split the text into overlapping chunks with `RecursiveCharacterTextSplitter` (chunk size ~500, overlap 50).
- Generate embeddings with OpenAI’s `text-embedding-3-small` and store them in a Chroma vector database.
3. Building the Retrieval Chain with LangChain
- Create a retriever from the vector store that fetches the top‑k most relevant chunks (k=3–5).
- Define a prompt template that instructs the LLM to answer strictly from the retrieved context.
- Assemble the chain: `RetrievalQA` or a custom `LLMChain` with the retriever and GPT‑4‑turbo.
4. Implementing a Simple CLI Interface
- Write a `while True` loop that accepts user queries from the command line.
- Pass each query through the retrieval chain and print the generated answer.
- Add a “quit” command to exit gracefully and handle empty inputs.
5. Enhancing Accuracy – Prompt Engineering & Source Citation
- Modify the prompt to force the model to say “I don’t know” when the answer is not in the context.
- Instruct the model to quote exact sentences and return the source page/chunk ID.
- Test with ambiguous queries and tweak chunk overlap or retrieval k‑value to improve recall.
6. Deploying as a Web App with Streamlit
- Build a minimal Streamlit UI with a text input and a “Ask” button.
<


