“`html

Build a Local RAG Chatbot with Python: A Step-by-Step Tutorial

1. What is RAG and Why Build It Locally?

Understand the core concept: RAG (Retrieval-Augmented Generation) combines a document retrieval system with an LLM to generate accurate, context-aware answers grounded in your own data.
Learn the key benefits of local deployment, including full data privacy (GDPR compliance), zero recurring API costs, and the ability to work offline.
Review the prerequisites: Intermediate Python knowledge, a machine with at least 8GB of RAM, and Python 3.10 or higher installed.

Create and activate a dedicated Python virtual environment to keep dependencies isolated (e.g., <code>python -m venv rag_env).
Install the core libraries: langchain, chromadb, sentence-transformers, and ollama (or llama-cpp-python if you prefer to use a GGUF model file directly).
Pull your local models: a lightweight embedding model (BAAI/bge-small-en via Sentence Transformers) and a chat LLM (llama3.2:3b or mistral:7b via Ollama).

Load documents from a local folder using LangChain's DirectoryLoader and TextLoader (supports PDFs, .txt, .md, and more with additional loaders).
Implement a splitting strategy using RecursiveCharacterTextSplitter with a chunk size of 500 and an overlap of 50 characters to balance context and precision.
Generate embeddings for every chunk using your chosen embedding model and index them into a persistent ChromaDB vector store for fast semantic search.

Design a custom prompt template that forces the LLM to answer strictly based on the retrieved context, with instructions to say “I don't know” if no relevant data is found.
Set up the retriever to perform a similarity search on the ChromaDB collection, fetching the top 3-4 most relevant document chunks for every user query.
Orchestrate the full RAG pipeline using LangChain's RetrievalQA chain (or a custom LCEL chain) to seamlessly connect the retriever, prompt, and local LLM.

Build a simple Python script with a while True loop that prompts the user to type a question and exits gracefully when they type exit or quit.
Pass the user's raw input directly to the RAG chain and print the formatted answer to the console.
Add basic error handling for cases where the vector store is empty, the LLM fails to load, or the query is blank.