🚀 Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System
🤖 Ollama
Ollama is a framework for running large language models (LLMs) locally on your machine. It lets you download, run, and interact with AI models without needing cloud-based APIs.
🔹 Example: ollama run deepseek-r1:1.5b
– Runs DeepSeek R1 locally.
🔹 Why use it? Free, private, fast, and works offline.
🔗 LangChain
LangChain is a Python/JS framework for building AI-powered applications by integrating LLMs with data sources, APIs, and memory.
🔹 Why use it? It helps connect LLMs to real-world applications like chatbots, document processing, and RAG.
📄 RAG (Retrieval-Augmented Generation)
RAG is an AI technique that retrieves external data (e.g., PDFs, databases) and augments the LLM’s response.
🔹 Why use it? Improves accuracy and reduces hallucinations by referencing actual documents.
🔹 Example: AI-powered PDF Q&A system that fetches relevant document content before generating answers.
⚡ DeepSeek R1
DeepSeek R1 is an open-source AI model optimized for reasoning, problem-solving, and factual retrieval.
🔹 Why use it? Strong logical capabilities, great for RAG applications, and can be run locally with Ollama.
🚀 How They Work Together?
- Ollama runs DeepSeek R1 locally.
- LangChain connects the AI model to external data.
- RAG enhances responses by retrieving relevant information.
- DeepSeek R1 generates high-quality answers.
💡 Example Use Case: A Q&A system that allows users to upload a PDF and ask questions about it, powered by DeepSeek R1 + RAG + LangChain on Ollama! 🚀
🎯 Why Run DeepSeek R1 Locally?
Benefit | Cloud-Based Models | Local DeepSeek R1 |
---|---|---|
Privacy | ❌ Data sent to external servers | ✅ 100% Local & Secure |
Speed | ⏳ API latency & network delays | ⚡ Instant inference |
Cost | 💰 Pay per API request | 🆓 Free after setup |
Customization | ❌ Limited fine-tuning | ✅ Full model control |
Deployment | 🌍 Cloud-dependent | 🔥 Works offline & on-premises |
🛠 Step 1: Installing Ollama
🔹 Download Ollama
Ollama is available for macOS, Linux, and Windows. Follow these steps to install it:
1️⃣ Go to the official Ollama download page
🔗 Download Ollama
2️⃣ Select your operating system (macOS, Linux, Windows)
3️⃣ Click on the Download button
4️⃣ Install it following the system-specific instructions
📸 Screenshot:
🛠 Step 2: Running DeepSeek R1 on Ollama
Once Ollama is installed, you can run DeepSeek R1 models.
🔹 Pull the DeepSeek R1 Model
To pull the DeepSeek R1 (1.5B parameter model), run:
ollama pull deepseek-r1:1.5b
This will download and set up the DeepSeek R1 model.
🔹 Running DeepSeek R1
Once the model is downloaded, you can interact with it by running:
ollama run deepseek-r1:1.5b
It will initialize the model and allow you to send queries.
🛠 Step 3: Setting Up a RAG System Using Streamlit
Now that you have DeepSeek R1 running, let's integrate it into a retrieval-augmented generation (RAG) system using Streamlit.
🔹 Prerequisites
Before running the RAG system, make sure you have:
- Python installed
- Conda environment (Recommended for package management)
- Required Python packages
pip install -U langchain langchain-community
pip install streamlit
pip install pdfplumber
pip install semantic-chunkers
pip install open-text-embeddings
pip install faiss
pip install ollama
pip install prompt-template
pip install langchain
pip install langchain_experimental
pip install sentence-transformers
pip install faiss-cpu
For detailed setup, follow this guide:
🔗 Setting Up a Conda Environment for Python Projects
🛠 Step 4: Running the RAG System
🔹 Clone or Create the Project
1️⃣ Create a new project directory
mkdir rag-system && cd rag-system
2️⃣ Create a Python script (app.py
)
Paste the following Streamlit-based script:
import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA
# Streamlit UI
st.title("📄 RAG System with DeepSeek R1 & Ollama")
uploaded_file = st.file_uploader("Upload your PDF file here", type="pdf")
if uploaded_file:
with open("temp.pdf", "wb") as f:
f.write(uploaded_file.getvalue())
loader = PDFPlumberLoader("temp.pdf")
docs = loader.load()
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)
embedder = HuggingFaceEmbeddings()
vector = FAISS.from_documents(documents, embedder)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
llm = Ollama(model="deepseek-r1:1.5b")
prompt = """
Use the following context to answer the question.
Context: {context}
Question: {question}
Answer:"""
QA_PROMPT = PromptTemplate.from_template(prompt)
llm_chain = LLMChain(llm=llm, prompt=QA_PROMPT)
combine_documents_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="context")
qa = RetrievalQA(combine_documents_chain=combine_documents_chain, retriever=retriever)
user_input = st.text_input("Ask a question about your document:")
if user_input:
response = qa(user_input)["result"]
st.write("**Response:**")
st.write(response)
🛠 Step 5: Running the App
Once the script is ready, start your Streamlit app:
streamlit run app.py
CHECK GITHUB REPO FOR COMPLETE CODE
LEARN BASICS HERE
🎯 Final Thoughts
✅ You have successfully set up Ollama and DeepSeek R1!
✅ You can now build AI-powered RAG applications with local LLMs!
✅ Try uploading PDFs and asking questions dynamically.
💡 Want to learn more? Follow my Dev.to blog for more development tutorials! 🚀