AI: The Ultimate Guide for Debian
Self-Hosted, Private, and Cloud-Free
Paperless-NGX is the gold standard for the paperless office. However, the standard installation has a weakness: the OCR (text recognition) is "basic," and metadata often needs to be maintained manually. In this guide, we will recreate the "Super-Stack" from TechnoTim's video. We are combining Paperless-NGX with Local AI (Ollama) to intelligently understand documents, automatically name them, and even analyze handwriting or complex layouts via Vision Models.
The best part: No data leaves your server. Everything runs locally.
📋 Prerequisites
Before we start, make sure your hardware is ready. AI models require resources.
OS: Debian 11/12 or Ubuntu 22.04/24.04 LTS.
CPU: Modern Multi-Core CPU (Intel N100 or better recommended).
RAM: At least 16 GB RAM recommended (for the AI stack). 8 GB is the absolute minimum and may lead to crashes.
GPU (Optional but recommended): NVIDIA GPU for fast AI responses. Without a GPU, everything runs on the CPU (slower, but functional).
Software: Root access or Sudo privileges.
🚀 Step 1: Prepare System & Install Docker
We will update the system and install the Docker Engine.
Bash
# Update system sudo apt update && sudo apt upgrade -y # Install necessary packages sudo apt install -y curl git apt-transport-https ca-certificates gnupg lsb-release # Add Docker Repository (Official method) sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker sudo apt update sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin # Test installation sudo docker compose version
📂 Step 2: Create Folder Structure
We create a clean structure so that data (PDFs, databases) resides safely outside the containers.
Bash
# Create main directory mkdir -p /opt/paperless-ai-stack cd /opt/paperless-ai-stack # Subfolders for data persistence mkdir -p paperless/data paperless/media paperless/consume paperless/export mkdir -p postgres/data mkdir -p ollama/data mkdir -p open-webui/data mkdir -p paperless-ai/data mkdir -p paperless-gpt/config
🛠️ Step 3: The Docker-Compose Stack
This is the core. We combine Paperless-NGX, Tika/Gotenberg (for Office files), Ollama (AI Engine), and the AI bridges paperless-ai and paperless-gpt.
Create the file docker-compose.yml:
Bash
nano docker-compose.yml
Paste the following content (Copy & Paste):
YAML
services:
# --- CORE: Paperless-NGX ---
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- redis_data:/data
db:
image: docker.io/library/postgres:16
restart: unless-stopped
volumes:
- ./postgres/data:/var/lib/postgresql/data
environment:
POSTGRES_DB: ${PAPERLESS_DB_NAME}
POSTGRES_USER: ${PAPERLESS_DB_USER}
POSTGRES_PASSWORD: ${PAPERLESS_DB_PASS}
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- "8000:8000"
volumes:
- ./paperless/data:/usr/src/paperless/data
- ./paperless/media:/usr/src/paperless/media
- ./paperless/export:/usr/src/paperless/export
- ./paperless/consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBPASS: ${PAPERLESS_DB_PASS}
PAPERLESS_DBUSER: ${PAPERLESS_DB_USER}
PAPERLESS_DBNAME: ${PAPERLESS_DB_NAME}
PAPERLESS_SECRET_KEY: ${PAPERLESS_SECRET_KEY}
PAPERLESS_URL: ${PAPERLESS_URL}
PAPERLESS_TIME_ZONE: "Europe/Berlin"
PAPERLESS_OCR_LANGUAGE: "deu+eng"
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
USERMAP_UID: 1000
USERMAP_GID: 1000
gotenberg:
image: docker.io/gotenberg/gotenberg:8.7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: docker.io/apache/tika:latest
restart: unless-stopped
# --- AI BRAIN: Ollama & WebUI ---
ollama:
image: ollama/ollama:latest
restart: unless-stopped
volumes:
- ./ollama/data:/root/.ollama
# If you have an NVIDIA GPU, uncomment the following block:
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:latest
restart: unless-stopped
ports:
- "3001:8080"
volumes:
- ./open-webui/data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
# --- AI INTEGRATION 1: Paperless-AI (Metadata & Chat) ---
paperless-ai:
image: ghcr.io/clusterzx/paperless-ai:latest
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- ./paperless-ai/data:/app/data
environment:
- PAPERLESS_API_URL=http://webserver:8000/api
- PAPERLESS_API_TOKEN=${PAPERLESS_API_TOKEN}
- AI_PROVIDER=ollama
- OLLAMA_URL=http://ollama:11434
- OLLAMA_MODEL=llama3.2:3b
# --- AI INTEGRATION 2: Paperless-GPT (Vision OCR) ---
paperless-gpt:
image: ghcr.io/icereed/paperless-gpt:latest
restart: unless-stopped
ports:
- "3002:8080"
environment:
- PAPERLESS_BASE_URL=http://webserver:8000
- PAPERLESS_API_TOKEN=${PAPERLESS_API_TOKEN}
- OLLAMA_HOST=http://ollama:11434
# The Vision Model for better OCR
- OCR_MODEL_NAME=minicpm-v:8b
volumes:
redis_data:
🔐 Step 4: Environment Variables (.env)
Create the .env file in the same folder. This is where passwords and tokens are defined.
Bash
nano .env
Insert the following (Modify the values!):
Ini, TOML
# Database & Paperless Configuration PAPERLESS_DB_USER=paperless PAPERLESS_DB_NAME=paperless PAPERLESS_DB_PASS=YourSecureDBPassword123! # Generate a long random string for this (e.g., with 'openssl rand -hex 32') PAPERLESS_SECRET_KEY=a1b2c3d4e5f6... # The URL under which Paperless is accessible (Important for CORS) PAPERLESS_URL=http://YOUR-SERVER-IP:8000 # API Token (Will be generated in Step 6 and added here later!) # Leave this empty or set a placeholder for now PAPERLESS_API_TOKEN=placeholder
▶️ Step 5: First Start & Create User
We start the Core Stack to create the Admin User.
Bash
# Start containers sudo docker compose up -d # Wait until everything is running, then create Admin User for Paperless sudo docker compose exec webserver manage.py createsuperuser
Follow the instructions (Username, Email, Password).
Now log in at: http://YOUR-SERVER-IP:8000
🧠 Step 6: Load AI Models & Generate Token
For the AI to work, we need to load the models into Ollama and connect Paperless with the AI containers.
1. Load Models:
Open the Open-WebUI (http://YOUR-SERVER-IP:3001) or use the CLI. We will use the CLI as it is faster:
Bash
# Small, fast model for Text/Metadata sudo docker compose exec ollama ollama pull llama3.2:3b # Vision model for superior OCR (reading images) sudo docker compose exec ollama ollama pull minicpm-v:8b
2. Generate API Token:
Go to Paperless (Port 8000) -> Settings (Profile top right).
Tab API Tokens.
Create a new token. Copy it.
3. Update .env:
Open your .env file again:
Bash
nano .env
Paste the copied token into PAPERLESS_API_TOKEN.
4. Restart:
Bash
sudo docker compose down && sudo docker compose up -d
⚙️ Step 7: Configuring the AI Tools
A. Paperless-AI (Metadata & Chat)
Accessible at: http://YOUR-SERVER-IP:3000
Log in (initial setup wizard).
Connect it to Paperless (Internal URL: http://webserver:8000, Token from Step 6).
Select "Ollama" as AI Provider and model llama3.2:3b.
Feature: Here you can configure documents to automatically receive tags ("Invoice", "Insurance") or you can "chat" with your documents ("How much did I pay for electricity in 2024?").
B. Paperless-GPT (Vision OCR)
Accessible at: http://YOUR-SERVER-IP:3002
Paperless-GPT is particularly powerful for reading poor scans.
It uses the model minicpm-v:8b.
Setup Workflow: Create a workflow in Paperless-NGX that automatically assigns the tag paperless-gpt-auto upon upload.
Paperless-GPT monitors this tag. As soon as a document has the tag, the AI reads the document visually (instead of just the text layer) and writes the content back into Paperless. The result is often drastically better than standard OCR.
📝 Summary & Conclusion
What we have done:
We set up a Debian-based Docker Host.
We installed Paperless-NGX (Document Management).
We integrated Ollama (Local AI) to ensure data privacy.
We configured Paperless-AI for intelligent metadata (Tags, Correspondents).
We set up Paperless-GPT for Vision-based OCR to make even the most difficult documents readable.
Backup Note:
Regularly back up the /opt/paperless-ai-stack folder. It contains your database, your PDFs, and your configurations.
Good luck with your truly smart, paperless office!