Skip to Content

Paperless AI

January 29, 2026 by
Mário Santiago

AI: The Ultimate Guide for Debian

Self-Hosted, Private, and Cloud-Free

Paperless-NGX is the gold standard for the paperless office. However, the standard installation has a weakness: the OCR (text recognition) is "basic," and metadata often needs to be maintained manually. In this guide, we will recreate the "Super-Stack" from TechnoTim's video. We are combining Paperless-NGX with Local AI (Ollama) to intelligently understand documents, automatically name them, and even analyze handwriting or complex layouts via Vision Models.

The best part: No data leaves your server. Everything runs locally.

📋 Prerequisites

Before we start, make sure your hardware is ready. AI models require resources.

  • OS: Debian 11/12 or Ubuntu 22.04/24.04 LTS.

  • CPU: Modern Multi-Core CPU (Intel N100 or better recommended).

  • RAM: At least 16 GB RAM recommended (for the AI stack). 8 GB is the absolute minimum and may lead to crashes.

  • GPU (Optional but recommended): NVIDIA GPU for fast AI responses. Without a GPU, everything runs on the CPU (slower, but functional).

  • Software: Root access or Sudo privileges.

🚀 Step 1: Prepare System & Install Docker

We will update the system and install the Docker Engine.

Bash

# Update system
sudo apt update && sudo apt upgrade -y

# Install necessary packages
sudo apt install -y curl git apt-transport-https ca-certificates gnupg lsb-release

# Add Docker Repository (Official method)
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Test installation
sudo docker compose version

📂 Step 2: Create Folder Structure

We create a clean structure so that data (PDFs, databases) resides safely outside the containers.

Bash

# Create main directory
mkdir -p /opt/paperless-ai-stack
cd /opt/paperless-ai-stack

# Subfolders for data persistence
mkdir -p paperless/data paperless/media paperless/consume paperless/export
mkdir -p postgres/data
mkdir -p ollama/data
mkdir -p open-webui/data
mkdir -p paperless-ai/data
mkdir -p paperless-gpt/config

🛠️ Step 3: The Docker-Compose Stack

This is the core. We combine Paperless-NGX, Tika/Gotenberg (for Office files), Ollama (AI Engine), and the AI bridges paperless-ai and paperless-gpt.

Create the file docker-compose.yml:

Bash

nano docker-compose.yml

Paste the following content (Copy & Paste):

YAML

services:
  # --- CORE: Paperless-NGX ---
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redis_data:/data

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - ./postgres/data:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: ${PAPERLESS_DB_NAME}
      POSTGRES_USER: ${PAPERLESS_DB_USER}
      POSTGRES_PASSWORD: ${PAPERLESS_DB_PASS}

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - ./paperless/data:/usr/src/paperless/data
      - ./paperless/media:/usr/src/paperless/media
      - ./paperless/export:/usr/src/paperless/export
      - ./paperless/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBPASS: ${PAPERLESS_DB_PASS}
      PAPERLESS_DBUSER: ${PAPERLESS_DB_USER}
      PAPERLESS_DBNAME: ${PAPERLESS_DB_NAME}
      PAPERLESS_SECRET_KEY: ${PAPERLESS_SECRET_KEY}
      PAPERLESS_URL: ${PAPERLESS_URL}
      PAPERLESS_TIME_ZONE: "Europe/Berlin"
      PAPERLESS_OCR_LANGUAGE: "deu+eng"
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998
      USERMAP_UID: 1000
      USERMAP_GID: 1000

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-routes=true"

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped

  # --- AI BRAIN: Ollama & WebUI ---
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ./ollama/data:/root/.ollama
    # If you have an NVIDIA GPU, uncomment the following block:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    restart: unless-stopped
    ports:
      - "3001:8080"
    volumes:
      - ./open-webui/data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434

  # --- AI INTEGRATION 1: Paperless-AI (Metadata & Chat) ---
  paperless-ai:
    image: ghcr.io/clusterzx/paperless-ai:latest
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - ./paperless-ai/data:/app/data
    environment:
      - PAPERLESS_API_URL=http://webserver:8000/api
      - PAPERLESS_API_TOKEN=${PAPERLESS_API_TOKEN}
      - AI_PROVIDER=ollama
      - OLLAMA_URL=http://ollama:11434
      - OLLAMA_MODEL=llama3.2:3b

  # --- AI INTEGRATION 2: Paperless-GPT (Vision OCR) ---
  paperless-gpt:
    image: ghcr.io/icereed/paperless-gpt:latest
    restart: unless-stopped
    ports:
      - "3002:8080"
    environment:
      - PAPERLESS_BASE_URL=http://webserver:8000
      - PAPERLESS_API_TOKEN=${PAPERLESS_API_TOKEN}
      - OLLAMA_HOST=http://ollama:11434
      # The Vision Model for better OCR
      - OCR_MODEL_NAME=minicpm-v:8b

volumes:
  redis_data:

🔐 Step 4: Environment Variables (.env)

Create the .env file in the same folder. This is where passwords and tokens are defined.

Bash

nano .env

Insert the following (Modify the values!):

Ini, TOML

# Database & Paperless Configuration
PAPERLESS_DB_USER=paperless
PAPERLESS_DB_NAME=paperless
PAPERLESS_DB_PASS=YourSecureDBPassword123!
# Generate a long random string for this (e.g., with 'openssl rand -hex 32')
PAPERLESS_SECRET_KEY=a1b2c3d4e5f6...

# The URL under which Paperless is accessible (Important for CORS)
PAPERLESS_URL=http://YOUR-SERVER-IP:8000

# API Token (Will be generated in Step 6 and added here later!)
# Leave this empty or set a placeholder for now
PAPERLESS_API_TOKEN=placeholder

▶️ Step 5: First Start & Create User

We start the Core Stack to create the Admin User.

Bash

# Start containers
sudo docker compose up -d

# Wait until everything is running, then create Admin User for Paperless
sudo docker compose exec webserver manage.py createsuperuser

Follow the instructions (Username, Email, Password).

Now log in at: http://YOUR-SERVER-IP:8000

🧠 Step 6: Load AI Models & Generate Token

For the AI to work, we need to load the models into Ollama and connect Paperless with the AI containers.

1. Load Models:

Open the Open-WebUI (http://YOUR-SERVER-IP:3001) or use the CLI. We will use the CLI as it is faster:

Bash

# Small, fast model for Text/Metadata
sudo docker compose exec ollama ollama pull llama3.2:3b

# Vision model for superior OCR (reading images)
sudo docker compose exec ollama ollama pull minicpm-v:8b

2. Generate API Token:

  1. Go to Paperless (Port 8000) -> Settings (Profile top right).

  2. Tab API Tokens.

  3. Create a new token. Copy it.

3. Update .env:

Open your .env file again:

Bash

nano .env

Paste the copied token into PAPERLESS_API_TOKEN.

4. Restart:

Bash

sudo docker compose down && sudo docker compose up -d

⚙️ Step 7: Configuring the AI Tools

A. Paperless-AI (Metadata & Chat)

Accessible at: http://YOUR-SERVER-IP:3000

  1. Log in (initial setup wizard).

  2. Connect it to Paperless (Internal URL: http://webserver:8000, Token from Step 6).

  3. Select "Ollama" as AI Provider and model llama3.2:3b.

  4. Feature: Here you can configure documents to automatically receive tags ("Invoice", "Insurance") or you can "chat" with your documents ("How much did I pay for electricity in 2024?").

B. Paperless-GPT (Vision OCR)

Accessible at: http://YOUR-SERVER-IP:3002

Paperless-GPT is particularly powerful for reading poor scans.

  1. It uses the model minicpm-v:8b.

  2. Setup Workflow: Create a workflow in Paperless-NGX that automatically assigns the tag paperless-gpt-auto upon upload.

  3. Paperless-GPT monitors this tag. As soon as a document has the tag, the AI reads the document visually (instead of just the text layer) and writes the content back into Paperless. The result is often drastically better than standard OCR.

📝 Summary & Conclusion

What we have done:

  1. We set up a Debian-based Docker Host.

  2. We installed Paperless-NGX (Document Management).

  3. We integrated Ollama (Local AI) to ensure data privacy.

  4. We configured Paperless-AI for intelligent metadata (Tags, Correspondents).

  5. We set up Paperless-GPT for Vision-based OCR to make even the most difficult documents readable.

Backup Note:

Regularly back up the /opt/paperless-ai-stack folder. It contains your database, your PDFs, and your configurations.

Good luck with your truly smart, paperless office!

Sende Mail direkt aus Paperless
Mails aus Paperless versenden