turash/docs/dev_guides/k3s_ollama_setup.md

8.4 KiB

K3S Ollama Setup Guide

Overview

This guide explains how to set up a generic Ollama service on K3S that can handle multiple models, suitable for translation and other AI tasks.

Current Issues

  1. Model-specific deployments: Current setup uses model-specific deployments (e.g., ollama-model-phi)
  2. Network issues: Cannot pull images from Docker Hub (DNS/network connectivity)
  3. Image pull failures: All pods in ImagePullBackOff state

Architecture

A single generic Ollama service that:

  • Runs one Ollama server instance
  • Stores all models in persistent storage
  • Exposes a single service endpoint
  • Can load any model on-demand via API

Components

  1. StatefulSet: Ollama server with persistent storage for models
  2. Service: ClusterIP service exposing Ollama API
  3. PersistentVolumeClaim: Storage for models (100GB+ recommended)
  4. ConfigMap: Optional configuration

Step 1: Fix Network/DNS Issues

Check DNS Resolution

ssh root@10.10.10.10
# Test DNS
nslookup registry-1.docker.io
ping -c 2 registry-1.docker.io

# If DNS fails, check resolv.conf
cat /etc/resolv.conf

Fix DNS (if needed)

# Add reliable DNS servers
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
echo "nameserver 1.1.1.1" >> /etc/resolv.conf

# Or configure systemd-resolved
resolvectl status

Test Docker Hub Access

curl -I https://registry-1.docker.io/v2/

Step 2: Create Generic Ollama Deployment

Namespace

apiVersion: v1
kind: Namespace
metadata:
  name: ollama

PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-models-pvc
  namespace: ollama
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 200Gi  # Adjust based on models you want

Generic Ollama Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ollama
  labels:
    app: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 11434
          name: http
          protocol: TCP
        env:
        - name: OLLAMA_HOST
          value: "0.0.0.0:11434"
        - name: OLLAMA_KEEP_ALIVE
          value: "24h"
        volumeMounts:
        - name: models-storage
          mountPath: /root/.ollama
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "8Gi"
            cpu: "4"
            # Uncomment if you have GPU
            # nvidia.com/gpu: "1"
        livenessProbe:
          httpGet:
            path: /api/tags
            port: 11434
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /api/tags
            port: 11434
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 5
      volumes:
      - name: models-storage
        persistentVolumeClaim:
          claimName: ollama-models-pvc

Service

apiVersion: v1
kind: Service
metadata:
  name: ollama
  namespace: ollama
  labels:
    app: ollama
spec:
  type: ClusterIP
  ports:
  - port: 11434
    targetPort: 11434
    protocol: TCP
    name: http
  selector:
    app: ollama

Optional: NodePort Service (for external access)

apiVersion: v1
kind: Service
metadata:
  name: ollama-nodeport
  namespace: ollama
spec:
  type: NodePort
  ports:
  - port: 11434
    targetPort: 11434
    nodePort: 31134  # Accessible on <node-ip>:31134
    protocol: TCP
  selector:
    app: ollama

Step 3: Deploy and Verify

Apply Manifests

kubectl apply -f ollama-namespace.yaml
kubectl apply -f ollama-pvc.yaml
kubectl apply -f ollama-deployment.yaml
kubectl apply -f ollama-service.yaml

Check Status

kubectl get pods -n ollama
kubectl get svc -n ollama
kubectl logs -n ollama deployment/ollama

Test Service

# From within cluster
kubectl run -it --rm curl-test --image=curlimages/curl:latest --restart=Never -- \
  curl http://ollama.ollama.svc.cluster.local:11434/api/tags

# From node
curl http://10.43.x.x:11434/api/tags  # Use ClusterIP from service

Step 4: Download Models

  1. qwen2.5:7b - Good for multilingual translation (Russian, English, Tatar)
  2. llama3.1:8b - General purpose, good translation
  3. mistral:7b - Fast and efficient
  4. phi3:3.8b - Lightweight option

Pull Models via API

# Get service ClusterIP
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')

# Pull models
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "qwen2.5:7b"}'
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "llama3.1:8b"}'
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "mistral:7b"}'

Or Use kubectl exec

kubectl exec -it -n ollama deployment/ollama -- ollama pull qwen2.5:7b
kubectl exec -it -n ollama deployment/ollama -- ollama pull llama3.1:8b
kubectl exec -it -n ollama deployment/ollama -- ollama pull mistral:7b

List Downloaded Models

kubectl exec -it -n ollama deployment/ollama -- ollama list

Step 5: Configure CLI to Use K3S Ollama

Get Service Endpoint

# Option 1: Use ClusterIP (from within cluster or via port-forward)
kubectl port-forward -n ollama svc/ollama 11434:11434

# Option 2: Use NodePort (if NodePort service created)
# Access via <node-ip>:31134

# Option 3: Use ClusterIP directly (if running from node)
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')
echo "Ollama URL: http://${OLLAMA_IP}:11434"

Update CLI Command

# Use port-forward
./cli heritage translate en site --ollama-url http://localhost:11434

# Or use NodePort
./cli heritage translate en site --ollama-url http://10.10.10.10:31134

Troubleshooting

Image Pull Issues

# Check DNS
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup registry-1.docker.io

# Check network connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl -I https://registry-1.docker.io/v2/

# Fix: Configure image pull secrets or use local registry

Pod Not Starting

# Check events
kubectl describe pod -n ollama -l app=ollama

# Check logs
kubectl logs -n ollama -l app=ollama

# Check storage
kubectl get pvc -n ollama

Models Not Persisting

  • Ensure PVC is properly mounted
  • Check storage class and available space
  • Verify volume mount path: /root/.ollama

Performance Issues

  • Increase resource limits if needed
  • Consider GPU allocation if available
  • Adjust OLLAMA_KEEP_ALIVE for model caching

Migration from Model-Specific Setup

1. Backup Existing Models (if any)

# If models-store pod was working, backup models
kubectl exec -it -n ollama ollama-models-store-0 -- \
  tar czf /tmp/models-backup.tar.gz /root/.ollama
kubectl cp ollama/ollama-models-store-0:/tmp/models-backup.tar.gz ./models-backup.tar.gz

2. Delete Old Resources

kubectl delete deployment -n ollama -l ollama.ayaka.io/type=model
kubectl delete service -n ollama -l ollama.ayaka.io/type=model

3. Deploy New Generic Setup

Follow steps 2-4 above.

Best Practices

  1. Storage: Use at least 200GB for multiple models
  2. Resources: Allocate 4-8GB RAM minimum, more for larger models
  3. GPU: Enable GPU support if available for better performance
  4. Backup: Regularly backup /root/.ollama directory
  5. Monitoring: Set up health checks and monitoring
  6. Security: Use NetworkPolicies to restrict access if needed

Model Recommendations for Translation

Model Size Best For Speed
qwen2.5:7b 4.4GB Multilingual (RU/EN/TT) Fast
llama3.1:8b 4.6GB General purpose Medium
mistral:7b 4.1GB Fast inference Very Fast
phi3:3.8b 2.3GB Lightweight Fastest

References