# K3S Ollama Setup Guide ## Overview This guide explains how to set up a generic Ollama service on K3S that can handle multiple models, suitable for translation and other AI tasks. ## Current Issues 1. **Model-specific deployments**: Current setup uses model-specific deployments (e.g., `ollama-model-phi`) 2. **Network issues**: Cannot pull images from Docker Hub (DNS/network connectivity) 3. **Image pull failures**: All pods in `ImagePullBackOff` state ## Architecture ### Recommended Setup A **single generic Ollama service** that: - Runs one Ollama server instance - Stores all models in persistent storage - Exposes a single service endpoint - Can load any model on-demand via API ### Components 1. **StatefulSet**: Ollama server with persistent storage for models 2. **Service**: ClusterIP service exposing Ollama API 3. **PersistentVolumeClaim**: Storage for models (100GB+ recommended) 4. **ConfigMap**: Optional configuration ## Step 1: Fix Network/DNS Issues ### Check DNS Resolution ```bash ssh root@10.10.10.10 # Test DNS nslookup registry-1.docker.io ping -c 2 registry-1.docker.io # If DNS fails, check resolv.conf cat /etc/resolv.conf ``` ### Fix DNS (if needed) ```bash # Add reliable DNS servers echo "nameserver 8.8.8.8" >> /etc/resolv.conf echo "nameserver 1.1.1.1" >> /etc/resolv.conf # Or configure systemd-resolved resolvectl status ``` ### Test Docker Hub Access ```bash curl -I https://registry-1.docker.io/v2/ ``` ## Step 2: Create Generic Ollama Deployment ### Namespace ```yaml apiVersion: v1 kind: Namespace metadata: name: ollama ``` ### PersistentVolumeClaim ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ollama-models-pvc namespace: ollama spec: accessModes: - ReadWriteOnce storageClassName: local-path resources: requests: storage: 200Gi # Adjust based on models you want ``` ### Generic Ollama Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: ollama namespace: ollama labels: app: ollama spec: replicas: 1 selector: matchLabels: app: ollama template: metadata: labels: app: ollama spec: containers: - name: ollama image: ollama/ollama:latest imagePullPolicy: IfNotPresent ports: - containerPort: 11434 name: http protocol: TCP env: - name: OLLAMA_HOST value: "0.0.0.0:11434" - name: OLLAMA_KEEP_ALIVE value: "24h" volumeMounts: - name: models-storage mountPath: /root/.ollama resources: requests: memory: "2Gi" cpu: "1" limits: memory: "8Gi" cpu: "4" # Uncomment if you have GPU # nvidia.com/gpu: "1" livenessProbe: httpGet: path: /api/tags port: 11434 initialDelaySeconds: 30 periodSeconds: 30 timeoutSeconds: 5 readinessProbe: httpGet: path: /api/tags port: 11434 initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 volumes: - name: models-storage persistentVolumeClaim: claimName: ollama-models-pvc ``` ### Service ```yaml apiVersion: v1 kind: Service metadata: name: ollama namespace: ollama labels: app: ollama spec: type: ClusterIP ports: - port: 11434 targetPort: 11434 protocol: TCP name: http selector: app: ollama ``` ### Optional: NodePort Service (for external access) ```yaml apiVersion: v1 kind: Service metadata: name: ollama-nodeport namespace: ollama spec: type: NodePort ports: - port: 11434 targetPort: 11434 nodePort: 31134 # Accessible on :31134 protocol: TCP selector: app: ollama ``` ## Step 3: Deploy and Verify ### Apply Manifests ```bash kubectl apply -f ollama-namespace.yaml kubectl apply -f ollama-pvc.yaml kubectl apply -f ollama-deployment.yaml kubectl apply -f ollama-service.yaml ``` ### Check Status ```bash kubectl get pods -n ollama kubectl get svc -n ollama kubectl logs -n ollama deployment/ollama ``` ### Test Service ```bash # From within cluster kubectl run -it --rm curl-test --image=curlimages/curl:latest --restart=Never -- \ curl http://ollama.ollama.svc.cluster.local:11434/api/tags # From node curl http://10.43.x.x:11434/api/tags # Use ClusterIP from service ``` ## Step 4: Download Models ### Recommended Models for Translation 1. **qwen2.5:7b** - Good for multilingual translation (Russian, English, Tatar) 2. **llama3.1:8b** - General purpose, good translation 3. **mistral:7b** - Fast and efficient 4. **phi3:3.8b** - Lightweight option ### Pull Models via API ```bash # Get service ClusterIP OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}') # Pull models curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "qwen2.5:7b"}' curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "llama3.1:8b"}' curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "mistral:7b"}' ``` ### Or Use kubectl exec ```bash kubectl exec -it -n ollama deployment/ollama -- ollama pull qwen2.5:7b kubectl exec -it -n ollama deployment/ollama -- ollama pull llama3.1:8b kubectl exec -it -n ollama deployment/ollama -- ollama pull mistral:7b ``` ### List Downloaded Models ```bash kubectl exec -it -n ollama deployment/ollama -- ollama list ``` ## Step 5: Configure CLI to Use K3S Ollama ### Get Service Endpoint ```bash # Option 1: Use ClusterIP (from within cluster or via port-forward) kubectl port-forward -n ollama svc/ollama 11434:11434 # Option 2: Use NodePort (if NodePort service created) # Access via :31134 # Option 3: Use ClusterIP directly (if running from node) OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}') echo "Ollama URL: http://${OLLAMA_IP}:11434" ``` ### Update CLI Command ```bash # Use port-forward ./cli heritage translate en site --ollama-url http://localhost:11434 # Or use NodePort ./cli heritage translate en site --ollama-url http://10.10.10.10:31134 ``` ## Troubleshooting ### Image Pull Issues ```bash # Check DNS kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup registry-1.docker.io # Check network connectivity kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \ curl -I https://registry-1.docker.io/v2/ # Fix: Configure image pull secrets or use local registry ``` ### Pod Not Starting ```bash # Check events kubectl describe pod -n ollama -l app=ollama # Check logs kubectl logs -n ollama -l app=ollama # Check storage kubectl get pvc -n ollama ``` ### Models Not Persisting - Ensure PVC is properly mounted - Check storage class and available space - Verify volume mount path: `/root/.ollama` ### Performance Issues - Increase resource limits if needed - Consider GPU allocation if available - Adjust `OLLAMA_KEEP_ALIVE` for model caching ## Migration from Model-Specific Setup ### 1. Backup Existing Models (if any) ```bash # If models-store pod was working, backup models kubectl exec -it -n ollama ollama-models-store-0 -- \ tar czf /tmp/models-backup.tar.gz /root/.ollama kubectl cp ollama/ollama-models-store-0:/tmp/models-backup.tar.gz ./models-backup.tar.gz ``` ### 2. Delete Old Resources ```bash kubectl delete deployment -n ollama -l ollama.ayaka.io/type=model kubectl delete service -n ollama -l ollama.ayaka.io/type=model ``` ### 3. Deploy New Generic Setup Follow steps 2-4 above. ## Best Practices 1. **Storage**: Use at least 200GB for multiple models 2. **Resources**: Allocate 4-8GB RAM minimum, more for larger models 3. **GPU**: Enable GPU support if available for better performance 4. **Backup**: Regularly backup `/root/.ollama` directory 5. **Monitoring**: Set up health checks and monitoring 6. **Security**: Use NetworkPolicies to restrict access if needed ## Model Recommendations for Translation | Model | Size | Best For | Speed | |-------|------|----------|-------| | qwen2.5:7b | 4.4GB | Multilingual (RU/EN/TT) | Fast | | llama3.1:8b | 4.6GB | General purpose | Medium | | mistral:7b | 4.1GB | Fast inference | Very Fast | | phi3:3.8b | 2.3GB | Lightweight | Fastest | ## References - [Ollama Official Docs](https://docs.ollama.com/) - [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md) - [K3S Documentation](https://docs.k3s.io/) - [Kubernetes Storage](https://kubernetes.io/docs/concepts/storage/)