mirror of
https://github.com/SamyRai/turash.git
synced 2025-12-26 23:01:33 +00:00
8.4 KiB
8.4 KiB
K3S Ollama Setup Guide
Overview
This guide explains how to set up a generic Ollama service on K3S that can handle multiple models, suitable for translation and other AI tasks.
Current Issues
- Model-specific deployments: Current setup uses model-specific deployments (e.g.,
ollama-model-phi) - Network issues: Cannot pull images from Docker Hub (DNS/network connectivity)
- Image pull failures: All pods in
ImagePullBackOffstate
Architecture
Recommended Setup
A single generic Ollama service that:
- Runs one Ollama server instance
- Stores all models in persistent storage
- Exposes a single service endpoint
- Can load any model on-demand via API
Components
- StatefulSet: Ollama server with persistent storage for models
- Service: ClusterIP service exposing Ollama API
- PersistentVolumeClaim: Storage for models (100GB+ recommended)
- ConfigMap: Optional configuration
Step 1: Fix Network/DNS Issues
Check DNS Resolution
ssh root@10.10.10.10
# Test DNS
nslookup registry-1.docker.io
ping -c 2 registry-1.docker.io
# If DNS fails, check resolv.conf
cat /etc/resolv.conf
Fix DNS (if needed)
# Add reliable DNS servers
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
echo "nameserver 1.1.1.1" >> /etc/resolv.conf
# Or configure systemd-resolved
resolvectl status
Test Docker Hub Access
curl -I https://registry-1.docker.io/v2/
Step 2: Create Generic Ollama Deployment
Namespace
apiVersion: v1
kind: Namespace
metadata:
name: ollama
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ollama-models-pvc
namespace: ollama
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 200Gi # Adjust based on models you want
Generic Ollama Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ollama
labels:
app: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 11434
name: http
protocol: TCP
env:
- name: OLLAMA_HOST
value: "0.0.0.0:11434"
- name: OLLAMA_KEEP_ALIVE
value: "24h"
volumeMounts:
- name: models-storage
mountPath: /root/.ollama
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "8Gi"
cpu: "4"
# Uncomment if you have GPU
# nvidia.com/gpu: "1"
livenessProbe:
httpGet:
path: /api/tags
port: 11434
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /api/tags
port: 11434
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
volumes:
- name: models-storage
persistentVolumeClaim:
claimName: ollama-models-pvc
Service
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ollama
labels:
app: ollama
spec:
type: ClusterIP
ports:
- port: 11434
targetPort: 11434
protocol: TCP
name: http
selector:
app: ollama
Optional: NodePort Service (for external access)
apiVersion: v1
kind: Service
metadata:
name: ollama-nodeport
namespace: ollama
spec:
type: NodePort
ports:
- port: 11434
targetPort: 11434
nodePort: 31134 # Accessible on <node-ip>:31134
protocol: TCP
selector:
app: ollama
Step 3: Deploy and Verify
Apply Manifests
kubectl apply -f ollama-namespace.yaml
kubectl apply -f ollama-pvc.yaml
kubectl apply -f ollama-deployment.yaml
kubectl apply -f ollama-service.yaml
Check Status
kubectl get pods -n ollama
kubectl get svc -n ollama
kubectl logs -n ollama deployment/ollama
Test Service
# From within cluster
kubectl run -it --rm curl-test --image=curlimages/curl:latest --restart=Never -- \
curl http://ollama.ollama.svc.cluster.local:11434/api/tags
# From node
curl http://10.43.x.x:11434/api/tags # Use ClusterIP from service
Step 4: Download Models
Recommended Models for Translation
- qwen2.5:7b - Good for multilingual translation (Russian, English, Tatar)
- llama3.1:8b - General purpose, good translation
- mistral:7b - Fast and efficient
- phi3:3.8b - Lightweight option
Pull Models via API
# Get service ClusterIP
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')
# Pull models
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "qwen2.5:7b"}'
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "llama3.1:8b"}'
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "mistral:7b"}'
Or Use kubectl exec
kubectl exec -it -n ollama deployment/ollama -- ollama pull qwen2.5:7b
kubectl exec -it -n ollama deployment/ollama -- ollama pull llama3.1:8b
kubectl exec -it -n ollama deployment/ollama -- ollama pull mistral:7b
List Downloaded Models
kubectl exec -it -n ollama deployment/ollama -- ollama list
Step 5: Configure CLI to Use K3S Ollama
Get Service Endpoint
# Option 1: Use ClusterIP (from within cluster or via port-forward)
kubectl port-forward -n ollama svc/ollama 11434:11434
# Option 2: Use NodePort (if NodePort service created)
# Access via <node-ip>:31134
# Option 3: Use ClusterIP directly (if running from node)
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')
echo "Ollama URL: http://${OLLAMA_IP}:11434"
Update CLI Command
# Use port-forward
./cli heritage translate en site --ollama-url http://localhost:11434
# Or use NodePort
./cli heritage translate en site --ollama-url http://10.10.10.10:31134
Troubleshooting
Image Pull Issues
# Check DNS
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup registry-1.docker.io
# Check network connectivity
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl -I https://registry-1.docker.io/v2/
# Fix: Configure image pull secrets or use local registry
Pod Not Starting
# Check events
kubectl describe pod -n ollama -l app=ollama
# Check logs
kubectl logs -n ollama -l app=ollama
# Check storage
kubectl get pvc -n ollama
Models Not Persisting
- Ensure PVC is properly mounted
- Check storage class and available space
- Verify volume mount path:
/root/.ollama
Performance Issues
- Increase resource limits if needed
- Consider GPU allocation if available
- Adjust
OLLAMA_KEEP_ALIVEfor model caching
Migration from Model-Specific Setup
1. Backup Existing Models (if any)
# If models-store pod was working, backup models
kubectl exec -it -n ollama ollama-models-store-0 -- \
tar czf /tmp/models-backup.tar.gz /root/.ollama
kubectl cp ollama/ollama-models-store-0:/tmp/models-backup.tar.gz ./models-backup.tar.gz
2. Delete Old Resources
kubectl delete deployment -n ollama -l ollama.ayaka.io/type=model
kubectl delete service -n ollama -l ollama.ayaka.io/type=model
3. Deploy New Generic Setup
Follow steps 2-4 above.
Best Practices
- Storage: Use at least 200GB for multiple models
- Resources: Allocate 4-8GB RAM minimum, more for larger models
- GPU: Enable GPU support if available for better performance
- Backup: Regularly backup
/root/.ollamadirectory - Monitoring: Set up health checks and monitoring
- Security: Use NetworkPolicies to restrict access if needed
Model Recommendations for Translation
| Model | Size | Best For | Speed |
|---|---|---|---|
| qwen2.5:7b | 4.4GB | Multilingual (RU/EN/TT) | Fast |
| llama3.1:8b | 4.6GB | General purpose | Medium |
| mistral:7b | 4.1GB | Fast inference | Very Fast |
| phi3:3.8b | 2.3GB | Lightweight | Fastest |