mirror of
https://github.com/SamyRai/turash.git
synced 2025-12-26 23:01:33 +00:00
372 lines
8.4 KiB
Markdown
372 lines
8.4 KiB
Markdown
# K3S Ollama Setup Guide
|
|
|
|
## Overview
|
|
|
|
This guide explains how to set up a generic Ollama service on K3S that can handle multiple models, suitable for translation and other AI tasks.
|
|
|
|
## Current Issues
|
|
|
|
1. **Model-specific deployments**: Current setup uses model-specific deployments (e.g., `ollama-model-phi`)
|
|
2. **Network issues**: Cannot pull images from Docker Hub (DNS/network connectivity)
|
|
3. **Image pull failures**: All pods in `ImagePullBackOff` state
|
|
|
|
## Architecture
|
|
|
|
### Recommended Setup
|
|
|
|
A **single generic Ollama service** that:
|
|
- Runs one Ollama server instance
|
|
- Stores all models in persistent storage
|
|
- Exposes a single service endpoint
|
|
- Can load any model on-demand via API
|
|
|
|
### Components
|
|
|
|
1. **StatefulSet**: Ollama server with persistent storage for models
|
|
2. **Service**: ClusterIP service exposing Ollama API
|
|
3. **PersistentVolumeClaim**: Storage for models (100GB+ recommended)
|
|
4. **ConfigMap**: Optional configuration
|
|
|
|
## Step 1: Fix Network/DNS Issues
|
|
|
|
### Check DNS Resolution
|
|
|
|
```bash
|
|
ssh root@10.10.10.10
|
|
# Test DNS
|
|
nslookup registry-1.docker.io
|
|
ping -c 2 registry-1.docker.io
|
|
|
|
# If DNS fails, check resolv.conf
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
### Fix DNS (if needed)
|
|
|
|
```bash
|
|
# Add reliable DNS servers
|
|
echo "nameserver 8.8.8.8" >> /etc/resolv.conf
|
|
echo "nameserver 1.1.1.1" >> /etc/resolv.conf
|
|
|
|
# Or configure systemd-resolved
|
|
resolvectl status
|
|
```
|
|
|
|
### Test Docker Hub Access
|
|
|
|
```bash
|
|
curl -I https://registry-1.docker.io/v2/
|
|
```
|
|
|
|
## Step 2: Create Generic Ollama Deployment
|
|
|
|
### Namespace
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: ollama
|
|
```
|
|
|
|
### PersistentVolumeClaim
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: ollama-models-pvc
|
|
namespace: ollama
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
storageClassName: local-path
|
|
resources:
|
|
requests:
|
|
storage: 200Gi # Adjust based on models you want
|
|
```
|
|
|
|
### Generic Ollama Deployment
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: ollama
|
|
namespace: ollama
|
|
labels:
|
|
app: ollama
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: ollama
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: ollama
|
|
spec:
|
|
containers:
|
|
- name: ollama
|
|
image: ollama/ollama:latest
|
|
imagePullPolicy: IfNotPresent
|
|
ports:
|
|
- containerPort: 11434
|
|
name: http
|
|
protocol: TCP
|
|
env:
|
|
- name: OLLAMA_HOST
|
|
value: "0.0.0.0:11434"
|
|
- name: OLLAMA_KEEP_ALIVE
|
|
value: "24h"
|
|
volumeMounts:
|
|
- name: models-storage
|
|
mountPath: /root/.ollama
|
|
resources:
|
|
requests:
|
|
memory: "2Gi"
|
|
cpu: "1"
|
|
limits:
|
|
memory: "8Gi"
|
|
cpu: "4"
|
|
# Uncomment if you have GPU
|
|
# nvidia.com/gpu: "1"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /api/tags
|
|
port: 11434
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 30
|
|
timeoutSeconds: 5
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /api/tags
|
|
port: 11434
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 10
|
|
timeoutSeconds: 5
|
|
volumes:
|
|
- name: models-storage
|
|
persistentVolumeClaim:
|
|
claimName: ollama-models-pvc
|
|
```
|
|
|
|
### Service
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: ollama
|
|
namespace: ollama
|
|
labels:
|
|
app: ollama
|
|
spec:
|
|
type: ClusterIP
|
|
ports:
|
|
- port: 11434
|
|
targetPort: 11434
|
|
protocol: TCP
|
|
name: http
|
|
selector:
|
|
app: ollama
|
|
```
|
|
|
|
### Optional: NodePort Service (for external access)
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: ollama-nodeport
|
|
namespace: ollama
|
|
spec:
|
|
type: NodePort
|
|
ports:
|
|
- port: 11434
|
|
targetPort: 11434
|
|
nodePort: 31134 # Accessible on <node-ip>:31134
|
|
protocol: TCP
|
|
selector:
|
|
app: ollama
|
|
```
|
|
|
|
## Step 3: Deploy and Verify
|
|
|
|
### Apply Manifests
|
|
|
|
```bash
|
|
kubectl apply -f ollama-namespace.yaml
|
|
kubectl apply -f ollama-pvc.yaml
|
|
kubectl apply -f ollama-deployment.yaml
|
|
kubectl apply -f ollama-service.yaml
|
|
```
|
|
|
|
### Check Status
|
|
|
|
```bash
|
|
kubectl get pods -n ollama
|
|
kubectl get svc -n ollama
|
|
kubectl logs -n ollama deployment/ollama
|
|
```
|
|
|
|
### Test Service
|
|
|
|
```bash
|
|
# From within cluster
|
|
kubectl run -it --rm curl-test --image=curlimages/curl:latest --restart=Never -- \
|
|
curl http://ollama.ollama.svc.cluster.local:11434/api/tags
|
|
|
|
# From node
|
|
curl http://10.43.x.x:11434/api/tags # Use ClusterIP from service
|
|
```
|
|
|
|
## Step 4: Download Models
|
|
|
|
### Recommended Models for Translation
|
|
|
|
1. **qwen2.5:7b** - Good for multilingual translation (Russian, English, Tatar)
|
|
2. **llama3.1:8b** - General purpose, good translation
|
|
3. **mistral:7b** - Fast and efficient
|
|
4. **phi3:3.8b** - Lightweight option
|
|
|
|
### Pull Models via API
|
|
|
|
```bash
|
|
# Get service ClusterIP
|
|
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')
|
|
|
|
# Pull models
|
|
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "qwen2.5:7b"}'
|
|
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "llama3.1:8b"}'
|
|
curl -X POST http://${OLLAMA_IP}:11434/api/pull -d '{"name": "mistral:7b"}'
|
|
```
|
|
|
|
### Or Use kubectl exec
|
|
|
|
```bash
|
|
kubectl exec -it -n ollama deployment/ollama -- ollama pull qwen2.5:7b
|
|
kubectl exec -it -n ollama deployment/ollama -- ollama pull llama3.1:8b
|
|
kubectl exec -it -n ollama deployment/ollama -- ollama pull mistral:7b
|
|
```
|
|
|
|
### List Downloaded Models
|
|
|
|
```bash
|
|
kubectl exec -it -n ollama deployment/ollama -- ollama list
|
|
```
|
|
|
|
## Step 5: Configure CLI to Use K3S Ollama
|
|
|
|
### Get Service Endpoint
|
|
|
|
```bash
|
|
# Option 1: Use ClusterIP (from within cluster or via port-forward)
|
|
kubectl port-forward -n ollama svc/ollama 11434:11434
|
|
|
|
# Option 2: Use NodePort (if NodePort service created)
|
|
# Access via <node-ip>:31134
|
|
|
|
# Option 3: Use ClusterIP directly (if running from node)
|
|
OLLAMA_IP=$(kubectl get svc ollama -n ollama -o jsonpath='{.spec.clusterIP}')
|
|
echo "Ollama URL: http://${OLLAMA_IP}:11434"
|
|
```
|
|
|
|
### Update CLI Command
|
|
|
|
```bash
|
|
# Use port-forward
|
|
./cli heritage translate en site --ollama-url http://localhost:11434
|
|
|
|
# Or use NodePort
|
|
./cli heritage translate en site --ollama-url http://10.10.10.10:31134
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Image Pull Issues
|
|
|
|
```bash
|
|
# Check DNS
|
|
kubectl run -it --rm debug --image=busybox --restart=Never -- nslookup registry-1.docker.io
|
|
|
|
# Check network connectivity
|
|
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
|
|
curl -I https://registry-1.docker.io/v2/
|
|
|
|
# Fix: Configure image pull secrets or use local registry
|
|
```
|
|
|
|
### Pod Not Starting
|
|
|
|
```bash
|
|
# Check events
|
|
kubectl describe pod -n ollama -l app=ollama
|
|
|
|
# Check logs
|
|
kubectl logs -n ollama -l app=ollama
|
|
|
|
# Check storage
|
|
kubectl get pvc -n ollama
|
|
```
|
|
|
|
### Models Not Persisting
|
|
|
|
- Ensure PVC is properly mounted
|
|
- Check storage class and available space
|
|
- Verify volume mount path: `/root/.ollama`
|
|
|
|
### Performance Issues
|
|
|
|
- Increase resource limits if needed
|
|
- Consider GPU allocation if available
|
|
- Adjust `OLLAMA_KEEP_ALIVE` for model caching
|
|
|
|
## Migration from Model-Specific Setup
|
|
|
|
### 1. Backup Existing Models (if any)
|
|
|
|
```bash
|
|
# If models-store pod was working, backup models
|
|
kubectl exec -it -n ollama ollama-models-store-0 -- \
|
|
tar czf /tmp/models-backup.tar.gz /root/.ollama
|
|
kubectl cp ollama/ollama-models-store-0:/tmp/models-backup.tar.gz ./models-backup.tar.gz
|
|
```
|
|
|
|
### 2. Delete Old Resources
|
|
|
|
```bash
|
|
kubectl delete deployment -n ollama -l ollama.ayaka.io/type=model
|
|
kubectl delete service -n ollama -l ollama.ayaka.io/type=model
|
|
```
|
|
|
|
### 3. Deploy New Generic Setup
|
|
|
|
Follow steps 2-4 above.
|
|
|
|
## Best Practices
|
|
|
|
1. **Storage**: Use at least 200GB for multiple models
|
|
2. **Resources**: Allocate 4-8GB RAM minimum, more for larger models
|
|
3. **GPU**: Enable GPU support if available for better performance
|
|
4. **Backup**: Regularly backup `/root/.ollama` directory
|
|
5. **Monitoring**: Set up health checks and monitoring
|
|
6. **Security**: Use NetworkPolicies to restrict access if needed
|
|
|
|
## Model Recommendations for Translation
|
|
|
|
| Model | Size | Best For | Speed |
|
|
|-------|------|----------|-------|
|
|
| qwen2.5:7b | 4.4GB | Multilingual (RU/EN/TT) | Fast |
|
|
| llama3.1:8b | 4.6GB | General purpose | Medium |
|
|
| mistral:7b | 4.1GB | Fast inference | Very Fast |
|
|
| phi3:3.8b | 2.3GB | Lightweight | Fastest |
|
|
|
|
## References
|
|
|
|
- [Ollama Official Docs](https://docs.ollama.com/)
|
|
- [Ollama API Reference](https://github.com/ollama/ollama/blob/main/docs/api.md)
|
|
- [K3S Documentation](https://docs.k3s.io/)
|
|
- [Kubernetes Storage](https://kubernetes.io/docs/concepts/storage/)
|
|
|