Kubernetes Cluster Backups with Velero

Table of Contents

Kubernetes Homelab - This article is part of a series.

Part : Vault + Consul: Enterprise Secret and Config Management for Kubernetes

Part : Deploying Kubernetes on Proxmox with Terraform and Talos

Part : From deploy.sh to Flux: GitOps for a Kubernetes Homelab

Part : This Article

Part : Kubernetes Homelab Infrastructure: MetalLB, Traefik, and NFS

Part : Kubernetes Media Stack: Plex, Sonarr, Radarr, and Friends

Part : Media Stack on Kubernetes with Helm

Part : Monitoring a Kubernetes Media Stack with Grafana

Part : Network Monitoring with Uptime Kuma on Kubernetes

Part : Self-Hosted Password Manager with Vaultwarden on Kubernetes

Cluster-level disaster recovery for Kubernetes. Backup all resources, persistent volumes, and configs. Restore an entire namespace (or the whole cluster) to a new environment in minutes.

“My cluster’s etcd is corrupted. Do I have backups?” - Questions you don’t want to ask at 2am

Why Velero
#

I had per-app backups (Sonarr config, Radarr database) but no cluster-level disaster recovery. One bad Helm upgrade cascaded and broke networking. Spent four hours manually recreating ingress rules, secrets, and PVCs from memory.

Velero gives you:

Full namespace backups - All resources (Deployments, Services, PVCs, Secrets, ConfigMaps)
Persistent volume snapshots - Actual data, not just resource definitions
Scheduled backups - Daily/weekly cron-like automation
Cross-cluster restore - Rebuild on new hardware
Selective recovery - Restore one app or the whole cluster

This complements per-app backups. Velero captures the Kubernetes layer (manifests, volumes). App backups capture internal state (databases, configs).

Architecture
#

┌─────────────────────────────────────────────────────────────┐
│  Velero (velero namespace)                                   │
│       ↓                                                      │
│  Scheduled Backups:                                          │
│   • media-daily (all resources + PVCs)                       │
│   • cluster-weekly (full cluster state)                      │
│       ↓                                                      │
│  Storage:                                                    │
│   • NFS backend (Synology: /volume1/nfs01/velero-backups)  │
│   • Restic for volume snapshots (filesystem-level)          │
│       ↓                                                      │
│  Restore:                                                    │
│   • Same cluster (rollback bad upgrades)                    │
│   • New cluster (disaster recovery)                         │
└─────────────────────────────────────────────────────────────┘

Deployment Repo
#

Full source: k8s-velero-backups on GitHub

k8s-velero-backups/
├── values.yaml              # Velero Helm values
├── backup-schedules/        # CronJob-style backup definitions
│   ├── media-daily.yaml
│   └── cluster-weekly.yaml
├── deploy.sh                # Automated deployment
├── restore.sh               # Interactive restore script
└── verify-backup.sh         # Test backup integrity

Storage Backend
#

Velero supports S3, GCS, Azure Blob, and filesystem targets. For homelab, NFS is simplest.

NFS Setup on Synology
#

SSH to your NAS and create the backup directory:

ssh jlambert@192.168.2.129
sudo mkdir -p /volume1/nfs01/velero-backups
sudo chown -R nobody:nogroup /volume1/nfs01/velero-backups
sudo chmod 755 /volume1/nfs01/velero-backups

Verify NFS export in DSM: Control Panel → Shared Folder → nfs01 → Edit → NFS Permissions

Ensure your K8s subnet (192.168.2.0/24) has read/write access.

Helm Values
#

# values.yaml
image:
  repository: velero/velero
  tag: v1.14.1

initContainers:
  - name: velero-plugin-for-aws
    image: velero/velero-plugin-for-aws:v1.10.1
    volumeMounts:
      - mountPath: /target
        name: plugins

configuration:
  # NFS storage via S3 API (MinIO running on NFS)
  backupStorageLocation:
    - name: default
      provider: aws
      bucket: velero
      config:
        region: minio
        s3ForcePathStyle: "true"
        s3Url: http://minio.velero.svc.cluster.local:9000
        publicUrl: http://minio.velero.svc.cluster.local:9000

  volumeSnapshotLocation:
    - name: default
      provider: aws
      config:
        region: minio

  # Use Restic for filesystem-level PVC backups
  uploaderType: restic
  defaultVolumesToFsBackup: true

  # Namespaces to include in cluster backups
  backupSyncPeriod: 1h
  restoreOnlyMode: false

# MinIO for S3-compatible NFS backend
deployNodeAgent: true

nodeAgent:
  podVolumePath: /var/lib/kubelet/pods
  privileged: false
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi

credentials:
  useSecret: true
  existingSecret: velero-credentials

# Schedules (defined separately as CRDs)
schedules: {}

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

rbac:
  create: true

serviceAccount:
  server:
    create: true

Key decisions:

MinIO as S3 gateway - Wraps NFS in S3 API (Velero’s native interface)
Restic for volumes - Filesystem-level snapshots (doesn’t require CSI snapshot support)
Node agent - Runs DaemonSet to access PVCs for backup

ℹ️ Info

Velero was originally designed for cloud object storage (S3, GCS). For homelab NFS, we run MinIO as a lightweight S3-compatible shim. This keeps Velero’s API clean while storing backups on your NAS.

Deploy
#

1. Install MinIO (S3 Backend)
#

MinIO provides the S3 API that Velero expects, backed by NFS.

Create minio-values.yaml:

mode: standalone

replicas: 1

persistence:
  enabled: true
  storageClass: nfs-appdata
  size: 50Gi

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

service:
  type: ClusterIP
  port: 9000

consoleService:
  enabled: true
  port: 9001

buckets:
  - name: velero
    policy: none
    purge: false

users:
  - accessKey: velero
    secretKey: velero-secret-key
    policy: readwrite

ingress:
  enabled: true
  ingressClassName: traefik
  hosts:
    - minio.media.lan
  tls: []

Deploy:

helm repo add minio https://charts.min.io/
helm repo update

kubectl create namespace velero

helm upgrade --install minio minio/minio \
    -n velero -f minio-values.yaml --wait

Verify:

kubectl get pods -n velero
kubectl get svc -n velero

Access MinIO console: http://minio.media.lan (user: velero, password: velero-secret-key)

2. Create Velero Credentials Secret
#

cat <<EOF > credentials-velero
[default]
aws_access_key_id = velero
aws_secret_access_key = velero-secret-key
EOF

kubectl create secret generic velero-credentials \
    -n velero \
    --from-file=cloud=credentials-velero

rm credentials-velero

3. Install Velero
#

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update

helm upgrade --install velero vmware-tanzu/velero \
    -n velero -f values.yaml --wait

Verify:

kubectl get pods -n velero
kubectl logs -n velero -l app.kubernetes.io/name=velero

You should see: "Backup storage location is valid"

Backup Schedules
#

Create scheduled backups using Velero’s Schedule CRD (like CronJobs for backups).

Daily Media Namespace Backup
#

# backup-schedules/media-daily.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: media-daily
  namespace: velero
spec:
  schedule: "0 2 * * *"  # 2am daily
  template:
    includedNamespaces:
      - media
    includedResources:
      - '*'
    defaultVolumesToFsBackup: true
    storageLocation: default
    ttl: 168h  # Keep 7 days

Apply:

kubectl apply -f backup-schedules/media-daily.yaml

Weekly Full Cluster Backup
#

# backup-schedules/cluster-weekly.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: cluster-weekly
  namespace: velero
spec:
  schedule: "0 3 * * 0"  # 3am Sunday
  template:
    includedNamespaces:
      - '*'
    excludedNamespaces:
      - kube-system
      - kube-public
      - kube-node-lease
    includedResources:
      - '*'
    defaultVolumesToFsBackup: true
    storageLocation: default
    ttl: 720h  # Keep 30 days

Apply:

kubectl apply -f backup-schedules/cluster-weekly.yaml

Verify schedules:

velero schedule get
velero backup get

Manual Backups
#

Trigger an immediate backup:

# Backup entire media namespace
velero backup create media-manual \
    --include-namespaces media \
    --default-volumes-to-fs-backup \
    --wait

# Backup single app (Sonarr)
velero backup create sonarr-manual \
    --include-namespaces media \
    --selector app.kubernetes.io/name=sonarr \
    --default-volumes-to-fs-backup \
    --wait

# Full cluster backup
velero backup create cluster-manual \
    --exclude-namespaces kube-system,kube-public,kube-node-lease \
    --default-volumes-to-fs-backup \
    --wait

Check status:

velero backup describe media-manual
velero backup logs media-manual

Restore
#

Restore Entire Namespace
#

Scenario: Bad Helm upgrade broke the media namespace.

# 1. Delete broken namespace (optional but cleaner)
kubectl delete namespace media

# 2. Restore from latest backup
velero restore create media-restore-$(date +%s) \
    --from-backup media-daily-20260208020000 \
    --wait

# 3. Verify
kubectl get all -n media

Restore Single App
#

Scenario: Sonarr’s database corrupted. Restore just Sonarr.

# 1. Scale down Sonarr
kubectl scale -n media deploy/sonarr --replicas=0

# 2. Restore Sonarr resources
velero restore create sonarr-restore-$(date +%s) \
    --from-backup media-daily-20260208020000 \
    --include-resources deployment,service,ingress,pvc,secret,configmap \
    --selector app.kubernetes.io/name=sonarr \
    --wait

# 3. Verify
kubectl get pods -n media -l app.kubernetes.io/name=sonarr

Disaster Recovery (New Cluster)
#

Scenario: Entire cluster lost (hardware failure, etcd corruption).

Build new cluster - Use k8s-deploy Terraform repo
Install foundation - MetalLB, Traefik, NFS CSI, Velero (same as original)
Point Velero at existing backups:

# MinIO already deployed with same NFS backend
# Velero sees existing backups automatically
velero backup get

Restore cluster state:

velero restore create full-restore-$(date +%s) \
    --from-backup cluster-weekly-20260202030000 \
    --wait

Verify all namespaces:

kubectl get namespaces
kubectl get all -n media
kubectl get all -n monitoring

Backup Verification
#

Don’t trust backups you haven’t tested. The repo includes verify-backup.sh:

#!/bin/bash
set -euo pipefail

BACKUP_NAME="${1:-}"

if [[ -z "$BACKUP_NAME" ]]; then
    echo "Usage: $0 <backup-name>"
    echo ""
    echo "Available backups:"
    velero backup get
    exit 1
fi

echo "Verifying backup: $BACKUP_NAME"

# Check backup completed successfully
STATUS=$(velero backup describe "$BACKUP_NAME" --details | grep -i phase | awk '{print $2}')

if [[ "$STATUS" != "Completed" ]]; then
    echo "❌ Backup status: $STATUS"
    exit 1
fi

echo "✅ Backup status: Completed"

# Check for errors
ERRORS=$(velero backup describe "$BACKUP_NAME" --details | grep -i errors | awk '{print $2}')

if [[ "$ERRORS" != "0" ]]; then
    echo "⚠️  Backup has $ERRORS errors:"
    velero backup logs "$BACKUP_NAME" | grep -i error
    exit 1
fi

echo "✅ No errors"

# Verify volumes backed up
VOLUMES=$(velero backup describe "$BACKUP_NAME" --details | grep -A 20 "Restic Backups" | grep "Completed: " | awk '{print $2}')

echo "✅ Volumes backed up: $VOLUMES"

# Check backup size in MinIO
echo ""
echo "Backup stored in MinIO (velero bucket)"

Run monthly:

./verify-backup.sh media-daily-20260208020000

Troubleshooting
#

Backup Stuck in Progress
#

Symptom: Backup never completes.

velero backup describe <backup-name>

Common causes:

Restic timeout - Large volumes take time. Increase timeout:

# values.yaml
configuration:
  fsBackupTimeout: 4h  # Default 1h

PVC not found - Velero can’t access PVC. Check node agent pods:

kubectl get pods -n velero -l name=node-agent
kubectl logs -n velero -l name=node-agent

Restore Fails with “Already Exists”
#

Symptom: velero restore fails because resources already exist.

Fix: Delete and retry, or use --preserve-nodeports=false for Services.

kubectl delete namespace media
velero restore create media-restore-$(date +%s) --from-backup <backup> --wait

MinIO Connection Refused
#

Symptom: Velero logs show connection refused to MinIO.

Check:

kubectl get svc -n velero minio
kubectl logs -n velero -l app.kubernetes.io/name=velero | grep -i minio

Fix: Verify MinIO service is running and accessible:

kubectl exec -n velero deploy/velero -- wget -O- http://minio.velero.svc.cluster.local:9000

Backup Storage Location Unavailable
#

Symptom: velero backup-location get shows Unavailable.

Check:

velero backup-location describe default

Common causes:

Wrong credentials - Verify velero-credentials secret
MinIO not running - Check kubectl get pods -n velero
Bucket doesn’t exist - Create velero bucket in MinIO console

Resource Usage
#

Tested on 2-worker cluster (2 vCPU, 4 GB RAM per worker):

Velero server: 50 MB RAM, <1% CPU (idle)
Node agent (per node): 100 MB RAM, <5% CPU (during backup)
MinIO: 200 MB RAM, <5% CPU

Backup times:

Media namespace (8 apps, 40 GB PVCs): ~15 minutes
Full cluster (3 namespaces, 60 GB total): ~25 minutes

Storage:

Daily media backups: ~8 GB each (with compression)
7-day retention: ~56 GB
Weekly full cluster: ~15 GB each
30-day retention: ~120 GB total

Provision 200 GB on your NAS for Velero backups.

What I Learned
#

1. Test Restores, Not Just Backups
#

Backups are worthless until proven. I schedule a quarterly “chaos day” where I delete a namespace and restore from backup. Found three issues this way before they mattered:

Velero couldn’t restore ingress due to missing CRDs
PVC restore failed because StorageClass disappeared
Secrets with immutable fields broke updates

Now I fix these proactively.

2. Separate App-Level and Cluster-Level Backups
#

Velero backs up Kubernetes state. Per-app backups back up internal databases. You need both.

Example: Sonarr’s Kubernetes resources (Deployment, Service, PVC) exist, but the SQLite database inside is corrupted. Velero restores the PVC (empty or old data). App-level backup restores the database.

3. MinIO Adds Latency but Simplifies Ops
#

Considered backing up directly to NFS (Velero’s filesystem plugin). MinIO adds a hop, but:

S3 API is Velero’s native interface (less buggy)
MinIO console makes browsing backups easy
Portable - switch to real S3/Backblaze B2 later with zero config change

The 50 MB of extra memory is worth the operational simplicity.

4. Backup Everything, Restore Selectively
#

Full cluster backups sound expensive. They’re cheap (15 GB). Storage is cheaper than reconstruction time.

I back up everything weekly. Restore only what’s needed. Deleted the wrong namespace? Restore it. Entire cluster? Restore everything. Having options reduces stress.

5. Retention Policies Save Disk Space
#

First month, I kept every backup forever. Hit 500 GB. Set TTLs on schedules:

Daily: 7 days
Weekly: 30 days
Monthly: 1 year

Now 200 GB covers everything. Auto-pruning prevents “I’ll clean this up later” debt.

What’s Next
#

You have disaster recovery for your Kubernetes cluster. Restore individual apps or rebuild from scratch.

Optional enhancements:

Off-site backups - Sync MinIO to Backblaze B2 or AWS S3 for geographic redundancy
Pre/post hooks - Quiesce databases before backup (flush writes, snapshot consistency)
Monitoring integration - Alert on failed backups via Uptime Kuma
Immutable backups - Enable S3 object lock to prevent ransomware deletion

The core setup is production-ready. Sleep better knowing you can rebuild in minutes.

References
#

Kubernetes Homelab - This article is part of a series.

Part : Vault + Consul: Enterprise Secret and Config Management for Kubernetes

Part : Deploying Kubernetes on Proxmox with Terraform and Talos

Part : From deploy.sh to Flux: GitOps for a Kubernetes Homelab

Part : This Article

Part : Kubernetes Homelab Infrastructure: MetalLB, Traefik, and NFS

Part : Kubernetes Media Stack: Plex, Sonarr, Radarr, and Friends

Part : Media Stack on Kubernetes with Helm

Part : Monitoring a Kubernetes Media Stack with Grafana

Part : Network Monitoring with Uptime Kuma on Kubernetes

Part : Self-Hosted Password Manager with Vaultwarden on Kubernetes

Why Velero#

Architecture#

Deployment Repo#

Storage Backend#

NFS Setup on Synology#

Helm Values#

Deploy#

1. Install MinIO (S3 Backend)#

2. Create Velero Credentials Secret#

3. Install Velero#

Backup Schedules#

Daily Media Namespace Backup#

Weekly Full Cluster Backup#

Manual Backups#

Restore#

Restore Entire Namespace#

Restore Single App#

Disaster Recovery (New Cluster)#

Backup Verification#

Troubleshooting#

Backup Stuck in Progress#

Restore Fails with “Already Exists”#

MinIO Connection Refused#

Backup Storage Location Unavailable#

Resource Usage#

What I Learned#

1. Test Restores, Not Just Backups#

2. Separate App-Level and Cluster-Level Backups#

3. MinIO Adds Latency but Simplifies Ops#

4. Backup Everything, Restore Selectively#

5. Retention Policies Save Disk Space#

What’s Next#

References#

Why Velero
#

Architecture
#

Deployment Repo
#

Storage Backend
#

NFS Setup on Synology
#

Helm Values
#

Deploy
#

1. Install MinIO (S3 Backend)
#

2. Create Velero Credentials Secret
#

3. Install Velero
#

Backup Schedules
#

Daily Media Namespace Backup
#

Weekly Full Cluster Backup
#

Manual Backups
#

Restore
#

Restore Entire Namespace
#

Restore Single App
#

Disaster Recovery (New Cluster)
#

Backup Verification
#

Troubleshooting
#

Backup Stuck in Progress
#

Restore Fails with “Already Exists”
#

MinIO Connection Refused
#

Backup Storage Location Unavailable
#

Resource Usage
#

What I Learned
#

1. Test Restores, Not Just Backups
#

2. Separate App-Level and Cluster-Level Backups
#

3. MinIO Adds Latency but Simplifies Ops
#

4. Backup Everything, Restore Selectively
#

5. Retention Policies Save Disk Space
#

What’s Next
#

References
#