Bare-metal LoadBalancer, ingress controller, and persistent storage for a Talos Kubernetes cluster. The foundation before deploying workloads.
“Why doesn’t LoadBalancer just work?” - Welcome to bare metal, where nothing is free
Problem
Fresh Kubernetes cluster. Runs perfect. Does nothing useful. You try to expose a service with type: LoadBalancer and it stays <pending> forever. You create a PersistentVolumeClaim and it stays unbound. You want to access apps via hostname but there’s no ingress controller.
Cloud providers give you these primitives for free. On bare metal, you build them yourself. It’s not hard, but you have to know the pieces.
Solution
Three Helm charts turn your cluster from “technically running” to “actually useful”:
| Component | What Problem It Solves |
|---|---|
| MetalLB | Makes type: LoadBalancer actually work by assigning real LAN IPs |
| Traefik | Routes app.domain.local to the right pod without 8 different IPs |
| NFS CSI Driver | Lets pods use your NAS for persistent storage (works with Talos) |
I’ve deployed this foundation on four different clusters. Same pattern every time. Install these three, then everything else just works.
Full source: k8s-media-stack (foundation/ directory)
MetalLB: Bare-Metal LoadBalancer
In cloud Kubernetes, you set type: LoadBalancer and get an IP. On bare metal, you get <pending> and confusion. I spent 30 minutes my first time thinking my cluster was broken.
MetalLB makes LoadBalancer work. You give it a pool of IPs from your LAN. It watches for LoadBalancer services and assigns them IPs from that pool. In L2 mode, it responds to ARP requests so any device on your network can reach those IPs.
Think of it as a software load balancer that speaks ARP.
Install
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm upgrade --install metallb metallb/metallb \
-n metallb-system --create-namespace \
--wait --timeout 120s
Wait for the controller to be ready before applying the config:
kubectl rollout status -n metallb-system deploy/metallb-controller --timeout=90s
Configure the IP Pool
Pick a range of IPs outside your DHCP scope. I use the top 10 IPs of my subnet (192.168.2.244-254). Check your router’s DHCP settings to see what’s actually in use.
# foundation/metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: lan-pool
namespace: metallb-system
spec:
addresses:
- <LB_RANGE_START>-<LB_RANGE_END> # e.g. 192.168.2.244-192.168.2.254
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: lan-l2
namespace: metallb-system
spec:
ipAddressPools:
- lan-pool
kubectl apply -f foundation/metallb-config.yaml
Smoke test MetalLB immediately after configuring the IP pool. Deploy a quick nginx service and confirm it gets an external IP:
kubectl create deployment nginx-test --image=nginx
kubectl expose deployment nginx-test --type=LoadBalancer --port=80
kubectl get svc nginx-test # EXTERNAL-IP should appear within seconds
curl http://<EXTERNAL-IP> # Should return the nginx welcome page
kubectl delete deployment nginx-test && kubectl delete svc nginx-test
If the IP stays <pending>, troubleshoot MetalLB before adding Traefik on top.
“I’ll just remember which IP is for what.” - Narrator: They did not remember
Traefik: Ingress Controller
Without an ingress controller, every service needs its own LoadBalancer IP. Got 8 apps? That’s 8 IPs to remember. Worse, you’re accessing everything by IP and port like it’s 2010.
Traefik solves this. It gets one LoadBalancer IP from MetalLB. You point *.media.lan at that IP in DNS. Traefik looks at the hostname in each request and routes to the right pod. One IP, clean hostnames, TLS termination included.
Install
helm repo add traefik https://traefik.github.io/charts
helm repo update
helm upgrade --install traefik traefik/traefik \
-n traefik --create-namespace \
-f foundation/traefik-values.yaml \
--wait --timeout 120s
Helm Values
# foundation/traefik-values.yaml
service:
type: LoadBalancer
annotations:
metallb.universe.tf/loadBalancerIPs: "<TRAEFIK_IP>" # e.g. 192.168.2.244
ports:
web:
exposedPort: 80
websecure:
exposedPort: 443
ingressRoute:
dashboard:
enabled: false # Access via port-forward if needed
providers:
kubernetesIngress:
enabled: true
publishedService:
enabled: true
logs:
general:
level: INFO
The metallb.universe.tf/loadBalancerIPs annotation pins Traefik to a specific IP. Without
it, MetalLB assigns the next available address from the pool.
publishedService updates the status.loadBalancer field on Ingress resources so
kubectl get ingress shows the external IP.
Verify
kubectl get svc -n traefik
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
traefik LoadBalancer 10.96.x.x 192.168.2.244 80:xxxxx/TCP,443:xxxxx/TCP
The EXTERNAL-IP column should show your pinned IP. If it shows <pending>, MetalLB isn’t
running or the IP pool isn’t configured.
How Ingress Routing Works
Once Traefik is running, any Ingress resource with ingressClassName: traefik gets picked
up automatically:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: my-namespace
spec:
ingressClassName: traefik
rules:
- host: my-app.example.lan
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 8080
All traffic to my-app.example.lan hits Traefik’s LoadBalancer IP, and Traefik forwards it
to the my-app service on port 8080. You just need DNS pointing the hostname at Traefik’s IP.
*.media.lan → 192.168.2.244 handles all services behind Traefik at once.Access the Traefik dashboard without exposing it publicly:
kubectl port-forward -n traefik svc/traefik 9000:9000
Then open http://localhost:9000/dashboard/ to see all active routes and entrypoints.
NFS CSI Driver: Persistent Storage
Kubernetes pods are ephemeral. Config files, databases, and media libraries need persistent volumes. On a homelab, NFS is the simplest shared storage backend.
Talos Linux has no package manager, so you can’t install nfs-utils on the host. The NFS CSI
driver solves this by handling NFS mounts inside the CSI pod itself.
Install
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
-n kube-system \
--wait --timeout 120s
Create a StorageClass
The StorageClass tells the CSI driver where to provision volumes. Each PVC gets its own subdirectory under the NFS share:
# foundation/nfs-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-appdata
provisioner: nfs.csi.k8s.io
parameters:
server: "<NAS_IP>" # e.g. 192.168.2.129
share: "<NFS_SHARE_PATH>" # e.g. /volume1/nfs01/k8s-appdata
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
- nfsvers=3
- nolock
kubectl apply -f foundation/nfs-storageclass.yaml
reclaimPolicy: Retain keeps the data on the NAS even if the PVC is deleted. For throwaway
workloads, use Delete.
Using It
Any PVC referencing this StorageClass gets a dynamically provisioned NFS volume:
persistence:
config:
type: persistentVolumeClaim
accessMode: ReadWriteOnce
size: 2Gi
storageClass: nfs-appdata
globalMounts:
- path: /config
Shared Volumes (ReadWriteMany)
For workloads that need to share the same filesystem (e.g., a download client and a media manager both accessing the same files), create a static PV/PVC pair:
apiVersion: v1
kind: PersistentVolume
metadata:
name: shared-data-pv
spec:
capacity:
storage: 10Ti
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: nfs.csi.k8s.io
volumeHandle: shared-data-pv
volumeAttributes:
server: "<NAS_IP>"
share: "<NFS_DATA_PATH>"
mountOptions:
- nfsvers=3
- nolock
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
namespace: my-namespace
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 10Ti
volumeName: shared-data-pv
Multiple pods can mount this PVC simultaneously with full read/write access.
ContainerCreating, check kubectl describe pod for mount errors and verify the
NFS export permissions on your NAS.nolock is simple but has no authentication. For anything beyond a trusted LAN,
use NFS v4 with Kerberos, or dedicated storage systems (Ceph, Longhorn, OpenEBS).Deployment Order
“Dependency management is just graph theory with better error messages.” - Engineer doing incident response at 3am
Order matters. MetalLB must be running before Traefik can get an IP, and the CSI driver must be running before any PVCs can bind:
# 1. Helm repos
helm repo add metallb https://metallb.github.io/metallb
helm repo add traefik https://traefik.github.io/charts
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update
# 2. MetalLB
helm upgrade --install metallb metallb/metallb -n metallb-system --create-namespace --wait
kubectl rollout status -n metallb-system deploy/metallb-controller --timeout=90s
kubectl apply -f foundation/metallb-config.yaml
# 3. NFS CSI
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs -n kube-system --wait
kubectl apply -f foundation/nfs-storageclass.yaml
# 4. Traefik
helm upgrade --install traefik traefik/traefik -n traefik --create-namespace \
-f foundation/traefik-values.yaml --wait
Verify
# MetalLB
kubectl get pods -n metallb-system
kubectl get ipaddresspools -n metallb-system
# NFS CSI
kubectl get pods -n kube-system -l app.kubernetes.io/name=csi-driver-nfs
kubectl get storageclass
# Traefik
kubectl get svc -n traefik
kubectl get pods -n traefik
Common Issues
| Symptom | Cause | Fix |
|---|---|---|
LoadBalancer IP shows <pending> | MetalLB not running or no IP pool | Check MetalLB pods, verify IPAddressPool exists |
| Traefik gets wrong IP | Pool exhausted or no annotation | Pin IP with metallb.universe.tf/loadBalancerIPs |
PVC stuck in Pending | CSI driver not ready or NFS unreachable | Check CSI pods, verify NFS export and network |
Pod stuck in ContainerCreating | NFS mount failed | kubectl describe pod for mount errors, check NAS permissions |
| Ingress returns 404 | Wrong ingressClassName or host mismatch | Verify ingressClassName: traefik and DNS |
What’s Next
The foundation is in place. Next steps:
- Deploy a media stack on top of this infrastructure using Helm and the bjw-s app-template chart
- Migrate to Flux GitOps for automated delivery from Git