Kubernetes Homelab Infrastructure: MetalLB, Traefik, and NFS

Bare-metal LoadBalancer, ingress controller, and persistent storage for a Talos Kubernetes cluster. The foundation before deploying workloads.

“Why doesn’t LoadBalancer just work?” - Welcome to bare metal, where nothing is free

Problem

Fresh Kubernetes cluster. Runs perfect. Does nothing useful. You try to expose a service with type: LoadBalancer and it stays <pending> forever. You create a PersistentVolumeClaim and it stays unbound. You want to access apps via hostname but there’s no ingress controller.

Cloud providers give you these primitives for free. On bare metal, you build them yourself. It’s not hard, but you have to know the pieces.

Solution

Three Helm charts turn your cluster from “technically running” to “actually useful”:

Component	What Problem It Solves
MetalLB	Makes `type: LoadBalancer` actually work by assigning real LAN IPs
Traefik	Routes `app.domain.local` to the right pod without 8 different IPs
NFS CSI Driver	Lets pods use your NAS for persistent storage (works with Talos)

I’ve deployed this foundation on four different clusters. Same pattern every time. Install these three, then everything else just works.

Full source: k8s-media-stack (foundation/ directory)

ℹ️ Info

This assumes you have a NAS or file server providing NFS. If you don’t, skip that section and use local storage for now. You can add NFS later.

MetalLB: Bare-Metal LoadBalancer

In cloud Kubernetes, you set type: LoadBalancer and get an IP. On bare metal, you get <pending> and confusion. I spent 30 minutes my first time thinking my cluster was broken.

MetalLB makes LoadBalancer work. You give it a pool of IPs from your LAN. It watches for LoadBalancer services and assigns them IPs from that pool. In L2 mode, it responds to ARP requests so any device on your network can reach those IPs.

Think of it as a software load balancer that speaks ARP.

Install

helm repo add metallb https://metallb.github.io/metallb
helm repo update

helm upgrade --install metallb metallb/metallb \
    -n metallb-system --create-namespace \
    --wait --timeout 120s

Wait for the controller to be ready before applying the config:

kubectl rollout status -n metallb-system deploy/metallb-controller --timeout=90s

Configure the IP Pool

Pick a range of IPs outside your DHCP scope. I use the top 10 IPs of my subnet (192.168.2.244-254). Check your router’s DHCP settings to see what’s actually in use.

# foundation/metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: lan-pool
  namespace: metallb-system
spec:
  addresses:
    - <LB_RANGE_START>-<LB_RANGE_END>   # e.g. 192.168.2.244-192.168.2.254
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: lan-l2
  namespace: metallb-system
spec:
  ipAddressPools:
    - lan-pool

kubectl apply -f foundation/metallb-config.yaml

💡 Tip

Keep a spreadsheet or text file mapping IPs to services. I tried remembering which IP was Plex vs Traefik. Failed after the third service. Document it now, thank yourself later.

💡 Tip

Smoke test MetalLB immediately after configuring the IP pool. Deploy a quick nginx service and confirm it gets an external IP:

kubectl create deployment nginx-test --image=nginx
kubectl expose deployment nginx-test --type=LoadBalancer --port=80
kubectl get svc nginx-test  # EXTERNAL-IP should appear within seconds
curl http://<EXTERNAL-IP>   # Should return the nginx welcome page
kubectl delete deployment nginx-test && kubectl delete svc nginx-test

If the IP stays <pending>, troubleshoot MetalLB before adding Traefik on top.

⚠️ Warning

L2 mode has a limitation: all traffic for a given IP goes through one node (the current ARP responder). This is fine for a homelab. For production or high-throughput, consider BGP mode.

“I’ll just remember which IP is for what.” - Narrator: They did not remember

Traefik: Ingress Controller

Without an ingress controller, every service needs its own LoadBalancer IP. Got 8 apps? That’s 8 IPs to remember. Worse, you’re accessing everything by IP and port like it’s 2010.

Traefik solves this. It gets one LoadBalancer IP from MetalLB. You point *.media.lan at that IP in DNS. Traefik looks at the hostname in each request and routes to the right pod. One IP, clean hostnames, TLS termination included.

Install

helm repo add traefik https://traefik.github.io/charts
helm repo update

helm upgrade --install traefik traefik/traefik \
    -n traefik --create-namespace \
    -f foundation/traefik-values.yaml \
    --wait --timeout 120s

Helm Values

# foundation/traefik-values.yaml
service:
  type: LoadBalancer
  annotations:
    metallb.universe.tf/loadBalancerIPs: "<TRAEFIK_IP>"   # e.g. 192.168.2.244

ports:
  web:
    exposedPort: 80
  websecure:
    exposedPort: 443

ingressRoute:
  dashboard:
    enabled: false   # Access via port-forward if needed

providers:
  kubernetesIngress:
    enabled: true
    publishedService:
      enabled: true

logs:
  general:
    level: INFO

The metallb.universe.tf/loadBalancerIPs annotation pins Traefik to a specific IP. Without it, MetalLB assigns the next available address from the pool.

publishedService updates the status.loadBalancer field on Ingress resources so kubectl get ingress shows the external IP.

Verify

kubectl get svc -n traefik

NAME      TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)
traefik   LoadBalancer   10.96.x.x    192.168.2.244    80:xxxxx/TCP,443:xxxxx/TCP

The EXTERNAL-IP column should show your pinned IP. If it shows <pending>, MetalLB isn’t running or the IP pool isn’t configured.

How Ingress Routing Works

Once Traefik is running, any Ingress resource with ingressClassName: traefik gets picked up automatically:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  namespace: my-namespace
spec:
  ingressClassName: traefik
  rules:
    - host: my-app.example.lan
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 8080

All traffic to my-app.example.lan hits Traefik’s LoadBalancer IP, and Traefik forwards it to the my-app service on port 8080. You just need DNS pointing the hostname at Traefik’s IP.

💡 Tip

For homelab DNS, use Pi-hole or your router’s DNS settings. A wildcard record like *.media.lan → 192.168.2.244 handles all services behind Traefik at once.

💡 Tip

Access the Traefik dashboard without exposing it publicly:

kubectl port-forward -n traefik svc/traefik 9000:9000

Then open http://localhost:9000/dashboard/ to see all active routes and entrypoints.

NFS CSI Driver: Persistent Storage

Kubernetes pods are ephemeral. Config files, databases, and media libraries need persistent volumes. On a homelab, NFS is the simplest shared storage backend.

Talos Linux has no package manager, so you can’t install nfs-utils on the host. The NFS CSI driver solves this by handling NFS mounts inside the CSI pod itself.

Install

helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update

helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
    -n kube-system \
    --wait --timeout 120s

Create a StorageClass

The StorageClass tells the CSI driver where to provision volumes. Each PVC gets its own subdirectory under the NFS share:

# foundation/nfs-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-appdata
provisioner: nfs.csi.k8s.io
parameters:
  server: "<NAS_IP>"                         # e.g. 192.168.2.129
  share: "<NFS_SHARE_PATH>"                  # e.g. /volume1/nfs01/k8s-appdata
reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
  - nfsvers=3
  - nolock

kubectl apply -f foundation/nfs-storageclass.yaml

reclaimPolicy: Retain keeps the data on the NAS even if the PVC is deleted. For throwaway workloads, use Delete.

Using It

Any PVC referencing this StorageClass gets a dynamically provisioned NFS volume:

persistence:
  config:
    type: persistentVolumeClaim
    accessMode: ReadWriteOnce
    size: 2Gi
    storageClass: nfs-appdata
    globalMounts:
      - path: /config

Shared Volumes (ReadWriteMany)

For workloads that need to share the same filesystem (e.g., a download client and a media manager both accessing the same files), create a static PV/PVC pair:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: shared-data-pv
spec:
  capacity:
    storage: 10Ti
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: nfs.csi.k8s.io
    volumeHandle: shared-data-pv
    volumeAttributes:
      server: "<NAS_IP>"
      share: "<NFS_DATA_PATH>"
  mountOptions:
    - nfsvers=3
    - nolock
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-data
  namespace: my-namespace
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Ti
  volumeName: shared-data-pv

Multiple pods can mount this PVC simultaneously with full read/write access.

💡 Tip

Verify the NFS share is reachable from your cluster nodes before creating PVCs. If a pod is stuck in ContainerCreating, check kubectl describe pod for mount errors and verify the NFS export permissions on your NAS.

🏭 In production

NFS v3 with nolock is simple but has no authentication. For anything beyond a trusted LAN, use NFS v4 with Kerberos, or dedicated storage systems (Ceph, Longhorn, OpenEBS).

Deployment Order

“Dependency management is just graph theory with better error messages.” - Engineer doing incident response at 3am

Order matters. MetalLB must be running before Traefik can get an IP, and the CSI driver must be running before any PVCs can bind:

# 1. Helm repos
helm repo add metallb https://metallb.github.io/metallb
helm repo add traefik https://traefik.github.io/charts
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm repo update

# 2. MetalLB
helm upgrade --install metallb metallb/metallb -n metallb-system --create-namespace --wait
kubectl rollout status -n metallb-system deploy/metallb-controller --timeout=90s
kubectl apply -f foundation/metallb-config.yaml

# 3. NFS CSI
helm upgrade --install csi-driver-nfs csi-driver-nfs/csi-driver-nfs -n kube-system --wait
kubectl apply -f foundation/nfs-storageclass.yaml

# 4. Traefik
helm upgrade --install traefik traefik/traefik -n traefik --create-namespace \
    -f foundation/traefik-values.yaml --wait

Verify

# MetalLB
kubectl get pods -n metallb-system
kubectl get ipaddresspools -n metallb-system

# NFS CSI
kubectl get pods -n kube-system -l app.kubernetes.io/name=csi-driver-nfs
kubectl get storageclass

# Traefik
kubectl get svc -n traefik
kubectl get pods -n traefik

Common Issues

Symptom	Cause	Fix
LoadBalancer IP shows `<pending>`	MetalLB not running or no IP pool	Check MetalLB pods, verify `IPAddressPool` exists
Traefik gets wrong IP	Pool exhausted or no annotation	Pin IP with `metallb.universe.tf/loadBalancerIPs`
PVC stuck in `Pending`	CSI driver not ready or NFS unreachable	Check CSI pods, verify NFS export and network
Pod stuck in `ContainerCreating`	NFS mount failed	`kubectl describe pod` for mount errors, check NAS permissions
Ingress returns 404	Wrong `ingressClassName` or host mismatch	Verify `ingressClassName: traefik` and DNS

What’s Next

The foundation is in place. Next steps:

Deploy a media stack on top of this infrastructure using Helm and the bjw-s app-template chart
Migrate to Flux GitOps for automated delivery from Git

Problem#

Solution#

MetalLB: Bare-Metal LoadBalancer#

Install#

Configure the IP Pool#

Traefik: Ingress Controller#

Install#

Helm Values#

Verify#

How Ingress Routing Works#

NFS CSI Driver: Persistent Storage#

Install#

Create a StorageClass#

Using It#

Shared Volumes (ReadWriteMany)#

Deployment Order#

Verify#

Common Issues#

What’s Next#

References#

Problem

Solution

MetalLB: Bare-Metal LoadBalancer

Install

Configure the IP Pool

Traefik: Ingress Controller

Install

Helm Values

Verify

How Ingress Routing Works

NFS CSI Driver: Persistent Storage

Install

Create a StorageClass

Using It

Shared Volumes (ReadWriteMany)

Deployment Order

Verify

Common Issues

What’s Next

References