Component: OfflineFixJob CRD + NBD Prep + Offline-Fix VM Version: v1.0.0 Prerequisites: Kubernetes cluster, KubeVirt installed
This guide deploys the complete Phase 4 offline-fix system:
┌─────────────────────────────────────────────────────────────┐
│ OfflineFixJob CR │
└──────────────────┬──────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Controller (Python/kopf) │
│ • Selects NBD-capable node │
│ • Annotates node for DaemonSet │
└──────────────────┬──────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ NBD Prep DaemonSet │
│ • Attaches disk to NBD device │
│ • Mounts filesystem to /var/lib/kubevirt-offline/ │
└──────────────────┬──────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ KubeVirt VM (offline-fix-vm) │
│ • Mounts HostDisk from NBD mount │
│ • Runs Phase 3 fixers (fstab, initramfs, grub, selinux) │
└──────────────────┬──────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Cleanup │
│ • DaemonSet unmounts and disconnects NBD │
└─────────────────────────────────────────────────────────────┘
Nodes that will run offline-fix operations need:
# NBD kernel module
sudo modprobe nbd max_part=16
lsmod | grep nbd
# Required tools
sudo apt-get install -y qemu-utils lvm2
# Directory for NBD mounts
sudo mkdir -p /var/lib/kubevirt-offline
sudo mkdir -p /var/lib/imports
Verify KubeVirt is installed:
kubectl get kubevirt -n kubevirt
kubectl get vmi -A
cd /path/to/hyper2kvm
# Set registry (change to your registry)
export REGISTRY=quay.io/yourusername
export VERSION=v1.0.0
# Build images
./scripts/build-phase4-images.sh
# Push to registry
docker push $REGISTRY/nbd-prep:$VERSION
docker push $REGISTRY/offline-fix-vm:$VERSION
If using pre-built images from quay.io/hyper2kvm, skip to Step 2.
Deploy the OfflineFixJob Custom Resource Definition:
kubectl apply -f k8s/operator/crds/offlinefixjob.yaml
Verify:
kubectl get crd offlinefixjobs.hyper2kvm.io
kubectl api-resources | grep offlinefixjob
Label nodes that can perform NBD operations:
# Check nodes
kubectl get nodes
# Label nodes (adjust node names)
kubectl label node worker-1 hyper2kvm.io/nbd-capable=true
kubectl label node worker-2 hyper2kvm.io/nbd-capable=true
# Verify
kubectl get nodes -L hyper2kvm.io/nbd-capable
Edit k8s/daemon/nbd-prep-daemonset.yaml if using custom registry:
containers:
- name: nbd-prep
image: quay.io/yourusername/nbd-prep:v1.0.0 # Change this
kubectl apply -f k8s/daemon/nbd-prep-daemonset.yaml
# Check DaemonSet
kubectl get ds -n hyper2kvm-system nbd-prep
# Check pods on labeled nodes
kubectl get pods -n hyper2kvm-system -l app=nbd-prep -o wide
# Check logs
kubectl logs -n hyper2kvm-system -l app=nbd-prep --tail=50
Expected output:
NBD Prep Daemon starting on node: worker-1
Loading NBD kernel module
NBD module loaded successfully
The OfflineFixJob controller runs as part of the main operator.
If you’re rebuilding the operator to include Phase 4:
cd /path/to/hyper2kvm
# Build operator image
docker build -t quay.io/yourusername/operator:v1.5.0 .
docker push quay.io/yourusername/operator:v1.5.0
# If operator not deployed yet
kubectl apply -f k8s/operator/deployment.yaml
# If updating existing operator
kubectl set image deployment/hyper2kvm-operator \
operator=quay.io/yourusername/operator:v1.5.0 \
-n hyper2kvm-system
# Restart operator to load new controller
kubectl rollout restart deployment/hyper2kvm-operator -n hyper2kvm-system
# Check operator logs for OfflineFixJob controller registration
kubectl logs -n hyper2kvm-system -l app=hyper2kvm-operator --tail=100 | grep -i offline
Place a test VMDK/qcow2 on NBD-capable nodes:
# On each NBD-capable node
sudo mkdir -p /var/lib/imports
# Copy or download test image
sudo cp /path/to/centos9.qcow2 /var/lib/imports/
# Or download
sudo curl -o /var/lib/imports/centos9.qcow2 https://example.com/centos9.qcow2
# Verify
ls -lh /var/lib/imports/
# test-offlinefixjob.yaml
apiVersion: hyper2kvm.io/v1alpha1
kind: OfflineFixJob
metadata:
name: test-centos9-fix
namespace: hyper2kvm-system
spec:
source:
disk:
type: qcow2
path: /var/lib/imports/centos9.qcow2
fixes:
- fstab
- initramfs
- grub
- selinux
execution:
mode: kubevirt-vm
vmImage: quay.io/hyper2kvm/offline-fix-vm:v1.0.0
resources:
memory: "2Gi"
cpu: "2"
safety:
readOnly: false
allowXfsRepair: true
nodeSelector:
hyper2kvm.io/nbd-capable: "true"
timeout: "30m"
kubectl apply -f test-offlinefixjob.yaml
# Watch job status
kubectl get offlinefixjob -w
# Expected phases:
# Pending → NBDPrepared → VMRunning → Fixing → Completed
# Get job details
kubectl describe offlinefixjob test-centos9-fix
# Check conditions
kubectl get offlinefixjob test-centos9-fix -o jsonpath='{.status.conditions}' | jq
# NBD prep daemon logs
kubectl logs -n hyper2kvm-system -l app=nbd-prep --tail=50 -f
# Operator logs
kubectl logs -n hyper2kvm-system -l app=hyper2kvm-operator --tail=50 -f
# VM logs (when running)
VM_NAME=$(kubectl get offlinefixjob test-centos9-fix -o jsonpath='{.status.vmName}')
kubectl logs -n hyper2kvm-system $VM_NAME
# See NBD status on node
NODE=$(kubectl get offlinefixjob test-centos9-fix -o jsonpath='{.status.node}')
kubectl get node $NODE -o jsonpath='{.metadata.annotations}' | jq | grep offlinefix
# Final status
kubectl get offlinefixjob test-centos9-fix -o yaml
# Check result
kubectl get offlinefixjob test-centos9-fix \
-o jsonpath='{.status.result}' | jq
Expected output:
{
"success": true,
"operations": [
{
"operation": "fstab",
"success": true,
"message": "...",
"durationSeconds": 2.5
},
{
"operation": "initramfs",
"success": true,
"message": "...",
"durationSeconds": 45.2
},
{
"operation": "grub",
"success": true,
"message": "...",
"durationSeconds": 12.1
},
{
"operation": "selinux",
"success": true,
"message": "...",
"durationSeconds": 0.3
}
],
"bootConfidence": 95
}
kubectl get offlinefixjob test-centos9-fix \
-o jsonpath='{.status.result.bootConfidence}'
kubectl delete offlinefixjob test-centos9-fix
This triggers automatic cleanup:
# Check VM deleted
kubectl get vmi -n hyper2kvm-system
# Check node annotations cleared
kubectl get node $NODE -o jsonpath='{.metadata.annotations}' | jq | grep offlinefix
# Check NBD disconnected (on node)
ssh $NODE "lsblk | grep nbd"
# Check DaemonSet logs
kubectl logs -n hyper2kvm-system -l app=nbd-prep
# Check NBD module on node
kubectl exec -n hyper2kvm-system <nbd-prep-pod> -- lsmod | grep nbd
# Check NBD devices
kubectl exec -n hyper2kvm-system <nbd-prep-pod> -- ls -la /dev/nbd*
# Check KubeVirt
kubectl get kubevirt -n kubevirt
# Check VMI status
kubectl get vmi -n hyper2kvm-system
# Check VMI events
kubectl describe vmi <vm-name> -n hyper2kvm-system
# Check virt-handler logs
kubectl logs -n kubevirt -l kubevirt.io=virt-handler --tail=100
# Check VM logs
kubectl logs -n hyper2kvm-system <vm-name>
# Check if /vmroot is mounted correctly
kubectl exec -n hyper2kvm-system <vm-name> -- ls -la /vmroot/etc/
# Check job spec ConfigMap
kubectl get cm offline-fix-spec-test-centos9-fix -o yaml
# Check node annotations
kubectl get node $NODE -o jsonpath='{.metadata.annotations}' | jq
# Manually trigger cleanup (if needed)
kubectl annotate node $NODE offlinefix.hyper2kvm.io/cleanup=true
# Check DaemonSet logs
kubectl logs -n hyper2kvm-system -l app=nbd-prep --tail=50
Before deploying to production:
To integrate OfflineFixJob with the Migration CRD:
Add Prometheus metrics:
Phase 4 deployment provides: ✅ Kubernetes-native offline VM repair ✅ KubeVirt-safe architecture ✅ Production-ready error handling ✅ Clean lifecycle management ✅ Integration with Phase 3 fixers
The system is now ready for production use!