Comprehensive best practices for VM migrations with Hyper2KVM based on real-world experience and lessons learned.
✅ DO:
# Test with non-critical VM first
command: local
vmdk: /vms/test-dev-server.vmdk
output_dir: /kvm/test
to_output: test-server.qcow2
# Enable testing
libvirt_test: true
❌ DON’T:
Why: Testing identifies issues before they impact production.
✅ DO:
# Essential options for Linux VMs
fstab_mode: stabilize-all
regen_initramfs: true
update_grub: true
# Essential for Windows VMs
inject_virtio_drivers: true
windows_version: 2019
❌ DON’T:
# Missing critical options
fstab_mode: preserve # Will likely cause boot failures
regen_initramfs: false # Missing VirtIO drivers
Why: These options prevent 90% of boot failures.
✅ DO:
# Always inspect VMDK first
./scripts/vmdk_inspect.py /vms/production.vmdk > inspection-report.txt
# Review the report
cat inspection-report.txt
# Check for issues
grep -i "issue\|warning\|error" inspection-report.txt
❌ DON’T:
Why: Early detection of issues saves time and prevents failures.
✅ DO:
# Keep originals by default
keep_original: true
# Don't delete source VMs immediately
# Keep for 1-2 weeks after successful migration
❌ DON’T:
Why: Easy rollback if issues are discovered later.
✅ DO:
❌ DON’T:
Why: Documentation helps troubleshooting and future migrations.
✅ DO:
# Remote fetch with SSH key
command: fetch-and-fix
host: esxi-01.example.com
user: root
identity: ~/.ssh/migration_key # Dedicated key
remote: /vmfs/volumes/datastore1/vm.vmdk
output_dir: /kvm/vms
# Generate dedicated key for migrations
ssh-keygen -t rsa -b 4096 -f ~/.ssh/migration_key -C "migration-key"
# Use strong passphrase
# Store passphrase in secure vault
❌ DON’T:
# Don't store passwords in configs
vcenter_password: "MyPassword123" # BAD!
Why: SSH keys are more secure and auditable.
✅ DO:
❌ DON’T:
Why: Limits potential damage from compromised credentials.
✅ DO:
# Encrypt QCOW2 images if they contain sensitive data
qemu-img convert -O qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
input.qcow2 encrypted-output.qcow2
# Use encrypted channels for remote transfers
# SSH, HTTPS, VPN
❌ DON’T:
Why: Protects data in transit and at rest.
✅ DO:
# Enable detailed logging
log_level: INFO
log_file: /var/log/hyper2kvm/migration-${VM_NAME}.log
# Keep logs for compliance
# Archive logs for audit trail
# Review logs after migration
grep -i "error\|warning" /var/log/hyper2kvm/*.log
❌ DON’T:
Why: Logs provide audit trail and help with security investigations.
✅ DO:
# After migration, verify security
# Check firewall rules
iptables -L -n
# Check SELinux status
getenforce
# Check open ports
ss -tlnp
# Verify SSH configuration
sshd -T | grep -i permit
❌ DON’T:
Why: Migration may change security configurations.
✅ DO:
# For databases and high-I/O workloads
out_format: raw
compress: false
# For general workloads
out_format: qcow2
compress: false
# For storage-constrained environments
out_format: qcow2
compress: true
compression_level: 6
❌ DON’T:
# Don't compress database VMs
out_format: qcow2
compress: true # Adds CPU overhead
Why: Format impacts runtime performance.
Performance Comparison:
✅ DO:
# Use fast local storage for conversion
conversion_dir: /fast/ssd/temp
output_dir: /fast/storage/vms
❌ DON’T:
Why: Migration speed is often I/O-bound.
✅ DO:
# Batch with appropriate parallelism
batch_parallel: 3 # For 4-core system
# Consider system resources
# CPUs: 4 → parallel: 2-3
# CPUs: 8 → parallel: 4-6
# CPUs: 16 → parallel: 8-12
❌ DON’T:
# Too many parallel migrations
batch_parallel: 10 # On 4-core system - BAD!
Why: Over-parallelization causes resource contention.
Rule of Thumb: parallel = (CPU_count / 2) to (CPU_count - 1)
✅ DO:
# Monitor during migration
watch -n 2 'ps aux | grep hyper2kvm; iostat -x 1 1; free -h'
# Check for bottlenecks
# CPU: Should be high (80-95%)
# I/O wait: Should be low (<20%)
# Memory: Should have 2+ GB available
❌ DON’T:
Why: Monitoring identifies bottlenecks and prevents issues.
✅ DO:
# For remote migrations
command: fetch-and-fix
host: esxi.example.com
user: root
identity: ~/.ssh/id_rsa
# Use compression for slow networks
compress: true
# Adjust timeouts for slow networks
timeout: 7200
network_retry: 5
# Test network speed first
iperf3 -c esxi.example.com
❌ DON’T:
Why: Network is often the bottleneck for remote migrations.
✅ DO:
# Use configs that can be re-run
h2kvmctl --config migration.yaml
# If migration fails, fix issue and re-run same command
# Hyper2KVM is designed to be idempotent
❌ DON’T:
Why: Makes recovery from failures easier.
✅ DO:
# Check exit codes
if h2kvmctl --config migration.yaml; then
echo "Migration successful"
# Continue with validation
else
echo "Migration failed"
# Alert team
# Review logs
# Don't proceed to cutover
fi
❌ DON’T:
# Ignore errors
h2kvmctl --config migration.yaml || true # BAD!
Why: Proper error handling prevents cascading failures.
✅ DO:
Migration Phases:
1. Test/Dev VMs (validate process)
2. Non-critical production (validate at scale)
3. Critical production (apply lessons learned)
❌ DON’T:
Why: Staged rollouts reduce risk and allow for process improvements.
✅ DO:
❌ DON’T:
Why: Rollback capability is your safety net.
✅ DO:
# Multi-level validation
# 1. File integrity
qemu-img check output.qcow2
# 2. Boot test
virsh start vm-name
# 3. Application test
curl http://vm-name/health
# 4. Performance test
run-benchmark.sh
# 5. Integration test
test-dependencies.sh
❌ DON’T:
Why: Thorough validation catches issues before users do.
✅ DO:
# Analyze VM resource usage before migration
# Right-size based on actual usage, not allocated resources
❌ DON’T:
Why: Migration is an opportunity to optimize costs.
✅ DO:
# Use compression for archival/cold storage
out_format: qcow2
compress: true
compression_level: 9 # Maximum compression
❌ DON’T:
# Don't compress if storage is cheap and abundant
compress: true # Wastes CPU time during migration
Why: Compression trades CPU time for storage savings.
When to Compress:
✅ DO:
# Batch similar VMs together
# Migrate during off-hours to use spare capacity
batch_parallel: 4
❌ DON’T:
Why: Batching maximizes resource utilization.
✅ DO:
❌ DON’T:
Why: Storage tier should match VM requirements and SLA.
✅ DO:
Migration Team Roles:
- Migration Lead: Overall coordination
- Technical Lead: Technical decisions
- Migration Engineer: Execute migrations
- Application Owner: Validate applications
- Operations: Post-migration support
❌ DON’T:
Why: Clear roles prevent confusion and ensure accountability.
✅ DO:
Template: Migration Runbook Template
❌ DON’T:
Why: Runbooks ensure consistency and knowledge transfer.
✅ DO:
Communication Schedule:
- T-1 week: Initial notification
- T-3 days: Reminder with details
- T-1 day: Final confirmation
- T-0: Start notification
- T+0: Completion notification
- T+1 day: Status update
❌ DON’T:
Why: Proactive communication reduces user impact and builds trust.
✅ DO:
Retrospective Questions:
❌ DON’T:
Why: Continuous improvement increases success rate.
✅ DO:
❌ DON’T:
Why: Good documentation accelerates future migrations.
What: Migrating all VMs at once without testing.
Why it’s bad:
✅ Instead:
What: Starting migration without validation.
Why it’s bad:
✅ Instead:
What: Proceeding despite VMDK inspection warnings.
Why it’s bad:
✅ Instead:
What: Migrating without a way to rollback.
Why it’s bad:
✅ Instead:
What: Superficial or no testing before production.
Why it’s bad:
✅ Instead:
What: Manually modifying VMs instead of using automation.
Why it’s bad:
✅ Instead:
What: Deleting source VMs immediately after migration.
Why it’s bad:
✅ Instead:
What: Treating security as an afterthought.
Why it’s bad:
✅ Instead:
What: Not communicating with stakeholders.
Why it’s bad:
✅ Instead:
What: Not documenting migrations.
Why it’s bad:
✅ Instead:
Track these metrics to measure migration success:
| Metric | Target | Good | Acceptable |
|---|---|---|---|
| Success Rate | 100% | >95% | >90% |
| First Boot Success | 100% | >97% | >95% |
| Rollback Rate | 0% | <3% | <5% |
| Actual vs Planned Time | 100% | >90% | >80% |
| Issues Found in Production | 0 | <2% | <5% |
| Stakeholder Satisfaction | 100% | >90% | >80% |
Last Updated: February 2026 Documentation Version: 2.1.0