hyper2kvm

Best Practices Guide

Comprehensive best practices for VM migrations with Hyper2KVM based on real-world experience and lessons learned.



General Best Practices

1. Always Test First

✅ DO:

# Test with non-critical VM first
command: local
vmdk: /vms/test-dev-server.vmdk
output_dir: /kvm/test
to_output: test-server.qcow2

# Enable testing
libvirt_test: true

❌ DON’T:

Why: Testing identifies issues before they impact production.


2. Always Use Essential Features

✅ DO:

# Essential options for Linux VMs
fstab_mode: stabilize-all
regen_initramfs: true
update_grub: true

# Essential for Windows VMs
inject_virtio_drivers: true
windows_version: 2019

❌ DON’T:

# Missing critical options
fstab_mode: preserve  # Will likely cause boot failures
regen_initramfs: false  # Missing VirtIO drivers

Why: These options prevent 90% of boot failures.


3. Inspect Before Migrating

✅ DO:

# Always inspect VMDK first
./scripts/vmdk_inspect.py /vms/production.vmdk > inspection-report.txt

# Review the report
cat inspection-report.txt

# Check for issues
grep -i "issue\|warning\|error" inspection-report.txt

❌ DON’T:

Why: Early detection of issues saves time and prevents failures.


4. Keep Backups Until Verified

✅ DO:

# Keep originals by default
keep_original: true
# Don't delete source VMs immediately
# Keep for 1-2 weeks after successful migration

❌ DON’T:

Why: Easy rollback if issues are discovered later.


5. Document Everything

✅ DO:

❌ DON’T:

Why: Documentation helps troubleshooting and future migrations.


Security Best Practices

1. Use SSH Keys, Not Passwords

✅ DO:

# Remote fetch with SSH key
command: fetch-and-fix
host: esxi-01.example.com
user: root
identity: ~/.ssh/migration_key  # Dedicated key
remote: /vmfs/volumes/datastore1/vm.vmdk
output_dir: /kvm/vms
# Generate dedicated key for migrations
ssh-keygen -t rsa -b 4096 -f ~/.ssh/migration_key -C "migration-key"

# Use strong passphrase
# Store passphrase in secure vault

❌ DON’T:

# Don't store passwords in configs
vcenter_password: "MyPassword123"  # BAD!

Why: SSH keys are more secure and auditable.


2. Use Least Privilege

✅ DO:

❌ DON’T:

Why: Limits potential damage from compromised credentials.


3. Encrypt Sensitive Data

✅ DO:

# Encrypt QCOW2 images if they contain sensitive data
qemu-img convert -O qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
  input.qcow2 encrypted-output.qcow2

# Use encrypted channels for remote transfers
# SSH, HTTPS, VPN

❌ DON’T:

Why: Protects data in transit and at rest.


4. Audit and Log Everything

✅ DO:

# Enable detailed logging
log_level: INFO
log_file: /var/log/hyper2kvm/migration-${VM_NAME}.log

# Keep logs for compliance
# Archive logs for audit trail
# Review logs after migration
grep -i "error\|warning" /var/log/hyper2kvm/*.log

❌ DON’T:

Why: Logs provide audit trail and help with security investigations.


5. Validate Security Posture After Migration

✅ DO:

# After migration, verify security
# Check firewall rules
iptables -L -n

# Check SELinux status
getenforce

# Check open ports
ss -tlnp

# Verify SSH configuration
sshd -T | grep -i permit

❌ DON’T:

Why: Migration may change security configurations.


Performance Best Practices

1. Choose the Right Output Format

✅ DO:

# For databases and high-I/O workloads
out_format: raw
compress: false

# For general workloads
out_format: qcow2
compress: false

# For storage-constrained environments
out_format: qcow2
compress: true
compression_level: 6

❌ DON’T:

# Don't compress database VMs
out_format: qcow2
compress: true  # Adds CPU overhead

Why: Format impacts runtime performance.

Performance Comparison:


2. Use Appropriate Storage

✅ DO:

# Use fast local storage for conversion
conversion_dir: /fast/ssd/temp
output_dir: /fast/storage/vms

❌ DON’T:

Why: Migration speed is often I/O-bound.


3. Optimize Parallel Migrations

✅ DO:

# Batch with appropriate parallelism
batch_parallel: 3  # For 4-core system

# Consider system resources
# CPUs: 4 → parallel: 2-3
# CPUs: 8 → parallel: 4-6
# CPUs: 16 → parallel: 8-12

❌ DON’T:

# Too many parallel migrations
batch_parallel: 10  # On 4-core system - BAD!

Why: Over-parallelization causes resource contention.

Rule of Thumb: parallel = (CPU_count / 2) to (CPU_count - 1)


4. Monitor Resource Usage

✅ DO:

# Monitor during migration
watch -n 2 'ps aux | grep hyper2kvm; iostat -x 1 1; free -h'

# Check for bottlenecks
# CPU: Should be high (80-95%)
# I/O wait: Should be low (<20%)
# Memory: Should have 2+ GB available

❌ DON’T:

Why: Monitoring identifies bottlenecks and prevents issues.


5. Optimize Network Transfers

✅ DO:

# For remote migrations
command: fetch-and-fix
host: esxi.example.com
user: root
identity: ~/.ssh/id_rsa

# Use compression for slow networks
compress: true

# Adjust timeouts for slow networks
timeout: 7200
network_retry: 5
# Test network speed first
iperf3 -c esxi.example.com

❌ DON’T:

Why: Network is often the bottleneck for remote migrations.


Reliability Best Practices

1. Use Idempotent Operations

✅ DO:

# Use configs that can be re-run
h2kvmctl --config migration.yaml

# If migration fails, fix issue and re-run same command
# Hyper2KVM is designed to be idempotent

❌ DON’T:

Why: Makes recovery from failures easier.


2. Implement Proper Error Handling

✅ DO:

# Check exit codes
if h2kvmctl --config migration.yaml; then
    echo "Migration successful"
    # Continue with validation
else
    echo "Migration failed"
    # Alert team
    # Review logs
    # Don't proceed to cutover
fi

❌ DON’T:

# Ignore errors
h2kvmctl --config migration.yaml || true  # BAD!

Why: Proper error handling prevents cascading failures.


3. Use Staged Rollouts

✅ DO:

Migration Phases:
1. Test/Dev VMs (validate process)
2. Non-critical production (validate at scale)
3. Critical production (apply lessons learned)

❌ DON’T:

Why: Staged rollouts reduce risk and allow for process improvements.


4. Maintain Rollback Capability

✅ DO:

❌ DON’T:

Why: Rollback capability is your safety net.


5. Validate Thoroughly

✅ DO:

# Multi-level validation
# 1. File integrity
qemu-img check output.qcow2

# 2. Boot test
virsh start vm-name

# 3. Application test
curl http://vm-name/health

# 4. Performance test
run-benchmark.sh

# 5. Integration test
test-dependencies.sh

❌ DON’T:

Why: Thorough validation catches issues before users do.


Cost Optimization Best Practices

1. Right-Size Resources

✅ DO:

# Analyze VM resource usage before migration
# Right-size based on actual usage, not allocated resources

❌ DON’T:

Why: Migration is an opportunity to optimize costs.


2. Use Compression Wisely

✅ DO:

# Use compression for archival/cold storage
out_format: qcow2
compress: true
compression_level: 9  # Maximum compression

❌ DON’T:

# Don't compress if storage is cheap and abundant
compress: true  # Wastes CPU time during migration

Why: Compression trades CPU time for storage savings.

When to Compress:


3. Batch Efficiently

✅ DO:

# Batch similar VMs together
# Migrate during off-hours to use spare capacity
batch_parallel: 4

❌ DON’T:

Why: Batching maximizes resource utilization.


4. Choose Cost-Effective Storage

✅ DO:

❌ DON’T:

Why: Storage tier should match VM requirements and SLA.


Team & Process Best Practices

1. Define Clear Roles

✅ DO:

Migration Team Roles:
- Migration Lead: Overall coordination
- Technical Lead: Technical decisions
- Migration Engineer: Execute migrations
- Application Owner: Validate applications
- Operations: Post-migration support

❌ DON’T:

Why: Clear roles prevent confusion and ensure accountability.


2. Use Runbooks

✅ DO:

Template: Migration Runbook Template

❌ DON’T:

Why: Runbooks ensure consistency and knowledge transfer.


3. Communicate Proactively

✅ DO:

Communication Schedule:
- T-1 week: Initial notification
- T-3 days: Reminder with details
- T-1 day: Final confirmation
- T-0: Start notification
- T+0: Completion notification
- T+1 day: Status update

❌ DON’T:

Why: Proactive communication reduces user impact and builds trust.


4. Learn from Each Migration

✅ DO:

Retrospective Questions:

❌ DON’T:

Why: Continuous improvement increases success rate.


5. Maintain Documentation

✅ DO:

❌ DON’T:

Why: Good documentation accelerates future migrations.


Common Anti-Patterns to Avoid

❌ Anti-Pattern 1: “Big Bang” Migration

What: Migrating all VMs at once without testing.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 2: Skipping Pre-Flight Checks

What: Starting migration without validation.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 3: Ignoring Inspection Warnings

What: Proceeding despite VMDK inspection warnings.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 4: No Rollback Plan

What: Migrating without a way to rollback.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 5: Inadequate Testing

What: Superficial or no testing before production.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 6: Manual Configuration

What: Manually modifying VMs instead of using automation.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 7: Premature Decommission

What: Deleting source VMs immediately after migration.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 8: Ignoring Security

What: Treating security as an afterthought.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 9: Poor Communication

What: Not communicating with stakeholders.

Why it’s bad:

✅ Instead:


❌ Anti-Pattern 10: Skipping Documentation

What: Not documenting migrations.

Why it’s bad:

✅ Instead:


Quick Reference: Best Practices Checklist

Before Migration

During Migration

After Migration


Success Metrics

Track these metrics to measure migration success:

Metric Target Good Acceptable
Success Rate 100% >95% >90%
First Boot Success 100% >97% >95%
Rollback Rate 0% <3% <5%
Actual vs Planned Time 100% >90% >80%
Issues Found in Production 0 <2% <5%
Stakeholder Satisfaction 100% >90% >80%

Additional Resources


Last Updated: February 2026 Documentation Version: 2.1.0