This document provides an in-depth exploration of hyper2kvm’s module-level architecture, execution flow, and core architectural principles.
It’s designed for:
hyper2kvm is laser-focused on fixing “successful” conversions that fail at boot, lose network connectivity, or exhibit instability post-migration. This architecture document explains how the modular design achieves reliability through:
At the heart of every migration is this invariant flow:
FETCH → FLATTEN → INSPECT → PLAN → FIX → CONVERT → VALIDATE / TEST
Not every command executes every stage, but the order is sacred. Stages can be skipped, but never reordered or interleaved.
Acquire source disks and metadata from any source:
Key principle: Source-agnostic acquisition with unified interface.
Transform complex disk chains into single-image files:
Output: Clean, single-file disk images ready for inspection.
Offline deep-dive using libguestfs to extract ground truth:
Philosophy: Derive facts, never guess. Inspection over assumption.
Strategic planning before execution:
Value: Plan smart, execute once. No trial-and-error.
Apply deterministic patches to ensure bootability:
Guarantee: Idempotent operations that tolerate re-runs.
Image format transformation via qemu-img:
Integration: Optional direct export pre/post-processing hooks.
Ruthless verification:
Motto: Does it boot? Does it network? Does it survive? Prove it.
flowchart LR
FETCH[FETCH<br/>Acquire Disks] --> FLATTEN[FLATTEN<br/>Collapse Chains]
FLATTEN --> INSPECT[INSPECT<br/>Extract Facts]
INSPECT --> PLAN[PLAN<br/>Strategy]
PLAN --> FIX[FIX<br/>Apply Patches]
FIX --> CONVERT[CONVERT<br/>Format Transform]
CONVERT --> VALIDATE[VALIDATE/TEST<br/>Boot Tests]
style FETCH fill:#4CAF50,stroke:#2E7D32,color:#fff
style FLATTEN fill:#2196F3,stroke:#1565C0,color:#fff
style INSPECT fill:#FF9800,stroke:#E65100,color:#fff
style PLAN fill:#9C27B0,stroke:#6A1B9A,color:#fff
style FIX fill:#F44336,stroke:#C62828,color:#fff
style CONVERT fill:#00BCD4,stroke:#006064,color:#fff
style VALIDATE fill:#8BC34A,stroke:#558B2F,color:#fff
Key Invariants:
This reflects the actual codebase structure as of the latest refactor:
hyper2kvm/
├── __init__.py # Package root
├── __main__.py # Entry point (python -m hyper2kvm)
│
├── cli/ # Command-line interface layer
│ ├── __init__.py
│ ├── argument_parser.py # Main argument parser (legacy entry)
│ ├── help_texts.py # User-facing help documentation
│ └── args/ # Refactored argument parsing (modular)
│ ├── __init__.py
│ ├── builder.py # Argument builder pattern
│ ├── groups.py # Argument group definitions
│ ├── helpers.py # Parsing utilities
│ ├── parser.py # Core parser logic
│ └── validators.py # Argument validation rules
│
├── config/ # Configuration management
│ ├── __init__.py
│ ├── config_loader.py # YAML config loading and merging
│ └── systemd_template.py # Systemd unit templates for guest injection
│
├── core/ # Foundational utilities and infrastructure
│ ├── __init__.py
│ ├── cred.py # Credential handling (secure storage)
│ ├── exceptions.py # Custom exception hierarchy
│ ├── file_ops.py # File operation utilities
│ ├── guest_identity.py # Guest OS identity detection
│ ├── guest_utils.py # Guest-specific utilities
│ ├── list_utils.py # List manipulation helpers
│ ├── logger.py # Structured logging (rich console)
│ ├── logging_utils.py # Logging configuration helpers
│ ├── optional_imports.py # Graceful optional dependency handling
│ ├── recovery_manager.py # Crash recovery and checkpointing
│ ├── retry.py # Retry logic with exponential backoff
│ ├── sanity_checker.py # Pre-flight sanity checks
│ ├── utils.py # General-purpose utilities
│ ├── validation_suite.py # Validation test suites
│ └── xml_utils.py # XML parsing and generation utilities
│
├── converters/ # Disk transformation engines
│ ├── __init__.py
│ ├── disk_resizer.py # Disk resizing operations
│ ├── fetch.py # Unified disk fetching interface
│ ├── flatten.py # Snapshot chain flattening
│ ├── extractors/ # Archive/container extractors
│ │ ├── __init__.py
│ │ ├── ami.py # AWS AMI tarball extractor
│ │ ├── ovf.py # OVF/OVA unpacker
│ │ ├── raw.py # RAW/tarball extractor with security checks
│ │ └── vhd.py # VHD/VHDX handler (Azure/Hyper-V)
│ └── qemu/ # QEMU image operations
│ ├── __init__.py
│ └── converter.py # qemu-img wrapper (convert, resize, info)
│
├── fixers/ # Guest OS repair and modification layer
│ ├── __init__.py
│ ├── base_fixer.py # Base class defining fixer interface
│ ├── cloud_init_injector.py # Cloud-init metadata injection
│ ├── network_fixer.py # Top-level network fixer coordinator
│ ├── offline_fixer.py # Top-level offline fixer coordinator
│ ├── report_writer.py # Migration report generation
│ │
│ ├── bootloader/ # Bootloader fixing subsystem
│ │ ├── __init__.py
│ │ ├── fixer.py # Bootloader fixer orchestration
│ │ └── grub.py # GRUB/GRUB2 specific fixes
│ │
│ ├── filesystem/ # Filesystem fixing subsystem
│ │ ├── __init__.py
│ │ ├── fixer.py # Filesystem fixer orchestration
│ │ └── fstab.py # /etc/fstab rewriting (UUID conversion)
│ │
│ ├── live/ # Live (SSH-based) fixing subsystem
│ │ ├── __init__.py
│ │ ├── fixer.py # Live SSH fixer
│ │ └── grub_fixer.py # Live GRUB regeneration via SSH
│ │
│ ├── network/ # Network fixing subsystem
│ │ ├── __init__.py
│ │ ├── backend.py # Network backend abstraction
│ │ ├── core.py # Core network fixing logic
│ │ ├── discovery.py # Network interface discovery
│ │ ├── model.py # Network configuration models
│ │ ├── topology.py # Network topology analysis
│ │ └── validation.py # Network config validation
│ │
│ ├── offline/ # Offline (libguestfs) fixing subsystem
│ │ ├── __init__.py
│ │ ├── config_rewriter.py # System config file rewriting
│ │ ├── mount.py # Guest filesystem mounting
│ │ ├── spec_converter.py # Spec file format conversions
│ │ ├── validation.py # Offline fix validation
│ │ └── vmware_tools_remover.py # Offline VMware Tools purge
│ │
│ └── windows/ # Windows-specific fixing subsystem
│ ├── __init__.py
│ ├── fixer.py # Main Windows fixer orchestrator
│ ├── network_fixer.py # Windows network fixing
│ ├── registry_core.py # Registry manipulation core
│ ├── registry/ # Windows Registry subsystem
│ │ ├── __init__.py
│ │ ├── encoding.py # Registry value encoding/decoding
│ │ ├── firstboot.py # First-boot registry tweaks
│ │ ├── io.py # Registry file I/O (hivex wrapper)
│ │ ├── mount.py # Registry hive mounting
│ │ ├── software.py # HKLM\Software modifications
│ │ └── system.py # HKLM\System modifications
│ └── virtio/ # Windows VirtIO driver injection
│ ├── __init__.py
│ ├── config.py # VirtIO configuration
│ ├── core.py # Core VirtIO injection logic
│ ├── detection.py # VirtIO ISO detection
│ ├── discovery.py # Driver discovery in VirtIO ISO
│ ├── install.py # Driver installation to registry
│ ├── paths.py # VirtIO ISO path resolution
│ └── utils.py # VirtIO utilities
│
├── libvirt/ # LibVirt integration layer
│ ├── domain_emitter.py # Generic domain XML emitter
│ ├── libvirt_utils.py # LibVirt utility functions
│ ├── linux_domain.py # Linux-specific domain XML generation
│ └── windows_domain.py # Windows-specific domain XML generation
│
├── modes/ # Specialized operational modes
│ ├── __init__.py
│ ├── inventory_mode.py # Read-only VM/disk inventory scanning
│ └── plan_mode.py # Dry-run planning mode (what-if)
│
├── orchestrator/ # Pipeline orchestration layer
│ ├── __init__.py
│ ├── README.md # Refactoring documentation
│ ├── orchestrator.py # Main pipeline coordinator (refactored)
│ ├── disk_discovery.py # Input disk discovery logic
│ ├── disk_processor.py # Disk processing pipeline executor
│ └── vsphere_exporter.py # vSphere VM export orchestration
│
├── ssh/ # SSH/SCP transport layer
│ ├── __init__.py
│ ├── ssh_client.py # Paramiko-based SSH client
│ └── ssh_config.py # SSH connection configuration
│
├── testers/ # Post-migration validation layer
│ ├── __init__.py
│ ├── libvirt_tester.py # LibVirt domain boot testing
│ └── qemu_tester.py # Direct QEMU boot testing
│
└── vmware/ # VMware ecosystem integration
├── __init__.py
├── clients/ # VMware API clients
│ ├── __init__.py
│ ├── client.py # pyvmomi SmartConnect wrapper
│ ├── extensions.py # vSphere API extensions
│ └── nfc_lease.py # NFC lease management for exports
│
├── transports/ # Data-plane transport implementations
│ ├── __init__.py
│ ├── govc_common.py # govc CLI wrapper utilities
│ ├── govc_export.py # govc export operations
│ ├── http_client.py # HTTP datastore download client
│ ├── http_progress.py # HTTP download progress tracking
│ ├── ovftool_client.py # VMware ovftool wrapper
│ ├── ovftool_loader.py # ovftool dynamic loader
│ ├── vddk_client.py # VDDK high-speed transfer client
│ └── vddk_loader.py # VDDK dynamic library loader
│
├── utils/ # VMware utilities
│ ├── __init__.py
│ ├── datastore.py # Datastore path parsing
│ ├── utils.py # General VMware utilities
│ └── vmdk_parser.py # VMDK descriptor file parser
│
└── vsphere/ # vSphere control-plane operations
├── __init__.py
├── command.py # vSphere command abstraction
├── errors.py # vSphere error handling
├── govc.py # govc-specific operations
└── mode.py # vSphere operational modes
```bash
**Total:** 27 directories, 117+ Python modules
The orchestrator was refactored from a single 1,197-line monolithic class into 4 focused components, each under 300 lines and following the Single Responsibility Principle.
orchestrator/orchestrator.py)Responsibility: Main pipeline coordinator
Key Methods:
run() - Execute full migration pipeline_setup_recovery() - Initialize crash recovery_discover_disks() - Delegate to DiskDiscovery_process_disks() - Delegate to DiskProcessor_run_tests() - Execute validation tests_emit_domain_xml() - Generate libvirt domain XMLPhilosophy: Coordinate, don’t implement. Delegate to specialists.
orchestrator/disk_discovery.py)Responsibility: Input disk detection and preparation
Supported Sources:
Output: List of discovered disk paths + optional temp directory
orchestrator/disk_processor.py)Responsibility: Per-disk processing pipeline
Pipeline Stages:
Features:
orchestrator/vsphere_exporter.py)Responsibility: vSphere VM export orchestration
Export Modes:
Features:
| Aspect | Before (Monolithic) | After (Refactored) |
|---|---|---|
| Lines of Code | 1,197 lines, 50+ methods | 4 files, each < 310 lines |
| Testability | Difficult to test in isolation | Each component independently testable |
| Maintainability | All concerns mixed | Single Responsibility Principle |
| Reusability | Tightly coupled | Components usable independently |
| Debugging | Hard to isolate failures | Clear component boundaries |
VMware integration enforces strict separation between what to do (control) and how to move bytes (data).
Purpose: Answer “what exists, where, and what’s the plan?”
Never touches bulk data - keeps operations lean, fast, and auditable.
Tool: VMware’s official CLI (govc)
Capabilities:
Why govc:
Integration: vmware/vsphere/govc.py + vmware/vsphere/command.py
Library: VMware’s official Python SDK
Use Cases:
Integration: vmware/clients/client.py - SmartConnect wrapper with retry logic
Details:
SmartConnectRetrievePropertiesExModules: vmware/vsphere/mode.py + vmware/vsphere/command.py
Function: Translate user commands (vsphere inventory, vsphere plan) into pure metadata operations. No data hauling.
Purpose: Answer “how do we safely move disk data?”
No inventory logic - pure transport layer.
Library: VMware Virtual Disk Development Kit
Module: vmware/transports/vddk_client.py
Features:
When to Use: Large VMs, bandwidth-constrained environments
Tool: VMware OVF Tool
Module: vmware/transports/ovftool_client.py
Features:
When to Use: Need OVF compatibility, vendor-specific flags
/folder (Datastore Downloads)Protocol: HTTPS datastore browsing
Module: vmware/transports/http_client.py
Features:
When to Use: Simple downloads, no VDDK available
Protocol: SSH with SCP/SFTP
Module: ssh/ssh_client.py
Features:
When to Use: API access unavailable, ESXi direct access
Tool: govc export.ovf / export.ova
Module: vmware/transports/govc_export.py
Features:
When to Use: Lightweight exports, scripting
Module: fixers/offline/
Philosophy: Modify disk images without booting. No runtime dependencies.
Technology: libguestfs (QEMU + kernel appliance)
Advantages:
Subsystems:
fixers/filesystem/)/etc/fstab rewriting (by-path → UUID/PARTUUID)fixers/bootloader/)fixers/offline/config_rewriter.py)fixers/offline/vmware_tools_remover.py)Module: fixers/live/
Philosophy: Execute fixes on running Linux guests via SSH.
Use Cases:
Safety:
Module: fixers/windows/
Principle: Windows logic never leaks into Linux fixers. Complete isolation.
fixers/windows/registry/)Purpose: Modify Windows Registry offline (no Windows boot required)
Technology: hivex (libguestfs registry manipulation)
Operations:
Modules:
io.py - Registry hive I/O (read/write)encoding.py - Registry value encodingmount.py - Hive mounting (SYSTEM, SOFTWARE, SAM)firstboot.py - First-boot tweakssoftware.py - HKLM\Software modificationssystem.py - HKLM\System modifications (drivers, services)fixers/windows/virtio/)Purpose: Inject VirtIO drivers for KVM compatibility
Challenge: Windows won’t boot on KVM without VirtIO drivers, but drivers can’t be installed without booting.
Solution: Offline registry modification to pre-install drivers.
Workflow:
detection.py) - Locate VirtIO ISO (local/remote)discovery.py) - Extract drivers matching guest OS versioninstall.py) - Add driver registry entriesconfig.py) - Configure driver load orderDrivers Injected:
viostor - Storage controllernetkvm - Network adaptervioscsi - SCSI controllerviorng - RNG deviceballoon - Memory ballooningModule: fixers/network/
Architecture: Modular backend system supporting multiple network managers.
Backends Supported:
Components:
discovery.py)topology.py)core.py)validation.py)backend.py)Fixes Applied:
Module: libvirt/
Purpose: Generate libvirt domain XML for migrated VMs
Components:
domain_emitter.py)Generic XML generation framework
linux_domain.py)Linux-specific domain XML:
windows_domain.py)Windows-specific domain XML:
Output: Ready-to-import libvirt XML (virsh define domain.xml)
Module: core/
The foundational layer providing infrastructure for all other modules.
guest_identity.py)recovery_manager.py)retry.py)validation_suite.py)file_ops.py)logger.py, logging_utils.py)Module: core/vmcraft/
Version: v9.0
VMCraft is hyper2kvm’s pure Python disk image manipulation platform, serving as the primary VM inspection and modification engine.
VMCraft consists of 57 specialized modules organized into focused categories:
core/vmcraft/
├── main.py # Orchestrator
├── Core Infrastructure (4 modules)
│ ├── nbd.py # NBD device management
│ ├── storage.py # LVM/LUKS/RAID/ZFS
│ ├── mount.py # Filesystem mounting
│ └── file_ops.py # File operations (70+ methods)
├── OS Detection (3 modules)
│ ├── inspection.py # Orchestration
│ ├── linux_detection.py # 15+ Linux distros
│ └── windows_detection.py # 20+ Windows versions
├── Windows Support (6 modules)
│ ├── windows_registry.py # Registry operations
│ ├── windows_drivers.py # Driver injection
│ ├── windows_users.py # User management
│ ├── windows_services.py # Service control
│ ├── windows_applications.py # App detection
│ └── scheduled_tasks.py # Task Scheduler
├── Linux Support (1 module)
│ └── linux_services.py # Systemd/init services
├── Enterprise Intelligence (5 modules)
│ ├── ml_analyzer.py # AI/ML analytics
│ ├── cloud_optimizer.py # Cloud migration
│ ├── disaster_recovery.py # DR planning
│ ├── audit_trail.py # Compliance logging
│ └── resource_orchestrator.py # Auto-scaling
└── Operational Tools (5 modules)
├── backup.py # Backup/restore
├── security.py # Security auditing
├── optimization.py # Disk optimization
├── advanced_analysis.py # Forensics
└── export.py # VM export
Core Operations:
Cross-Platform:
Enterprise Intelligence (v9.0):
VMCraft integrates into the migration pipeline at these stages:
| Operation | Time | Notes |
|---|---|---|
| Launch | ~1.9s | NBD + storage |
| OS Inspection | ~0.3s | Linux/Windows detection |
| File Read | <50ms | Direct filesystem access |
| Registry Read | ~150ms | Windows offline registry |
from hyper2kvm.core.vmcraft import VMCraft
with VMCraft() as g:
g.add_drive_opts("/path/to/disk.vmdk", readonly=False)
g.launch()
# OS detection
roots = g.inspect_os()
# File operations
g.write("/etc/motd", "Migrated to KVM\n")
# Windows registry (if Windows)
g.win_registry_write("SOFTWARE", r"Microsoft\...", "Key", "Value")
# AI/ML analytics (v9.0)
anomalies = g.ml_detect_anomalies(metrics, "cpu")
# Cloud optimization (v9.0)
readiness = g.cloud_analyze_readiness(system_info)
Documentation: See VMCraft Platform Guide for complete reference (307+ methods).
These principles are non-negotiable. Violating them leads to unreliable migrations.
Unless explicitly marked live, all fixers assume:
Runtime dependencies belong in fixers/live/.
Never guess. Always derive facts from:
Code must handle “unexpected but valid” configurations gracefully.
/dev/disk/by-path is RadioactiveVMware uses by-path references extensively. KVM does not.
All fixer code must:
This is the #1 cause of boot failures if missed.
Windows-specific code lives exclusively in fixers/windows/.
Linux fixers:
Cross-contamination is forbidden.
Control-plane (vmware/vsphere/, vmware/clients/):
Data-plane (vmware/transports/):
No module should perform both. Separation ensures:
Fixers should:
Only critical failures (unbootable guest) should halt the pipeline.
cli/Owns: User-facing command-line interface, argument parsing, help text. Does NOT own: Business logic, execution.
config/Owns: Configuration file loading (YAML), merging, defaults.
Does NOT own: Configuration validation (done in core/sanity_checker.py).
core/Owns: Cross-cutting concerns (logging, errors, retries, recovery, validation). Does NOT own: Domain-specific logic.
converters/Owns: Format conversions (VMDK→qcow2), extractions (OVA, AMI, VHD), disk operations.
Does NOT own: Guest OS modifications (that’s fixers/).
fixers/Owns: Guest OS modifications (offline and live), bootloader fixes, network cleanup, Windows drivers.
Does NOT own: Disk format conversions (that’s converters/).
libvirt/Owns: LibVirt domain XML generation.
Does NOT own: QEMU execution (that’s testers/qemu_tester.py).
modes/Owns: Read-only operational modes (inventory, planning). Does NOT own: Write operations (migrations).
orchestrator/Owns: Pipeline coordination, stage ordering, component delegation. Does NOT own: Stage implementation (delegates to specialists).
ssh/Owns: SSH/SCP transport, remote command execution.
Does NOT own: What commands to execute (that’s fixers/live/).
testers/Owns: Post-migration validation (boot tests, network tests). Does NOT own: Migration itself.
vmware/Owns: VMware-specific integrations (vSphere API, VDDK, govc).
Does NOT own: Generic disk operations (that’s converters/).
Location: converters/extractors/azure.py or converters/fetch.py
Hook: Register in orchestrator/disk_discovery.py
Location: fixers/offline/selinux_fixer.py or extend fixers/offline/config_rewriter.py
Hook: Call from orchestrator/disk_processor.py
Location: fixers/network/backend.py (add backend class)
Hook: Auto-detected via backend discovery
Location: testers/storage_tester.py
Hook: Call from orchestrator/orchestrator.py:_run_tests()
Location: vmware/transports/nbd_client.py
Hook: Register in vmware/transports/__init__.py
Module: orchestrator/disk_processor.py
Option: args.parallel_processing = True
Implementation: ThreadPoolExecutor (multiple disks processed concurrently)
When to Use: Multi-disk VMs (e.g., VM with OS disk + data disks)
Benefit: 3-5x faster than HTTP downloads Trade-off: Requires VDDK installation, complex setup
Benefit: Smaller output files, faster network transfers Trade-off: CPU overhead during conversion
Recommendation: Use compression for network transfers, skip for local migrations.
Location: tests/unit/
Coverage:
core/)converters/)fixers/)fixers/network/)Technology: pytest, pytest-mock
Location: tests/integration/
Coverage:
Runs: GitHub Actions (Bandit, pip-audit)
Focus:
converters/extractors/raw.py)Allow third-party fixers, transports, and validators without modifying core code.
Design:
Direct export to cloud providers without intermediate storage.
Candidates:
Module: converters/cloud/ (new)
Transactional migrations with automatic rollback on failure.
Design:
Module: Enhanced core/recovery_manager.py
Real-time progress tracking and performance metrics.
Design:
Module: core/metrics.py (new)
libguestfs: Library for accessing and modifying virtual machine disk images offline.
VDDK: VMware Virtual Disk Development Kit - high-performance API for disk access.
govc: VMware’s official CLI for vSphere operations.
pyvmomi: VMware’s official Python SDK for vSphere SOAP API.
VirtIO: Paravirtualized I/O drivers for KVM (storage, network, RNG, balloon).
hivex: Library for reading and writing Windows Registry hive files.
NBD: Network Block Device - protocol for accessing block devices over network.
CBT: Changed Block Tracking - VMware feature for incremental backups.
MoRef: Managed Object Reference - vSphere API identifier for objects.
NFC: Network File Copy - VMware protocol for efficient VM export.
from hyper2kvm.orchestrator.disk_processor import DiskProcessor
from hyper2kvm.core.guest_identity import GuestIdentity
# Initialize processor
processor = DiskProcessor()
# Process a VMDK
result = processor.process_disk(
source_path='/data/vm.vmdk',
output_path='/data/vm.qcow2',
flatten=True,
compress=True
)
# Inspect guest OS
identity = GuestIdentity.from_disk('/data/vm.qcow2')
print(f"OS: {identity.os_family}")
print(f"Firmware: {identity.firmware_type}")
from hyper2kvm.fixers.offline_fixer import OfflineFixer
# Create fixer instance
fixer = OfflineFixer('/data/vm.qcow2')
# Apply specific fixes
fixer.fix_fstab(use_uuid=True)
fixer.fix_grub(regenerate=True)
fixer.fix_network(clean_mac=True)
# Verify fixes
fixer.validate()
from hyper2kvm.vmware.clients.client import VMwareClient
# Connect to vCenter
client = VMwareClient(
host='vcenter.example.com',
username='administrator@vsphere.local',
password='password'
)
# Export VM
await client.async_export_vm(
vm_name='production-web',
output_dir='/data/exports',
export_mode='export'
)
When proposing architectural changes:
hyper2kvm’s architecture achieves reliable, repeatable VM migrations through:
The result: Migrations that “just work” - boring, predictable, and successful.
Boring migrations are successful migrations.