This document details the 8 major enhancements to daemon mode for production deployments.
| # | Feature | Benefit | Config |
|---|---|---|---|
| 1 | Concurrent Processing | 3-5x throughput | max_concurrent_jobs: 3 |
| 2 | File Completion Detection | Prevents corruption | file_stable_timeout: 30 |
| 3 | Statistics Tracking | Performance monitoring | Auto-enabled |
| 4 | Retry Mechanism | Handles transient failures | retry_policy: {...} |
| 5 | Control API | Runtime management | Auto-enabled |
| 6 | Notifications | Alerting | notifications: {...} |
| 7 | Deduplication | Prevents reprocessing | enable_deduplication: true |
| 8 | Error Context | Better troubleshooting | Auto-enabled |
Process multiple VMs simultaneously using a thread pool, dramatically increasing throughput.
max_concurrent_jobs: 3 # Adjust based on system resources
| System | Recommended Workers | Notes |
|---|---|---|
| 4 CPU, 8GB RAM | 2 | Conservative |
| 8 CPU, 16GB RAM | 3-4 | Balanced |
| 16+ CPU, 32GB+ RAM | 4-6 | Aggressive |
# Check current queue depth
python3 -m hyper2kvm.cli.daemon_ctl stats | grep "Queue Depth"
# Monitor CPU usage
top -p $(pgrep -f "hyper2kvm.*daemon")
Waits for files to be completely written before processing, preventing corruption from incomplete transfers.
file_stable_timeout secondsfile_stable_timeout: 30 # Wait up to 30 seconds for stability
When to increase timeout:
When to decrease timeout:
08:15:01 File still growing: large-vm.vmdk (10GB)
08:15:02 File still growing: large-vm.vmdk (25GB)
08:15:03 File still growing: large-vm.vmdk (50GB)
08:15:04 File stable: large-vm.vmdk (50GB)
08:15:04 π₯ New file queued: large-vm.vmdk
Comprehensive metrics tracking for monitoring, troubleshooting, and capacity planning.
Overall:
Per File Type:
Current State:
Method 1: Control CLI
python3 -m hyper2kvm.cli.daemon_ctl stats
# Output:
π Daemon Statistics:
Uptime: 12.5 hours
Processed: 45
Failed: 2
Success Rate: 95.7%
Avg Processing Time: 125.3s
Queue Depth: 3
By File Type:
vmdk: 30 ok, 1 failed, 96.8% success
ova: 15 ok, 1 failed, 93.8% success
Method 2: JSON File
cat /var/lib/hyper2kvm/output/.daemon/stats.json
Method 3: Signal (live daemon)
# Send SIGUSR1 to print stats to log
kill -USR1 $(pgrep -f "hyper2kvm.*daemon")
# View in journal
journalctl -u hyper2kvm -n 50
Stats are automatically:
stats.json# Example: Prometheus exporter
import json
from pathlib import Path
stats = json.load(open('/var/lib/hyper2kvm/output/.daemon/stats.json'))
print(f"hyper2kvm_processed_total {stats['total_processed']}")
print(f"hyper2kvm_failed_total {stats['total_failed']}")
print(f"hyper2kvm_success_rate {stats['success_rate_percent']}")
print(f"hyper2kvm_queue_depth {stats['current_queue_depth']}")
Automatically retries failed conversions with exponential backoff, handling transient failures.
retry_policy:
enabled: true
max_retries: 3
retry_delay: 300 # 5 minutes initial delay
backoff_multiplier: 2.0 # Doubles each retry
| Attempt | Delay | Time |
|---|---|---|
| 1 (initial) | 0s | 10:00:00 |
| 2 | 5 minutes | 10:05:00 |
| 3 | 10 minutes | 10:15:00 |
| 4 | 20 minutes | 10:35:00 |
| Final failure | - | 10:35:00 |
Good candidates for retry:
Not suitable for retry:
# View retry statistics
python3 -m hyper2kvm.cli.daemon_ctl stats | grep retried
# Check logs for retry attempts
journalctl -u hyper2kvm | grep -i retry
.errors/ directory.processed/.errors/Unix socket-based control interface for runtime management without restarting.
{output_dir}/.daemon/control.sock
Default: /var/lib/hyper2kvm/output/.daemon/control.sock
| Command | Description | Use Case |
|---|---|---|
status |
Get running/paused state | Health checks |
stats |
Get full statistics | Monitoring |
pause |
Pause processing | Maintenance window |
resume |
Resume processing | After maintenance |
drain |
Finish queue and exit | Graceful shutdown |
stop |
Stop immediately | Emergency shutdown |
# Basic usage
python3 -m hyper2kvm.cli.daemon_ctl status
# Specify custom output directory
python3 -m hyper2kvm.cli.daemon_ctl --output-dir /custom/path status
# Get JSON output
python3 -m hyper2kvm.cli.daemon_ctl stats --json
# Full path to socket (if non-standard)
python3 -m hyper2kvm.cli.daemon_ctl --socket /path/to/control.sock status
Pause for Maintenance:
# Pause processing
python3 -m hyper2kvm.cli.daemon_ctl pause
# β
Daemon paused
# Perform maintenance...
# ...
# Resume
python3 -m hyper2kvm.cli.daemon_ctl resume
# β
Daemon resumed
Graceful Drain:
# Drain queue and exit when empty
python3 -m hyper2kvm.cli.daemon_ctl drain
# β
Draining queue
# Daemon will:
# 1. Stop accepting new files
# 2. Finish processing queued files
# 3. Exit cleanly
Emergency Stop:
python3 -m hyper2kvm.cli.daemon_ctl stop
# β
Stopping daemon
#!/bin/bash
# Health check script for monitoring systems
SOCKET="/var/lib/hyper2kvm/output/.daemon/control.sock"
if [ ! -S "$SOCKET" ]; then
echo "CRITICAL: Daemon not running"
exit 2
fi
STATS=$(python3 -m hyper2kvm.cli.daemon_ctl stats --json)
QUEUE=$(echo "$STATS" | jq -r '.stats.current_queue_depth')
if [ "$QUEUE" -gt 10 ]; then
echo "WARNING: Queue depth is $QUEUE"
exit 1
fi
echo "OK: Daemon healthy, queue depth $QUEUE"
exit 0
Send alerts via webhooks or email for failures, stalls, and optionally successes.
notifications:
enabled: true
# When to notify
on_success: false # Usually too noisy
on_failure: true # Alert on failures
on_stalled: true # Alert if idle >60min with items in queue
# Webhook (Slack/Discord/Generic)
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
webhook_type: "slack" # slack, discord, or generic
# Email (optional)
email_enabled: false
email_smtp_host: "smtp.gmail.com"
email_smtp_port: 587
email_from: "hyper2kvm@example.com"
email_to: "admin@example.com"
email_username: "hyper2kvm@example.com"
email_password: "app-specific-password"
Slack:
webhook_type: "slack"
webhook_url: "https://hooks.slack.com/services/T00/B00/XXX"
Discord:
webhook_type: "discord"
webhook_url: "https://discord.com/api/webhooks/123/abc"
Generic (any HTTP endpoint):
webhook_type: "generic"
webhook_url: "https://your-monitoring.com/webhooks/hyper2kvm"
Success (if enabled):
{
"event": "conversion_success",
"filename": "vm1.vmdk",
"duration_seconds": 125.3,
"output_path": "/var/lib/hyper2kvm/output/2026-01-17/vm1",
"timestamp": "2026-01-17T10:30:00Z"
}
Failure:
{
"event": "conversion_failure",
"filename": "vm2.ova",
"error": "Disk image corrupted",
"retry_count": 2,
"timestamp": "2026-01-17T11:00:00Z"
}
Stalled:
{
"event": "daemon_stalled",
"queue_depth": 5,
"idle_minutes": 75,
"last_activity": "2026-01-17T09:45:00Z",
"timestamp": "2026-01-17T11:00:00Z"
}
Gmail Setup:
email_smtp_host: "smtp.gmail.com"
email_smtp_port: 587
email_username: "your-email@gmail.com"
email_password: "your-app-specific-password"
# Trigger a test failure by copying an invalid file
echo "corrupt" > /var/lib/hyper2kvm/queue/test.vmdk
# Check logs for notification attempt
journalctl -u hyper2kvm | grep -i "webhook\|email"
Tracks processed files in SQLite database to prevent reprocessing duplicates.
enable_deduplication: true
# Hash-based deduplication (slower but catches renamed files)
deduplication_use_md5: false
Method 1: Filename + Size (default)
Method 2: MD5 Hash
{output_dir}/.daemon/deduplication.db
βοΈ Skipping duplicate: vm1.vmdk (originally processed: 2026-01-17T09:00:00Z)
# Database is automatically cleaned up every 90 days on daemon shutdown
# Manual cleanup can be done with SQLite:
sqlite3 /var/lib/hyper2kvm/output/.daemon/deduplication.db
# View all processed files
sqlite> SELECT filename, processed_at, status FROM processed_files ORDER BY processed_at DESC LIMIT 10;
# Delete old records
sqlite> DELETE FROM processed_files WHERE processed_at < datetime('now', '-90 days');
# Check database size
sqlite> .dbinfo
# Check deduplication stats programmatically
from pathlib import Path
import sqlite3
db_path = Path('/var/lib/hyper2kvm/output/.daemon/deduplication.db')
conn = sqlite3.connect(str(db_path))
cursor = conn.execute("""
SELECT status, COUNT(*) as count
FROM processed_files
GROUP BY status
""")
for row in cursor:
print(f"{row[0]}: {row[1]} files")
Detailed error information with actionable suggestions for faster troubleshooting.
{watch_dir}/.errors/
βββ failed-vm.vmdk # Failed disk file
βββ failed-vm.vmdk.error.json # Detailed error context
{
"filename": "failed-vm.vmdk",
"filepath": "/var/lib/hyper2kvm/queue/failed-vm.vmdk",
"file_size_mb": 50.5,
"timestamp": "2026-01-17T11:30:00Z",
"error": "Disk image corrupted: invalid VMDK descriptor",
"phase": "disk_extraction",
"exception_traceback": "Traceback (most recent call last):\n...",
"suggestion": "Re-export the VM from source, the disk image may be corrupted",
"system_info": {
"python_version": "3.11.2",
"disk_space_free_gb": 250.5
}
}
Tracks where in the pipeline the error occurred:
initialization - Setup phasefile_type_detection - File extension validationargument_preparation - Config preparationoutput_directory_creation - Directory setupconversion - Main conversion pipelinecompletion - Final stepsThe system provides context-aware suggestions based on the error:
| Error Pattern | Suggestion |
|---|---|
| Disk full/space | Free up disk space or configure different output directory |
| Permission denied | Check file permissions and ensure daemon has required access |
| Corrupt/invalid | Re-export the VM from source, the disk image may be corrupted |
| Network/timeout | Check network connectivity to source system |
| Memory error | Reduce max_concurrent_jobs or increase system memory |
# 1. List failed files
ls -lh /var/lib/hyper2kvm/queue/.errors/
# 2. Read error context
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq '.'
# 3. Follow suggestion
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq -r '.suggestion'
# 4. Check system state if needed
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq '.system_info'
# 5. Review full traceback if needed
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq -r '.exception_traceback'
See examples/yaml/50-daemon/daemon-enhanced.yaml for a fully-commented configuration demonstrating all features.
| Feature | CPU Overhead | Memory Overhead | Disk I/O Overhead |
|---|---|---|---|
| Concurrent Processing | None (better utilization) | +50MB per worker | Parallel I/O (faster) |
| File Completion | Minimal | Negligible | None |
| Statistics | Minimal | ~1-2MB | ~100KB/hour |
| Retry | Minimal | Negligible | None |
| Control API | Minimal | ~1MB | None |
| Notifications | Minimal | Negligible | Network only |
| Deduplication (filename) | Minimal | ~1MB | ~10KB/file |
| Deduplication (MD5) | Moderate (hash calc) | ~1MB | Full file read |
| Error Context | Minimal | ~100KB/error | ~10KB/error |
Total overhead: ~5-10MB RAM, negligible CPU except MD5 deduplication
max_concurrent_jobs for your system