hyper2kvm

hyper2kvm Daemon Mode - Advanced Features

This document details the 8 major enhancements to daemon mode for production deployments.

Enhancement Summary

#	Feature	Benefit	Config
1	Concurrent Processing	3-5x throughput	`max_concurrent_jobs: 3`
2	File Completion Detection	Prevents corruption	`file_stable_timeout: 30`
3	Statistics Tracking	Performance monitoring	Auto-enabled
4	Retry Mechanism	Handles transient failures	`retry_policy: {...}`
5	Control API	Runtime management	Auto-enabled
6	Notifications	Alerting	`notifications: {...}`
7	Deduplication	Prevents reprocessing	`enable_deduplication: true`
8	Error Context	Better troubleshooting	Auto-enabled

1. Concurrent Processing

Overview

Process multiple VMs simultaneously using a thread pool, dramatically increasing throughput.

Configuration

max_concurrent_jobs: 3  # Adjust based on system resources

Benefits

3-5x faster for multiple small/medium VMs
Better resource utilization (CPU, disk I/O)
Configurable based on system capacity

Resource Planning

System	Recommended Workers	Notes
4 CPU, 8GB RAM	2	Conservative
8 CPU, 16GB RAM	3-4	Balanced
16+ CPU, 32GB+ RAM	4-6	Aggressive

Monitoring

# Check current queue depth
python3 -m hyper2kvm.cli.daemon_ctl stats | grep "Queue Depth"

# Monitor CPU usage
top -p $(pgrep -f "hyper2kvm.*daemon")

2. File Completion Detection

Overview

Waits for files to be completely written before processing, preventing corruption from incomplete transfers.

How It Works

File appears in watch directory
Daemon checks file size every second
Waits for size to stabilize (3 consecutive identical size checks)
Proceeds with processing only when stable
Times out after file_stable_timeout seconds

Configuration

file_stable_timeout: 30  # Wait up to 30 seconds for stability

Use Cases

When to increase timeout:

Large files (100GB+) over slow networks
Network file systems (NFS, CIFS)
Files being written incrementally

When to decrease timeout:

Fast local disk copies
Known-fast transfers
Testing with small files

Example

15:01 File still growing: large-vm.vmdk (10GB)
15:02 File still growing: large-vm.vmdk (25GB)
15:03 File still growing: large-vm.vmdk (50GB)
15:04 File stable: large-vm.vmdk (50GB)
15:04 📥 New file queued: large-vm.vmdk

3. Statistics Tracking

Overview

Comprehensive metrics tracking for monitoring, troubleshooting, and capacity planning.

Metrics Tracked

Overall:

Total processed/failed
Success rate
Average processing time
Uptime

Per File Type:

Count by extension (.vmdk, .ova, etc.)
Success rate by type
Average time by type

Current State:

Queue depth
Active jobs
Recent completions

Accessing Statistics

Method 1: Control CLI

python3 -m hyper2kvm.cli.daemon_ctl stats

# Output:
📊 Daemon Statistics:
  Uptime: 12.5 hours
  Processed: 45
  Failed: 2
  Success Rate: 95.7%
  Avg Processing Time: 125.3s
  Queue Depth: 3

  By File Type:
    vmdk: 30 ok, 1 failed, 96.8% success
    ova: 15 ok, 1 failed, 93.8% success

Method 2: JSON File

cat /var/lib/hyper2kvm/output/.daemon/stats.json

Method 3: Signal (live daemon)

# Send SIGUSR1 to print stats to log
kill -USR1 $(pgrep -f "hyper2kvm.*daemon")

# View in journal
journalctl -u hyper2kvm -n 50

Automatic Reporting

Stats are automatically:

Saved every 60 seconds to stats.json
Printed to logs every hour
Included in final shutdown summary

Monitoring Integration

# Example: Prometheus exporter
import json
from pathlib import Path

stats = json.load(open('/var/lib/hyper2kvm/output/.daemon/stats.json'))

print(f"hyper2kvm_processed_total {stats['total_processed']}")
print(f"hyper2kvm_failed_total {stats['total_failed']}")
print(f"hyper2kvm_success_rate {stats['success_rate_percent']}")
print(f"hyper2kvm_queue_depth {stats['current_queue_depth']}")

4. Retry Mechanism

Overview

Automatically retries failed conversions with exponential backoff, handling transient failures.

Configuration

retry_policy:
  enabled: true
  max_retries: 3
  retry_delay: 300  # 5 minutes initial delay
  backoff_multiplier: 2.0  # Doubles each retry

Retry Schedule Example

Attempt	Delay	Time
1 (initial)	0s	10:00:00
2	5 minutes	10:05:00
3	10 minutes	10:15:00
4	20 minutes	10:35:00
Final failure	-	10:35:00

When Retries Help

Good candidates for retry:

Temporary network issues
Disk I/O timeouts
Rate limiting
Transient libguestfs errors
Resource exhaustion (temporary)

Not suitable for retry:

Corrupted disk images
Unsupported formats
Permission errors
Disk full errors

Monitoring Retries

# View retry statistics
python3 -m hyper2kvm.cli.daemon_ctl stats | grep retried

# Check logs for retry attempts
journalctl -u hyper2kvm | grep -i retry

Retry Behavior

First failure → Schedule retry in 5 minutes
File stays in .errors/ directory
After delay, file moved back to watch directory
Processing attempted again
If successful → Moves to .processed/
If fails again → Next retry scheduled with longer delay
After max retries → Permanently moved to .errors/

5. Health Check & Control API

Overview

Unix socket-based control interface for runtime management without restarting.

Control Socket Location

{output_dir}/.daemon/control.sock

Default: /var/lib/hyper2kvm/output/.daemon/control.sock

Available Commands

Command	Description	Use Case
`status`	Get running/paused state	Health checks
`stats`	Get full statistics	Monitoring
`pause`	Pause processing	Maintenance window
`resume`	Resume processing	After maintenance
`drain`	Finish queue and exit	Graceful shutdown
`stop`	Stop immediately	Emergency shutdown

Using the Control CLI

# Basic usage
python3 -m hyper2kvm.cli.daemon_ctl status

# Specify custom output directory
python3 -m hyper2kvm.cli.daemon_ctl --output-dir /custom/path status

# Get JSON output
python3 -m hyper2kvm.cli.daemon_ctl stats --json

# Full path to socket (if non-standard)
python3 -m hyper2kvm.cli.daemon_ctl --socket /path/to/control.sock status

Examples

Pause for Maintenance:

# Pause processing
python3 -m hyper2kvm.cli.daemon_ctl pause
# ✅ Daemon paused

# Perform maintenance...
# ...

# Resume
python3 -m hyper2kvm.cli.daemon_ctl resume
# ✅ Daemon resumed

Graceful Drain:

# Drain queue and exit when empty
python3 -m hyper2kvm.cli.daemon_ctl drain
# ✅ Draining queue

# Daemon will:
# 1. Stop accepting new files
# 2. Finish processing queued files
# 3. Exit cleanly

Emergency Stop:

python3 -m hyper2kvm.cli.daemon_ctl stop
# ✅ Stopping daemon

Integration with Monitoring

#!/bin/bash
# Health check script for monitoring systems

SOCKET="/var/lib/hyper2kvm/output/.daemon/control.sock"

if [ ! -S "$SOCKET" ]; then
    echo "CRITICAL: Daemon not running"
    exit 2
fi

STATS=$(python3 -m hyper2kvm.cli.daemon_ctl stats --json)
QUEUE=$(echo "$STATS" | jq -r '.stats.current_queue_depth')

if [ "$QUEUE" -gt 10 ]; then
    echo "WARNING: Queue depth is $QUEUE"
    exit 1
fi

echo "OK: Daemon healthy, queue depth $QUEUE"
exit 0

6. Notifications

Overview

Send alerts via webhooks or email for failures, stalls, and optionally successes.

Configuration

notifications:
  enabled: true

  # When to notify
  on_success: false  # Usually too noisy
  on_failure: true   # Alert on failures
  on_stalled: true   # Alert if idle >60min with items in queue

  # Webhook (Slack/Discord/Generic)
  webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
  webhook_type: "slack"  # slack, discord, or generic

  # Email (optional)
  email_enabled: false
  email_smtp_host: "smtp.gmail.com"
  email_smtp_port: 587
  email_from: "hyper2kvm@example.com"
  email_to: "admin@example.com"
  email_username: "hyper2kvm@example.com"
  email_password: "app-specific-password"

Webhook Types

Slack:

webhook_type: "slack"
webhook_url: "https://hooks.slack.com/services/T00/B00/XXX"

Discord:

webhook_type: "discord"
webhook_url: "https://discord.com/api/webhooks/123/abc"

Generic (any HTTP endpoint):

webhook_type: "generic"
webhook_url: "https://your-monitoring.com/webhooks/hyper2kvm"

Notification Events

Success (if enabled):

{
  "event": "conversion_success",
  "filename": "vm1.vmdk",
  "duration_seconds": 125.3,
  "output_path": "/var/lib/hyper2kvm/output/2026-01-17/vm1",
  "timestamp": "2026-01-17T10:30:00Z"
}

Failure:

{
  "event": "conversion_failure",
  "filename": "vm2.ova",
  "error": "Disk image corrupted",
  "retry_count": 2,
  "timestamp": "2026-01-17T11:00:00Z"
}

Stalled:

{
  "event": "daemon_stalled",
  "queue_depth": 5,
  "idle_minutes": 75,
  "last_activity": "2026-01-17T09:45:00Z",
  "timestamp": "2026-01-17T11:00:00Z"
}

Email Notifications

Gmail Setup:

Enable 2FA on Google account
Generate app-specific password
Use in configuration:

email_smtp_host: "smtp.gmail.com"
email_smtp_port: 587
email_username: "your-email@gmail.com"
email_password: "your-app-specific-password"

Testing Notifications

# Trigger a test failure by copying an invalid file
echo "corrupt" > /var/lib/hyper2kvm/queue/test.vmdk

# Check logs for notification attempt
journalctl -u hyper2kvm | grep -i "webhook\|email"

7. File Deduplication

Overview

Tracks processed files in SQLite database to prevent reprocessing duplicates.

Configuration

enable_deduplication: true

# Hash-based deduplication (slower but catches renamed files)
deduplication_use_md5: false

Deduplication Methods

Method 1: Filename + Size (default)

Fast
Catches exact copies
Won’t detect renamed files

Method 2: MD5 Hash

Slower (reads entire file)
Catches renamed files with same content
More reliable but impacts performance

Database Location

{output_dir}/.daemon/deduplication.db

How It Works

New file appears
Check database for:
- Same filename + file size OR
- Same MD5 hash (if enabled)
If match found → Skip processing, log as duplicate
If no match → Process normally
After processing → Record in database

Example Output

⏭️ Skipping duplicate: vm1.vmdk (originally processed: 2026-01-17T09:00:00Z)

Database Maintenance

# Database is automatically cleaned up every 90 days on daemon shutdown
# Manual cleanup can be done with SQLite:

sqlite3 /var/lib/hyper2kvm/output/.daemon/deduplication.db

# View all processed files
sqlite> SELECT filename, processed_at, status FROM processed_files ORDER BY processed_at DESC LIMIT 10;

# Delete old records
sqlite> DELETE FROM processed_files WHERE processed_at < datetime('now', '-90 days');

# Check database size
sqlite> .dbinfo

Statistics

# Check deduplication stats programmatically
from pathlib import Path
import sqlite3

db_path = Path('/var/lib/hyper2kvm/output/.daemon/deduplication.db')
conn = sqlite3.connect(str(db_path))

cursor = conn.execute("""
    SELECT status, COUNT(*) as count
    FROM processed_files
    GROUP BY status
""")

for row in cursor:
    print(f"{row[0]}: {row[1]} files")

8. Enhanced Error Context

Overview

Detailed error information with actionable suggestions for faster troubleshooting.

Error Files Location

{watch_dir}/.errors/
  ├── failed-vm.vmdk           # Failed disk file
  └── failed-vm.vmdk.error.json  # Detailed error context

Error JSON Structure

{
  "filename": "failed-vm.vmdk",
  "filepath": "/var/lib/hyper2kvm/queue/failed-vm.vmdk",
  "file_size_mb": 50.5,
  "timestamp": "2026-01-17T11:30:00Z",
  "error": "Disk image corrupted: invalid VMDK descriptor",
  "phase": "disk_extraction",
  "exception_traceback": "Traceback (most recent call last):\n...",
  "suggestion": "Re-export the VM from source, the disk image may be corrupted",
  "system_info": {
    "python_version": "3.11.2",
    "disk_space_free_gb": 250.5
  }
}

Error Phases

Tracks where in the pipeline the error occurred:

initialization - Setup phase
file_type_detection - File extension validation
argument_preparation - Config preparation
output_directory_creation - Directory setup
conversion - Main conversion pipeline
completion - Final steps

Actionable Suggestions

The system provides context-aware suggestions based on the error:

Error Pattern	Suggestion
Disk full/space	Free up disk space or configure different output directory
Permission denied	Check file permissions and ensure daemon has required access
Corrupt/invalid	Re-export the VM from source, the disk image may be corrupted
Network/timeout	Check network connectivity to source system
Memory error	Reduce max_concurrent_jobs or increase system memory

Troubleshooting Workflow

# 1. List failed files
ls -lh /var/lib/hyper2kvm/queue/.errors/

# 2. Read error context
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq '.'

# 3. Follow suggestion
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq -r '.suggestion'

# 4. Check system state if needed
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq '.system_info'

# 5. Review full traceback if needed
cat /var/lib/hyper2kvm/queue/.errors/failed-vm.vmdk.error.json | jq -r '.exception_traceback'

Complete Configuration Example

See examples/yaml/50-daemon/daemon-enhanced.yaml for a fully-commented configuration demonstrating all features.

Performance Impact

Feature	CPU Overhead	Memory Overhead	Disk I/O Overhead
Concurrent Processing	None (better utilization)	+50MB per worker	Parallel I/O (faster)
File Completion	Minimal	Negligible	None
Statistics	Minimal	~1-2MB	~100KB/hour
Retry	Minimal	Negligible	None
Control API	Minimal	~1MB	None
Notifications	Minimal	Negligible	Network only
Deduplication (filename)	Minimal	~1MB	~10KB/file
Deduplication (MD5)	Moderate (hash calc)	~1MB	Full file read
Error Context	Minimal	~100KB/error	~10KB/error

Total overhead: ~5-10MB RAM, negligible CPU except MD5 deduplication

hyper2kvm

hyper2kvm Daemon Mode - Advanced Features

Enhancement Summary

1. Concurrent Processing

Overview

Configuration

Benefits

Resource Planning

Monitoring

2. File Completion Detection

Overview

How It Works

Configuration

Use Cases

Example

3. Statistics Tracking

Overview

Metrics Tracked

Accessing Statistics

Automatic Reporting

Monitoring Integration

4. Retry Mechanism

Overview

Configuration

Retry Schedule Example

When Retries Help

Monitoring Retries

Retry Behavior

5. Health Check & Control API

Overview

Control Socket Location

Available Commands

Using the Control CLI

Examples

Integration with Monitoring

6. Notifications

Overview

Configuration

Webhook Types

Notification Events

Email Notifications

Testing Notifications

7. File Deduplication

Overview

Configuration

Deduplication Methods

Database Location

How It Works

Example Output

Database Maintenance

Statistics

8. Enhanced Error Context

Overview

Error Files Location

Error JSON Structure

Error Phases

Actionable Suggestions

Troubleshooting Workflow

Complete Configuration Example

Performance Impact

Production Checklist

See Also