This document outlines potential features and enhancements for the HyperSDK ecosystem.
Priority: CRITICAL
Problem: Jobs are lost on daemon restart, no historical tracking
Implementation:
// Add SQLite database backend
type JobStore interface {
SaveJob(job *Job) error
LoadJob(id string) (*Job, error)
ListJobs(filter JobFilter) ([]*Job, error)
UpdateJobStatus(id string, status JobStatus) error
}
// Schema
CREATE TABLE jobs (
id TEXT PRIMARY KEY,
name TEXT,
vm_path TEXT,
status TEXT,
created_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
progress_json TEXT,
error TEXT
);
CREATE INDEX idx_status ON jobs(status);
CREATE INDEX idx_created_at ON jobs(created_at DESC);
Benefits:
Effort: 2-3 weeks
Priority: HIGH
Implementation:
// Add /metrics endpoint
import "github.com/prometheus/client_golang/prometheus"
var (
jobsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "hypersdk_jobs_total",
Help: "Total number of jobs",
},
[]string{"status", "provider"},
)
jobDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "hypersdk_job_duration_seconds",
Help: "Job duration in seconds",
},
[]string{"vm_type", "provider"},
)
exportedVMs = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "hypersdk_vms_exported_total",
Help: "Total VMs exported",
},
)
exportedBytes = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "hypersdk_bytes_exported_total",
Help: "Total bytes exported",
},
)
)
Metrics to track:
Grafana Dashboard: Create pre-built dashboard
Effort: 1-2 weeks
Priority: MEDIUM
Current: Export as OVF (separate files) Enhancement: Package as single OVA file
Implementation:
# Create TAR archive with specific structure
ovftool --acceptAllEulas export.ovf export.ova
# Or native implementation:
tar -cf vm.ova vm.ovf vm-disk1.vmdk vm.mf
Benefits:
Effort: 1 week
Priority: HIGH
Target: Windows Server Hyper-V
Implementation:
// providers/hyperv/client.go
type HyperVProvider struct {
host string
username string
password string
client *winrm.Client
}
// Use WinRM + PowerShell for VM operations
func (h *HyperVProvider) ListVMs() ([]VMInfo, error) {
script := "Get-VM | ConvertTo-Json"
return h.runPowerShell(script)
}
Capabilities:
Testing: Requires Windows Server lab environment
Effort: 3-4 weeks
Priority: MEDIUM
Target: Proxmox Virtual Environment (PVE)
Implementation:
// providers/proxmox/client.go
type ProxmoxProvider struct {
apiURL string
token string
client *proxmox.Client
}
// Use Proxmox REST API
func (p *ProxmoxProvider) ListVMs() ([]VMInfo, error) {
return p.client.GetVMs()
}
API Endpoints:
/api2/json/cluster/resources?type=vm/api2/json/nodes/{node}/qemu/{vmid}/configEffort: 2-3 weeks
Priority: MEDIUM
Implementation:
// providers/aws/client.go
import "github.com/aws/aws-sdk-go-v2/service/ec2"
type AWSProvider struct {
region string
client *ec2.Client
}
func (a *AWSProvider) ListVMs() ([]VMInfo, error) {
result, err := a.client.DescribeInstances(ctx, &ec2.DescribeInstancesInput{})
// Convert EC2 instances to VMInfo
}
func (a *AWSProvider) ExportVM(vmID string) error {
// Create AMI
// Export to S3
// Download OVF/VMDK
}
Challenges:
Effort: 4-5 weeks
Priority: HIGH
Configuration:
# config.yaml
webhooks:
enabled: true
endpoints:
- url: https://slack.com/api/incoming/webhook
events: [job.completed, job.failed]
headers:
Authorization: Bearer token123
- url: https://myapp.com/api/migration-complete
events: [job.completed]
retry: 3
timeout: 10s
Payload:
{
"event": "job.completed",
"timestamp": "2026-01-17T10:30:00Z",
"job_id": "abc123",
"vm_name": "web-server-01",
"status": "completed",
"duration_seconds": 1234,
"exported_files": [
"/exports/web-server-01/vm.ovf",
"/exports/web-server-01/disk.vmdk"
]
}
Effort: 1-2 weeks
Priority: MEDIUM
Use Cases:
Implementation:
// Add cron-like scheduling
type ScheduledJob struct {
ID string
JobTemplate JobDefinition
Schedule string // "0 2 * * *" (cron format)
Enabled bool
NextRun time.Time
}
// Scheduler using robfig/cron
func (s *Scheduler) Start() {
c := cron.New()
c.AddFunc(job.Schedule, func() {
s.executeScheduledJob(job)
})
c.Start()
}
API Endpoints:
POST /schedules # Create scheduled job
GET /schedules # List all schedules
GET /schedules/{id} # Get specific schedule
DELETE /schedules/{id} # Delete schedule
PUT /schedules/{id} # Update schedule
Effort: 2-3 weeks
Priority: MEDIUM
Templates:
# templates/linux-vm-export.yaml
name: "-export"
vm_path: ""
output_path: "/exports/"
options:
parallel_downloads: 8
remove_cdrom: true
pre_hooks:
- type: shutdown
timeout: 300
- type: snapshot
name: "pre-migration-"
post_hooks:
- type: convert
format: qcow2
- type: upload
destination: "s3://migrations/"
Workflow DAG:
workflow:
name: "datacenter-migration"
steps:
- id: export-dbs
type: batch
vms: [db-01, db-02, db-03]
parallel: false # Sequential
- id: export-apps
type: batch
vms: [app-01, app-02]
parallel: true
depends_on: [export-dbs]
- id: verify
type: script
command: "./verify-migration.sh"
depends_on: [export-apps]
- id: notify
type: webhook
url: https://hooks.slack.com/...
depends_on: [verify]
Effort: 3-4 weeks
Priority: HIGH
Technology Stack:
Pages:
Implementation:
/web
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── Dashboard.tsx
│ │ │ ├── VMList.tsx
│ │ │ ├── JobHistory.tsx
│ │ │ └── JobProgress.tsx
│ │ ├── api/
│ │ │ └── client.ts
│ │ └── App.tsx
│ └── package.json
└── embed.go # Embed static files in Go binary
Effort: 6-8 weeks
Priority: MEDIUM
Implementation:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
)
func (e *Exporter) ExportVM(ctx context.Context, vm VMInfo) error {
ctx, span := otel.Tracer("hypersdk").Start(ctx, "export_vm")
defer span.End()
span.SetAttributes(
attribute.String("vm.name", vm.Name),
attribute.Int("vm.cpus", vm.NumCPU),
attribute.Int64("vm.memory", vm.MemoryMB),
)
// Export logic with child spans
ctx, downloadSpan := otel.Tracer("hypersdk").Start(ctx, "download_disks")
// ...
downloadSpan.End()
}
Spans to track:
Backend: Jaeger or Tempo
Effort: 2-3 weeks
Priority: HIGH
Problem: govmomi uses HTTPS for downloads (slower) Solution: VMware VDDK (Virtual Disk Development Kit) for native access
Performance Gain: 2-5x faster exports
Implementation:
// Use CGO to call VDDK C library
// #cgo LDFLAGS: -lvixDiskLib
// #include <vixDiskLib.h>
import "C"
func (v *VDDKExporter) OpenDisk(path string) error {
connection := C.VixDiskLib_ConnectEx(...)
handle := C.VixDiskLib_Open(connection, path, ...)
// Read disk blocks efficiently
}
Challenges:
Effort: 4-6 weeks
Priority: MEDIUM
Implementation:
type ConnectionPool struct {
pools map[string]*vSpherePool
mu sync.RWMutex
}
type vSpherePool struct {
conns chan *vim25.Client
max int
}
func (p *ConnectionPool) GetConnection(url string) (*vim25.Client, error) {
pool := p.getOrCreatePool(url)
select {
case conn := <-pool.conns:
return conn, nil
default:
return p.createConnection(url)
}
}
Benefits:
Effort: 1-2 weeks
Priority: MEDIUM
Use Case: Export only changed blocks since last export
Implementation:
type IncrementalExport struct {
BaseSnapshot string
CurrentSnapshot string
ChangeBlockIDs []string
}
// Use vSphere Changed Block Tracking (CBT)
func (e *Exporter) ExportIncremental(vm VMInfo, baseSnap string) error {
changes := e.queryChangedDiskAreas(vm, baseSnap)
for _, block := range changes {
e.downloadBlock(block)
}
}
Requirements:
Effort: 3-4 weeks
Priority: HIGH (for enterprise use)
Roles:
roles:
admin:
- job.create
- job.cancel
- job.delete
- vm.list
- vm.export
- settings.manage
operator:
- job.create
- job.view
- vm.list
- vm.export
viewer:
- job.view
- vm.list
Implementation:
type User struct {
ID string
Username string
Roles []string
}
func (a *Authorizer) Authorize(user User, action string) bool {
for _, role := range user.Roles {
if a.roleHasPermission(role, action) {
return true
}
}
return false
}
Effort: 3-4 weeks
Priority: HIGH
Implementation:
type AuditLog struct {
Timestamp time.Time
User string
Action string
Resource string
Result string
IP string
Details map[string]interface{}
}
// Log all API calls
func (a *AuditLogger) Log(ctx context.Context, action string) {
user := ctx.Value("user")
a.write(AuditLog{
Timestamp: time.Now(),
User: user.Username,
Action: action,
// ...
})
}
Storage: Separate audit database, immutable logs
Effort: 2-3 weeks
Priority: MEDIUM
Encrypt:
Implementation:
import "golang.org/x/crypto/nacl/secretbox"
type SecretStore struct {
key [32]byte
}
func (s *SecretStore) Encrypt(plaintext []byte) ([]byte, error) {
nonce := randomNonce()
encrypted := secretbox.Seal(nonce[:], plaintext, &nonce, &s.key)
return encrypted, nil
}
Effort: 2 weeks
Priority: LOW (complex, limited use cases)
Target: vMotion-style live migration to KVM
Challenges:
Research Phase: 3-6 months
Priority: LOW (experimental)
Capabilities:
Implementation:
# ML model training
import scikit-learn
model = RandomForestRegressor()
model.fit(historical_exports, durations)
# Predict new export duration
predicted_time = model.predict(vm_features)
Research Phase: 2-4 months
Priority: LOW
Features:
Effort: 6+ months (major feature)
| Feature | Priority | Effort | Impact | Quarter |
|---|---|---|---|---|
| Job Persistence | CRITICAL | 2-3w | High | Q1 2026 |
| Prometheus Metrics | HIGH | 1-2w | High | Q1 2026 |
| OVA Format | MEDIUM | 1w | Medium | Q1 2026 |
| Webhook Notifications | HIGH | 1-2w | High | Q3 2026 |
| Hyper-V Provider | HIGH | 3-4w | High | Q2 2026 |
| Proxmox Provider | MEDIUM | 2-3w | Medium | Q2 2026 |
| Web Dashboard | HIGH | 6-8w | Very High | Q4 2026 |
| Job Scheduling | MEDIUM | 2-3w | Medium | Q3 2026 |
| VDDK Integration | HIGH | 4-6w | Very High | 2027 |
| RBAC | HIGH | 3-4w | High | 2027 |
| OpenTelemetry | MEDIUM | 2-3w | Medium | Q4 2026 |
| AWS Provider | MEDIUM | 4-5w | Medium | Q2 2026 |
Easy entry points for external contributors:
Track these KPIs as features are added:
HyperSDK has a solid foundation with vSphere support and excellent CLI/API design. The roadmap focuses on:
Recommended First Steps:
This provides immediate value (persistence + metrics) while building towards a comprehensive multi-cloud migration platform.