Remote Actions and Hooks
The internal/actions package enables platform-triggered remote action execution on plexd mesh nodes. It supports built-in operations (diagnostics, connectivity checks) and custom hook scripts with SHA-256 integrity verification. Action results are reported back to the control plane.
Data Flow
Control Plane (SSE)
│
▼
┌──────────────────────┐
│ HandleActionRequest │ api.EventHandler for EventActionRequest
│ (handler.go) │
└──────────┬───────────┘
│ parse ActionRequest
▼
┌──────────────────────┐
│ Executor.Execute │
│ (executor.go) │
└──────────┬───────────┘
│
┌─────┴──────────────────────────────────────┐
│ 1. Check shuttingDown │
│ 2. Check duplicate execution_id │
│ 3. Check MaxConcurrent │
│ 4. Look up action (builtins → hooks) │
│ 5. Send ExecutionAck (accepted / rejected) │
└──────────┬─────────────────────────────────┘
│ if accepted
▼
┌───────────────┐
│ goroutine │
│ runAction │
└───┬───────┬───┘
│ │
builtin │ │ hook
▼ ▼
┌─────────┐ ┌─────────────────────────────┐
│runBuiltin│ │runHook │
│ call fn │ │ 1. Path traversal check │
└────┬────┘ │ 2. File existence check │
│ │ 3. integrity.VerifyHook │
│ │ 4. exec.CommandContext │
│ │ 5. Capture stdout/stderr │
│ │ 6. Truncate to MaxOutputBytes │
│ └──────────┬──────────────────────┘
│ │
└────────┬────────┘
│
▼
┌────────────────┐
│ ReportResult │ POST /v1/nodes/{id}/executions/{eid}/result
└────────────────┘Config
Config holds configuration for remote action execution.
| Field | Type | Default | Description |
|---|---|---|---|
Enabled | bool | true | Whether action execution is active |
HooksDir | string | /etc/plexd/hooks | Directory containing hook scripts |
MaxConcurrent | int | 5 | Max simultaneous action executions |
MaxActionTimeout | time.Duration | 10m | Max duration for a single action |
MaxOutputBytes | int64 | 1 MiB | Max output capture size per action |
cfg := actions.Config{
HooksDir: "/etc/plexd/hooks",
}
cfg.ApplyDefaults() // Enabled=true, HooksDir=/etc/plexd/hooks, MaxConcurrent=5, MaxActionTimeout=10m, MaxOutputBytes=1MiB
if err := cfg.Validate(); err != nil {
log.Fatal(err)
}ApplyDefaults uses zero-value detection: on a fully zero-valued Config, all numeric fields being zero triggers all defaults including Enabled = true. If any numeric field is already set (indicating explicit construction), Enabled is left as-is.
Validation Rules
| Field | Rule | Error Message |
|---|---|---|
MaxConcurrent | >= 1 when Enabled=true | actions: config: MaxConcurrent must be at least 1 |
MaxActionTimeout | >= 10s when Enabled=true | actions: config: MaxActionTimeout must be at least 10s |
MaxOutputBytes | >= 1024 when Enabled=true | actions: config: MaxOutputBytes must be at least 1024 |
Validation is skipped entirely when Enabled is false.
Executor
Central orchestrator for action execution, concurrency control, and result reporting.
Constructor
func NewExecutor(cfg Config, reporter ActionReporter, verifier HookVerifier, logger *slog.Logger) *Executor| Parameter | Description |
|---|---|
cfg | Actions configuration |
reporter | Control plane adapter for acks and results |
verifier | Hook integrity verification adapter |
logger | Structured logger (log/slog) |
Logger is tagged with component=actions.
Methods
| Method | Signature | Description |
|---|---|---|
RegisterBuiltin | (name, description string, params []api.ActionParam, fn BuiltinFunc) | Register a built-in action |
SetHooks | (hooks []api.HookInfo) | Set the discovered hooks snapshot |
Capabilities | () ([]api.ActionInfo, []api.HookInfo) | Return registered builtins and hooks for reporting |
Execute | (ctx context.Context, nodeID string, req api.ActionRequest) | Main entry point for action execution |
Shutdown | (ctx context.Context) | Cancel all running executions, reject new ones |
ActiveCount | () int | Number of currently running actions |
Execute Flow
- Check shutting down: if
shuttingDown, reject withreason=shutting_down - Check duplicate: if
executionIDalready active, reject withreason=duplicate_execution_id - Check concurrency: if
len(active) >= MaxConcurrent, reject withreason=max_concurrent_reached - Look up action: search builtins map first, then hooks list
- Unknown action: reject with
reason=unknown_action - Accept: send
ExecutionAck{Status: "accepted"}viaActionReporter.AckExecution - Execute: launch goroutine calling
runActionwith timeout context
runAction (goroutine)
- Parse timeout from
ActionRequest.Timeout(capped byConfig.MaxActionTimeout) - Dispatch to
runBuiltinorrunHook - Determine status:
success,failed(non-zero exit),timeout,cancelled,error - Build
api.ExecutionResultwithExecutionID,Status,ExitCode,Stdout,Stderr,Duration,FinishedAt,TriggeredBy - Report via
ActionReporter.ReportResult - Remove from active map
runHook
- Path traversal prevention: reject names containing
/,\, or.. - File existence:
os.Statthe resolved path - Integrity verification: call
HookVerifier.VerifyHook(ctx, nodeID, hookPath, checksum) - Execute:
exec.CommandContextwithWaitDelay=500ms - Environment: minimal env (
PATH,HOME,PLEXD_NODE_ID,PLEXD_EXECUTION_ID) plusPLEXD_PARAM_*vars - Output capture: stdout and stderr captured in buffers, truncated to
MaxOutputBytes
Shutdown
- Sets
shuttingDown = trueunder mutex - Collects all active cancel functions
- Calls each cancel function to cancel running contexts
- Subsequent
Executecalls are rejected withreason=shutting_down
HandleActionRequest
SSE event handler for action_request events. Follows the same closure pattern as tunnel.HandleSSHSessionSetup.
func HandleActionRequest(executor *Executor, nodeID string, logger *slog.Logger) api.EventHandlerReturns an api.EventHandler that:
- Parses
SignedEnvelope.Payloadintoapi.ActionRequest - Returns error on malformed JSON (no ack sent; logged by dispatcher)
- Returns error on missing
execution_id - When
Config.Enabledisfalse: sends rejected ack withreason=actions_disabled - Otherwise: delegates to
Executor.Execute
ActionReporter
Interface abstracting control plane communication for testability.
type ActionReporter interface {
AckExecution(ctx context.Context, nodeID, executionID string, ack api.ExecutionAck) error
ReportResult(ctx context.Context, nodeID, executionID string, result api.ExecutionResult) error
}A production implementation wraps api.ControlPlane.AckExecution and api.ControlPlane.ReportResult.
HookVerifier
Interface abstracting hook integrity verification for testability.
type HookVerifier interface {
VerifyHook(ctx context.Context, nodeID, hookPath, expectedChecksum string) (bool, error)
}The production implementation is integrity.Verifier, which computes SHA-256 of the hook file and compares against the expected checksum from the control plane.
BuiltinFunc
Signature for built-in action implementations.
type BuiltinFunc func(ctx context.Context, params map[string]string) (stdout string, stderr string, exitCode int, err error)Built-in actions do not require integrity verification (they are compiled into the binary).
NodeInfoProvider
Interface for reading mesh state, injected into built-in actions.
type NodeInfoProvider interface {
NodeID() string
MeshIP() string
PeerCount() int
}Built-in Actions
diagnostics.collect
Collects system diagnostics (hostname, OS, architecture, CPU count, memory, disk, load average, kernel version, network interfaces, processes) and returns them as JSON. Gracefully handles missing /proc data by using fallback values.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
include_network | bool | no | true | Include network interface info |
include_processes | bool | no | true | Include process listing |
{
"hostname": "edge-us-west-42",
"os": "linux",
"arch": "amd64",
"cpu_count": 4,
"memory_total": 8589934592,
"disk_total": 107374182400,
"load_avg": "1.50 1.20 0.90 2/150 12345",
"kernel_version": "6.1.0-amd64",
"network_interfaces": "...",
"processes": "..."
}diagnostics.ping_peer
Pings a mesh peer and reports latency. Uses the system ping command with -c <count> -W 3.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
peer_id | string | yes | — | Peer mesh IP address |
count | string | no | 1 | Number of pings (max 10) |
Returns ping output in stdout. Exit code 0 on success, 1 on failure (unreachable or invalid IP).
diagnostics.traceroute_peer
Traceroute to a mesh peer. Uses the system traceroute command with -n -m <max_hops> -w 3 flags.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
peer_id | string | yes | — | Peer mesh IP address |
max_hops | string | no | 15 | Maximum number of hops |
Returns traceroute output in stdout. Exit code 1 if traceroute is not installed.
service.restart
Restarts the plexd service via systemctl restart plexd.service. No parameters required. Exit code 1 if systemctl is not available.
service.reload_config
Sends SIGHUP to the current process to trigger a configuration reload.
{
"status": "reload_signal_sent",
"pid": 12345
}No parameters required.
service.upgrade
Upgrades plexd to a specified version. Downloads the new binary from the control plane's artifact store (GET /v1/artifacts/plexd/{version}/{os}/{arch}), verifies the SHA-256 checksum, atomically replaces the current binary, and triggers a systemd restart.
| Parameter | Type | Required | Description |
|---|---|---|---|
version | string | yes | Target version (e.g. 1.5.0) |
checksum | string | yes | Expected SHA-256 checksum (hex, optional sha256: prefix) |
On checksum mismatch, the upgrade is aborted and the original binary is preserved:
{
"status": "checksum_mismatch",
"message": "expected abc123..., got def456...",
"version": "1.5.0"
}On success:
{
"status": "upgraded",
"version": "1.5.0",
"message": "binary replaced, restarting service"
}system.info
Reports OS, kernel, hardware, and runtime info as JSON.
{
"hostname": "edge-us-west-42",
"os": "linux",
"arch": "amd64",
"go_version": "go1.24.0",
"mesh_ip": "10.100.0.5",
"peer_count": 12,
"node_id": "node-abc123"
}No parameters required.
health.check
Reports the node's health status.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
include_peers | bool | no | true | Include per-peer status |
{
"tunnel_count": 3,
"connected_peers": 5,
"uptime": "2h30m15s",
"last_heartbeat": "2026-02-15T10:30:00Z",
"last_reconcile": "2026-02-15T10:25:00Z",
"status": "healthy"
}Status is "healthy" if tunnel_count > 0, otherwise "degraded".
mesh.reconnect
Triggers mesh reconnection via the reconciler. On success, returns {"status": "reconnected"}. On failure, returns exit code 1 with {"status": "failed", "error": "..."}.
No parameters required.
config.dump
Returns the current effective configuration with sensitive values redacted. Returns the config string in stdout. No parameters required.
logs.snapshot
Captures recent logs from the in-memory ring buffer and returns them as newline-separated text.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
lines | string | no | 100 | Number of lines (max: 10000) |
since | string | no | — | Duration filter (e.g. 5m, 1h) |
Returns newline-separated log lines in stdout.
HookWatcher
Monitors a hooks directory for filesystem changes using fsnotify. Replaces the one-time DiscoverHooks call with a continuous watch loop.
Constructor
func NewHookWatcher(hooksDir string, onChange HookChangeCallback, onIntegrity IntegrityAlertCallback, logger *slog.Logger) *HookWatcher| Parameter | Description |
|---|---|
hooksDir | Directory containing hook scripts |
onChange | Callback invoked with the full hooks list on change |
onIntegrity | Callback invoked when a hook's checksum changes |
logger | Structured logger (log/slog) |
Callbacks
type HookChangeCallback func(hooks []api.HookInfo)
type IntegrityAlertCallback func(hookName, oldChecksum, newChecksum string)Methods
| Method | Signature | Description |
|---|---|---|
Watch | (ctx context.Context) error | Monitor directory; blocks until ctx is cancelled |
Hooks | () []api.HookInfo | Return sorted snapshot of current hooks |
Watch Lifecycle
- Create hooks directory if it does not exist
- Perform initial scan: read all executable files, compute checksums, call
onChange - Start
fsnotifywatcher on the hooks directory - On file create/write/chmod: debounce (200ms), then re-read file, compute checksum, update hooks map, call
onChange - On file remove/rename: debounce, remove from hooks map, call
onChange - On
.jsonsidecar change: debounce, re-read the parent hook's metadata - On checksum change for an existing hook: call
onIntegritywith old and new checksums - On context cancellation: stop all timers, return nil
Integration with Executor
In cmd/plexd/cmd/up.go, the watcher is wired to the executor:
hookWatcher := actions.NewHookWatcher(cfg.Actions.HooksDir, executor.SetHooks, onIntegrityAlert, logger)When hooks change, executor.SetHooks is called, updating Capabilities() output. The Hooks() method satisfies the nodeapi.HookReloader interface.
Local API Endpoints
The node API server (internal/nodeapi) exposes action and hook management endpoints over the Unix socket.
GET /v1/actions
Lists all registered built-in actions and hooks.
Response:
{
"builtin_actions": [
{"name": "diagnostics.collect", "description": "Collect system diagnostics"}
],
"hooks": [
{"name": "deploy.sh", "source": "local", "checksum": "sha256:abc...", "description": "Deploy"}
]
}POST /v1/actions/run
Runs a built-in action synchronously and returns the result. The action provider must implement the LocalActionRunner interface (satisfied by Executor).
Request:
{
"action": "diagnostics.collect",
"parameters": {}
}Response:
{
"status": "success",
"exit_code": 0,
"stdout": "{...}",
"stderr": ""
}Status values: success (exit 0), failed (non-zero exit), error (internal error).
GET /v1/hooks
Lists all registered hooks (subset of GET /v1/actions response).
POST /v1/hooks/reload
Triggers a re-scan of hooks via the HookReloader interface (satisfied by HookWatcher.Hooks()).
Response:
{
"status": "reloaded",
"hooks": [...]
}CLI Commands
plexd actions
Lists available actions via GET /v1/actions over Unix socket. Output is a tab-separated table with TYPE, NAME, and DESCRIPTION columns.
plexd actions run <name>
Runs an action via POST /v1/actions/run. Accepts --param key=value flags for passing parameters.
plexd hooks list
Lists hooks via GET /v1/hooks. Shows NAME, SOURCE, CHECKSUM (truncated to 12 chars), and DESCRIPTION.
plexd hooks verify
Reads hooks via GET /v1/hooks and checks that each hook has a checksum. Reports OK or WARN per hook.
plexd hooks reload
Triggers a hook re-scan via POST /v1/hooks/reload. Reports the status and hook count.
DiscoverHooks
Scans a directory for executable hook scripts and builds metadata.
func DiscoverHooks(hooksDir string, logger *slog.Logger) ([]api.HookInfo, error)- Returns empty slice (not nil) if
hooksDiris empty or does not exist - Skips directories, non-executable files, and
.jsonsidecar files - Computes SHA-256 via
integrity.HashFilefor each executable - Parses optional
.jsonsidecar for metadata (description, parameters, timeout, sandbox) - Results sorted by name
- Individual file errors logged at warn level; valid hooks still returned
Sidecar Metadata Format
A hook named deploy can have a sidecar file deploy.json:
{
"description": "Deploy to production",
"parameters": [
{
"name": "target",
"type": "string",
"required": true,
"description": "Target address"
}
],
"timeout": "30s",
"sandbox": "none"
}Parameter Passing
Parameters from ActionRequest.Parameters are passed to hook scripts as environment variables with the PLEXD_PARAM_ prefix.
| Original Name | Environment Variable |
|---|---|
target | PLEXD_PARAM_TARGET |
region | PLEXD_PARAM_REGION |
my-param.name! | PLEXD_PARAM_MY_PARAM_NAME_ |
Sanitization: non-alphanumeric characters (except underscore) are replaced with underscore, then uppercased.
Additional environment variables always set:
| Variable | Description |
|---|---|
PATH | Inherited from agent process |
HOME | Inherited from agent process |
PLEXD_NODE_ID | Node ID of the executing node |
PLEXD_EXECUTION_ID | Execution ID from the request |
Execution Status Values
| Status | Meaning |
|---|---|
success | Action completed with exit code 0 |
failed | Action completed with non-zero exit code |
timeout | Action exceeded its timeout and was killed |
cancelled | Action was cancelled (e.g., during shutdown) |
error | Internal error (integrity failure, file not found, etc.) |
Ack Rejection Reasons
| Reason | Trigger |
|---|---|
unknown_action | Action name not in builtins or hooks list |
max_concurrent_reached | Active executions >= Config.MaxConcurrent |
duplicate_execution_id | Execution ID already in progress |
shutting_down | Agent is shutting down |
actions_disabled | Config.Enabled is false |
API Types
Types defined in internal/api/types.go.
ActionRequest
SSE payload for action_request events.
type ActionRequest struct {
ExecutionID string `json:"execution_id"`
Action string `json:"action"`
Parameters map[string]string `json:"parameters,omitempty"`
Timeout string `json:"timeout"`
Checksum string `json:"checksum,omitempty"`
TriggeredBy *TriggeredBy `json:"triggered_by,omitempty"`
}ExecutionAck
Sent to POST /v1/nodes/{node_id}/executions/{execution_id}/ack.
type ExecutionAck struct {
ExecutionID string `json:"execution_id"`
Status string `json:"status"` // "accepted" or "rejected"
Reason string `json:"reason"` // populated when rejected
}ExecutionResult
Sent to POST /v1/nodes/{node_id}/executions/{execution_id}/result.
type ExecutionResult struct {
ExecutionID string `json:"execution_id"`
Status string `json:"status"`
ExitCode int `json:"exit_code"`
Stdout string `json:"stdout"`
Stderr string `json:"stderr"`
Duration string `json:"duration"`
FinishedAt time.Time `json:"finished_at"`
TriggeredBy *TriggeredBy `json:"triggered_by,omitempty"`
}CapabilitiesPayload
Sent to PUT /v1/nodes/{node_id}/capabilities.
type CapabilitiesPayload struct {
Binary *BinaryInfo `json:"binary,omitempty"`
BuiltinActions []ActionInfo `json:"builtin_actions"`
Hooks []HookInfo `json:"hooks"`
}Integration Points
With internal/api
EventActionRequestconstant defines the SSE event typeapi.ControlPlane.AckExecutionandReportResultare the production implementations ofActionReporterapi.ControlPlane.UpdateCapabilitiessends discovered capabilities
With internal/integrity
integrity.VerifierimplementsHookVerifierfor SHA-256 hook verificationintegrity.HashFileis used byDiscoverHooksfor computing hook checksums
With internal/api (EventDispatcher)
HandleActionRequest returns an api.EventHandler registered with the EventDispatcher for EventActionRequest events, following the same pattern as tunnel.HandleSSHSessionSetup.
Lifecycle
// 1. Create config
cfg := actions.Config{HooksDir: "/etc/plexd/hooks"}
cfg.ApplyDefaults()
// 2. Create executor
exec := actions.NewExecutor(cfg, reporter, verifier, logger)
// 3. Register built-in actions
exec.RegisterBuiltin("diagnostics.collect", "Collect system diagnostics", collectParams, actions.DiagnosticsCollect())
exec.RegisterBuiltin("diagnostics.ping_peer", "Ping a mesh peer", peerIDParam, actions.PingPeer(nodeInfo))
exec.RegisterBuiltin("diagnostics.traceroute_peer", "Traceroute to peer", peerIDParam, actions.DiagnosticsTraceroutePeer(nodeInfo))
exec.RegisterBuiltin("service.restart", "Restart service", nil, actions.ServiceRestart())
exec.RegisterBuiltin("service.reload_config", "Reload config", nil, actions.ServiceReloadConfig())
exec.RegisterBuiltin("service.upgrade", "Upgrade plexd binary", upgradeParams, actions.ServiceUpgrade(apiClient))
exec.RegisterBuiltin("system.info", "Report system and runtime info", nil, actions.SystemInfo(nodeInfo))
exec.RegisterBuiltin("health.check", "Check health", healthParams, actions.HealthCheck(healthProvider))
exec.RegisterBuiltin("mesh.reconnect", "Reconnect mesh", nil, actions.MeshReconnect(reconnector))
exec.RegisterBuiltin("config.dump", "Dump config", nil, actions.ConfigDump(configProvider))
exec.RegisterBuiltin("logs.snapshot", "Snapshot logs", snapshotParams, actions.LogsSnapshot(logProvider))
// 4. Register SSE handler
sseMgr.RegisterHandler(api.EventActionRequest,
actions.HandleActionRequest(exec, nodeID, logger))
// 5. Create hook watcher (replaces one-time DiscoverHooks)
watcher := actions.NewHookWatcher(cfg.HooksDir, exec.SetHooks, onIntegrityAlert, logger)
// 6. Wire to nodeapi
nodeAPISrv.SetActionProvider(exec)
nodeAPISrv.SetHookReloader(watcher)
// 7. Start watcher goroutine
go watcher.Watch(ctx)
// 8. On shutdown
exec.Shutdown(ctx)Error Handling
| Scenario | Behavior |
|---|---|
| Malformed SSE payload | Handler returns error (logged by dispatcher) |
| Missing execution_id | Handler returns error |
| Actions disabled | Rejected ack with reason=actions_disabled |
| Unknown action | Rejected ack with reason=unknown_action |
| Hook file missing | Accepted ack, then error result |
| Hook integrity failure | Accepted ack, then error result |
| Hook timeout | Process killed, result status=timeout |
| Hook non-zero exit | Result status=failed with actual exit code |
| Result report fails | Logged at warn level, agent continues |
| Ack report fails | Logged at warn level |
| Panic in action | Recovered, error result reported |
Logging
All log entries use component=actions.
| Level | Event | Keys |
|---|---|---|
Info | action_request received | execution_id, action |
Info | Action completed | execution_id, status, duration |
Warn | Action rejected | execution_id, action, reason |
Warn | Failed to send ack | execution_id, error |
Warn | Failed to report result | execution_id, error |
Error | Payload parse failed | event_id, error |
Error | Missing execution_id | event_id |
Warn | Actions disabled | execution_id, action |
SSE Event: action_request
The control plane sends an action_request event over the existing SSE stream to trigger an action on a node. Like all SSE events, it is wrapped in a signed envelope and verified before processing.
Payload
{
"execution_id": "exec_a1b2c3d4",
"action": "diagnostics.collect",
"type": "builtin",
"parameters": {
"include_network": true,
"include_processes": true
},
"timeout": "30s",
"callback_url": "https://api.plexsphere.com/v1/nodes/n_abc123/executions/exec_a1b2c3d4"
}| Field | Type | Description |
|---|---|---|
execution_id | string | Unique identifier for this execution |
action | string | Action name (e.g. diagnostics.collect, hooks/backup) |
type | string | builtin or hook |
parameters | object | Key-value parameters passed to the action |
timeout | duration | Maximum execution time (default: 30s) |
callback_url | string | URL for ACK/NACK and result delivery |
The issued_at, nonce, and signature fields are part of the signed event envelope and apply to all SSE events uniformly.
ACK/NACK and Result Formats
ACK/NACK (immediate)
POST {callback_url}/ack
{
"execution_id": "exec_a1b2c3d4",
"status": "accepted", // or "rejected"
"reason": "" // Reason if rejected (e.g. "unknown action", "integrity violation")
}Result (asynchronous)
POST {callback_url}/result
{
"execution_id": "exec_a1b2c3d4",
"status": "success", // success, failure, timeout, cancelled
"exit_code": 0,
"stdout": "...", // Truncated to 64 KiB
"stderr": "...", // Truncated to 64 KiB
"duration": "2.34s",
"finished_at": "2025-01-15T10:30:02Z"
}Retry and Persistence
- If the callback POST fails, plexd retries with exponential backoff (1s, 2s, 4s, ... up to 5 minutes).
- Pending results are persisted to
data_dirand re-delivered when the SSE connection is re-established.
Capability Announcement
When plexd registers or when its capabilities change (e.g. hooks added/removed, binary updated), it announces its full capability set to the control plane.
Registration Flow
During POST /v1/register, the capabilities field is included in the registration payload:
{
"token": "plx_enroll_a8f3c7...",
"public_key": "...",
"hostname": "web-01",
"metadata": { },
"capabilities": {
"binary": {
"version": "1.4.2",
"checksum": "sha256:a1b2c3d4e5f6..."
},
"builtin_actions": [
{
"name": "diagnostics.collect",
"description": "Collect system diagnostics",
"parameters": [
{ "name": "include_network", "type": "bool", "required": false, "default": "true" },
{ "name": "include_processes", "type": "bool", "required": false, "default": "true" }
]
}
],
"hooks": [
{
"name": "backup",
"description": "Run incremental backup of application data",
"source": "script",
"checksum": "sha256:f7e8d9c0b1a2...",
"parameters": [
{ "name": "target", "type": "string", "required": true },
{ "name": "compress", "type": "bool", "required": false, "default": "true" }
],
"timeout": "300s",
"sandbox": "namespaced"
},
{
"name": "db-backup",
"description": "PostgreSQL backup to S3",
"source": "crd",
"checksum": "sha256:abc123...",
"parameters": [
{ "name": "bucket", "type": "string", "required": true },
{ "name": "compress", "type": "bool", "required": false, "default": "true" }
],
"timeout": "600s",
"privileged": false
}
]
}
}Runtime Capability Update
PUT /v1/nodes/{node_id}/capabilitiesUsed when capabilities change after initial registration (e.g. hook files added/removed/modified, PlexdHook CRs created/updated/deleted, plexd binary updated). Same capabilities payload structure as in the registration request.
Data Model
| Type | Fields |
|---|---|
BinaryInfo | version, checksum |
ActionCapability | name, description, parameters[] |
HookCapability | name, description, source (script or crd), checksum, parameters[], timeout, sandbox (script) / privileged (crd) |
ParameterDef | name, type, required, default, description |
Kubernetes CRD Hooks
When plexd runs as a DaemonSet in Kubernetes, hooks are defined as PlexdHook custom resources instead of script files. On action_request, plexd creates a Kubernetes Job on the target node.
Generated Job YAML
When action_request arrives with action: hooks/db-backup, plexd creates:
apiVersion: batch/v1
kind: Job
metadata:
name: plexd-db-backup-exec-a1b2c3d4
namespace: plexd-system
labels:
plexd.plexsphere.com/hook: db-backup
plexd.plexsphere.com/execution-id: exec_a1b2c3d4
ownerReferences:
- apiVersion: plexd.plexsphere.com/v1alpha1
kind: PlexdHook
name: db-backup
spec:
backoffLimit: 0
activeDeadlineSeconds: 600
template:
spec:
nodeSelector:
kubernetes.io/hostname: worker-03
serviceAccountName: plexd-hook-runner
containers:
- name: backup
image: registry.example.com/tools/pg-backup:2.1@sha256:abc123...
command: ["/usr/local/bin/pg-backup.sh"]
env:
- name: PLEXD_PARAM_BUCKET
value: "s3://backups/prod"
- name: PLEXD_PARAM_COMPRESS
value: "true"
- name: PLEXD_EXECUTION_ID
value: "exec_a1b2c3d4"
- name: PLEXD_ACTION_NAME
value: "db-backup"
resources:
limits:
cpu: "1"
memory: 512Mi
volumeMounts:
- name: pgdata
mountPath: /var/lib/postgresql
readOnly: true
volumes:
- name: pgdata
hostPath:
path: /var/lib/postgresql
restartPolicy: Neverplexd pins the Job to the target node via nodeSelector, injects parameters as PLEXD_PARAM_* environment variables, and sets an ownerReference to the PlexdHook CR for garbage collection.
Result Mapping
plexd watches the Job and maps its status to the action callback:
| Job Condition | Callback Status | Notes |
|---|---|---|
| Succeeded | success | Exit code 0 |
| Failed | failure | Exit code from container termination state |
activeDeadlineSeconds exceeded | timeout | Job killed by Kubernetes |
Stdout and stderr are captured from the pod logs via the Kubernetes API.
Security Considerations
- Signed delivery -- All SSE events (including
action_request,peer_added,peer_removed,rotate_keys, etc.) are signed with the control plane's Ed25519 key. plexd verifies every signature before processing. Local action requests via Unix socket require a valid session JWT. - Replay protection -- Every SSE event includes
issued_at(max staleness: 5 minutes) andnonce(tracked in bounded set). Signature verification, staleness, and nonce checks are applied uniformly to all event types. - Hook file permissions -- plexd verifies that hook files are owned by root and not group- or other-writable before execution.
- Symlink protection -- Hook paths are resolved and validated to prevent symlink escape outside the configured hooks directory.
- Checksum enforcement -- Hook checksums are verified before every execution. Binary checksums are reported continuously. On Kubernetes, image digests serve as checksums -- hooks without pinned digest (
@sha256:...) are rejected. - Resource isolation -- Hooks run with cgroup limits at minimum; higher sandbox levels add namespace or container isolation. On Kubernetes, hooks always run as separate Pods with native resource limits.
- CRD privilege control -- Kubernetes hooks requiring host-level access (
hostPID,hostNetwork,privileged) must declareprivileged: truein thePlexdHookspec. The platform can enforce approval policies. - Session token scoping -- JWTs are bound to a specific node (
node_idclaim) and a specific set of actions (actionsclaim). Tokens cannot be used on other nodes or for unauthorized actions. - Session revocation -- When an SSH session ends, the control plane pushes
session_revokedvia SSE. plexd maintains a local revocation set to reject tokens from terminated sessions. - Local emergency access --
plexd actions run --localrequires root or plexd user, bypasses JWT authorization, but is logged aslocal_emergencyand reported to the control plane.