Skip to content

Kubernetes E2E Test

Validates that plexd deployed as a Kubernetes DaemonSet successfully registers, sends heartbeats, retrieves state, reports capabilities, detects drift, and forwards metrics, logs, and audit events to the Central API. The test uses kind to create a local single-node cluster, applies all production manifests from deploy/kubernetes/, deploys a mock-api as a ClusterIP Service, and polls the assertion endpoint to verify plexd's lifecycle calls.

Cluster Topology

┌─────────────────────────────────────────────────┐
│              kind cluster (plexd-e2e)            │
│                                                  │
│  ┌──────────────┐       ┌──────────────────┐    │
│  │   mock-api   │◄──────│  plexd DaemonSet │    │
│  │  Deployment  │       │  (host network)  │    │
│  │  :8080 (HTTP)│       │  1 pod per node  │    │
│  │  :8443 (TLS) │       │                  │    │
│  └──────────────┘       └──────────────────┘    │
│  ClusterIP Service                               │
│  mock-api.plexd-e2e:8080, :8443                  │
└─────────────────────────────────────────────────┘
        │ port-forward :18080

   localhost:18080/test/assertions
ComponentImageSourcePurpose
mock-apimockapi:e2etest/e2e/mockapi/DockerfileFixture-based mock Central API, tracks call counters
plexdplexd:e2edeploy/docker/DockerfileAgent under test, deployed as DaemonSet

Test Phases

1. Pre-flight checks

Verifies that kind, kubectl, docker, curl, and jq are available on $PATH. Exits immediately if any tool is missing.

2. Cluster creation

Deletes any pre-existing cluster with the same name, then creates a new kind cluster with --wait 60s for node readiness.

3. Image build and load

Builds both Docker images from the repository root, then loads them into the kind cluster node with kind load docker-image. Both use imagePullPolicy: Never to avoid registry pulls.

4. Manifest application

Manifests are applied in dependency order:

OrderResourceSourceNotes
1Namespace plexd-e2ekubectl create namespaceTest-specific namespace
2PlexdNodeState CRDdeploy/kubernetes/crds/plexdnodestate-crd.yamlCluster-scoped
3PlexdHook CRDdeploy/kubernetes/crds/plexdhook-crd.yamlCluster-scoped
4ServiceAccountdeploy/kubernetes/serviceaccount.yamlNamespace patched via sed
5RBACdeploy/kubernetes/rbac.yamlClusterRoleBinding patched to test namespace
6Bootstrap Secretkubectl create secret genericToken: e2e-test-token
7ConfigMapkubectl create configmapInline config pointing to mock-api
8mock-api Deployment + Servicetest/e2e/kubernetes/mock-api-manifests.yamlClusterIP on ports 8080 (HTTP) and 8443 (TLS)
9plexd DaemonSetdeploy/kubernetes/daemonset.yamlImage and namespace patched via sed

The DaemonSet manifest is patched at apply time using --dry-run=client -o yaml | sed to substitute the namespace, image tag, and pull policy.

5. Readiness wait

  • mock-api Deployment: kubectl rollout status with 60s timeout.
  • plexd DaemonSet: kubectl rollout status with configurable timeout (default 120s).

6. Port-forward and initial assertions

Port-forward from localhost:18080 to svc/mock-api:8080 is started in the background. The script polls GET /test/assertions every 5 seconds for up to 60 seconds until all 8 counters are >= 1.

7. Request body validation

Uses GET /test/last-request/{endpoint} to verify the content of request payloads:

EndpointValidated Fields
registertoken (non-empty), hostname (non-empty)
heartbeatValid JSON with timestamp field
capabilitiesbuiltin_actions (array with >= 1 entry)

8. Periodic loop verification

Waits up to 60 seconds for heartbeat_count and metrics_count to reach >= 2, proving that self-generating periodic loops run continuously. Logs and audit are tested via pod restart.

9. Pod restart resilience

Deletes the plexd pod and waits for the DaemonSet controller to schedule a new pod. Verifies:

  • New pod becomes ready within 60 seconds
  • Heartbeat resumes (agent loads persisted identity from hostPath and enters steady state)
  • audit_count increases (new ProcessSource fires process_start in the new pod)

Note: registration_count does not increase because the identity persists via the hostPath volume at /var/lib/plexd. This is correct production behavior — the agent reuses its existing registration.

10. Local endpoint delivery

Polls GET /test/assertions until local_metrics_count, local_logs_count, and local_audit_count are all >= 1 (timeout: 60s). Validates that the local endpoint credential chain works in Kubernetes: NSK from registration → secret fetch → AES-256-GCM decryption → Bearer token → HTTPS POST to mock-api.plexd-e2e:8443.

11. Cleanup

The cleanup function runs on EXIT trap (both success and failure). It kills the port-forward process, prints diagnostics, and deletes the kind cluster.

Assertion Logic

The test polls GET http://localhost:18080/test/assertions which returns JSON counters:

json
{
  "registration_count": 1,
  "heartbeat_count": 3,
  "state_count": 1,
  "capabilities_count": 1,
  "drift_count": 1,
  "metrics_count": 1,
  "logs_count": 1,
  "audit_count": 1,
  "local_metrics_count": 1,
  "local_logs_count": 1,
  "local_audit_count": 1
}

The test passes when all eight platform counters are >= 1 (initial assertions), and separately verifies that all three local endpoint counters are >= 1 (Phase 10):

CounterMeaning
registration_countplexd called POST /v1/register
heartbeat_countplexd called POST /v1/nodes/{id}/heartbeat
state_countplexd called GET /v1/nodes/{id}/state
capabilities_countplexd called PUT /v1/nodes/{id}/capabilities
drift_countplexd called POST /v1/nodes/{id}/drift
metrics_countplexd called POST /v1/nodes/{id}/metrics
logs_countplexd called POST /v1/nodes/{id}/logs
audit_countplexd called POST /v1/nodes/{id}/audit
local_metrics_countplexd sent metrics to POST /local/metrics (TLS)
local_logs_countplexd sent logs to POST /local/logs (TLS)
local_audit_countplexd sent audit to POST /local/audit (TLS)

plexd Configuration

The ConfigMap is created inline by the test script (not from deploy/kubernetes/plexd-config-configmap.yaml) because the API URL must point to the in-cluster mock-api Service.

yaml
api:
  base_url: http://mock-api.plexd-e2e:8080
registration:
  data_dir: /var/lib/plexd
node_api:
  data_dir: /var/lib/plexd
heartbeat:
  node_id: e2e-k8s-node
metrics:
  local_endpoint:
    url: https://mock-api.plexd-e2e:8443/local/metrics
    secret_key: local-bearer-token
    tls_insecure_skip_verify: true
log_fwd:
  local_endpoint:
    url: https://mock-api.plexd-e2e:8443/local/logs
    secret_key: local-bearer-token
    tls_insecure_skip_verify: true
audit_fwd:
  local_endpoint:
    url: https://mock-api.plexd-e2e:8443/local/audit
    secret_key: local-bearer-token
    tls_insecure_skip_verify: true

The bootstrap token is set via kubectl create secret generic plexd-bootstrap --from-literal=token=e2e-test-token.

Configuration Variables

VariableDefaultDescription
CLUSTER_NAMEplexd-e2eName of the kind cluster
TIMEOUT120sDaemonSet rollout timeout

Usage

bash
make test-e2e-k8s

Or directly:

bash
bash test/e2e/kubernetes/test.sh

Override configuration:

bash
CLUSTER_NAME=my-cluster TIMEOUT=180s make test-e2e-k8s

Prerequisites

  • Docker
  • kind
  • kubectl
  • curl and jq on the host

Debugging Failures

DaemonSet does not become ready:

bash
kubectl -n plexd-e2e describe daemonset/plexd
kubectl -n plexd-e2e logs -l app.kubernetes.io/name=plexd --tail=50

The DaemonSet uses hostNetwork: true with dnsPolicy: ClusterFirstWithHostNet. DNS resolution to the mock-api ClusterIP Service requires this policy. The plexd container also has readOnlyRootFilesystem: true and drops all capabilities except NET_ADMIN and NET_RAW. Since kind nodes lack the WireGuard kernel module, the test validates manifest correctness and API communication — not tunnel creation.

Assertions not met (counters stay at 0):

bash
kubectl -n plexd-e2e logs -l app.kubernetes.io/name=plexd --tail=100
kubectl -n plexd-e2e logs -l app.kubernetes.io/name=mock-api --tail=50

Common causes:

  • DNS resolution failure — verify dnsPolicy: ClusterFirstWithHostNet is set on the DaemonSet.
  • ConfigMap not mounted — check that the plexd pod has /etc/plexd/config.yaml with the correct api.base_url.
  • Missing bootstrap token — the PLEXD_BOOTSTRAP_TOKEN env var must resolve from the plexd-bootstrap Secret.

Port-forward not reachable:

The script waits 2 seconds after starting the port-forward. If curl fails, check that mock-api is healthy:

bash
kubectl -n plexd-e2e get pods -l app.kubernetes.io/name=mock-api

Cluster not cleaned up:

The cleanup trap runs on EXIT, but if the script is killed with SIGKILL, run manually:

bash
kind delete cluster --name plexd-e2e

Diagnostics Output

On any failure, the print_diagnostics function outputs:

CommandPurpose
kubectl get pods -n plexd-e2e -o widePod status and node assignment
kubectl describe daemonset/plexd -n plexd-e2eScheduling events and conditions
kubectl logs -l app.kubernetes.io/name=plexd --tail=50Recent plexd agent logs
kubectl logs -l app.kubernetes.io/name=mock-api --tail=50Recent mock-api server logs

Key Files

FilePurpose
test/e2e/kubernetes/test.shOrchestration script (build, deploy, assert, cleanup)
test/e2e/kubernetes/mock-api-manifests.yamlmock-api Deployment + ClusterIP Service
test/e2e/mockapi/DockerfileMock API image
test/e2e/mockapi/mockapi.goMock API server with /test/assertions endpoint
deploy/docker/Dockerfileplexd production image
deploy/kubernetes/*.yamlProduction manifests applied by the test
Makefiletest-e2e-k8s target

See also