Operations

InfraRules CRD

InfraRules is the cluster-scoped policy object for Layer-managed runtime infrastructure. There is exactly one object: InfraRules/default.

Pipelines and Functions do not reference a separate autoscaling resource. They set spec.scaling inline and choose a pool from InfraRules/default.spec.computePools.

InfraRules

apiVersion: hevlayer.com/v1alpha1
kind: InfraRules
metadata:
  name: default
spec:
  computePools:
    - name: cpu
      kind: cpu
      nodeSelector:
        layer.hev.dev/node-role: worker-cpu
        layer.hev.dev/compute: cpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-cpu
          effect: NoSchedule
      resources:
        requests:
          cpu: "1"
          memory: 2Gi
        limits:
          cpu: "2"
          memory: 4Gi
      maxReplicasPerWorkload: 32
    - name: cpu-large
      kind: cpu
      nodeSelector:
        layer.hev.dev/node-role: worker-cpu
        layer.hev.dev/compute: cpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-cpu
          effect: NoSchedule
      resources:
        requests:
          cpu: "1"
          memory: 2Gi
          ephemeral-storage: 35Gi
        limits:
          cpu: "4"
          memory: 4Gi
          ephemeral-storage: 40Gi
      maxReplicasPerWorkload: 8
    - name: gpu
      kind: gpu
      nodeSelector:
        layer.hev.dev/node-role: worker-gpu
        layer.hev.dev/compute: gpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-gpu
          effect: NoSchedule
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      resources:
        requests:
          cpu: 250m
          memory: 4Gi
          nvidia.com/gpu: "1"
        limits:
          cpu: "2"
          memory: 10Gi
          nvidia.com/gpu: "1"
      maxReplicasPerWorkload: 4
  documentCache:
    capGiB: 256
    replicationFactor: 1
    scaling:
      mode: autoscale
      nodes:
        min: 0
        max: 1

The operator validates that the object is named default. Helm can render the default object with operator.infraRules.create=true.

Compute pools

The Helm defaults define three well-known pools:

Pool	Use
`cpu`	General CPU workers.
`cpu-large`	CPU workers that need local ephemeral-storage headroom.
`gpu`	One-NVIDIA-GPU workers for embedding and inference.

The default pools select the Karpenter-backed worker nodes with layer.hev.dev/node-role=worker-cpu or worker-gpu. The default gpu pool also requests nvidia.com/gpu: "1" and includes the standard NVIDIA toleration. Override nodeSelector, gpuType, or resource envelopes in operator.infraRules.computePools when your cluster uses different worker pool names or specific SKUs.

Field	Purpose
`name`	Referenced by `spec.scaling.pool` on Pipeline and Function resources.
`kind`	Pool class label such as `cpu` or `gpu`.
`gpuType`	Optional descriptive GPU type for GPU pools.
`nodeSelector`	Applied to worker pods that choose the pool.
`tolerations`	Applied to worker pods that choose the pool.
`resources`	Container resources applied to worker pods.
`maxReplicasPerWorkload`	Hard ceiling for one Pipeline or Function.

If a workload names an unknown pool or asks for more replicas than the pool ceiling, the operator leaves the workload unready and records a condition on its status.

Workload scaling

scaling:
  pool: gpu
  mode: autoscale
  warmWindowSeconds: 300
  replicas:
    min: 0
    max: 4

Mode	Behavior
`autoscale`	Emit a KEDA `ScaledObject` and let queue depth scale the Deployment between `min` and `max`.
`fixed`	Set Deployment replicas to `replicas.min`; no KEDA object is emitted.
`disabled`	Scale the Deployment to 0; no KEDA object is emitted.

Field	Purpose
`pool`	Names a pool in `InfraRules/default.spec.computePools`. When omitted, the operator maps `worker.computeClass` to the stock `cpu` or `gpu` pool.
`mode`	`autoscale`, `fixed`, or `disabled` (see the table above).
`replicas`	`min`/`max` bounds for the Deployment. `max` may not exceed the pool’s `maxReplicasPerWorkload`.
`warmWindowSeconds`	Cooldown that holds a workload warm after its last scaling trigger drains, before `autoscale` returns it to `replicas.min`. See below.

Paused workloads also scale to 0. To keep a cold-start-heavy worker warm, set mode: autoscale and replicas.min: 1.

Warm window

warmWindowSeconds maps to the KEDA ScaledObject cooldownPeriod: the operator waits this long after the last trigger fires before scaling the Deployment back to replicas.min. It defaults to 60 when unset. A non-zero value also annotates the worker pods with karpenter.sh/do-not-disrupt, so Karpenter retains the node — not just the replica — for the window rather than consolidating it away.

This is aimed at scale-to-zero GPU pools, where each wake otherwise pays a full cold start (fresh nodeclaim, multi-GB image pull, model load). A warm window lets adjacent batches reuse one warm node, then lets the pool return to genuine scale-to-zero once the window elapses. It must be >= 0 and requires mode: autoscale; the operator leaves the workload unready and records a condition otherwise.

Document cache rules

documentCache captures the operator-owned document cache settings: capacity, replication factor, and node count. Helm still renders the document-cache KEDA object directly; InfraRules is the declared policy shape the operator reports and validates against.