Operations
InfraRules CRD
InfraRules is the cluster-scoped policy object for Layer-managed
runtime infrastructure. There is exactly one object:
InfraRules/default.
Pipelines and Functions do not reference a separate autoscaling resource.
They set spec.scaling inline and choose a pool from
InfraRules/default.spec.computePools.
InfraRules
apiVersion: hevlayer.com/v1alpha1
kind: InfraRules
metadata:
name: default
spec:
computePools:
- name: cpu
kind: cpu
nodeSelector:
layer.hev.dev/node-role: worker-cpu
layer.hev.dev/compute: cpu
tolerations:
- key: layer.hev.dev/node-role
operator: Equal
value: worker-cpu
effect: NoSchedule
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
maxReplicasPerWorkload: 32
- name: cpu-large
kind: cpu
nodeSelector:
layer.hev.dev/node-role: worker-cpu
layer.hev.dev/compute: cpu
tolerations:
- key: layer.hev.dev/node-role
operator: Equal
value: worker-cpu
effect: NoSchedule
resources:
requests:
cpu: "1"
memory: 2Gi
ephemeral-storage: 35Gi
limits:
cpu: "4"
memory: 4Gi
ephemeral-storage: 40Gi
maxReplicasPerWorkload: 8
- name: gpu
kind: gpu
nodeSelector:
layer.hev.dev/node-role: worker-gpu
layer.hev.dev/compute: gpu
tolerations:
- key: layer.hev.dev/node-role
operator: Equal
value: worker-gpu
effect: NoSchedule
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
resources:
requests:
cpu: 250m
memory: 4Gi
nvidia.com/gpu: "1"
limits:
cpu: "2"
memory: 10Gi
nvidia.com/gpu: "1"
maxReplicasPerWorkload: 4
documentCache:
capGiB: 256
replicationFactor: 1
scaling:
mode: autoscale
nodes:
min: 0
max: 1
The operator validates that the object is named default. Helm can
render the default object with operator.infraRules.create=true.
Compute pools
The Helm defaults define three well-known pools:
| Pool | Use |
|---|---|
cpu | General CPU workers. |
cpu-large | CPU workers that need local ephemeral-storage headroom. |
gpu | One-NVIDIA-GPU workers for embedding and inference. |
The default pools select the Karpenter-backed worker nodes with
layer.hev.dev/node-role=worker-cpu or worker-gpu. The default gpu
pool also requests nvidia.com/gpu: "1" and includes the standard
NVIDIA toleration. Override nodeSelector, gpuType, or resource
envelopes in operator.infraRules.computePools when your cluster uses
different worker pool names or specific SKUs.
| Field | Purpose |
|---|---|
name | Referenced by spec.scaling.pool on Pipeline and Function resources. |
kind | Pool class label such as cpu or gpu. |
gpuType | Optional descriptive GPU type for GPU pools. |
nodeSelector | Applied to worker pods that choose the pool. |
tolerations | Applied to worker pods that choose the pool. |
resources | Container resources applied to worker pods. |
maxReplicasPerWorkload | Hard ceiling for one Pipeline or Function. |
If a workload names an unknown pool or asks for more replicas than the pool ceiling, the operator leaves the workload unready and records a condition on its status.
Workload scaling
scaling:
pool: cpu
mode: autoscale
replicas:
min: 0
max: 4
| Mode | Behavior |
|---|---|
autoscale | Emit a KEDA ScaledObject and let queue depth scale the Deployment between min and max. |
fixed | Set Deployment replicas to replicas.min; no KEDA object is emitted. |
disabled | Scale the Deployment to 0; no KEDA object is emitted. |
Paused workloads also scale to 0. To keep a cold-start-heavy worker
warm, set mode: autoscale and replicas.min: 1.
When a Function or Pipeline omits scaling.pool, the operator uses
worker.computeClass to choose the stock cpu or gpu pool.
Document cache rules
documentCache captures the operator-owned document cache settings:
capacity, replication factor, and node count. Helm still renders the
document-cache KEDA object directly; InfraRules is the declared
policy shape the operator reports and validates against.