See it in action on the hev-shop demo store.

Operations

InfraRules CRD

InfraRules is the cluster-scoped policy object for Layer-managed runtime infrastructure. There is exactly one object: InfraRules/default.

Pipelines and Functions do not reference a separate autoscaling resource. They set spec.scaling inline and choose a pool from InfraRules/default.spec.computePools.

InfraRules

apiVersion: hevlayer.com/v1alpha1
kind: InfraRules
metadata:
  name: default
spec:
  computePools:
    - name: cpu
      kind: cpu
      nodeSelector:
        layer.hev.dev/node-role: worker-cpu
        layer.hev.dev/compute: cpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-cpu
          effect: NoSchedule
      resources:
        requests:
          cpu: "1"
          memory: 2Gi
        limits:
          cpu: "2"
          memory: 4Gi
      maxReplicasPerWorkload: 32
    - name: cpu-large
      kind: cpu
      nodeSelector:
        layer.hev.dev/node-role: worker-cpu
        layer.hev.dev/compute: cpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-cpu
          effect: NoSchedule
      resources:
        requests:
          cpu: "1"
          memory: 2Gi
          ephemeral-storage: 35Gi
        limits:
          cpu: "4"
          memory: 4Gi
          ephemeral-storage: 40Gi
      maxReplicasPerWorkload: 8
    - name: gpu
      kind: gpu
      nodeSelector:
        layer.hev.dev/node-role: worker-gpu
        layer.hev.dev/compute: gpu
      tolerations:
        - key: layer.hev.dev/node-role
          operator: Equal
          value: worker-gpu
          effect: NoSchedule
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      resources:
        requests:
          cpu: 250m
          memory: 4Gi
          nvidia.com/gpu: "1"
        limits:
          cpu: "2"
          memory: 10Gi
          nvidia.com/gpu: "1"
      maxReplicasPerWorkload: 4
  documentCache:
    capGiB: 256
    replicationFactor: 1
    scaling:
      mode: autoscale
      nodes:
        min: 0
        max: 1

The operator validates that the object is named default. Helm can render the default object with operator.infraRules.create=true.

Compute pools

The Helm defaults define three well-known pools:

PoolUse
cpuGeneral CPU workers.
cpu-largeCPU workers that need local ephemeral-storage headroom.
gpuOne-NVIDIA-GPU workers for embedding and inference.

The default pools select the Karpenter-backed worker nodes with layer.hev.dev/node-role=worker-cpu or worker-gpu. The default gpu pool also requests nvidia.com/gpu: "1" and includes the standard NVIDIA toleration. Override nodeSelector, gpuType, or resource envelopes in operator.infraRules.computePools when your cluster uses different worker pool names or specific SKUs.

FieldPurpose
nameReferenced by spec.scaling.pool on Pipeline and Function resources.
kindPool class label such as cpu or gpu.
gpuTypeOptional descriptive GPU type for GPU pools.
nodeSelectorApplied to worker pods that choose the pool.
tolerationsApplied to worker pods that choose the pool.
resourcesContainer resources applied to worker pods.
maxReplicasPerWorkloadHard ceiling for one Pipeline or Function.

If a workload names an unknown pool or asks for more replicas than the pool ceiling, the operator leaves the workload unready and records a condition on its status.

Workload scaling

scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 4
ModeBehavior
autoscaleEmit a KEDA ScaledObject and let queue depth scale the Deployment between min and max.
fixedSet Deployment replicas to replicas.min; no KEDA object is emitted.
disabledScale the Deployment to 0; no KEDA object is emitted.

Paused workloads also scale to 0. To keep a cold-start-heavy worker warm, set mode: autoscale and replicas.min: 1.

When a Function or Pipeline omits scaling.pool, the operator uses worker.computeClass to choose the stock cpu or gpu pool.

Document cache rules

documentCache captures the operator-owned document cache settings: capacity, replication factor, and node count. Helm still renders the document-cache KEDA object directly; InfraRules is the declared policy shape the operator reports and validates against.

esc