See it in action on the hev-shop demo store.

Operations

Pipeline CRD

The Pipeline CRD declares the scaling characteristics you want for ingesting data. Ingestion typically runs in stages: a CPU stage for chunking and extraction, followed by a GPU stage for embedding. You can declare the spec in YAML, from code through the pipeline API, or a combination of both — it is recommended you declare your pipeline scaling characteristics in YAML while setting your namespace via the client. spec.sourceRef lets you declare your pipeline’s upstream details as well — the operator hands it to the worker as an environment variable, so the worker reads its source from config instead of hardcoding it.

apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
  name: product-images
  namespace: layer
spec:
  target:
    namespace: products
  sourceRef:
    kind: sqs
    queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images
  worker:
    image: ghcr.io/hev/product-image-worker:latest
    computeClass: cpu
    batchSize: 64
    timeoutSeconds: 60
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 8

Target

spec.target.namespace is the Turbopuffer namespace the pipeline writes. The gateway pipeline API owns document state, chunks, and vector writes for that target namespace.

Pipeline id

spec.pipelineId names the gateway pipeline (the queue) the worker stages into and scales on. It defaults to the resource name. Set it when multiple worker resources share one queue: the extract and embed stages of a two-stage pipeline both set pipelineId: products.

Source

spec.sourceRef is intentionally open JSON for the external source that feeds the worker: SQS, Kafka, S3 events, a partner API, or a one-off migration source. The operator injects it into the worker pod verbatim as HEVLAYER_SOURCE_REF; the worker image owns source-specific behavior. See Extract and chunk for a worker reading it.

Worker

FieldPurpose
imageWorker image.
computeClasscpu or gpu. Defaults to cpu; when scaling.pool is omitted, the operator maps this to the stock cpu or gpu pool.
batchSizeWork items per batch.
timeoutSecondsWorker call timeout.
podSpecOptional pod-level merge patch.

The operator creates one Deployment per Pipeline and injects:

VariableValue
HEVLAYER_PIPELINE_IDspec.pipelineId, defaulting to the resource name.
HEVLAYER_TARGET_NAMESPACEspec.target.namespace.
HEVLAYER_BASE_URLThe gateway base URL.
HEVLAYER_SOURCE_REFspec.sourceRef as JSON, when set.
LAYER_GATEWAY_API_KEYGateway bearer token. In deriveFromStore mode this is the default VectorStore credential; in keys mode it is the configured inbound worker key.

Scaling

scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 8

spec.scaling.pool, when set, must name a pool in InfraRules/default. When omitted, the operator uses worker.computeClass to choose the stock cpu or gpu pool. Helm installs the well-known cpu, cpu-large, and gpu pools by default. mode: autoscale creates a KEDA ScaledObject backed by pipeline queue depth. mode: fixed pins the Deployment to replicas.min; mode: disabled scales it to zero.

spec.paused: true also scales the worker to zero.

Status

Use the pipeline status API for status: queue counts, stage progress, and worker state. The resource itself reports only managed object references and readiness conditions.

esc