Operations
Pipeline CRD
The Pipeline CRD declares the scaling characteristics you want for
ingesting data. Ingestion typically runs in stages: a CPU stage for
chunking and extraction, followed by a GPU stage for embedding. You can
declare the spec in YAML, from code through the
pipeline API, or a combination of both — it is
recommended you declare your pipeline scaling characteristics in YAML
while setting your namespace via the client. spec.sourceRef lets you
declare your pipeline’s upstream details as well — the operator hands it
to the worker as an environment variable, so the worker reads its source
from config instead of hardcoding it.
apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
name: product-images
namespace: layer
spec:
target:
namespace: products
sourceRef:
kind: sqs
queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images
worker:
image: ghcr.io/hev/product-image-worker:latest
computeClass: cpu
batchSize: 64
timeoutSeconds: 60
scaling:
pool: cpu
mode: autoscale
replicas:
min: 0
max: 8
Target
spec.target.namespace is the Turbopuffer namespace the pipeline writes.
The gateway pipeline API owns document state, chunks, and vector writes
for that target namespace.
Pipeline id
spec.pipelineId names the gateway pipeline (the queue) the worker
stages into and scales on. It defaults to the resource name. Set it when
multiple worker resources share one queue: the extract and embed stages
of a two-stage pipeline both set
pipelineId: products.
Source
spec.sourceRef is intentionally open JSON for the external source that
feeds the worker: SQS, Kafka, S3 events, a partner API, or a one-off
migration source. The operator injects it into the worker pod verbatim
as HEVLAYER_SOURCE_REF; the worker image owns source-specific
behavior. See Extract and chunk
for a worker reading it.
Worker
| Field | Purpose |
|---|---|
image | Worker image. |
computeClass | cpu or gpu. Defaults to cpu; when scaling.pool is omitted, the operator maps this to the stock cpu or gpu pool. |
batchSize | Work items per batch. |
timeoutSeconds | Worker call timeout. |
podSpec | Optional pod-level merge patch. |
The operator creates one Deployment per Pipeline and injects:
| Variable | Value |
|---|---|
HEVLAYER_PIPELINE_ID | spec.pipelineId, defaulting to the resource name. |
HEVLAYER_TARGET_NAMESPACE | spec.target.namespace. |
HEVLAYER_BASE_URL | The gateway base URL. |
HEVLAYER_SOURCE_REF | spec.sourceRef as JSON, when set. |
LAYER_GATEWAY_API_KEY | Gateway bearer token. In deriveFromStore mode this is the default VectorStore credential; in keys mode it is the configured inbound worker key. |
Scaling
scaling:
pool: cpu
mode: autoscale
replicas:
min: 0
max: 8
spec.scaling.pool, when set, must name a pool in
InfraRules/default. When omitted, the
operator uses worker.computeClass to choose the stock cpu or gpu
pool. Helm installs the well-known cpu, cpu-large, and gpu pools
by default.
mode: autoscale creates a KEDA ScaledObject backed by pipeline queue
depth. mode: fixed pins the Deployment to replicas.min; mode: disabled scales it to zero.
spec.paused: true also scales the worker to zero.
Status
Use the pipeline status API for status: queue counts, stage progress, and worker state. The resource itself reports only managed object references and readiness conditions.