Operations

Pipeline CRD

The Pipeline CRD declares the scaling characteristics you want for ingesting data. Ingestion typically runs in stages: a CPU stage for chunking and extraction, followed by a GPU stage for embedding. You can declare the spec in YAML, from code through the pipeline API, or a combination of both — it is recommended you declare your pipeline scaling characteristics in YAML while setting your namespace via the client. spec.sourceRef lets you declare your pipeline’s upstream details as well — the operator hands it to the worker as an environment variable, so the worker reads its source from config instead of hardcoding it.

apiVersion: hevlayer.com/v1alpha1
kind: Pipeline
metadata:
  name: product-images
  namespace: layer
spec:
  target:
    namespace: products
  sourceRef:
    kind: sqs
    queueUrl: https://sqs.us-east-1.amazonaws.com/123456789/product-images
  schedule:
    cron: "0 2 * * *"
    leaseSeconds: 600
  worker:
    image: <acct>.dkr.ecr.us-east-1.amazonaws.com/hev-product-image-worker:latest
    computeClass: cpu
    batchSize: 64
    timeoutSeconds: 60
  scaling:
    pool: cpu
    mode: autoscale
    replicas:
      min: 0
      max: 8

Target

spec.target.namespace is the turbopuffer namespace the pipeline writes. The gateway pipeline API owns document state, chunks, and vector writes for that target namespace.

Pipeline id

spec.pipelineId names the gateway pipeline (the queue) the worker stages into and scales on. It defaults to the resource name. Set it when multiple worker resources share one queue: the extract and embed stages of a two-stage pipeline both set pipelineId: products.

Source

spec.sourceRef declares the external source that feeds the worker. Its kind selects how the operator treats it.

For open kinds — SQS, Kafka, S3 events, a partner API, a one-off migration — sourceRef is arbitrary JSON injected into the worker pod verbatim as HEVLAYER_SOURCE_REF; the worker image owns source-specific behavior. See Extract and chunk for a worker reading it.

Typed sources

For warehouse-backed kinds — snowflake, huggingface, and rest — kind selects a typed shape the operator validates. The source names a Warehouse with warehouseRef; the operator resolves it (it must be Verified), mounts its credential Secret, and injects connection details as HEVLAYER_WAREHOUSE with no credential material. spec.worker.image is then optional: omit it and the operator defaults to the stock worker for that kind from the mesh-account ECR registry (for example, <acct>.dkr.ecr.us-east-1.amazonaws.com/hev-huggingface-source or <acct>.dkr.ecr.us-east-1.amazonaws.com/hev-rest-source), so a typed source needs no custom image. Set worker.image to override with your own. The per-kind source fields are on the Warehouse CRD page.

Chunking

A source’s text column is often a whole document that must be split before it is embedded. An optional chunk block declares how, with no code for the common strategies. It applies to any source whose worker honors it — the stock workers do.

spec:
  sourceRef:
    kind: huggingface
    warehouseRef: hf-public
    dataset: wikimedia/wikipedia
    config: 20231101.en
    split: train
    mapping:
      text: text
      attributes: [title, url]
    chunk:
      strategy: recursive     # none | fixed | recursive | sentence | markdown
      unit: tokens            # tokens | characters
      size: 512
      overlap: 64
      tokenizer: cl100k_base  # when unit: tokens

`chunk` field	Purpose
`strategy`	`none` (default — one document per row), `fixed`, `recursive` (a paragraph→line→sentence→word ladder kept under `size`), `sentence`, or `markdown` (split on headings).
`unit`	`tokens` or `characters` — what `size` and `overlap` count in.
`size`	Target maximum chunk length.
`overlap`	Units repeated between adjacent chunks for context.
`tokenizer`	Token model when `unit: tokens`. Pinned so chunk boundaries stay reproducible.

Each row maps to one document; text splits into chunks. The chunk is the unit indexed and embedded — a row with id {documentId}#{i} carrying the document’s attributes plus reserved _hevlayer_parent_id and _hevlayer_chunk_index. For splits the stock strategies can’t express, set spec.worker.image to your own chunker.

Schedule

spec.schedule is optional. When it is set, the operator wakes the Pipeline worker on a KEDA cron window instead of scaling it on pending pipeline queue depth:

schedule:
  cron: "0 2 * * *"      # 5-field UTC cron; minute must be a single integer
  leaseSeconds: 600      # sizes the cron window

The worker still owns source semantics: what to pull on wake, how to advance cursors, and how to stage rows. The schedule only controls when the worker runs. Scheduled Pipelines must use scaling.replicas.min: 0; the cron window is the wake trigger.

Worker

Field	Purpose
`image`	Worker image. Optional for typed sources, where it defaults to the stock worker for the source kind; required otherwise.
`computeClass`	`cpu` or `gpu`. Defaults to `cpu`; when `scaling.pool` is omitted, the operator maps this to the stock `cpu` or `gpu` pool.
`batchSize`	Work items per batch.
`timeoutSeconds`	Worker call timeout.
`podSpec`	Optional pod-level merge patch.

The operator creates one Deployment per Pipeline and injects:

Variable	Value
`HEVLAYER_PIPELINE_ID`	`spec.pipelineId`, defaulting to the resource name.
`HEVLAYER_TARGET_NAMESPACE`	`spec.target.namespace`.
`HEVLAYER_BASE_URL`	The gateway base URL.
`HEVLAYER_SOURCE_REF`	`spec.sourceRef` as JSON, when set.
`HEVLAYER_PIPELINE_SCHEDULE`	`1` when `spec.schedule` is set.
`HEVLAYER_WAREHOUSE`	Resolved `Warehouse` connection JSON (no credential material), for typed sources. The credential Secret is mounted separately.
`LAYER_GATEWAY_API_KEY`	Gateway bearer token. In `deriveFromStore` mode this is the default `VectorStore` credential; in `keys` mode it is the configured inbound worker key.

Scaling

scaling:
  pool: cpu
  mode: autoscale
  replicas:
    min: 0
    max: 8

spec.scaling.pool, when set, must name a pool in InfraRules/default. When omitted, the operator uses worker.computeClass to choose the stock cpu or gpu pool. Helm installs the well-known cpu, cpu-large, and gpu pools by default. mode: autoscale creates a KEDA ScaledObject backed by pipeline queue depth, or by the cron window when spec.schedule is set. mode: fixed pins the Deployment to replicas.min; mode: disabled scales it to zero.

spec.scaling.warmWindowSeconds sets a cooldown (and node retention) that holds the worker warm after its queue drains — see Workload scaling.

spec.paused: true also scales the worker to zero.

Status

Use the pipeline status API for status: queue counts, stage progress, and worker state. The resource itself reports only managed object references and readiness conditions.