Overview

Concepts

Control loops

Layer uses a control loop as a core primitive for managing your indexes. It reconciles index state against metrics emitted by the search system, which is how Layer applies row-level transformations (UDFs) and keeps an index’s stable view current.

Related: UDFs, snapshots, stable watermark.

Kubernetes autoscaling

Because Layer is stateless, you can autoscale every tier independently. Karpenter handles node-level scaling, and KEDA scales pods against signals from an embedded PostgreSQL queue. The data in that queue is used for scaling decisions only — it carries no non-recoverable system state.

Gateway enhancements

Where helpful, the gateway extends your search system with common query patterns and filtering primitives. Layer’s enhancements use reserved _hevlayer_* attributes; changing the schema on those attributes breaks Layer’s guarantees but should degrade gracefully. All functionality is exposed through one API surface, the Python, Go, or TypeScript client, or plain REST, so applications can route every call through the gateway. Layer works best when traffic flows through it consistently, even for requests that need no extra behavior.

Scatter/gather

Layer can partition a single namespace into hash buckets, called shards, by assigning each row a reserved _hevlayer_shard attribute (xxh64 of its id, modulo the shard count). The gateway then scatters a query to every bucket in parallel, one _hevlayer_shard-filtered query per shard, and gathers the results: it merges and re-ranks the combined rows down to your requested top_k before returning them. Sharding stays invisible to the client — you issue one query and get one ranked result set. The same scatter/gather path backs scans (filter, full-text, and radius) and UDF discovery scans.

For an existing turbopuffer namespace adopted by Layer, initialize sharding with POST /v2/namespaces/{namespace}/init and a shard_count. The gateway writes a reserved namespace marker, stamps new writes immediately, and runs an embedded scan-and-patch backfill for rows that do not yet have _hevlayer_shard. Scatter/gather activates only after namespace metadata reports layer.shard_lag_rows: 0; until then queries and scans use the single-namespace path so unstamped rows are not missed.

Document cache

The Layer document cache does two jobs. Document reads are served pull-through: the gateway checks the cache first, and on a miss reads through to turbopuffer (or S3 for snapshots), returns the row, and backfills the cache best-effort. Pipeline chunk handoff uses the same store as the queue between CPU and GPU workers. Neither job makes it a hard dependency: document reads fall through to origin if the cache is unavailable, and chunk reads fall back to S3 backing (see Failure modes). One logical cache serves every path, with different uses (document fetch, pipeline chunks, snapshot field-values) separated into dedicated cache sets.

Glossary

Concept	Current meaning
Namespace	A turbopuffer namespace addressed through `/v2/namespaces/{namespace}`.
Document	A row id plus attributes, and optionally a vector when writing/searching.
Document cache	Layer-managed hot records keyed by namespace and document id, plus cache sets for pipeline chunks and snapshots.
Stable watermark	Epoch-ms cut tracked by the consistency watcher when turbopuffer reports up-to-date, or when a backing store without an index watermark settles its `row_count` across consecutive polls.
Ready signal	Whether a namespace is fully indexed: `indexed` / `index_lag_rows` on namespace metadata, reconciled from the latest snapshot when every row’s vector is indexed.
Pipeline	A PostgreSQL-backed state machine for CPU extraction and GPU embedding work.
Snapshot	A content-addressed S3 facet histogram written after a namespace is observed stable.
Facet listing	The distinct values for a field, precomputed in snapshots as `fields[].values[].v` or computed on demand by a values scan.
Facet count	The document count for a facet value, returned as `fields[].values[].n` in snapshots and `values[].n` in values scan results.
Scan	On-demand row selection by filter, full-text (`fts`), or radius (`ann`) that returns matching IDs or field values asynchronously, or a row count synchronously.
UDF	A stateless worker the gateway coordinates over existing rows to enrich, fan out, or re-upsert data.
Gateway	The Rust proxy fronting turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime.
Operator	The Kubernetes operator that reconciles Layer’s CRDs — functions, pipelines, scaling, and cluster config.
Shard	A hash bucket within a single namespace. Each row carries a reserved `_hevlayer_shard` value (xxh64 of its id, modulo the shard count) so the gateway can scatter/gather a query across buckets.
Leg	One subquery in a hybrid text expansion: the full-input BM25 leg or one per-token fuzzy leg. Every leg of a query reads the same stable watermark cut.
RRF	Reciprocal rank fusion — turbopuffer-native re-ranking (`rerank_by: ["RRF", ...]`) that merges legs into one list. Layer delegates all fusion math upstream.
Tokenizer policy	The documented transform from a `HybridText` input to query tokens: UAX #29 word boundaries and lowercasing via turbopuffer’s open-source `alyze` tokenizer (the production `word_v4` code), then drop tokens under 2 characters, dedupe, cap at 15.
Route	The retrieval strategy the query router picks for an `Auto` query (`hybrid_text`, `semantic`, or `fused`), chosen from the shape of the input text alone. Vector availability gates execution, not the choice.
Routing policy	The deterministic, versioned decision function behind `Auto`. The version travels in the `routing` echo block and search history so threshold changes are visible.
Deferral	The response to a vectorless `Auto` query routed `semantic` or `fused`: the routing decision with `executed: false` and no rows. The application embeds and re-issues with the route forced.
CRD	Custom Resource Definition: the Kubernetes-native resources the operator reconciles — functions, pipelines, scaling, and indexes.
PromQL	The Prometheus query language. The gateway proxies it to the embedded VictoriaMetrics so you can query metrics without a separate scraper.