Overview
Concepts
Control loops
Layer uses a control loop as a core primitive for managing your indexes. It reconciles index state against metrics emitted by the search system, which is how Layer applies row-level transformations (UDFs) and keeps an index’s stable view current.
Related: UDFs, snapshots, stable watermark.
Kubernetes autoscaling
Because Layer is stateless, you can autoscale every tier independently. Karpenter handles node-level scaling, and KEDA scales pods against signals from an embedded PostgreSQL queue. The data in that queue is used for scaling decisions only — it carries no non-recoverable system state.
Gateway enhancements
Where helpful, the gateway extends your search system with common query patterns and filtering primitives. Layer’s enhancements use reserved _hevlayer_* attributes; changing the schema on those attributes breaks Layer’s guarantees but should degrade gracefully. All functionality is exposed through one API surface, the Python, Go, or TypeScript client, or plain REST, so applications can route every call through the gateway. Layer works best when traffic flows through it consistently, even for requests that need no extra behavior.
Scatter/gather
Layer can partition a single namespace into hash buckets, called shards, by assigning each row a reserved _hevlayer_shard attribute (xxh64 of its id, modulo the shard count). The gateway then scatters a query to every bucket in parallel, one _hevlayer_shard-filtered query per shard, and gathers the results: it merges and re-ranks the combined rows down to your requested top_k before returning them. Sharding stays invisible to the client — you issue one query and get one ranked result set. The same scatter/gather path backs scans (filter, full-text, and radius) and UDF discovery scans.
Document cache
The document cache (NVMe-backed Aerospike) does two jobs. Document reads are served pull-through: the gateway checks the cache first, and on a miss reads through to Turbopuffer (or S3 for snapshots), returns the row, and backfills the cache best-effort. Pipeline chunk handoff uses the same store as the queue between CPU and GPU workers. Neither job makes it a hard dependency: document reads fall through to origin if the cache is unavailable, and chunk reads fall back to S3 backing (see Failure modes). One logical cache serves every path, with different uses (document fetch, pipeline chunks, snapshot field-values) separated by Aerospike set.
Glossary
| Concept | Current meaning |
|---|---|
| Namespace | A Turbopuffer namespace addressed through /v2/namespaces/{namespace}. |
| Document | A row id plus attributes, and optionally a vector when writing/searching. |
| Document cache | NVMe-backed records keyed by namespace and document id, plus cache sets for pipeline chunks and snapshots. |
| Stable watermark | Epoch-ms cut tracked by the consistency watcher when Turbopuffer index status is up-to-date. |
| Ready signal | Whether a namespace is fully indexed: indexed / index_lag_rows on namespace metadata, reconciled from the latest snapshot when every row’s vector is indexed. |
| Pipeline | A PostgreSQL-backed state machine for CPU extraction and GPU embedding work. |
| Snapshot | A content-addressed S3 facet histogram written after a namespace is observed stable. |
| Facet listing | The distinct values for a field, precomputed in snapshots as fields[].values[].v or computed on demand by a values scan. |
| Facet count | The document count for a facet value, returned as fields[].values[].n in snapshots and values[].n in values scan results. |
| Scan | On-demand row selection by filter, full-text (fts), or radius (ann) that returns matching IDs or field values asynchronously, or a row count synchronously. |
| UDF | A stateless worker the gateway coordinates over existing rows to enrich, fan out, or re-upsert data. |
| Gateway | The Rust proxy fronting Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime. |
| Operator | The Kubernetes operator that reconciles Layer’s CRDs — functions, pipelines, scaling, and cluster config. |
| Shard | A hash bucket within a single namespace. Each row carries a reserved _hevlayer_shard value (xxh64 of its id, modulo the shard count) so the gateway can scatter/gather a query across buckets. |
| Leg | One subquery in a hybrid text expansion: the full-input BM25 leg or one per-token fuzzy leg. Every leg of a query reads the same stable watermark cut. |
| RRF | Reciprocal rank fusion — Turbopuffer-native re-ranking (rerank_by: ["RRF", ...]) that merges legs into one list. Layer delegates all fusion math upstream. |
| Tokenizer policy | The documented transform from a HybridText input to query tokens: UAX #29 word boundaries and lowercasing via Turbopuffer’s open-source alyze tokenizer (the production word_v4 code), then drop tokens under 2 characters, dedupe, cap at 15. |
| Route | The retrieval strategy the query router picks for an Auto query (hybrid_text, semantic, or fused), chosen from the shape of the input text alone. Vector availability gates execution, not the choice. |
| Routing policy | The deterministic, versioned decision function behind Auto. The version travels in the routing echo block and search history so threshold changes are visible. |
| Deferral | The response to a vectorless Auto query routed semantic or fused: the routing decision with executed: false and no rows. The application embeds and re-issues with the route forced. |
| CRD | Custom Resource Definition: the Kubernetes-native resources the operator reconciles — functions, pipelines, scaling, and indexes. |
| PromQL | The Prometheus query language. The gateway proxies it to the embedded VictoriaMetrics so you can query metrics without a separate scraper. |