See it in action on the hev-shop demo store.

Overview

Concepts

Control loops

Layer uses a control loop as a core primitive for managing your indexes. It reconciles index state against metrics emitted by the search system, which is how Layer applies row-level transformations (UDFs) and keeps an index’s stable view current.

Related: UDFs, snapshots, stable watermark.

Kubernetes autoscaling

Because Layer is stateless, you can autoscale every tier independently. Karpenter handles node-level scaling, and KEDA scales pods against signals from an embedded PostgreSQL queue. The data in that queue is used for scaling decisions only — it carries no non-recoverable system state.

Gateway enhancements

Where helpful, the gateway extends your search system with common query patterns and filtering primitives. Layer’s enhancements use reserved _hevlayer_* attributes; changing the schema on those attributes breaks Layer’s guarantees but should degrade gracefully. All functionality is exposed through one API surface, the Python, Go, or TypeScript client, or plain REST, so applications can route every call through the gateway. Layer works best when traffic flows through it consistently, even for requests that need no extra behavior.

Scatter/gather

Layer can partition a single namespace into hash buckets, called shards, by assigning each row a reserved _hevlayer_shard attribute (xxh64 of its id, modulo the shard count). The gateway then scatters a query to every bucket in parallel, one _hevlayer_shard-filtered query per shard, and gathers the results: it merges and re-ranks the combined rows down to your requested top_k before returning them. Sharding stays invisible to the client — you issue one query and get one ranked result set. The same scatter/gather path backs scans (filter, full-text, and radius) and UDF discovery scans.

Document cache

The document cache (NVMe-backed Aerospike) does two jobs. Document reads are served pull-through: the gateway checks the cache first, and on a miss reads through to Turbopuffer (or S3 for snapshots), returns the row, and backfills the cache best-effort. Pipeline chunk handoff uses the same store as the queue between CPU and GPU workers. Neither job makes it a hard dependency: document reads fall through to origin if the cache is unavailable, and chunk reads fall back to S3 backing (see Failure modes). One logical cache serves every path, with different uses (document fetch, pipeline chunks, snapshot field-values) separated by Aerospike set.

Glossary

ConceptCurrent meaning
NamespaceA Turbopuffer namespace addressed through /v2/namespaces/{namespace}.
DocumentA row id plus attributes, and optionally a vector when writing/searching.
Document cacheNVMe-backed records keyed by namespace and document id, plus cache sets for pipeline chunks and snapshots.
Stable watermarkEpoch-ms cut tracked by the consistency watcher when Turbopuffer index status is up-to-date.
Ready signalWhether a namespace is fully indexed: indexed / index_lag_rows on namespace metadata, reconciled from the latest snapshot when every row’s vector is indexed.
PipelineA PostgreSQL-backed state machine for CPU extraction and GPU embedding work.
SnapshotA content-addressed S3 facet histogram written after a namespace is observed stable.
Facet listingThe distinct values for a field, precomputed in snapshots as fields[].values[].v or computed on demand by a values scan.
Facet countThe document count for a facet value, returned as fields[].values[].n in snapshots and values[].n in values scan results.
ScanOn-demand row selection by filter, full-text (fts), or radius (ann) that returns matching IDs or field values asynchronously, or a row count synchronously.
UDFA stateless worker the gateway coordinates over existing rows to enrich, fan out, or re-upsert data.
GatewayThe Rust proxy fronting Turbopuffer that serves the compatible API plus cache, scans, snapshots, pipelines, and the UDF runtime.
OperatorThe Kubernetes operator that reconciles Layer’s CRDs — functions, pipelines, scaling, and cluster config.
ShardA hash bucket within a single namespace. Each row carries a reserved _hevlayer_shard value (xxh64 of its id, modulo the shard count) so the gateway can scatter/gather a query across buckets.
LegOne subquery in a hybrid text expansion: the full-input BM25 leg or one per-token fuzzy leg. Every leg of a query reads the same stable watermark cut.
RRFReciprocal rank fusion — Turbopuffer-native re-ranking (rerank_by: ["RRF", ...]) that merges legs into one list. Layer delegates all fusion math upstream.
Tokenizer policyThe documented transform from a HybridText input to query tokens: UAX #29 word boundaries and lowercasing via Turbopuffer’s open-source alyze tokenizer (the production word_v4 code), then drop tokens under 2 characters, dedupe, cap at 15.
RouteThe retrieval strategy the query router picks for an Auto query (hybrid_text, semantic, or fused), chosen from the shape of the input text alone. Vector availability gates execution, not the choice.
Routing policyThe deterministic, versioned decision function behind Auto. The version travels in the routing echo block and search history so threshold changes are visible.
DeferralThe response to a vectorless Auto query routed semantic or fused: the routing decision with executed: false and no rows. The application embeds and re-issues with the route forced.
CRDCustom Resource Definition: the Kubernetes-native resources the operator reconciles — functions, pipelines, scaling, and indexes.
PromQLThe Prometheus query language. The gateway proxies it to the embedded VictoriaMetrics so you can query metrics without a separate scraper.
esc