API

Query & Fetch

This is Layer’s query API. Layer reports its own metadata in x-layer-* response headers.

Stable reads

Layer’s query API defaults to stable reads. Every response carries an x-layer-stable-as-of watermark: the point the backing index is known to be caught up to. A query issued right after an upsert never returns partially-indexed rows and never 429s under write pressure, so derived views like facets and counts stay in sync with your index.

HTTP/1.1 200 OK
x-layer-stable-as-of: 1715600400000

{"rows":[{"id":"asin-B08N5WRWNW","$dist":0.42,"title":"..."}]}

This is achieved by:

Queries run at consistency=eventual upstream, so they never block on indexing.
A control loop polls each registered namespace’s index.status and records the latest status plus, when stable, a watermark equal to poll_start - safety_margin. Cold or updating namespaces use the fast polling interval; stable namespaces back off to the stable interval until the next write re-arms the fast tier.
Per-query decision:
- Updating → inject a hidden _hevlayer_upserted_at <= watermark predicate so the read never sees partially-indexed rows.
- Stable or Unknown → run without the predicate. The upstream index is caught up (or no contrary evidence exists).
On a 429 to an unfiltered query, Layer retries once with the watermark filter forced on.

Responses report x-layer-stable-as-of (epoch ms) when the watcher has a watermark for the namespace. It is omitted on a cold-start gateway that has not yet observed a stable poll. Query responses always carry a top-level next_cursor in the body — the next page token when more results may exist, or null on the last page — and mirror a non-null token in the x-layer-next-cursor header. Pass a non-null value back as cursor in the next request body. Native vector queries use a score-band cursor. Layer-fused HybridText and executed Auto routes use a bounded server-side re-fetch cursor, capped at 10,000 ranked rows.

To pin a query to an explicit temporal cut, pass either as_of or between in the request body. as_of: 1747300000123 conjoins _hevlayer_upserted_at <= 1747300000123; between: [lo, hi] conjoins lo < _hevlayer_upserted_at <= hi. They are mutually exclusive. Temporal selectors compose with filters, nearest_to_id, a top-level batch query, and the Layer HybridText / Auto rank expressions. Native turbopuffer passthrough bodies remain upstream-shaped and are not rewritten.

Stable-read behavior is set per namespace with the consistency field on the Index CRD. Two gateway tunables control the watcher:

Variable	Default	Purpose
`CONSISTENCY_POLL_INTERVAL_MS`	1000	Fast cadence for cold and updating namespaces.
`CONSISTENCY_STABLE_POLL_INTERVAL_MS`	60000	Slow cadence for namespaces last observed stable. Set equal to the fast interval to restore one uniform cadence.
`CONSISTENCY_SAFETY_MARGIN_MS`	500	Cushion between poll time and watermark to cover in-flight upserts.

Query by id

Pass nearest_to_id in place of vector to rank by stored document vectors instead of a raw query vector — exactly one of the two is required. nearest_to_id takes an array of document ids: the gateway resolves each id’s vector (document cache first, the namespace’s configured VectorStore on miss with a cache backfill) and averages them component-wise into a single centroid, then ranks nearest neighbors to that centroid. Pass one id to rank by a single document; pass several to get “more like these” over a set of seeds.

response = await client.query_namespace("products", {
    "nearest_to_id": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
    "top_k": 10,
    "include_attributes": ["title", "category"],
})

response, err := client.QueryNamespace(ctx, "products", &hevlayer.QueryRequest{
    NearestToID:       []string{"asin-B08N5WRWNW", "asin-B07PXGQC1Q"},
    TopK:              10,
    IncludeAttributes: []string{"title", "category"},
})

const response = await client.queryNamespace("products", {
  nearest_to_id: ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
  top_k: 10,
  include_attributes: ["title", "category"],
});

curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/query" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "nearest_to_id": ["asin-B08N5WRWNW", "asin-B07PXGQC1Q"],
    "top_k": 10,
    "include_attributes": ["title", "category"]
  }'

Outcome	Status
Every id resolved (cache or origin)	200, ranked results
Any id has no stored vector anywhere	404 (names the missing ids)
`nearest_to_id` empty, or both/neither of `vector` / `nearest_to_id`	422

The centroid is an unweighted mean, so seed ids contribute equally regardless of how many you pass. All resolved vectors share the namespace’s dimensionality, so no reconciliation is needed across seeds. This fuses the seeds into one ranking; to run several independent rankings in a single request, see batch query.

Rank expressions

Pass rank_by with top_k when you need an explicit ranking operator instead of the top-level vector / nearest_to_id shape. Layer handles the portable subset with the same cache, history, and stable-read behavior as vector queries. Native upstream query bodies that omit top_k remain pass-through.

rank_by is mutually exclusive with vector and nearest_to_id.

Batch query

nearest_to_id fuses several seeds into a single ranking. To run several independent queries in one round trip, each with its own ranking, post a queries array. The response is a parallel results array — one ranked result set per query, in request order: { "results": [{ "rows": ... }] }. Layer holds every leg on the same stable cut, so a batch reads one consistent view of the index.

(The method is batch_query_namespace. It is named apart from turbopuffer’s own upstream multi-query — a rerank_by body, which Layer passes through unchanged, as noted at the end of this section — to keep the two distinct.)

batch = await client.batch_query_namespace("products", {
    "queries": [
        {"rank_by": ["vector", "ANN", [0.1, 0.2, 0.3]], "top_k": 10},
        {"rank_by": ["title", "BM25", "wireless earbuds"], "top_k": 10},
    ],
})
# batch.results[0].rows ranked by vector; batch.results[1].rows by text

batch, err := client.BatchQueryNamespace(ctx, "products",
    &hevlayer.BatchQueryRequest{
        Queries: []hevlayer.TurbopufferQueryRequest{
            {"rank_by": []any{"vector", "ANN", []float64{0.1, 0.2, 0.3}}, "top_k": 10},
            {"rank_by": []any{"title", "BM25", "wireless earbuds"}, "top_k": 10},
        },
    })

const batch = await client.batchQueryNamespace("products", {
  queries: [
    { rank_by: ["vector", "ANN", [0.1, 0.2, 0.3]], top_k: 10 },
    { rank_by: ["title", "BM25", "wireless earbuds"], top_k: 10 },
  ],
});
// batch.results[0].rows ranked by vector; batch.results[1].rows by text

curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/query" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "queries": [
      {"rank_by": ["vector", "ANN", [0.1, 0.2, 0.3]], "top_k": 10},
      {"rank_by": ["title", "BM25", "wireless earbuds"], "top_k": 10}
    ]
  }'

All legs in a non-fused batch share one x-layer-stable-as-of value. A leg may use native rank_by, or the Layer vector / nearest_to_id single-query shape; nearest_to_id is resolved before the leg is sent upstream. Batches must contain 2 to 16 legs. cursor is rejected at the top level and per leg because pagination is single-query only.

When rerank_by is present, Layer treats the request as an upstream fused query and passes the body through unchanged.

Reach for a batch query when you genuinely need N rankings — distinct user queries batched into one round trip, or hybrid retrieval fused upstream with RRF. Reach for nearest_to_id when many seeds should collapse into one “more like these” ranking. To get typo-tolerant text search without building the fused query yourself, see hybrid text fusion.

Every leg here targets the namespace in the path. To fan one query across a set of namespaces — and merge them into a single ranked list — see federated query.

Hybrid text fusion

BM25 misses typos and morphological variants; fuzzy matching alone loses the relevance signal BM25 provides. HybridText runs both in one request: the gateway tokenizes your input string, expands it into one BM25 leg plus one fuzzy leg per token, and the effective legs are RRF-fused into one ranking. One expression in, typo-tolerant ranked results out.

HybridText is a Layer-only rank_by spelling on the existing query route — no new endpoint, no client changes beyond the expression. The gateway tokenizes with alyze, turbopuffer’s own open-source tokenizer and the same code that segmented your text at index time, so query tokens match index terms by construction.

The ranked field must be indexed for both full-text and fuzzy matching — declare it {"type": "string", "full_text_search": true, "fuzzy": true} in the namespace schema. The BM25 leg uses the full-text index; the per-token fuzzy legs use the fuzzy index.

response = await client.query_namespace("support-tickets", {
    "rank_by": ["content", "HybridText", "conection timout kubernets"],
    "top_k": 10,
    "filters": ["tenant", "Eq", "t-42"],
    "include_attributes": ["content", "title"],
})

response, err := client.QueryNamespace(ctx, "support-tickets", &hevlayer.QueryRequest{
    RankBy:            []any{"content", "HybridText", "conection timout kubernets"},
    TopK:              10,
    Filters:           []any{"tenant", "Eq", "t-42"},
    IncludeAttributes: []string{"content", "title"},
})

const response = await client.queryNamespace("support-tickets", {
  rank_by: ["content", "HybridText", "conection timout kubernets"],
  top_k: 10,
  filters: ["tenant", "Eq", "t-42"],
  include_attributes: ["content", "title"],
});

curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/support-tickets/query" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rank_by": ["content", "HybridText", "conection timout kubernets"],
    "top_k": 10,
    "filters": ["tenant", "Eq", "t-42"],
    "include_attributes": ["content", "title"]
  }'

An optional fourth tuple element tunes the expansion. Defaults:

["content", "HybridText", "conection timout kubernets", {
  "fuzziness": "auto",
  "rank_constant": 60,
  "per_leg_limit": null
}]

Option	Default	Meaning
`fuzziness`	`"auto"`	Edit-distance tolerance for the fuzzy legs, keyed to each token’s length (turbopuffer requires at least 3 query characters per edit). `"auto"` permits up to distance 2: exact for tokens of 3–5 characters, distance 1 for 6–8, distance 2 for 9 or more. Fixed `0`, `1`, or `2` caps the ladder, so `0` is exact-only.
`rank_constant`	`60`	turbopuffer’s RRF constant, passed through verbatim. Integer > 0.
`per_leg_limit`	`clamp(5 × top_k, 50, 200)`	How deep each leg retrieves before fusion. Integer > 0.
`threads`	`Index.spec.scan.threads`, else `8`	Maximum concurrent upstream requests when the gateway scatter/gathers the expansion across a sharded namespace — the same fan-out control as scans. Clamped to active shards. No effect on unsharded namespaces, where the expansion is a single fused upstream call.

Set top-level include_leg_breakdown: true to return per-row $fused.legs attribution. Each leg entry reports the leg label, that row’s 1-based rank within the leg, and the leg’s raw score or distance. rank and score are null when the row fell outside that leg’s per_leg_limit cut. Labels are bm25, fuzzy:<token>, and semantic on routed fused queries.

Tokenization

The input string becomes tokens under a fixed, documented policy:

Split on Unicode (UAX #29) word boundaries and lowercase, using alyze — the code behind turbopuffer’s production word_v4 tokenizer. Punctuation-only tokens never survive the split.
Drop tokens shorter than 2 characters.
Dedupe.
Cap at 15 tokens (15 fuzzy legs + 1 BM25 leg = 16, the upstream subquery limit). Tokens cut by the cap are counted in tokens_dropped.

Stemming, stopword removal, and language detection are not applied. The input must yield at least one token; one token is fine (that is still two legs, the RRF minimum).

Response

Results are the RRF-fused list. A hybrid block echoes the effective expansion so defaults are never invisible:

{
  "rows": [
    {
      "id": "ticket-4117",
      "$score": 0.0639,
      "content": "...",
      "title": "Connection timeout on Kubernetes ingress"
    }
  ],
  "hybrid": {
    "tokens": ["conection", "timout", "kubernets"],
    "tokens_dropped": 0,
    "fuzziness": "auto",
    "rank_constant": 60,
    "legs": 4,
    "per_leg_limit": 50
  },
  "next_cursor": null
}

Field	Meaning
`$score`	RRF score. Comparable within a response, not across requests — do not threshold on it.
`$fused.legs`	Present only when `include_leg_breakdown: true`. Per-leg attribution in effective leg order; each item has `leg`, `rank`, and `score`.
`tokens`	Tokens that produced fuzzy legs, post-policy.
`tokens_dropped`	Tokens removed by the 15-token cap (not by the length or punctuation rules).
`legs`	Total effective subqueries in the fused expansion. Normally the fuzzy legs + 1 BM25 leg (plus 1 ANN leg on routed fused queries). On the `surfaced` fallback there is no BM25 leg, so `legs` equals the token count (one fuzzy leg per token).
`surfaced`	Present and `true` only when the empty-result fallback fired (see Surfacing fallback). Absent on the normal path.
`next_cursor`	Top-level field (not inside `hybrid`), always present in the body: the next page token, or `null` on the last page. Mirrors the `x-layer-next-cursor` header. Pass a non-null value back as `cursor`.

The hybrid block appears only on HybridText responses. On sharded namespaces it also reports the effective threads fan-out width. Requests without a HybridText expression, including native turbopuffer multi-query + rerank_by bodies, keep their upstream-shaped responses byte-for-byte.

Surfacing fallback

Every primary leg ranks by BM25 over the full input, which upstream scores at zero — and drops — when no token matches a stored term exactly. A fully-misspelled query therefore fuses to zero rows. When the primary expansion returns nothing, Layer re-runs one fuzzy leg per token, reorders each leg by field/token edit distance, and fuses those instead, so a typo-heavy query still surfaces near matches.

The response then carries "surfaced": true in the hybrid block, and legs reflects the surfacing expansion — one fuzzy leg per token, with no BM25 leg. Working queries never reach this path; the fallback is purely additive and absent (surfaced omitted) on the normal path.

Semantics

Fusion. RRF uses the effective leg order: BM25 first, then one fuzzy leg per token, then the semantic ANN leg on routed fused queries. include_leg_breakdown: true can require one upstream query per leg on unsharded namespaces so Layer can report per-leg ranks.
One consistency cut. Request-level filters are replicated to every leg, and the stable-read watermark predicate is injected into every leg from a single read — all legs see the same cut. Responses carry x-layer-stable-as-of as usual.
All-or-nothing. Any leg failure fails the request; Layer does not return a partial fusion over surviving legs.
Replay as a unit. The query logs to search history as one entry carrying the HybridText expression, so replaying it reproduces the whole expansion.

Validation

All return 422:

Condition	Why
Input yields zero tokens under the policy	Nothing to expand.
`HybridText` inside a `queries` array	The expansion is already one batch deep by construction.
`fuzziness` not in `"auto" \| 0 \| 1 \| 2`; `rank_constant` ≤ 0; `per_leg_limit` ≤ 0; `threads` < 1	Out of range.

To let the gateway pick between hybrid text and semantic retrieval per query, see query routing.

Query routing

Real search boxes receive both "timout" and "why do pods lose their connection during deploys". The first wants hybrid text fusion; the second wants semantic retrieval — lexical legs add noise on long conversational input, and ANN underperforms on short identifier-shaped tokens. Auto is a Layer-only rank_by spelling that makes that call per query, so the branch doesn’t live ad hoc in your application code.

The gateway never embeds. The route is chosen from the shape of the input alone, and when the chosen route needs a query vector the request didn’t include, the response is the routing decision instead of results — your application embeds and re-issues with the route forced. Short keyword traffic executes immediately and never pays for an embedding.

response = await client.query_namespace("support-tickets", {
    "rank_by": ["content", "Auto", user_input],
    "top_k": 10,
    "filters": ["tenant", "Eq", "t-42"],
})

if not response.routing.executed:
    vector = await embed(user_input)  # your model or API
    response = await client.query_namespace("support-tickets", {
        "rank_by": ["content", "Auto", user_input, {
            "route": response.routing.route,
            "vector": vector,
        }],
        "top_k": 10,
        "filters": ["tenant", "Eq", "t-42"],
    })

response, err := client.QueryNamespace(ctx, "support-tickets", &hevlayer.QueryRequest{
    RankBy:  []any{"content", "Auto", userInput},
    TopK:    10,
    Filters: []any{"tenant", "Eq", "t-42"},
})

if err == nil && !response.Routing.Executed {
    vector := embed(userInput) // your model or API
    response, err = client.QueryNamespace(ctx, "support-tickets", &hevlayer.QueryRequest{
        RankBy: []any{"content", "Auto", userInput, map[string]any{
            "route":  response.Routing.Route,
            "vector": vector,
        }},
        TopK:    10,
        Filters: []any{"tenant", "Eq", "t-42"},
    })
}

let response = await client.queryNamespace("support-tickets", {
  rank_by: ["content", "Auto", userInput],
  top_k: 10,
  filters: ["tenant", "Eq", "t-42"],
});

if (!response.routing.executed) {
  const vector = await embed(userInput); // your model or API
  response = await client.queryNamespace("support-tickets", {
    rank_by: ["content", "Auto", userInput, {
      route: response.routing.route,
      vector,
    }],
    top_k: 10,
    filters: ["tenant", "Eq", "t-42"],
  });
}

# First request: no vector. Executes lexically, or returns the decision.
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/support-tickets/query" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rank_by": ["content", "Auto", "why do pods lose their connection during deploys"],
    "top_k": 10,
    "filters": ["tenant", "Eq", "t-42"]
  }'

# Routed semantic without a vector, so the body is the decision, not rows:
# {"rows": [], "routing": {"route": "semantic", "policy": "v1", "tokens": 8, "executed": false}}

# Embed, then re-issue with the route forced:
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/support-tickets/query" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rank_by": ["content", "Auto", "why do pods lose their connection during deploys", {
      "route": "semantic",
      "vector": [0.0012, -0.043]
    }],
    "top_k": 10,
    "filters": ["tenant", "Eq", "t-42"]
  }'

Routing policy

The v1 policy reads the token count of the input under the same tokenizer policy as hybrid text fusion:

Tokens	Route	Runs
≤ 2	`hybrid_text`	The hybrid text fusion expansion.
≥ 8	`semantic`	ANN over the supplied query vector.
3 – 7	`fused`	Both, merged upstream by RRF.

Vector availability never changes which route is chosen — only whether it executes in this request. hybrid_text always executes; semantic and fused execute when the request supplies a vector and defer otherwise. The policy is versioned ("policy": "v1") so threshold changes are visible in search history.

Options

The optional fourth tuple element:

Option	Default	Meaning
`route`	`"auto"`	Force `"hybrid_text"`, `"semantic"`, or `"fused"` instead of applying the policy. Used on re-issue after a deferral, and for A/B comparison of strategies on the same input.
`vector`	—	The query vector for the semantic leg. Dimensionality must match the namespace.
`fuzziness`	`"auto"`	Forwarded to the `HybridText` expansion on the `hybrid_text` and `fused` routes: `"auto"`, `0`, `1`, or `2`. `0` forces exact-only matching. No effect on the `semantic` route.

When the chosen route expands hybrid-text legs, the hybrid defaults apply and the hybrid echo block appears alongside routing. Set top-level include_leg_breakdown: true to add $fused.legs to each fused row; the fused route includes a final semantic leg after the BM25 and fuzzy-token legs.

Response

Every Auto response carries a routing block:

{
  "rows": [{"id": "ticket-4117", "$score": 0.0639, "title": "..."}],
  "routing": {
    "route": "hybrid_text",
    "policy": "v1",
    "tokens": 1,
    "executed": true
  },
  "hybrid": {"tokens": ["timout"], "tokens_dropped": 0, "fuzziness": "auto", "rank_constant": 60, "legs": 2, "per_leg_limit": 50}
}

Field	Meaning
`route`	The strategy chosen (or forced).
`policy`	Routing policy version that made the decision. `"forced"` when `route` was supplied.
`tokens`	Token count the policy read, post tokenizer policy.
`executed`	`false` on a deferral: the route needs a vector the request didn’t supply. `rows` is empty; embed and re-issue with the route forced.

Routed queries follow the same semantics as their underlying strategy: one consistency cut across all legs, all-or-nothing leg failure, and a single search history entry carrying the Auto expression and the decision.

Validation

All return 422:

Condition	Why
Forced `"semantic"` or `"fused"` without `vector`	Forcing asserts you have the vector; only auto-routing defers.
Input yields zero tokens under the policy	Nothing to route.
`vector` dimensionality mismatch	Same check as a plain vector query.
`Auto` inside a `queries` array	Inherited from hybrid text fusion.

Counting matches

To count how many rows match a full-text or vector query, use scan count mode with the fts or ann selector. Ranked counts share the single /scans endpoint with filter counts — fts is exact, ann is a radius scan flagged approximate, and both honor the exhaustive flag and the count deadline.

Fetch

Fetch is a Layer-only endpoint with no upstream equivalent. The document cache is checked first; on miss or error the gateway falls through to the backing store and backfills the cache best-effort.

Single fetch

doc = await client.fetch_document(
    "products",
    "asin-B08N5WRWNW",
    include_attributes=["title", "category"],
)

doc, err := client.FetchDocument(ctx, "products", "asin-B08N5WRWNW",
    &hevlayer.FetchDocumentParams{
        IncludeAttributes: []string{"title", "category"},
    })

const doc = await client.fetchDocument("products", "asin-B08N5WRWNW", {
  includeAttributes: ["title", "category"],
});

curl "$LAYER_GATEWAY_URL/v2/namespaces/products/documents/asin-B08N5WRWNW?include_attributes=title,category" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"

Outcome	Status	Header
Cached hit	200	`x-layer-cache: hit`
Cache miss, upstream hit, cache backfilled	200	`x-layer-cache: miss`
Cache unavailable, upstream hit	200	`x-layer-cache: miss-on-error`
Missing from both layers	404	—

Batch fetch

batch = await client.fetch_documents("products", {
    "ids": ["asin-1", "asin-2", "asin-3"],
    "include_attributes": ["title"],
})

batch, err := client.FetchDocuments(ctx, "products", &hevlayer.FetchDocumentsRequest{
    Ids:               []string{"asin-1", "asin-2", "asin-3"},
    IncludeAttributes: []string{"title"},
})

const batch = await client.fetchDocuments("products", {
  ids: ["asin-1", "asin-2", "asin-3"],
  include_attributes: ["title"],
});

curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/documents" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": ["asin-1", "asin-2", "asin-3"],
    "include_attributes": ["title"]
  }'

{
  "documents": [
    {"id": "asin-1", "attributes": {"title": "..."}},
    {"id": "asin-3", "attributes": {"title": "..."}}
  ],
  "missing": ["asin-2"]
}

Batch fetch returns found documents and missing ids inline instead of a partial 404. documents preserves request order; ids the gateway could not find anywhere land in missing. Because order is preserved, batch fetch is a convenient way to reassemble a pipeline’s chunks back into their original document — request the chunk ids in sequence and concatenate the results.

Behavior matrix

Cache state	Single fetch	Batch fetch
Hit	cache	cache
Miss, upstream present	upstream + backfill	upstream + backfill
Miss, upstream absent	404	inline `missing`
Cache unavailable	upstream, `miss-on-error`	upstream, `miss-on-error`