See it in action on the hev-shop demo store.

API

Scan

A scan is on-demand row selection over a namespace. It picks rows by one of three selectors and returns their IDs (mode: ids, an asynchronous job), their count (mode: count, synchronous), or the distinct values of one attribute field (mode: values, an asynchronous job):

InputFieldMeaningNotes
Filter selectorfiltersAn attribute predicate, or all rows when omitted.Exact
Full-text selectorftsA BM25 predicate against a text field.Exact
Radius selectorannRows within radius of a query vector.Approximate (ANN recall)
Fan-out controlthreadsMaximum concurrent upstream requests for origin scatter/gather.Origin only; defaults from Index.spec.scan.threads, then 8.

A request carries at most one ranked selector (fts or ann). filters is always optional and, when present alongside a ranked selector, is ANDed onto the match set as an extra constraint. A request with both fts and ann is a 422.

At cutover, mode: ids is filter-only (ranked IDs are a defined fast-follow), while mode: count and mode: values support all three selectors. Use scans for bulk exports, manual inspection, UDF discovery debugging, cache/origin consistency checks, exact or approximate counts, and field value discovery.

Routes

RouteMethodBehavior
POST /v2/namespaces/{ns}/scansPOSTCreate an ID or values scan job, or return a count.
GET /v2/namespaces/{ns}/scansGETList scan jobs for the namespace.
GET /v2/namespaces/{ns}/scans/{id}GETRead one scan job.
GET /v2/namespaces/{ns}/scans/{id}/resultsGETRead completed scan IDs or values.
DELETE /v2/namespaces/{ns}/scans/{id}DELETEDrop the in-memory scan job.

ID Mode

job = await client.create_scan("products", {
    "source": "auto",
    "mode": "ids",
    "filters": ["category", "Eq", "Electronics"],
    "threads": 8,
    "page_size": 1000,
})
job, err := client.CreateScan(ctx, "products", &hevlayer.CreateScanRequest{
    Source:   "auto",
    Mode:     "ids",
    Filters:  []interface{}{"category", "Eq", "Electronics"},
    Threads:  8,
    PageSize: 1000,
})
const job = await client.createScan("products", {
  source: "auto",
  mode: "ids",
  filters: ["category", "Eq", "Electronics"],
  threads: 8,
  page_size: 1000,
});
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/scans" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "auto",
    "mode": "ids",
    "filters": ["category", "Eq", "Electronics"],
    "threads": 8,
    "page_size": 1000
  }'

mode defaults to ids. Valid ID-mode sources are auto, cache, and origin. The Python and TypeScript clients also ship scan(...) helpers that create the job and poll until it completes; in Go, poll GetScan until status is completed.

The create response is 202 Accepted:

{
  "id": "scan-uuid",
  "namespace": "products",
  "source": "auto",
  "effective_source": "origin",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "threads": 8,
  "created_at": "2026-05-26T10:00:00Z"
}

Read IDs after status is completed:

results = await client.get_scan_results("products", job.id, limit=1000, offset=0)
results, err := client.GetScanResults(ctx, "products", scanID,
    &hevlayer.GetScanResultsParams{Limit: 1000, Offset: 0})
const results = await client.getScanResults("products", job.id, {
  limit: 1000,
  offset: 0,
});
curl "$LAYER_GATEWAY_URL/v2/namespaces/products/scans/scan-uuid/results?limit=1000&offset=0" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"
{
  "ids": ["doc-1", "doc-2"],
  "total": 2
}

Count Mode

count = await client.create_scan("products", {
    "mode": "count",
    "source": "auto",
    "filters": ["category", "Eq", "Electronics"],
    "threads": 8,
    "timeout_seconds": 30,
})
count, err := client.CreateScan(ctx, "products", &hevlayer.CreateScanRequest{
    Mode:           "count",
    Source:         "auto",
    Filters:        []interface{}{"category", "Eq", "Electronics"},
    Threads:        8,
    TimeoutSeconds: 30,
})
const count = await client.createScan("products", {
  mode: "count",
  source: "auto",
  filters: ["category", "Eq", "Electronics"],
  threads: 8,
  timeout_seconds: 30,
});
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/scans" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "count",
    "source": "auto",
    "filters": ["category", "Eq", "Electronics"],
    "threads": 8,
    "timeout_seconds": 30
  }'
{
  "count": 4210,
  "served_by": "snapshot",
  "snapshot_sha": "3f9e8b21",
  "watermark_ms": 1747300000123,
  "elapsed_ms": 3
}

When watermark_ms is present, the response also includes x-layer-stable-as-of with the same epoch-ms value.

Count-mode sources are auto, snapshot, cache, and origin. Snapshot reads are eligible only for a single leaf Eq or In filter on a field present in the latest snapshot fields[]. And, Or, Not, range operators, fields absent from the snapshot, and skipped fields fall through under auto and fail with 412 precondition_failed under source: snapshot.

Live count responses include:

{
  "count": 4210,
  "served_by": "origin",
  "bounded": false,
  "timed_out": false,
  "shards_saturated": 0,
  "shards_total": 1,
  "threads": 1,
  "elapsed_ms": 42
}

Values Mode

A values scan enumerates the distinct values of one attribute field over the rows the selector picks, each with its document count. Use it to discover a field’s value set — what product categories exist, what tags appear on rows matching a query — instead of confirming values you already know with counts.

field is required for mode: values (and rejected on other modes with 422). It must name a scalar string or integer attribute, or an array of strings — each array element counts once per containing document. Vector fields are a 422.

job = await client.create_scan("products", {
    "mode": "values",
    "field": "category",
    "source": "auto",
    "filters": ["in_stock", "Eq", True],
})
job, err := client.CreateScan(ctx, "products", &hevlayer.CreateScanRequest{
    Mode:    "values",
    Field:   "category",
    Source:  "auto",
    Filters: []interface{}{"in_stock", "Eq", true},
})
const job = await client.createScan("products", {
  mode: "values",
  field: "category",
  source: "auto",
  filters: ["in_stock", "Eq", true],
});
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/scans" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "values",
    "field": "category",
    "source": "auto",
    "filters": ["in_stock", "Eq", true]
  }'

Like ID mode, the create response is a 202 Accepted job, and the scan(...) SDK helpers poll it to completion:

{
  "id": "scan-uuid",
  "namespace": "products",
  "mode": "values",
  "field": "category",
  "source": "auto",
  "effective_source": "origin",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "threads": 8,
  "created_at": "2026-05-26T10:00:00Z"
}

Read values from the same results route after status is completed, with the same limit/offset pagination as scan IDs:

{
  "values": [
    {"v": "electronics", "n": 4210},
    {"v": "books", "n": 1240}
  ],
  "total": 2,
  "truncated": false
}

v/n is the same vocabulary snapshot facet histograms use: v is the value, n its document count. Ordering is deterministic — n descending, then v ascending. Counts are exact for filter-selector scans; on a ranked scan with a saturated shard the job carries bounded: true and each n is a >= lower bound.

Precomputed serving

An unfiltered values scan (no filters, no ranked selector) on a field present in the latest snapshot fields[] is answered straight from the snapshot’s facet histogram: the job completes during the create call — the 202 body already shows status: completed — and carries effective_source: snapshot with snapshot_sha and watermark_ms. Fields in fields_skipped[] or absent from the snapshot fall through to cache/origin under auto and fail with 412 precondition_failed under explicit source: snapshot, as do scans carrying any selector.

High cardinality

Snapshot facet histograms cap each field at 10,000 distinct values and skip fields beyond it; values scans are the enumeration path for exactly those fields. A values job accumulates its histogram in gateway memory and caps the listing at 1,000,000 distinct values. A scan that crosses the cap completes rather than failing:

  • The cap applies after the full pass, so every emitted n stays exact.
  • The listing truncates deterministically to the top 1,000,000 values by count (value-ascending tiebreak); the low-count tail is dropped.
  • The job and its results carry truncated: true, meaning the listing is incomplete.

truncated, bounded, and approximate are independent flags: truncated is a gateway memory bound on the listing, bounded is upstream top_k saturation on a ranked scan’s counts, and approximate is ANN recall fuzz on a radius ball’s membership.

Fan-out width

Origin scans fan out one upstream request per active shard. threads sets the maximum number of those upstream requests a single scan may have in flight at once. It means concurrent requests, not operating-system threads; the gateway is async.

Resolution order:

  1. threads on the scan request.
  2. spec.scan.threads on the namespace’s Index resource.
  3. The gateway default, 8.

The effective value is clamped to the active shard count and the server cap, 32, then echoed as threads on origin responses and completed scan jobs. Snapshot and cache reads do not fan out, so they ignore this field and omit the echo.

Full-text count

Count rows matching a BM25 query with the fts selector. Full-text counts are exact and always run origin scatter/gather, so source must be omitted, auto, or origin. A filters array, when present, is ANDed on as an extra constraint.

count = await client.create_scan("products", {
    "mode": "count",
    "fts": {"field": "title", "query": "wireless headphones"},
    "filters": ["category", "Eq", "Electronics"],
    "exhaustive": True,
})
count, err := client.CreateScan(ctx, "products", &hevlayer.CreateScanRequest{
    Mode:       "count",
    Fts:        &hevlayer.FtsScan{Field: "title", Query: "wireless headphones"},
    Filters:    []interface{}{"category", "Eq", "Electronics"},
    Exhaustive: true,
})
const count = await client.createScan("products", {
  mode: "count",
  fts: { field: "title", query: "wireless headphones" },
  filters: ["category", "Eq", "Electronics"],
  exhaustive: true,
});
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/scans" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "count",
    "fts": {"field": "title", "query": "wireless headphones"},
    "filters": ["category", "Eq", "Electronics"],
    "exhaustive": true
  }'

Radius count

Count rows within radius of a query vector with the ann selector — a distance-ball scan. radius is required and finite (without an upper bound every row is in the ball); field defaults to vector. Like fts, radius counts always run origin scatter/gather.

The count is approximate: ANN recall means the index’s membership of the ball may differ from the true set, independent of saturation, so the response carries approximate: true.

count = await client.create_scan("products", {
    "mode": "count",
    "ann": {"field": "vector", "vector": [0.12, -0.3, 0.88], "radius": 0.25},
})
count, err := client.CreateScan(ctx, "products", &hevlayer.CreateScanRequest{
    Mode: "count",
    Ann:  &hevlayer.AnnScan{Field: "vector", Vector: []float64{0.12, -0.3, 0.88}, Radius: 0.25},
})
const count = await client.createScan("products", {
  mode: "count",
  ann: { field: "vector", vector: [0.12, -0.3, 0.88], radius: 0.25 },
});
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/scans" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "count",
    "ann": {"field": "vector", "vector": [0.12, -0.3, 0.88], "radius": 0.25}
  }'
{
  "count": 980,
  "served_by": "origin",
  "approximate": true,
  "bounded": false,
  "timed_out": false,
  "shards_saturated": 0,
  "shards_total": 1,
  "threads": 1,
  "elapsed_ms": 51
}

Bounding ranked scans

Ranked selectors fan out one Turbopuffer query per shard, each capped at top_k = 10_000. threads bounds fan-out width: how many shard requests can run at once. exhaustive and timeout_seconds bound depth: what happens when a shard hits that cap and how long recursion can run.

  • exhaustive: false (default) — one scatter/gather. A saturated shard contributes its cap as a lower bound; the response carries bounded: true with shards_saturated > 0.
  • exhaustive: true — recurse on each saturated shard via score-band pagination (BM25: $score < last with an id tiebreak; ANN: $dist > last) until every page is short or timeout_seconds elapses.

The same threads value applies to the initial round and every exhaustive round over the remaining saturated shards.

bounded and approximate are independent. bounded means a shard saturated and the count is a >= lower bound for the rows the index returned; approximate means the distance ball’s membership is itself fuzzy. An ann count can be bounded: false yet still approximate: true.

Sources

SourceID modeCount modeValues mode
autoCache when fresh enough, otherwise originSnapshot first, then cache/origin.Snapshot when eligible, then cache/origin.
snapshotNot supportedLatest snapshot only; requires eligible Eq or In.Latest snapshot facet listing; requires an unfiltered scan on a field in fields[].
cacheAerospike document cache onlyAerospike document cache onlyAerospike document cache only.
originTurbopuffer paginated scanTurbopuffer paginated scanTurbopuffer paginated scan with gateway-side dedupe.

This table covers the filter selector. The fts and ann selectors have no snapshot or cache evaluator, so they always run origin scatter/gather: omitted, auto, and origin all resolve to origin, and snapshot or cache returns 422.

Filters

Scans accept the same Turbopuffer filter array as query. On origin scans, the filter is pushed to Turbopuffer. On cache scans, the gateway evaluates it against cached document attributes.

Supported cache operators are Eq, NotEq, Gt, Gte, Lt, Lte, In, NotIn, And, Or, and Not. If auto sees a filter the cache cannot evaluate, it uses origin. Explicit source: cache with an unsupported filter fails rather than returning partial results.

Auto-Mode Policy

Auto ties cache freshness to the same consistency watermark used by stable reads. The gateway tracks per-namespace cache_warmed_through, the watermark observed at the end of the last successful origin warm.

Cache stateWatermark stateAction
EmptyanyRun origin and stamp cache_warmed_through.
Populated, cache_warmed_through >= watermarkobservedServe cache.
Populated, cache_warmed_through < watermarkobservedServe cache and start a background origin warm.
Populated, no cache_warmed_through yetobservedServe cache and start a background origin warm.
Populatednot yet observedServe cache.

When cache is used, _hevlayer_upserted_at <= cache_warmed_through is added before the user filter so the scan is a stable warmed view.

Operational notes

  • ID and values scan state is in-memory and ephemeral; it resets on gateway restart.
  • Count scans have a deadline, default 30s and maximum 300s.
  • Values jobs cap at 1,000,000 distinct values per scan and set truncated: true when crossed; the listing keeps the top values by count, each with an exact count.
  • Origin scan fan-out defaults to 8 concurrent upstream requests per scan unless the request or Index.spec.scan.threads sets a different value.
  • Snapshot-served count scans are exact at the snapshot watermark_ms.
esc