See it in action on the hev-shop demo store.

API

Warm cache

Layer exposes two warm endpoints. hint_cache_warm is the Turbopuffer-compatible hint; warm is the Layer-only shortcut that creates a gateway warm job.

Hint-cache warm

With no query parameters, the call is a raw passthrough: the gateway forwards it to Turbopuffer unchanged and returns the upstream response verbatim. Existing Turbopuffer clients keep their exact wire behavior.

curl "$LAYER_GATEWAY_URL/v1/namespaces/products/hint_cache_warm" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"

Supplying any warm option (turbopuffer, documents, snapshots, page_size) switches the call into Layer orchestration. Steps then default on; each is independently toggleable:

StepWhat it does
turbopuffer=trueForwards the warm hint upstream.
documents=trueStarts an origin warm job to backfill the NVMe cache.
snapshots=trueMirrors the latest S3 snapshot body into NVMe.
result = await client.hint_cache_warm(
    "products",
    turbopuffer=False,
    documents=False,
    snapshots=True,
)
const result = await client.hintCacheWarm("products", {
  turbopuffer: false,
  documents: false,
  snapshots: true,
});
curl "$LAYER_GATEWAY_URL/v1/namespaces/products/hint_cache_warm?turbopuffer=false&documents=false&snapshots=true" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"

The generated Go client omits false query parameters, so it cannot turn steps off — disable steps over REST (or the Python client) instead.

The orchestrated response reports per-step status:

{
  "namespace": "products",
  "turbopuffer": { "enabled": true, "status": "completed" },
  "documents": {
    "enabled": true,
    "status": "started",
    "job": { "id": "warm-job-uuid", "status": "running" }
  },
  "snapshots": {
    "enabled": true,
    "status": "completed",
    "key": "snapshots/products/...",
    "watermark_ms": 1715600400000,
    "sha": "..."
  }
}

If documents is enabled, the response includes a warm job; poll it through /warm-jobs/{id}.

Layer warm

POST /v2/namespaces/{ns}/warm creates an asynchronous job that pages through Turbopuffer, backfills Aerospike, and refreshes cache_warmed_through. Use it when bootstrapping a namespace whose data was written outside the gateway.

job = await client.warm_cache("products", page_size=1000)
job, err := client.WarmCache(ctx, "products", &hevlayer.WarmCacheParams{
    PageSize: 1000,
})
const job = await client.warmCache("products", { pageSize: 1000 });
curl -X POST "$LAYER_GATEWAY_URL/v2/namespaces/products/warm?page_size=1000" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"

The response is 202 Accepted with the warm job:

{
  "id": "warm-job-uuid",
  "namespace": "products",
  "status": "running",
  "progress": 0,
  "documents_scanned": 0,
  "created_at": "2026-05-26T10:00:00Z"
}

Poll it through:

job = await client.get_warm_job("products", job.id)
job, err := client.GetWarmJob(ctx, "products", jobID)
const job = await client.getWarmJob("products", jobId);
curl "$LAYER_GATEWAY_URL/v2/namespaces/products/warm-jobs/warm-job-uuid" \
  -H "Authorization: Bearer $LAYER_GATEWAY_API_KEY"

Cache-cold behavior

The split is deliberate. Fetch is correctness-first: a cache outage must not turn into a missing document. Warm is throughput-first: warming on a cold cache would be wasted work, so the gateway reports the cold state to the caller rather than silently no-op-ing.

A bare hint_cache_warm passthrough never touches the gateway cache, so it succeeds even while the cache is cold. The orchestrated form returns 503 cache_cold only when documents or snapshots is requested.

For how the cache recovers from an outage and the signals to watch, see the failure-mode runbook.

esc