See it in action on the hev-shop demo store.

Operations

Failure Modes

Read

Reads route through the gateway, but a gateway outage does not take your queries dark. The Python and Go SDKs fall through to Turbopuffer direct when the gateway is unreachable, so Turbopuffer-compatible queries keep serving rather than failing, minus the document cache, search history, and Layer’s query enhancements (see Client fall-through below). Layer-only read paths (document fetch, warm jobs, pipeline and UDF status, snapshots, and search history) fail fast, because they depend on gateway-owned cache, queue, history, and consistency state.

The document cache is stateless and can scale to zero with no disruption: document fetches fall through to origin (Turbopuffer, or S3 for snapshots) on a miss or cache outage, so a cache failure degrades latency, not availability.

Write

Writes also fall through to Turbopuffer direct when the gateway is unreachable (again, see Client fall-through); the durable upstream still accepts the row, but the write skips document-cache warming and pipeline staging until the gateway returns.

Pipeline stop-writes

The primary failure mode for writes through a healthy gateway is Aerospike stop-writes during a multi-stage pipeline job: staged documents stay warm in the cache but carry no vector data yet, and once that data exceeds the Aerospike drive allocation the cache rejects further writes.

The pipeline does not stall. Each stage persists its chunk bodies to S3 before it touches the cache, and pipeline state lives in PostgreSQL, so the Aerospike write is best-effort: on stop-writes the gateway logs the skipped write and the stage still completes. Downstream chunk reads degrade to the S3 backing for as long as the cache is rejecting writes.

Recovery is automatic. The Helm document cache restarts on stop-writes by default (documentCache.autoRestartOnStopWrites: true) and clears its Aerospike backing file on pod start (documentCache.storage.resetOnStart: true); the gateway reconnects in the background and refills the cache from S3 on demand. No pipeline work is lost — S3 and PostgreSQL are the durable recovery boundary and must stay healthy.

Operator signals:

  • layer_aerospike_op_duration_seconds{status="aerospike_stop_writes"} — the stop-writes condition itself, the same series the dashboard charts.
  • hevlayer_cache_cold_responses_total — reads being served from S3 backing instead of the cache while it recovers.
  • hevlayer_document_cache_cold_starts_total and hevlayer_document_cache_cold_start_seconds — the demand-triggered reconnect-and-refill cycle after the cache restarts.
  • Gateway warn logs Aerospike chunk write failed (best-effort) and Aerospike chunk read failed; falling back to S3 backing.

Client fall-through

When the gateway is unreachable, the SDKs retry the call against Turbopuffer directly for operations that need no Layer state — simple vector queries, writes, and raw Turbopuffer-compatible methods (schema, metadata, namespace listing). These calls succeed without the document cache, search history, or Layer’s query enhancements, and set the perf fallback field to turbopuffer_direct. Fall-through requires Turbopuffer credentials (TURBOPUFFER_API_KEY, or WithTurbopufferAPIKey / turbopuffer_api_key); without them the original gateway error propagates unchanged.

Fall-through is on by default. Disable it with fallback_to_turbopuffer=False on AsyncHevlayer or WithFallbackToTurbopuffer(false) on the Go client. For the exact list of which operations fall through and which fail fast, see Client fall-through in the API introduction.

esc