See it in action on the hev-shop demo store.

Overview

Introduction

Layer provides a set of drop-in enhancements to your favorite retrieval systems. Layer lets you scale your own compute over multi-stage pipelines, reason about the state of your index, observe clickstream, track cost, and more.

╔════════════╗      ╔════════════╗          ╔═══ vector store ════════════════════════╗
║   layer    ║░     ║   layer    ║░         ║                                         ║░
║   client   ║◀────▶║  gateway   ║◀──API───▶║                                         ║░
║            ║░     ║            ║░         ║                                         ║░
╚════════════╝░     ╚═════╤══════╝░         ║  ┏━━━━━━━━━━━━━━┓     ┏━━━━━━━━━━━━━━┓  ║░
 ░░░░░░░░░░░░░░      ░░░░░│░░░░░░░░         ║  ┃    BM 25     ┃     ┃  KNN / ANN   ┃  ║░
                          │                 ║  ┃              ┃     ┃              ┃  ║░
╔════════════╗      ╔═════▼══════╗          ║  ┗━━━━━━━━━━━━━━┛     ┗━━━━━━━━━━━━━━┛  ║░
║   layer    ║░     ║   layer    ║░         ║                                         ║░
║ dashboard  ║◀────▶║  operator  ║◀──API───▶║                                         ║░
║            ║░     ║            ║░         ║                                         ║░
╚════════════╝░     ╚═════╤══════╝░         ╚═════════════════════════════════════════╝░
 ░░░░░░░░░░░░░░      ░░░░░│░░░░░░░░          ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
                          ▼
                   ┏━━━━━━━━━━━━━━┓
                   ┃   document   ┃
                   ┃    cache     ┃
                   ┗━━━━━━┯━━━━━━━┛
                          │
                          ▼
                   ┏━━━━━━━━━━━━━━┓
                   ┃ Object Store ┃
                   ┃ Bucket (S3)  ┃
                   ┗━━━━━━━━━━━━━━┛

You run two server components in your own cluster: a Rust gateway and a Kubernetes operator. The gateway is a transparent proxy in front of Turbopuffer. It extends native clients with fetch, scans, snapshots, and operator-facing semantics around the cache, write path, and pipelines — you swap in Layer’s drop-in client and change nothing else. It also drives the function runtime: discovering UDF work, leasing it to worker pools, retrying, and writing results back, with KEDA scaling each pool to zero between bursts.

You call the gateway four ways: the Python client, the Go client, the TypeScript client, or the REST API directly — the clients are generated from the same OpenAPI spec, and every endpoint page shows them side by side. Layer also ships an optional GUI dashboard. The dashboard manages cluster configuration through CRDs; all other state is persisted in object storage (S3). No durable state lives in a Layer process, so the compute tier is stateless and fully elastic.

Because indexing is bursty, especially GPU-bound work, our Terraform installs Karpenter as a cluster autoscaler to provision and scale the nodes Layer’s compute runs on. The remaining backing services are the document cache, the indexing-state store, and the metrics store. Every component Layer runs alongside is open source:

  • Karpenter — cluster autoscaler that provisions and scales nodes for Layer’s bursty, GPU-bound compute (Apache-2.0).
  • Aerospike — NVMe-backed ephemeral document cache (AGPL-3.0).
  • PostgreSQL — indexing-state store for the pipeline and embed queue (PostgreSQL License).
  • VictoriaMetrics — metrics store (Apache-2.0).

To get started, see the install guide. For more technical detail, see Concepts, Guarantees, and Tradeoffs.

esc