Architecture
Sp00ky operates on a “Sidecar” architecture to enable its powerful synchronization capabilities.
High Level Overview
The system is composed of three main parts around a central database:
- The Database: Standard SurrealDB. The central hub.
- The Client: Connects directly to SurrealDB using the standard SurrealQL protocol.
- The Sidecar: A background service that monitors the database and performs heavy computational tasks (hashing, integrity).
graph TD
subgraph client_app [Client App]
UI[UI Components]
LocalDB[(IndexedDB)]
WASM[Sp00ky Core]
UI <-->|Live Queries| WASM
WASM <-->|Persist| LocalDB
end
subgraph backend_sys [Backend]
Sidecar[Sp00ky Sidecar]
DB[(SurrealDB)]
end
WASM <-->|"SurrealQL (WS)"| DB
Sidecar <-->|"Live Query & RPC"| DB
Why a Sidecar?
Even though the client connects directly to SurrealDB, the Sidecar is essential for the Sp00ky Protocol:
1. Incremental View Computation
The SSP runs a DBSP circuit per registered query. When SurrealDB’s schema events forward row CREATE / UPDATE / DELETE notifications to the SSP’s /ingest endpoint, the circuit incrementally computes which views the change affects and writes the resulting per-user edges into _00_list_ref_user_<id> (or the legacy _00_list_ref). Clients then read those edges over a LIVE subscription instead of re-running the source query.
2. Backend Job Execution
It runs backend jobs declared in sp00ky.yml (e.g. AI agent steps, scheduled tasks). The same DBSP machinery routes the trigger row through to the configured backend so a job runner can pick it up.
3. Integrity Checks
The SSP also computes per-table content hashes that the scheduler can compare against its persisted replica during bootstrap and through spky verify, so cluster nodes detect drift before serving stale data.
Distributed Architecture
For production deployments, Sp00ky supports a distributed architecture with multiple SSP instances coordinated by a central Scheduler:
graph TD
subgraph clients [Clients]
C1[Client 1]
C2[Client 2]
C3[Client 3]
end
subgraph backend [Backend Services]
DB[(SurrealDB)]
subgraph sched [Scheduler]
Scheduler[Scheduler Core]
Replica[(Snapshot Replica<br/>RocksDB)]
WAL[WAL]
end
subgraph ssps [SSP Pool]
SSP1[SSP 1]
SSP2[SSP 2]
SSP3[SSP 3]
end
end
C1 <-->|SurrealQL| DB
C2 <-->|SurrealQL| DB
C3 <-->|SurrealQL| DB
DB -->|Change Events| Scheduler
Scheduler -->|Persist| Replica
Scheduler -->|Append| WAL
Scheduler -->|HTTP POST /ingest| SSP1
Scheduler -->|HTTP POST /ingest| SSP2
Scheduler -->|HTTP POST /ingest| SSP3
SSP1 -.->|POST /proxy/query| Scheduler
SSP1 -->|POST /ssp/heartbeat| Scheduler
SSP2 -->|POST /ssp/heartbeat| Scheduler
SSP3 -->|POST /ssp/heartbeat| Scheduler
SSP1 <-->|Edge Updates| DB
SSP2 <-->|Edge Updates| DB
SSP3 <-->|Edge Updates| DB
Components
Scheduler (Port 9667 by default)
- Central coordinator for all SSP instances
- Maintains a persistent Snapshot Replica (RocksDB-backed embedded SurrealDB) of database state
- Writes all events to a Write-Ahead Log (WAL) for crash recovery
- Distributes data updates to all SSPs via HTTP
- Exposes proxy endpoints (
/proxy/query,/proxy/signin,/proxy/use) so SSPs can self-bootstrap by querying the snapshot directly - Manages SSP lifecycle:
Bootstrapping→Replaying→Ready - Assigns queries to SSPs using load balancing strategies
- See Scheduler API Reference
SSP (Sp00ky Sidecar Processor) (Port 8667 by default)
- Stateful service maintaining materialized views
- Executes backend functions and jobs
- Registers with scheduler on startup
- Self-bootstraps by querying the scheduler’s proxy endpoints (no chunk push needed)
- Sends periodic heartbeats with
viewscount for health monitoring - See SSP API Reference
SSP Lifecycle
When an SSP starts up with scheduler integration enabled:
sequenceDiagram
participant SSP
participant Scheduler
participant Replica as Snapshot Replica (RocksDB)
Note over SSP: Startup
SSP->>Scheduler: POST /ssp/register<br/>{ssp_id, url}
Scheduler->>Scheduler: Freeze snapshot<br/>Mark SSP as "Bootstrapping"
Scheduler-->>SSP: 202 Accepted<br/>{snapshot_seq}
Note over SSP: Self-Bootstrap via Proxy
SSP->>Scheduler: POST /proxy/query<br/>(SurrealQL queries)
Scheduler->>Replica: Execute query
Replica-->>Scheduler: Results
Scheduler-->>SSP: Query results
Note over SSP: SSP loads data locally
loop Scheduler polls SSP health
Scheduler->>SSP: GET /health
alt SSP still bootstrapping
SSP-->>Scheduler: {status: "bootstrapping"}
else SSP ready
SSP-->>Scheduler: {status: "ok"}
end
end
Scheduler->>Scheduler: Mark SSP as "Replaying"<br/>Unfreeze snapshot
loop Replay buffered events (seq > snapshot_seq)
Scheduler->>SSP: POST /ingest
SSP-->>Scheduler: 200 OK
end
Scheduler->>Scheduler: Mark SSP as "Ready"
Note over SSP,Scheduler: SSP receives live updates
loop Every 5 seconds
SSP->>Scheduler: POST /ssp/heartbeat<br/>{views, cpu, memory}
alt SSP is healthy
Scheduler-->>SSP: 200 OK
else Buffer overflow
Scheduler-->>SSP: 409 Conflict<br/>(re-bootstrap needed)
else Not registered
Scheduler-->>SSP: 404 Not Found<br/>(re-registration needed)
end
end
Bootstrap Process
- Registration: SSP sends its ID and URL to the scheduler
- Freeze: Scheduler freezes the snapshot replica and marks SSP as
Bootstrapping - Proxy Query: SSP self-bootstraps by querying the scheduler’s
POST /proxy/queryendpoint (executes SurrealQL against the frozen snapshot) - Health Poll: Scheduler polls the SSP’s
GET /healthendpoint everyssp_poll_interval_ms(default 3s) until the SSP reports ready - Unfreeze & Replay: Scheduler unfreezes the snapshot, marks SSP as
Replaying, and replays all buffered events withseq > snapshot_seq - Ready: Once replay completes, scheduler marks SSP as
Ready - Live Updates: SSP now receives real-time updates via
/ingestendpoint
Health Monitoring
- SSPs send heartbeats every 5 seconds (configurable) with
viewscount, CPU, and memory usage - Scheduler marks SSPs as stale after 15 seconds without heartbeat (configurable)
- Stale SSPs are removed from the pool
- Maximum buffer size per SSP: 10,000 messages (configurable via
max_buffer_per_ssp) - Queries are reassigned to healthy SSPs
Load Balancing
The scheduler supports multiple load balancing strategies for query assignment:
Round Robin
Distributes queries evenly across all SSPs in rotation.
Least Queries
Assigns queries to the SSP with the fewest active queries.
Least Load
Assigns queries to the SSP with the lowest combined CPU and memory usage.
Configure via the scheduler’s load_balance field in its config (default LeastQueries).
Communication Patterns
Data Ingestion Flow
sequenceDiagram
participant DB as SurrealDB
participant Scheduler
participant SSP1
participant SSP2
DB->>Scheduler: Change event<br/>(CREATE/UPDATE/DELETE)
Scheduler->>Scheduler: Update replica
par Broadcast to ready SSPs
Scheduler->>SSP1: POST /ingest
SSP1->>SSP1: Update views
SSP1-->>Scheduler: 200 OK
and
Scheduler->>SSP2: POST /ingest
SSP2->>SSP2: Update views
SSP2-->>Scheduler: 200 OK
end
Note over SSP1,SSP2: Views updated in real-time
The SSP responds 200 to /ingest immediately and finishes the
view-fan-out work asynchronously. This matters because SurrealDB
runs the schema mutation event that called http::post inside
the originating transaction; if the SSP’s own follow-up
UPDATE _00_list_ref_user_<id> committed before that parent
transaction did, another session’s LIVE notification on the per-user
edge would race ahead of the source row’s visibility. The SSP’s
deferred task waits for the row’s _00_version to become readable
on its own connection (a clean proxy for “parent transaction is
committed”) before bumping the per-user edge.
Edge-Update Batching
The SSP does not write one transaction per record. View deltas — from
both per-ingest steps and the initial snapshot computed at query
registration — are pushed to a coalescing service that buffers them over
a short window and writes all of their _00_list_ref edges (primary
window edges and subquery child edges) in a single SurrealDB
BEGIN…COMMIT transaction. Without this, a burst of updates (a bulk
import, a sync backfill, or a fresh client registering several queries at
once) produced one transaction per record, so SurrealDB fired a LIVE
notification per record and a connecting client streamed its whole window
over seconds of serialized round-trips. Batching collapses that into a
few LIVE deliveries.
The window is set by SPKY_SSP_QUERY_UPDATE_THROTTLE_MS (default 100;
0 disables batching and flushes each update immediately). The buffer
also flushes early once it crosses an internal size cap, and flushes any
remainder on shutdown. On the client, the in-browser Stream Processor has
the analogous streamDebounceTime (default 50ms — see
Configuration), which coalesces the resulting
per-query stream updates before they reach useQuery.
Bootstrap Pagination
Both the scheduler’s snapshot replica and the SSP load large tables in
bounded pages rather than one giant SELECT (which overflows the
WebSocket frame ceiling on multi-GB databases). Paging uses keyset
pagination — … ORDER BY id LIMIT n, then resuming from the last id
seen with WHERE id > <last> ORDER BY id LIMIT n — never an
OFFSET/START. The source database is live while it is paged (and the
SSP re-bootstraps from the scheduler’s replica on every restart), so an
offset scan would be unsafe: a concurrent delete behind the offset shifts
every later row up one, and the next page silently skips a row. A
skipped record never enters the circuit, so it is invisible to the
materialized views — a later delete of it emits no removal delta and the
client’s live query goes stale until a manual reload. Keyset pagination
resumes from a value rather than a row count, so deletes behind the cursor
can’t shift rows out of view (even deleting the cursor row itself is safe,
because the next page is id > <value>, not a reference to a live row).
Page size is tunable via SPKY_SSP_BOOTSTRAP_PAGE_SIZE (SSP) and
SPKY_BOOTSTRAP_PAGE_SIZE (scheduler).
Per-User _00_list_ref Tables
Each registered view stores a per-user materialized edge in
_00_list_ref_user_<auth_id>. The auth id segment is hardcoded
into the table’s PERMISSIONS FOR select WHERE $auth.id = user:<id>
rule, so a record-token client subscribing to its own dedicated
table matches at LIVE-notification time without any cross-session
permission lookup. This works around the SurrealDB v3 LIVE
permission gap, where cross-session LIVE deliveries on a
permission-gated table can be silently dropped even when the
subscriber’s expression would pass for the new row.
The mode is configured via refMode in sp00ky.yml
(or SPKY_SSP_REF_MODE on the SSP):
dedicated(default): per-user_00_list_ref_user_<id>tables; no cross-session permission evaluation at LIVE time. The SSP pre-emptively creates the user’s table when it receives theuserCREATE ingest event, so the client’s first LIVE subscription after sign-in succeeds on the first try. TheliveRetryCountgetter on the client returns0on a clean bootstrap and is exposed for regression-guard tests.single: legacy shared_00_list_refwith aWHERE auth_id = $auth.idrule. Same-session writes propagate; cross-session LIVE notifications hit the v3 permission gap.
When a user record is deleted upstream, the SSP receives the
user DELETE ingest event and drops the matching
_00_list_ref_user_<id> table so dedicated schema state doesn’t
accumulate over time.
Query Registration Flow
sequenceDiagram
participant Client
participant Scheduler
participant SSP
Client->>Scheduler: POST /query/register<br/>{query_id, view_plan}
Scheduler->>Scheduler: Select SSP<br/>(load balancing)
Scheduler->>SSP: POST /view/register<br/>{plan, metadata}
SSP->>SSP: Create view<br/>Materialize results
SSP-->>Scheduler: 200 OK + initial results
Scheduler-->>Client: {ssp_id, ssp_url}
Note over Client,SSP: Client connects directly to SSP<br/>for real-time updates
Horizontal Scaling
The distributed architecture supports horizontal scaling:
Adding SSPs
- Start a new SSP instance with
SPKY_SCHEDULER_URLconfigured - SSP automatically registers with scheduler
- Scheduler bootstraps the SSP with current replica state
- New queries are distributed across all healthy SSPs
Removing SSPs
- Stop the SSP instance (graceful shutdown)
- Scheduler detects missing heartbeats
- SSP marked as stale and removed from pool
- Active queries reassigned to remaining SSPs
Failure Recovery
- SSP failures are detected via heartbeat timeout
- Queries automatically reassigned to healthy SSPs
- Job execution failures trigger retries on different SSPs
- Client reconnection handles SSP unavailability
Deployment Modes
Single SSP (Development)
# No scheduler needed — one SSP, one SurrealDB.
SPKY_DB_URL=http://localhost:8666
SPKY_SSP_LISTEN_ADDR=0.0.0.0:8667
SPKY_SSP_REF_MODE=dedicated # or 'single' for the legacy shared _00_list_ref
Simple setup for development and testing. The Sp00ky CLI’s spky dev wires this up automatically using values from sp00ky.yml.
Multiple SSPs (Production)
# Scheduler — host/port and DB credentials come from its YAML config plus
# a handful of optional env overrides.
SPKY_DB_WS=ws://surrealdb:8000/rpc
SPKY_DB_NS=main
SPKY_DB_NAME=app
SPKY_SCHEDULER_ID=scheduler-01
SPKY_SNAPSHOT_UPDATE_INTERVAL_SECS=10
# SSP instances
SPKY_SCHEDULER_URL=http://scheduler:9667
SPKY_SSP_ID=ssp-01
SPKY_SSP_LISTEN_ADDR=0.0.0.0:8667
SPKY_SSP_ADVERTISE_ADDR=10.100.1.30:8667
SPKY_SSP_REF_MODE=dedicated
Production setup with high availability and load distribution.
For detailed deployment instructions, see the Deployment Guide.