SyndrDB | How It Works

1. System Overview

SyndrDB processes every request through a layered pipeline. A client connection arrives over TCP, is authenticated and bound to a session, and its command is routed through the parser, planner, and execution engine. Results flow back through the wire protocol. Cross-cutting concerns — MVCC, WAL, locking, and indexes — integrate at each layer.

Request Path

Client

→

→

→

↓

→

→

→

↓

→

→

→

Disk

Cross-Cutting Systems

Each box links to its detailed section below

2. Query Pipeline — From SQL to Results

Every SyndrQL statement follows a deterministic pipeline from raw text to executed results. The pipeline is designed for zero-allocation token scanning, modular parsing, and cost-based plan selection.

Query Pipeline

SQL String

→

Tokenizer

→

Parser

→

Expression AST

→

Query Router

↓

Cost-Based Planner

→

Execution Plan

→

Execute

→

Results

Tokenizer

Character-by-character scanning produces a flat list of tokens. The tokenizer distinguishes between == (equality comparison, TOKEN_EQ) and = (assignment, TOKEN_ASSIGN), and treats * as TOKEN_MULTIPLY — context determines whether it means "all fields" or multiplication.

Parsers

SyndrDB uses dedicated parsers per statement type rather than a single monolithic grammar. Each parser consumes tokens and produces a typed query struct:

SELECT parser — handles projections, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT/OFFSET, JOINs, subqueries
INSERT / UPDATE / DELETE parsers — DML operations with expression-based WHERE clauses
Transaction parser — BEGIN, COMMIT, ROLLBACK, SAVEPOINT
Cursor parser — DECLARE, FETCH, CLOSE
Trigger parser — CREATE/DROP/ENABLE/DISABLE TRIGGER
DDL parsers — CREATE/DROP BUNDLE, CREATE INDEX, CREATE VIEW

Expression AST

WHERE clauses, HAVING filters, and computed expressions are represented as a tree of expression nodes:

Node Type	Purpose	Example
`BinaryExpression`	Two-operand comparison or logic	`age >= 25 AND active == true`
`UnaryExpression`	NOT, IS NULL, EXISTS	`NOT EXISTS (...)`
`IdentifierExpression`	Field reference	`name`
`QualifiedIdentifierExpression`	Table-qualified field	`"users"."name"`
`LiteralExpression`	Constant value	`42`, `"hello"`
`SubqueryExpression`	Nested SELECT	`IN (SELECT ...)`

Cost Model

The query planner evaluates candidate plans using a cost model that considers:

CPU cost — per-row evaluation overhead (expression complexity, function calls)
I/O cost — pages to read from storage (full scan vs. index lookup)
Memory cost — intermediate result buffering (hash tables, sort buffers)
Selectivity estimation — HyperLogLog cardinality and histogram statistics guide row-count predictions

3. Execution Engine — Nodes & Interfaces

SyndrDB's execution engine uses a node-based architecture where each operation (scan, filter, sort, aggregate) is represented as a composable node. Nodes can be stacked into execution trees where data flows from leaf nodes (scans) up through processing nodes to the root.

Three Execution Interfaces

Different query patterns need different data-flow models. SyndrDB provides three interfaces:

ExecutionNode
Materialized (map)

SliceExecutionNode
Scan-optimized (slice)

IteratorNode
Volcano pull-based

Interface	Method	Returns	Use Case
`ExecutionNode`	`Execute(ctx)`	`map[string]*Document`	General queries with random access by doc ID
`SliceExecutionNode`	`ExecuteSlice(ctx)`	`[]*Document, []string`	Full scans where map overhead is unnecessary
`IteratorNode`	`Init(ctx)`, `Next()`, `Close()`	`*Document, error`	Streaming, cursors, memory-bounded execution

The IteratorNode uses Volcano-model semantics: Next() returns one document per call, and (nil, nil) signals end-of-data. Nodes implementing IterableNode can produce iterators via AsIterator().

Execution Node Types

Example Execution Tree

LimitNode

↓

SortNode

↓

AggregationNode

↓

FilterNode

↓

FullScanNode

SELECT status, COUNT(*) FROM "orders" WHERE total > 100 GROUP BY status ORDER BY status LIMIT 10

Node	Purpose	Key Optimization
`FullScanNode`	Sequential scan of all pages	Predicate pushdown into scanner, projection pushdown
`IndexScanNode`	Hash/BTree/BRIN index lookup	Falls back to full scan if index miss
`BRINScanNode`	Block-range skip scan	Skips entire page ranges based on min/max summaries
`IndexOnlyScanNode`	Answers query from index alone	Zero page reads when index covers all projected fields
`BTreeOrderedScanNode`	Pre-sorted range traversal	Avoids in-memory sort for ORDER BY on indexed column
`FilterNode`	WHERE expression evaluation	SIMD batch evaluation for simple predicates
`AggregationNode`	GROUP BY with hash/sort strategy	Streaming aggregation, SIMD vectorized SUM/MIN/MAX
`SortNode`	ORDER BY	Radix sort, parallel merge sort, SIMD-accelerated comparisons
`LimitNode`	LIMIT / OFFSET	Short-circuits upstream execution
`DistinctNode`	DISTINCT deduplication	Hash-based with pre-sorted optimization
`JoinExecutionNode`	Hash join	Build-side selection, predicate pushdown
`CorrelatedSubqueryNode`	IN/EXISTS subqueries	Hash semi/anti-join rewriting (O(N+M) vs O(N*M))

SIMD Acceleration

Performance-critical paths use SIMD (Single Instruction, Multiple Data) via the syndrdb-simd library:

Batch predicate evaluation — evaluates WHERE conditions on entire document batches
Vectorized aggregation — SUM, MIN, MAX on int64 columns processed in SIMD lanes
Compound predicate bitmaps — AND/OR of multiple conditions via bitmap operations
SIMD string operations — UPPER/LOWER with ASCII fast path
Accelerated sorting — SIMD-assisted comparisons for radix and parallel merge sort

4. Plan Cache — Adaptive Query Optimization

Query planning is expensive (cost estimation, index selection, join ordering). SyndrDB caches execution plans to amortize this cost across repeated queries.

Plan Cache Lookup

SQL Query

→

xxhash

→

Shard [N]

→

Version Check

↓

Hit: Serve Cached Plan

Miss: Build New Plan

8-Shard LRU

The cache is divided into 8 independent shards, each with its own LRU eviction and mutex. The shard is selected by xxhash(queryText) % 8. This reduces lock contention under high concurrency — 8 concurrent planners can each hit a different shard without blocking.

Adaptive Generic/Custom Planning

Inspired by PostgreSQL, SyndrDB uses an adaptive strategy:

First 5 executions — always use a custom plan (parameter-specific)
After 5 executions — compare generic plan cost vs. average custom plan cost
If generic is cheaper — switch to generic plan (parameter-independent, reusable)
Periodic re-evaluation — if statistics change, reconsider the choice

Lazy Invalidation

Each bundle tracks a version number that increments on schema changes, index creation/deletion, or statistics refresh. Cached plans store the version at creation time. On cache hit, if the plan's version is stale, a fresh plan is built. During rebuild, the stale plan continues serving read queries to avoid latency spikes.

Setting	Default	Description
`planCacheCapacity`	1000 per shard	Max entries before LRU eviction
`planCacheEnabled`	true	Enable/disable plan caching

5. Storage Engine — Segments, Pages & Write Path

SyndrDB stores documents in append-only binary segment files, organized by bundle. The storage engine is designed for sequential write throughput and efficient page-level reads.

On-Disk Layout

Bundle Directory Structure

database/bundleName/

↓

bundle.manifest

000001.bnd

000002.bnd

sorted_index.idx

JSON metadata

Binary segments (BSON)

Page lookup

Component	Format	Purpose
`bundle.manifest`	JSON	Tracks all segment files, document counts, bloom filter state
`*.bnd`	Binary (BSON)	Append-only segment files containing document data
`sorted_index.idx`	Binary	Sharded sorted index for O(log n) pageID calculation

Segment Files

Documents are serialized as BSON and appended to the current active segment file. When a segment reaches the maximum size (default 32MB), a new segment is created. Old segments are immutable — compaction merges them to reclaim space from deleted/superseded versions.

Document Pages

In memory, documents are organized into pages of approximately 4,096 documents each. Each page provides two access patterns:

Documents map[string]Document — keyed by document ID for random access
DocumentSlice []Document — flat array for scan-optimized sequential access

Pages form a linked list via NextPageID / PreviousPageID pointers.

Write Buffer

Writes use a double-buffered design for zero-contention I/O:

Double-Buffered Write Path

Writer 1

Writer 2

Writer N

↓ atomic offset reservation

Active Buffer (pwrite, no mutex)

↓ swap on flush

Back Buffer → Disk (background)

Writers atomically reserve an offset in the active buffer and write via pwrite — no mutex needed. When the buffer is flushed, the active and back buffers swap atomically. The background flusher writes the back buffer to disk without blocking new writes.

Setting	Default	Description
`bundleFileMaxSizeMB`	32	Segment file rotation threshold
`maxLoadedDocumentPages`	500	Max pages in memory before eviction

6. Page Cache — 64-Shard Lock-Free Design

The page cache is the most contended data structure in SyndrDB — every query touches it. Its design prioritizes lock-free reads under high concurrency.

64-Shard Page Cache

Page Request

→

xxhash(pageKey) % 64

↓

Shard 0

Shard 1

Shard 2

...

Shard 63

↓

sync.Map (fast path)

Lock-free atomic loads

Authoritative Map

RWMutex-protected

LRU Chain

Eviction ordering

Read Path (Zero Contention)

Reads first attempt sync.Map.Load() which is a lock-free atomic load. Under read-heavy workloads (the common case), no locks are ever acquired. On miss, the shard's RWMutex is taken for a read lock to consult the authoritative map.

Write Path (Copy-Outside-Lock)

Writes follow a copy-outside-lock pattern: the new page state is prepared without holding any lock, then a brief write lock on the target shard updates both the authoritative map and sync.Map atomically. This minimizes the critical section to a pointer swap.

COW Snapshots for GROUP BY

GROUP BY queries need a consistent view of page data while concurrent writes may be modifying pages. The cache provides copy-on-write snapshots: an immutable []Document array is created from the page and cached with a staleness timestamp. Multiple concurrent GROUP BY queries share the same snapshot if the page hasn't changed.

7. Index System — Hash (LSM), B-Tree, BRIN

SyndrDB supports three index types, each optimized for different access patterns. All indexes support partial indexes (WHERE clause), functional expressions (LOWER, YEAR, arithmetic), and INCLUDE columns for covering queries.

Hash Index V3 (LSM Architecture)

Hash Index V3 — LSM Tiers

MemTable
In-memory, 100K max entries

↓ overflow / flush

Entry Storage
256 buckets, append-only

↓ compaction

Compacted Files
Merged, deduplicated

O(1) average lookup for equality queries (field == value). Uses an LSM-tree approach:

Write path: append entry to disk bucket → update MemTable → check compaction threshold
Read path: check MemTable → scan bucket files backward (newest first) → cache result
MVCC-aware: reads filter by CommitSequence to return only visible versions

B-Tree V2

B+ Tree Structure

Root Node

↓

Internal

↓

Leaf

↔

Leaf

↔

Leaf

↔

Leaf

Linked leaf nodes enable efficient range traversal

O(log n) lookup for range queries, ORDER BY, and unique constraints. B+ tree with linked leaf nodes for efficient range traversal:

Page-based storage: 8KB pages, metadata page 0, LRU page cache (1000 pages default)
WAL for crash recovery: separate B-tree WAL with CRC32 checksums
Range queries: O(log n) search to first matching leaf + O(k) sequential traversal

BRIN (Block Range INdex)

BRIN Range Skip Visualization

Pages 1-128
min:1 max:500

Pages 129-256
min:480 max:1200

Pages 257-384
min:900 max:1500

Pages 385-512
min:1400 max:2000

Query: WHERE value BETWEEN 600 AND 1300 — skips pages 1-128 and 385-512 entirely

One entry per ~128 pages storing min/max values, NULL tracking, and document count. Ideal for naturally ordered data (timestamps, auto-incrementing IDs). Tiny footprint: ~250 entries per 1M documents.

Index Comparison

Type	Best For	Complexity	Implementation
Hash V3	Equality (`field = value`)	O(1) avg	LSM: MemTable + append-only buckets
B-Tree V2	Range, ORDER BY, unique	O(log n)	B+ tree with linked leaves, page cache
BRIN	Range on ordered data	O(ranges)	Block-range min/max summaries

8. MVCC — Multi-Version Concurrency Control

Every write creates a new version of a document rather than overwriting in place. Readers see a consistent snapshot without blocking writers, and writers don't block readers.

Document Version Fields

Field	Type	Purpose
`CommitSequence`	uint64	Global monotonic sequence assigned at commit
`VersionSequence`	uint64	Per-document version counter (1, 2, 3...)
`CreatedByTxID`	uint64	Transaction that created this version
`DeletedByTxID`	uint64	Transaction that deleted this version
`SupersededAt`	time.Time	Timestamp when replaced by a newer version (zero = current)

Visibility Rules

A document version is visible to a transaction's snapshot if all five conditions are met:

MVCC Visibility Check

1. Read-your-own-writes: if CreatedByTxID == myTxID, always visible

2. Snapshot boundary: CommitSequence <= snapshotSeq

3. Active tx exclusion: CreatedByTxID not in active transaction set

4. Not deleted: DeletedByTxID == 0 or deleted after snapshot

5. RCU grace period: superseded versions visible for 100ms window

Version Chain

Document Version History

v1
CommitSeq: 100
Superseded

→

v2
CommitSeq: 250
Superseded

→

v3 (current)
CommitSeq: 500
Active

Transaction with snapshot at seq 300 sees v2; transaction at seq 600 sees v3

Dead Version Reclamation (Vacuum)

Old versions that are no longer visible to any active transaction are cleaned up by the vacuum process:

isDeadVersion() checks: superseded + grace period elapsed + commitSequence < oldest active snapshot
RemoveDeadVersionsFromPage() performs in-memory cleanup at the page level
Configurable via vacuumDeadRatioThreshold (default 0.3) and vacuumMaxPagesPerCycle (default 100)

HOT Updates

When an UPDATE modifies only non-indexed fields, SyndrDB skips the index update entirely (Heap-Only Tuple optimization). This avoids index maintenance overhead for common "update a status field" patterns.

9. WAL — Write-Ahead Log & Crash Recovery

The WAL guarantees durability: every state-changing operation is recorded to the log before the in-memory state is modified. On crash, the WAL is replayed to recover to the last consistent state.

WAL Entry Format

+----------+--------+------------+-------+--------+-----------+------------+--------+
| TxID     | OpType | BundleName | DocID | Before | After     | Timestamp  | CRC32  |
| (uint64) | (byte) | (string)   | (str) | (data) | (data)    | (int64)    | (4B)   |
+----------+--------+------------+-------+--------+-----------+------------+--------+

Each entry is self-describing with a CRC32 checksum for corruption detection. The Before field stores the pre-modification state for undo-based rollback.

Three Durability Modes

Strict
fsync after every op
Safest, slowest

Balanced
Group commit
10x fewer fsyncs

Performance
Async flush
Fastest, risk of loss

Group Commit (Balanced Mode)

Multiple concurrent transactions share a single fsync by batching their WAL entries into a double-buffered write pipeline:

Group Commit Flow

Tx 1

Tx 2

Tx 3

↓ append entries

Main Buffer (accumulating)

↓ swap (atomic)

Back Buffer → fsync to disk

One fsync serves all three transactions

Crash Recovery

On startup, the recovery process:

Finds the last checkpoint marker in the WAL
Replays all WAL entries after that checkpoint
Reloads affected bundles from their segment files
Rolls back any incomplete transactions

Write Coordinator

Three background goroutines manage the WAL lifecycle:

Goroutine	Purpose
WAL Writer	Drains the entry queue, writes to log file, triggers group commit
Background Writer	Periodically flushes dirty pages from cache to segment files
Checkpointer	Writes checkpoint markers, enables WAL file rotation

Setting	Default	Description
`walEnabled`	true	Enable/disable WAL
`durabilityMode`	balanced	strict / balanced / performance
`walMaxFileSizeMB`	100	WAL file rotation threshold

10. Transaction System — ACID Guarantees

SyndrDB provides full ACID transactions with three isolation levels, undo-based rollback, and document-level write locks.

Transaction Lifecycle

Transaction Flow

BEGIN

→

Capture Snapshot

→

DML Operations

→

Conflict Check

→

COMMIT

↓

ROLLBACK (undo via WAL before-images)

Isolation Levels

Level	Behavior	Use Case
READ COMMITTED	Each statement sees the latest committed data	Simple read workloads, low contention
REPEATABLE READ (default)	Snapshot captured at BEGIN, all reads see the same point-in-time	Consistent reporting, analytics
SERIALIZABLE	SSI (Serializable Snapshot Isolation) detects read/write conflicts	Financial transactions, strict consistency

Serializable Snapshot Isolation (SSI)

SERIALIZABLE uses a technique called SSI to detect anomalies without blocking reads:

SIREAD locks — recorded after SELECT execution, tracking which documents were read
rw-antidependency tracking — when a write conflicts with another transaction's SIREAD, an edge is recorded
Dangerous structure detection — at COMMIT, checks for cycles in the dependency graph
Abort policy — the transaction that creates a dangerous structure is aborted with a serialization error

Deadlock Detection

Document-level write locks can create deadlock situations. SyndrDB detects these in real-time:

Wait-for graph — when a transaction blocks on a lock, an edge is added to the dependency graph
DFS cycle detection — runs on every new wait edge, not periodically
Victim selection — the youngest transaction in the cycle is aborted (least work lost)
Channel-based waiting — blocked transactions wait on a channel rather than polling

Savepoints

Single-level savepoints allow partial rollback within a transaction:

BEGIN TRANSACTION;
  ADD DOCUMENT TO BUNDLE "orders" WITH ({...});
  SAVEPOINT "before_update";
  UPDATE DOCUMENTS IN BUNDLE "orders" (...) CONFIRMED WHERE status == "pending";
  -- Oops, wrong update
  ROLLBACK TO SAVEPOINT "before_update";
  -- orders table is restored to the savepoint state
COMMIT;

11. Concurrency Architecture — Shards, Atomics & Lock-Free Patterns

SyndrDB is designed for high-concurrency workloads. The master pattern is sharded access with lock-free reads: split data structures into independent shards, and use atomic operations for the read path so that readers never block.

Sharding Overview

Sharded Subsystems

Page Cache
64 shards
RWMutex + sync.Map

Session Manager
64 shards
RWMutex + sync.Map

Plan Cache
8 shards
LRU per shard

Rate Limiter
32 shards
Immutable whitelist

Lock-Free Patterns

Pattern	Where Used	Mechanism
atomic.Pointer	ServiceManager, BucketFileManager	Lock-free singleton access via atomic load/store
sync.Map	Page cache fast path, scanner registry, session indexes	Lock-free reads, amortized-lock writes
Copy-outside-lock	Page cache writes	Prepare new state outside critical section, brief lock for pointer swap
Double-checked locking	Manifest creation	RLock fast-path check, then Lock + re-check for initialization
Atomic offset reservation	Write buffer	Writers atomically claim a region in the buffer without any mutex
RCU (Read-Copy-Update)	Write path, reader views	Immutable snapshots published atomically, old versions reclaimed after grace period

Why Sharding Works

With 64 shards, even at 60 concurrent connections, the expected number of concurrent accesses per shard is less than 1. This virtually eliminates lock contention. The hash function (xxhash) provides uniform distribution, ensuring no hot shards under random access patterns.

12. Server & Wire Protocol

SyndrDB uses a custom TCP wire protocol designed for low-latency command execution, pipelining, and streaming of large result sets.

Connection Lifecycle

Connection Flow

TCP Accept

→

Parse Connection String

→

Authenticate

→

Create Session

↓

Command Loop: Read → Parse → Execute → Send Result

↓

Disconnect → Cleanup Session → Release Locks

Wire Protocol Format

Feature	Detail
Command Terminator	`\x04` (EOT). Literal `\x04` escaped as `\x04\x04`
Parameter Delimiter	`\x05` (ENQ) separates prepared statement parameters
Pipeline Mode	Client sends multiple commands; server responds to each with `READY\n` sentinel after completion
Compression	Optional zstd compression via `compress=zstd` in connection string

Connection String

syndrdb://host:port:database:user:password[:options]

Options (colon-separated key=value):
  compress=zstd        Enable zstd compression
  pipeline=true        Enable pipeline mode
  streaming=chunked    Enable streaming protocol

Streaming Protocol

For large result sets, streaming avoids materializing the entire result in memory:

Streaming Protocol (STREAM:v1)

STREAM:v1\n — header (negotiated)

CHUNK:<len>\n<data> — uncompressed chunk

ZCHUNK:<comp>:<uncomp>\n<data> — zstd compressed chunk

END:<count>,<timeMS>\n — terminator with stats

The streaming chunk size defaults to 256 documents. The execution engine pulls documents from an IteratorNode, batches them into chunks, and sends each chunk over the wire as it's produced.

Session Manager

Sessions are managed in a 64-shard storage with per-shard RWMutex. Lock-free secondary indexes (via sync.Map) allow fast lookup by username or connection ID. Each session is cryptographically bound to the client's IP address and user-agent fingerprint.

Rate Limiting & Throttling

Per-IP rate limiting — 32-shard design with immutable whitelist set and atomic global connection counter
Large query throttling — semaphore limits concurrent full scans to 15, preventing any single query pattern from starving others

Setting	Default	Description
`maxConnections`	1000	Maximum concurrent connections
`streamingChunkSize`	256	Documents per streaming chunk
`maxOpenCursorsPerSession`	64	Cursor limit per session
`queryTimeoutSeconds`	300	Maximum query execution time

1. System Overview

2. Query Pipeline — From SQL to Results

Tokenizer

Parsers

Expression AST

Cost Model

3. Execution Engine — Nodes & Interfaces

Three Execution Interfaces

Execution Node Types

SIMD Acceleration

4. Plan Cache — Adaptive Query Optimization

8-Shard LRU

Adaptive Generic/Custom Planning

Lazy Invalidation

5. Storage Engine — Segments, Pages & Write Path

On-Disk Layout

Segment Files

Document Pages

Write Buffer

6. Page Cache — 64-Shard Lock-Free Design

Read Path (Zero Contention)

Write Path (Copy-Outside-Lock)

COW Snapshots for GROUP BY

7. Index System — Hash (LSM), B-Tree, BRIN

Hash Index V3 (LSM Architecture)

B-Tree V2

BRIN (Block Range INdex)

Index Comparison

8. MVCC — Multi-Version Concurrency Control

Document Version Fields

Visibility Rules

Version Chain

Dead Version Reclamation (Vacuum)

HOT Updates

9. WAL — Write-Ahead Log & Crash Recovery

WAL Entry Format

Three Durability Modes

Group Commit (Balanced Mode)

Crash Recovery

Write Coordinator

10. Transaction System — ACID Guarantees

Transaction Lifecycle

Isolation Levels

Serializable Snapshot Isolation (SSI)

Deadlock Detection

Savepoints

11. Concurrency Architecture — Shards, Atomics & Lock-Free Patterns

Sharding Overview

Lock-Free Patterns

Why Sharding Works

12. Server & Wire Protocol

Connection Lifecycle

Wire Protocol Format

Connection String

Streaming Protocol

Session Manager

Rate Limiting & Throttling

On This Page