Design Philosophy¶

arandu is designed around two foundations: software engineering principles that make it reliable and extensible, and cognitive science models that inform its architecture. This page covers both - the engineering decisions and the neuroscience parallels that inspired them.

Engineering Principles¶

Protocol-Based Dependency Injection¶

The SDK uses Python's typing.Protocol for all external dependencies (LLM, embeddings). No inheritance required - just implement the method signatures:

@runtime_checkable
class LLMProvider(Protocol):
    async def complete(
        self,
        messages: list[dict],
        temperature: float = 0,
        response_format: dict | None = None,
        max_tokens: int | None = None,
    ) -> LLMResult: ...

Why: Vendor lock-in kills adoption. By using structural subtyping (duck typing), any LLM provider works without inheriting from a base class. The OpenAI provider is included for convenience, but you can swap in Anthropic, local models, or custom endpoints with zero SDK changes.

Fail-Safe by Default¶

Every stage of the pipeline has fallback behavior:

Stage	Failure	Fallback
Informed Extraction	LLM timeout/error	Fall back to legacy blind extraction + reconciliation
Extraction (legacy)	LLM timeout/error	Return empty extraction; event still logged
Entity Resolution	LLM fallback fails	Create new entity (prefer duplicates over lost data)
Reconciliation	LLM error	Default to ADD
Reranking	Reranker fails	Keep original ranking
Background jobs	Any job fails	Other jobs proceed independently

Why: In a production AI agent, memory is a supporting system - it should never crash the main flow. A degraded response (missing some context) is always better than an error.

Composition Over Inheritance¶

The SDK has no abstract base classes, no deep class hierarchies. It's built from small, focused modules composed into pipelines:

write/extract.py → write/entity_resolution.py → write/reconcile.py → write/upsert.py
read/retrieval_agent.py (deterministic planner) → read/retrieval.py → read/reranker.py

Why: Each module has a single responsibility with clear inputs and outputs. You can understand, test, and replace any module independently. This follows the Unix philosophy: do one thing well.

Savepoint-Based Transaction Safety¶

Write operations use database savepoints (session.begin_nested()) so that a failure in one fact doesn't abort the entire batch:

async with session.begin_nested():
    # If this fails, only this savepoint rolls back
    session.add(new_fact)
    await session.flush()

Why: In a pipeline that processes multiple facts per message, atomic all-or-nothing transactions are too fragile. Savepoints give per-fact atomicity while keeping the outer transaction alive.

Neuroscience Parallels¶

The architecture of arandu draws from established models in cognitive neuroscience. Each parallel below maps a system component to its biological counterpart.

Encoding: The Write Pipeline¶

System: Message → Alias Lookup → Pre-retrieval → Informed Extraction → Resolve → Upsert

Brain: Sensory input → Orienting response → Schema activation → Encoding → Association → Storage

When you experience something, your brain doesn't record a raw video. It encodes a selective representation - extracting salient features, linking them to existing knowledge, and storing the result in a form that can be retrieved later. The write pipeline does the same:

Alias lookup + pre-retrieval is the orienting response: the brain compares incoming stimuli against existing schemas before committing anything to long-term storage. Known information triggers repetition suppression (lower neural firing), while novel or contradictory information triggers enhanced encoding (the novelty/mismatch signal from the hippocampus). Informed extraction replicates this -- the LLM sees what's already known and only extracts what's genuinely new or changed.
Informed extraction is perception-with-context: an LLM extracts all factual information from the raw message, guided by prior knowledge. Just as human encoding is shaped by what you already know (schema-dependent encoding), the LLM receives existing entity profiles and facts as context. The reconciliation step (ADD/UPDATE/NOOP/DELETE) handles deduplication downstream, maximizing recall of specific details.
Entity resolution is association: linking new mentions to existing memory traces
Reconciliation (fallback path) is reconsolidation: updating existing memories when new information arrives
Upsert is storage: committing the processed trace to long-term memory

Associative Memory: Entity Resolution¶

System: 3-phase resolution (exact → fuzzy → LLM)

Brain: Pattern completion in hippocampal-neocortical circuits

The brain doesn't store memories as isolated records - it stores them as patterns of activation across neural networks. When you encounter a partial cue ("Carol"), your brain completes the pattern to retrieve the full representation ("Carolina, my colleague from work").

Entity resolution mirrors this process:

Exact match = direct retrieval (strong, well-established associations)
Fuzzy match = pattern completion (partial cue activates the most similar existing pattern)
LLM fallback = deliberate recall (conscious effort to disambiguate when automatic retrieval fails)

The fuzzy threshold (0.85) and LLM fallback range (0.50-0.85) model the brain's confidence gradient: strong matches are automatic, ambiguous matches require deliberation.

Reconsolidation: Fact Reconciliation¶

System: ADD / UPDATE / NOOP / DELETE decisions

Brain: Memory reconsolidation (Nader, Schiller, & LeDoux, 2000)

When a memory is retrieved, it enters a labile state where it can be modified. This is reconsolidation - the brain's mechanism for updating memories with new information while preserving the original trace.

The reconciliation stage models this process:

NOOP = retrieval without modification (memory confirmed, last_confirmed_at updated)
UPDATE = reconsolidation (old memory superseded, new version created with provenance link via supersedes_fact_id)
ADD = new encoding (no existing memory to reconsolidate)
DELETE = active forgetting (explicit retraction, modeled by setting invalidated_at)

The fact versioning system (valid_from, valid_to, supersedes_fact_id) preserves the full history - just as the brain retains traces of original memories even after reconsolidation.

Spreading Activation: Graph Retrieval¶

System: BFS 2-hop traversal with decay factor

Brain: Spreading activation in semantic networks (Collins & Loftus, 1975)

In Collins and Loftus's model, when a concept is activated (e.g., "fire engine"), activation spreads along associative links to related concepts ("red", "truck", "emergency"), with strength decreasing as distance increases.

Graph retrieval implements this directly:

Seed entities from the query activate the starting nodes
Hop 1 activates direct neighbors (no pruning - all connections fire)
Hop 2 activates second-degree connections (pruned by min_edge_strength)
Decay factor (0.50 per hop) models the attenuation of activation over distance
Edge strength models the associative strength between concepts (reinforced by repeated co-mention)

The query_bonus (1.5×) for entities whose names appear in the query models top-down priming - when you explicitly mention an entity, its connections are more strongly activated.

Sleep-Time Compute: Background Processing¶

System: Clustering, consolidation, importance scoring, summary refresh

Brain: Memory consolidation during sleep (Diekelmann & Born, 2010)

During sleep, the brain performs critical maintenance:

Hippocampal replay - Recent experiences are replayed in compressed form, transferring them from short-term (hippocampal) to long-term (neocortical) storage
Synaptic homeostasis - Strongly activated synapses are maintained while weakly activated ones are pruned (Tononi & Cirelli)
Pattern detection - The neocortex detects statistical regularities across episodes
Gist extraction - Detailed episodic memories are compressed into semantic knowledge

The background jobs map to these processes:

Brain process	System job	Mechanism
Hippocampal replay	Consolidation	Reviews recent events, detects patterns and contradictions
Synaptic homeostasis	Importance scoring	Scores entities by density + recency + retrieval frequency + connectivity
Pattern detection	Community detection	Finds groups of related entities via graph analysis
Gist extraction	Summary refresh + Memify	Generates compressed summaries from detailed facts

Forgetting Curve: Vitality and Recency¶

System: Recency decay, vitality scoring, importance-based pruning

Brain: Ebbinghaus forgetting curve (1885)

Hermann Ebbinghaus demonstrated that memory retention decays exponentially over time, but each retrieval (practice) resets the curve and slows future decay. This is the spacing effect - the most robust finding in memory research.

arandu models this with:

Recency decay - Exponential decay with configurable half-life (recency_half_life_days). Recent facts score higher. This models the basic forgetting curve.
Retrieval reinforcement - Each NOOP decision (fact confirmed during write) updates last_confirmed_at, effectively "practicing" the fact and resetting its decay curve.
Vitality scoring - Combines recency, confirmation recency (last_confirmed_at), and importance to determine how "alive" a fact is. Low-vitality facts are candidates for consolidation or pruning.

Selective Attention: Reranking¶

System: LLM reranker on retrieval candidates

Brain: Selective attention (Broadbent, 1958; Treisman, 1964)

The brain doesn't process all sensory input equally - selective attention filters and prioritizes information based on current goals. The cocktail party effect demonstrates this: you can focus on one conversation in a noisy room by filtering out irrelevant signals.

The reranker acts as the attention filter:

Raw retrieval signals (semantic, keyword, graph) produce a broad set of candidates - like the full sensory input
The reranker evaluates each candidate against the query intent - like attentional selection
Only the most relevant facts pass through to the context - like the attended signal

This is why the reranker uses an LLM (not just scoring heuristics): attention is goal-directed and requires understanding the meaning of both query and candidates.

Working Memory: Context Budget¶

System: Token budget with facts/patterns/events sections

Brain: Working memory (Baddeley & Hitch, 1974; Cowan, 2001)

Working memory has a strict capacity limit - Cowan estimates 4±1 items can be held in the focus of attention simultaneously. The context budget models this constraint:

Token budget = capacity limit (you can't send infinite context to an LLM)
Facts (80% budget) = focus of attention (the most relevant facts for the current query, as a clean bullet list)
Patterns (20% budget) = activated long-term memory (meta-observations and trends)
Events (overflow) = peripheral activation (recent conversation snippets for episodic context)

This tiered approach ensures the LLM receives a focused, prioritized context rather than a noisy dump of everything the system knows. The format is clean and free of internal metadata -- no timestamps on facts, no entity prefixes, no confidence scores.

Summary Table¶

System Component	Neuroscience Model	Key Reference
Write Pipeline	Encoding	-
Informed Extraction	Orienting response / Schema-dependent encoding	Sokolov (1963); Tulving & Kroll (1995)
Entity Profiles	Schema activation (write-time only)	Meyer & Schvaneveldt (1971)
Entity Resolution	Associative memory / Pattern completion	-
Reconciliation	Reconsolidation	Nader, Schiller, & LeDoux (2000)
Graph Retrieval	Spreading activation	Collins & Loftus (1975)
Recency Decay	Forgetting curve	Ebbinghaus (1885)
Background Jobs	Sleep consolidation	Diekelmann & Born (2010)
Importance Scoring	Synaptic homeostasis	Tononi & Cirelli (SHY)
Summary Refresh	Gist memory formation	-
Reranking	Selective attention	Broadbent (1958)
Context Budget	Working memory capacity	Baddeley & Hitch (1974); Cowan (2001)
Vitality/Reinforcement	Spacing effect	Ebbinghaus (1885)

These are analogies, not claims

The parallels above are architectural inspirations, not scientific claims. arandu is an engineering system, not a cognitive model. The brain is vastly more complex - these parallels highlight the design intuitions, not the biological mechanisms.