Design Philosophy¶
arandu is designed around two foundations: software engineering principles that make it reliable and extensible, and cognitive science models that inform its architecture. This page covers both - the engineering decisions and the neuroscience parallels that inspired them.
Engineering Principles¶
Protocol-Based Dependency Injection¶
The SDK uses Python's typing.Protocol for all external dependencies (LLM, embeddings). No inheritance required - just implement the method signatures:
@runtime_checkable
class LLMProvider(Protocol):
async def complete(
self,
messages: list[dict],
temperature: float = 0,
response_format: dict | None = None,
max_tokens: int | None = None,
) -> LLMResult: ...
Why: Vendor lock-in kills adoption. By using structural subtyping (duck typing), any LLM provider works without inheriting from a base class. The OpenAI provider is included for convenience, but you can swap in Anthropic, local models, or custom endpoints with zero SDK changes.
Fail-Safe by Default¶
Every stage of the pipeline has fallback behavior:
| Stage | Failure | Fallback |
|---|---|---|
| Informed Extraction | LLM timeout/error | Fall back to legacy blind extraction + reconciliation |
| Extraction (legacy) | LLM timeout/error | Return empty extraction; event still logged |
| Entity Resolution | LLM fallback fails | Create new entity (prefer duplicates over lost data) |
| Reconciliation | LLM error | Default to ADD |
| Reranking | Reranker fails | Keep original ranking |
| Background jobs | Any job fails | Other jobs proceed independently |
Why: In a production AI agent, memory is a supporting system - it should never crash the main flow. A degraded response (missing some context) is always better than an error.
Composition Over Inheritance¶
The SDK has no abstract base classes, no deep class hierarchies. It's built from small, focused modules composed into pipelines:
write/extract.py→write/entity_resolution.py→write/reconcile.py→write/upsert.pyread/retrieval_agent.py(deterministic planner) →read/retrieval.py→read/reranker.py
Why: Each module has a single responsibility with clear inputs and outputs. You can understand, test, and replace any module independently. This follows the Unix philosophy: do one thing well.
Savepoint-Based Transaction Safety¶
Write operations use database savepoints (session.begin_nested()) so that a failure in one fact doesn't abort the entire batch:
async with session.begin_nested():
# If this fails, only this savepoint rolls back
session.add(new_fact)
await session.flush()
Why: In a pipeline that processes multiple facts per message, atomic all-or-nothing transactions are too fragile. Savepoints give per-fact atomicity while keeping the outer transaction alive.
Neuroscience Parallels¶
The architecture of arandu draws from established models in cognitive neuroscience. Each parallel below maps a system component to its biological counterpart.
Encoding: The Write Pipeline¶
System: Message → Alias Lookup → Pre-retrieval → Informed Extraction → Resolve → Upsert
Brain: Sensory input → Orienting response → Schema activation → Encoding → Association → Storage
When you experience something, your brain doesn't record a raw video. It encodes a selective representation - extracting salient features, linking them to existing knowledge, and storing the result in a form that can be retrieved later. The write pipeline does the same:
- Alias lookup + pre-retrieval is the orienting response: the brain compares incoming stimuli against existing schemas before committing anything to long-term storage. Known information triggers repetition suppression (lower neural firing), while novel or contradictory information triggers enhanced encoding (the novelty/mismatch signal from the hippocampus). Informed extraction replicates this -- the LLM sees what's already known and only extracts what's genuinely new or changed.
- Informed extraction is perception-with-context: an LLM extracts all factual information from the raw message, guided by prior knowledge. Just as human encoding is shaped by what you already know (schema-dependent encoding), the LLM receives existing entity profiles and facts as context. The reconciliation step (ADD/UPDATE/NOOP/DELETE) handles deduplication downstream, maximizing recall of specific details.
- Entity resolution is association: linking new mentions to existing memory traces
- Reconciliation (fallback path) is reconsolidation: updating existing memories when new information arrives
- Upsert is storage: committing the processed trace to long-term memory
Associative Memory: Entity Resolution¶
System: 3-phase resolution (exact → fuzzy → LLM)
Brain: Pattern completion in hippocampal-neocortical circuits
The brain doesn't store memories as isolated records - it stores them as patterns of activation across neural networks. When you encounter a partial cue ("Carol"), your brain completes the pattern to retrieve the full representation ("Carolina, my colleague from work").
Entity resolution mirrors this process:
- Exact match = direct retrieval (strong, well-established associations)
- Fuzzy match = pattern completion (partial cue activates the most similar existing pattern)
- LLM fallback = deliberate recall (conscious effort to disambiguate when automatic retrieval fails)
The fuzzy threshold (0.85) and LLM fallback range (0.50-0.85) model the brain's confidence gradient: strong matches are automatic, ambiguous matches require deliberation.
Reconsolidation: Fact Reconciliation¶
System: ADD / UPDATE / NOOP / DELETE decisions
Brain: Memory reconsolidation (Nader, Schiller, & LeDoux, 2000)
When a memory is retrieved, it enters a labile state where it can be modified. This is reconsolidation - the brain's mechanism for updating memories with new information while preserving the original trace.
The reconciliation stage models this process:
- NOOP = retrieval without modification (memory confirmed,
last_confirmed_atupdated) - UPDATE = reconsolidation (old memory superseded, new version created with provenance link via
supersedes_fact_id) - ADD = new encoding (no existing memory to reconsolidate)
- DELETE = active forgetting (explicit retraction, modeled by setting
invalidated_at)
The fact versioning system (valid_from, valid_to, supersedes_fact_id) preserves the full history - just as the brain retains traces of original memories even after reconsolidation.
Spreading Activation: Graph Retrieval¶
System: BFS 2-hop traversal with decay factor
Brain: Spreading activation in semantic networks (Collins & Loftus, 1975)
In Collins and Loftus's model, when a concept is activated (e.g., "fire engine"), activation spreads along associative links to related concepts ("red", "truck", "emergency"), with strength decreasing as distance increases.
Graph retrieval implements this directly:
- Seed entities from the query activate the starting nodes
- Hop 1 activates direct neighbors (no pruning - all connections fire)
- Hop 2 activates second-degree connections (pruned by
min_edge_strength) - Decay factor (0.50 per hop) models the attenuation of activation over distance
- Edge strength models the associative strength between concepts (reinforced by repeated co-mention)
The query_bonus (1.5×) for entities whose names appear in the query models top-down priming - when you explicitly mention an entity, its connections are more strongly activated.
Sleep-Time Compute: Background Processing¶
System: Clustering, consolidation, importance scoring, summary refresh
Brain: Memory consolidation during sleep (Diekelmann & Born, 2010)
During sleep, the brain performs critical maintenance:
- Hippocampal replay - Recent experiences are replayed in compressed form, transferring them from short-term (hippocampal) to long-term (neocortical) storage
- Synaptic homeostasis - Strongly activated synapses are maintained while weakly activated ones are pruned (Tononi & Cirelli)
- Pattern detection - The neocortex detects statistical regularities across episodes
- Gist extraction - Detailed episodic memories are compressed into semantic knowledge
The background jobs map to these processes:
| Brain process | System job | Mechanism |
|---|---|---|
| Hippocampal replay | Consolidation | Reviews recent events, detects patterns and contradictions |
| Synaptic homeostasis | Importance scoring | Scores entities by density + recency + retrieval frequency + connectivity |
| Pattern detection | Community detection | Finds groups of related entities via graph analysis |
| Gist extraction | Summary refresh + Memify | Generates compressed summaries from detailed facts |
Forgetting Curve: Vitality and Recency¶
System: Recency decay, vitality scoring, importance-based pruning
Brain: Ebbinghaus forgetting curve (1885)
Hermann Ebbinghaus demonstrated that memory retention decays exponentially over time, but each retrieval (practice) resets the curve and slows future decay. This is the spacing effect - the most robust finding in memory research.
arandu models this with:
- Recency decay - Exponential decay with configurable half-life (
recency_half_life_days). Recent facts score higher. This models the basic forgetting curve. - Retrieval reinforcement - Each NOOP decision (fact confirmed during write) updates
last_confirmed_at, effectively "practicing" the fact and resetting its decay curve. - Vitality scoring - Combines recency, confirmation recency (
last_confirmed_at), and importance to determine how "alive" a fact is. Low-vitality facts are candidates for consolidation or pruning.
Selective Attention: Reranking¶
System: LLM reranker on retrieval candidates
Brain: Selective attention (Broadbent, 1958; Treisman, 1964)
The brain doesn't process all sensory input equally - selective attention filters and prioritizes information based on current goals. The cocktail party effect demonstrates this: you can focus on one conversation in a noisy room by filtering out irrelevant signals.
The reranker acts as the attention filter:
- Raw retrieval signals (semantic, keyword, graph) produce a broad set of candidates - like the full sensory input
- The reranker evaluates each candidate against the query intent - like attentional selection
- Only the most relevant facts pass through to the context - like the attended signal
This is why the reranker uses an LLM (not just scoring heuristics): attention is goal-directed and requires understanding the meaning of both query and candidates.
Working Memory: Context Budget¶
System: Token budget with facts/patterns/events sections
Brain: Working memory (Baddeley & Hitch, 1974; Cowan, 2001)
Working memory has a strict capacity limit - Cowan estimates 4±1 items can be held in the focus of attention simultaneously. The context budget models this constraint:
- Token budget = capacity limit (you can't send infinite context to an LLM)
- Facts (80% budget) = focus of attention (the most relevant facts for the current query, as a clean bullet list)
- Patterns (20% budget) = activated long-term memory (meta-observations and trends)
- Events (overflow) = peripheral activation (recent conversation snippets for episodic context)
This tiered approach ensures the LLM receives a focused, prioritized context rather than a noisy dump of everything the system knows. The format is clean and free of internal metadata -- no timestamps on facts, no entity prefixes, no confidence scores.
Summary Table¶
| System Component | Neuroscience Model | Key Reference |
|---|---|---|
| Write Pipeline | Encoding | - |
| Informed Extraction | Orienting response / Schema-dependent encoding | Sokolov (1963); Tulving & Kroll (1995) |
| Entity Profiles | Schema activation (write-time only) | Meyer & Schvaneveldt (1971) |
| Entity Resolution | Associative memory / Pattern completion | - |
| Reconciliation | Reconsolidation | Nader, Schiller, & LeDoux (2000) |
| Graph Retrieval | Spreading activation | Collins & Loftus (1975) |
| Recency Decay | Forgetting curve | Ebbinghaus (1885) |
| Background Jobs | Sleep consolidation | Diekelmann & Born (2010) |
| Importance Scoring | Synaptic homeostasis | Tononi & Cirelli (SHY) |
| Summary Refresh | Gist memory formation | - |
| Reranking | Selective attention | Broadbent (1958) |
| Context Budget | Working memory capacity | Baddeley & Hitch (1974); Cowan (2001) |
| Vitality/Reinforcement | Spacing effect | Ebbinghaus (1885) |
These are analogies, not claims
The parallels above are architectural inspirations, not scientific claims. arandu is an engineering system, not a cognitive model. The brain is vastly more complex - these parallels highlight the design intuitions, not the biological mechanisms.