Custom Providers¶
arandu uses Python protocols for dependency injection. You can use any LLM or embedding backend by implementing two simple interfaces - no inheritance required.
Built-in Providers¶
The SDK includes two built-in providers:
| Provider | Install | LLM | Embeddings |
|---|---|---|---|
| OpenAI | pip install arandu[openai] |
✅ GPT-4o, GPT-4o-mini, etc. | ✅ text-embedding-3-small, etc. |
| Anthropic | pip install arandu[anthropic] |
✅ Claude Sonnet, Opus, Haiku | ❌ Use OpenAI for embeddings |
# OpenAI (LLM + embeddings in one provider)
from arandu.providers.openai import OpenAIProvider
provider = OpenAIProvider(api_key="sk-...")
memory = MemoryClient(database_url="...", llm=provider, embeddings=provider)
# Anthropic (Claude for LLM, OpenAI for embeddings)
from arandu.providers.anthropic import AnthropicProvider
from arandu.providers.openai import OpenAIProvider
llm = AnthropicProvider(api_key="sk-ant-...")
embeddings = OpenAIProvider(api_key="sk-...")
memory = MemoryClient(database_url="...", llm=llm, embeddings=embeddings)
OpenAI-Compatible Providers¶
OpenAIProvider works with any API that follows the OpenAI chat completions format. Just set base_url to point at the provider's endpoint:
from arandu.providers.openai import OpenAIProvider
# DeepSeek
llm = OpenAIProvider(api_key="sk-deepseek-...", model="deepseek-chat", base_url="https://api.deepseek.com/v1")
# Groq
llm = OpenAIProvider(api_key="gsk_...", model="llama-3.3-70b-versatile", base_url="https://api.groq.com/openai/v1")
# Together AI
llm = OpenAIProvider(api_key="tog_...", model="meta-llama/Llama-3.3-70B-Instruct-Turbo", base_url="https://api.together.xyz/v1")
# Fireworks AI
llm = OpenAIProvider(api_key="fw_...", model="accounts/fireworks/models/llama-v3p3-70b-instruct", base_url="https://api.fireworks.ai/inference/v1")
# Ollama (local)
llm = OpenAIProvider(api_key="ollama", model="llama3.1", base_url="http://localhost:11434/v1")
This covers LLM calls only. Embeddings still require OpenAI or a custom EmbeddingProvider since most of these providers don't offer an embedding API.
If the built-in providers cover your use case, you don't need to read the rest of this page.
The Protocols¶
If you need a different provider (Ollama, LiteLLM, Groq, etc.), implement the protocols:
LLMProvider¶
from arandu.protocols import LLMResult, TokenUsage
class LLMProvider(Protocol):
async def complete(
self,
messages: list[dict],
temperature: float = 0,
response_format: dict | None = None,
max_tokens: int | None = None,
) -> LLMResult: ...
| Parameter | Description |
|---|---|
messages |
List of message dicts with "role" and "content" keys (OpenAI format) |
temperature |
Sampling temperature (0 = deterministic) |
response_format |
Optional format spec (e.g., {"type": "json_object"}) |
max_tokens |
Optional maximum tokens for the response |
| Returns | LLMResult(text="...", usage=TokenUsage(...)) |
JSON mode support
The pipeline relies on JSON-mode responses (response_format={"type": "json_object"}).
If your backend doesn't support this natively, append a JSON instruction to the system prompt.
EmbeddingProvider¶
class EmbeddingProvider(Protocol):
async def embed(self, texts: list[str]) -> list[list[float]]: ...
async def embed_one(self, text: str) -> list[float] | None: ...
| Method | Description |
|---|---|
embed(texts) |
Generate embeddings for a batch of texts. Returns one vector per input. |
embed_one(text) |
Generate embedding for a single text. Returns None if empty/invalid. |
Embedding dimensions
The default embedding_dimensions is 1536 (OpenAI text-embedding-3-small).
If your provider uses different dimensions, set MemoryConfig(embedding_dimensions=...).
Example: Local Model Provider¶
For running with local models (e.g., via Ollama):
import httpx
from arandu.protocols import LLMResult, TokenUsage
class OllamaProvider:
"""LLM + Embedding provider using a local Ollama server."""
def __init__(
self,
base_url: str = "http://localhost:11434",
model: str = "llama3.1",
embedding_model: str = "nomic-embed-text",
) -> None:
self._base_url = base_url
self._model = model
self._embedding_model = embedding_model
self._client = httpx.AsyncClient(timeout=60.0)
# -- LLMProvider --
async def complete(
self,
messages: list[dict],
temperature: float = 0,
response_format: dict | None = None,
max_tokens: int | None = None,
) -> LLMResult:
payload: dict = {
"model": self._model,
"messages": messages,
"stream": False,
"options": {"temperature": temperature},
}
if response_format and response_format.get("type") == "json_object":
payload["format"] = "json"
response = await self._client.post(
f"{self._base_url}/api/chat",
json=payload,
)
response.raise_for_status()
text = response.json()["message"]["content"]
return LLMResult(text=text, usage=None) # Ollama doesn't report usage
# -- EmbeddingProvider --
async def embed(self, texts: list[str]) -> list[list[float]]:
results = []
for text in texts:
if not text.strip():
continue
response = await self._client.post(
f"{self._base_url}/api/embed",
json={"model": self._embedding_model, "input": text},
)
response.raise_for_status()
results.append(response.json()["embeddings"][0])
return results
async def embed_one(self, text: str) -> list[float] | None:
if not text or not text.strip():
return None
results = await self.embed([text])
return results[0] if results else None
Embedding dimensions
When using local models, configure the dimensions:
Testing Your Provider¶
Verify your provider works before going to production:
import asyncio
from arandu import MemoryClient, MemoryConfig
async def test_provider():
provider = YourProvider(...)
memory = MemoryClient(
database_url="postgresql+psycopg://memory:memory@localhost/memory",
llm=provider,
embeddings=provider,
)
await memory.initialize()
try:
# Test write
result = await memory.write(
agent_id="test",
message="Testing the provider. My name is Alice and I work at Acme.",
)
assert len(result.facts_added) > 0, "No facts extracted — check LLM responses"
assert len(result.entities_resolved) > 0, "No entities resolved"
print(f"Write OK: {len(result.facts_added)} facts, {len(result.entities_resolved)} entities")
# Test retrieve
context = await memory.retrieve(agent_id="test", query="who is Alice?")
assert len(context.facts) > 0, "No facts retrieved — check embeddings"
print(f"Retrieve OK: {len(context.facts)} facts found")
print(f"Context: {context.context}")
finally:
await memory.close()
asyncio.run(test_provider())
Key Requirements¶
-
LLMResult-complete()returnsLLMResult(text=..., usage=...), notstr. If your backend doesn't report usage, passusage=None. -
JSON mode - The pipeline sends
response_format={"type": "json_object"}frequently. Your provider must return valid JSON when this is set. -
Async - Both protocols are async (
async def). If your backend SDK is synchronous, wrap calls withasyncio.to_thread(). -
Empty/error handling -
embed_onereturnsNonefor empty input.embedreturns[]for empty input. -
Timeout - Add timeouts to your provider. The SDK sets timeouts on its side, but provider-level timeouts add safety.
-
Embedding dimensions - Set
MemoryConfig(embedding_dimensions=N)to match your provider's output dimensions.