Skip to content

Custom Providers

arandu uses Python protocols for dependency injection. You can use any LLM or embedding backend by implementing two simple interfaces - no inheritance required.

Built-in Providers

The SDK includes two built-in providers:

Provider Install LLM Embeddings
OpenAI pip install arandu[openai] ✅ GPT-4o, GPT-4o-mini, etc. ✅ text-embedding-3-small, etc.
Anthropic pip install arandu[anthropic] ✅ Claude Sonnet, Opus, Haiku ❌ Use OpenAI for embeddings
# OpenAI (LLM + embeddings in one provider)
from arandu.providers.openai import OpenAIProvider
provider = OpenAIProvider(api_key="sk-...")
memory = MemoryClient(database_url="...", llm=provider, embeddings=provider)

# Anthropic (Claude for LLM, OpenAI for embeddings)
from arandu.providers.anthropic import AnthropicProvider
from arandu.providers.openai import OpenAIProvider
llm = AnthropicProvider(api_key="sk-ant-...")
embeddings = OpenAIProvider(api_key="sk-...")
memory = MemoryClient(database_url="...", llm=llm, embeddings=embeddings)

OpenAI-Compatible Providers

OpenAIProvider works with any API that follows the OpenAI chat completions format. Just set base_url to point at the provider's endpoint:

from arandu.providers.openai import OpenAIProvider

# DeepSeek
llm = OpenAIProvider(api_key="sk-deepseek-...", model="deepseek-chat", base_url="https://api.deepseek.com/v1")

# Groq
llm = OpenAIProvider(api_key="gsk_...", model="llama-3.3-70b-versatile", base_url="https://api.groq.com/openai/v1")

# Together AI
llm = OpenAIProvider(api_key="tog_...", model="meta-llama/Llama-3.3-70B-Instruct-Turbo", base_url="https://api.together.xyz/v1")

# Fireworks AI
llm = OpenAIProvider(api_key="fw_...", model="accounts/fireworks/models/llama-v3p3-70b-instruct", base_url="https://api.fireworks.ai/inference/v1")

# Ollama (local)
llm = OpenAIProvider(api_key="ollama", model="llama3.1", base_url="http://localhost:11434/v1")

This covers LLM calls only. Embeddings still require OpenAI or a custom EmbeddingProvider since most of these providers don't offer an embedding API.

If the built-in providers cover your use case, you don't need to read the rest of this page.


The Protocols

If you need a different provider (Ollama, LiteLLM, Groq, etc.), implement the protocols:

LLMProvider

from arandu.protocols import LLMResult, TokenUsage

class LLMProvider(Protocol):
    async def complete(
        self,
        messages: list[dict],
        temperature: float = 0,
        response_format: dict | None = None,
        max_tokens: int | None = None,
    ) -> LLMResult: ...
Parameter Description
messages List of message dicts with "role" and "content" keys (OpenAI format)
temperature Sampling temperature (0 = deterministic)
response_format Optional format spec (e.g., {"type": "json_object"})
max_tokens Optional maximum tokens for the response
Returns LLMResult(text="...", usage=TokenUsage(...))

JSON mode support

The pipeline relies on JSON-mode responses (response_format={"type": "json_object"}). If your backend doesn't support this natively, append a JSON instruction to the system prompt.

EmbeddingProvider

class EmbeddingProvider(Protocol):
    async def embed(self, texts: list[str]) -> list[list[float]]: ...
    async def embed_one(self, text: str) -> list[float] | None: ...
Method Description
embed(texts) Generate embeddings for a batch of texts. Returns one vector per input.
embed_one(text) Generate embedding for a single text. Returns None if empty/invalid.

Embedding dimensions

The default embedding_dimensions is 1536 (OpenAI text-embedding-3-small). If your provider uses different dimensions, set MemoryConfig(embedding_dimensions=...).


Example: Local Model Provider

For running with local models (e.g., via Ollama):

import httpx
from arandu.protocols import LLMResult, TokenUsage


class OllamaProvider:
    """LLM + Embedding provider using a local Ollama server."""

    def __init__(
        self,
        base_url: str = "http://localhost:11434",
        model: str = "llama3.1",
        embedding_model: str = "nomic-embed-text",
    ) -> None:
        self._base_url = base_url
        self._model = model
        self._embedding_model = embedding_model
        self._client = httpx.AsyncClient(timeout=60.0)

    # -- LLMProvider --

    async def complete(
        self,
        messages: list[dict],
        temperature: float = 0,
        response_format: dict | None = None,
        max_tokens: int | None = None,
    ) -> LLMResult:
        payload: dict = {
            "model": self._model,
            "messages": messages,
            "stream": False,
            "options": {"temperature": temperature},
        }
        if response_format and response_format.get("type") == "json_object":
            payload["format"] = "json"

        response = await self._client.post(
            f"{self._base_url}/api/chat",
            json=payload,
        )
        response.raise_for_status()
        text = response.json()["message"]["content"]
        return LLMResult(text=text, usage=None)  # Ollama doesn't report usage

    # -- EmbeddingProvider --

    async def embed(self, texts: list[str]) -> list[list[float]]:
        results = []
        for text in texts:
            if not text.strip():
                continue
            response = await self._client.post(
                f"{self._base_url}/api/embed",
                json={"model": self._embedding_model, "input": text},
            )
            response.raise_for_status()
            results.append(response.json()["embeddings"][0])
        return results

    async def embed_one(self, text: str) -> list[float] | None:
        if not text or not text.strip():
            return None
        results = await self.embed([text])
        return results[0] if results else None

Embedding dimensions

When using local models, configure the dimensions:

config = MemoryConfig(
    embedding_dimensions=768,  # nomic-embed-text uses 768 dims
)

Testing Your Provider

Verify your provider works before going to production:

import asyncio
from arandu import MemoryClient, MemoryConfig


async def test_provider():
    provider = YourProvider(...)
    memory = MemoryClient(
        database_url="postgresql+psycopg://memory:memory@localhost/memory",
        llm=provider,
        embeddings=provider,
    )
    await memory.initialize()

    try:
        # Test write
        result = await memory.write(
            agent_id="test",
            message="Testing the provider. My name is Alice and I work at Acme.",
        )
        assert len(result.facts_added) > 0, "No facts extracted — check LLM responses"
        assert len(result.entities_resolved) > 0, "No entities resolved"
        print(f"Write OK: {len(result.facts_added)} facts, {len(result.entities_resolved)} entities")

        # Test retrieve
        context = await memory.retrieve(agent_id="test", query="who is Alice?")
        assert len(context.facts) > 0, "No facts retrieved — check embeddings"
        print(f"Retrieve OK: {len(context.facts)} facts found")
        print(f"Context: {context.context}")
    finally:
        await memory.close()


asyncio.run(test_provider())

Key Requirements

  1. LLMResult - complete() returns LLMResult(text=..., usage=...), not str. If your backend doesn't report usage, pass usage=None.

  2. JSON mode - The pipeline sends response_format={"type": "json_object"} frequently. Your provider must return valid JSON when this is set.

  3. Async - Both protocols are async (async def). If your backend SDK is synchronous, wrap calls with asyncio.to_thread().

  4. Empty/error handling - embed_one returns None for empty input. embed returns [] for empty input.

  5. Timeout - Add timeouts to your provider. The SDK sets timeouts on its side, but provider-level timeouts add safety.

  6. Embedding dimensions - Set MemoryConfig(embedding_dimensions=N) to match your provider's output dimensions.