openbrain-mcp/README.md

# OpenBrain MCP Server

**High-performance vector memory for AI agents**

OpenBrain is a Model Context Protocol (MCP) server that provides AI agents with a persistent, semantic memory system. It uses local ONNX-based embeddings and PostgreSQL with pgvector for efficient similarity search.

## Features

- 🧠 **Semantic Memory**: Store and retrieve memories using vector similarity search
- 🏠 **Local Embeddings**: No external API calls - uses ONNX runtime with all-MiniLM-L6-v2
- 🐘 **PostgreSQL + pgvector**: Production-grade vector storage with HNSW indexing
- 🔌 **MCP Protocol**: Streamable HTTP plus legacy HTTP+SSE compatibility
- 🔐 **Multi-Agent Support**: Isolated memory namespaces per agent
- ♻️ **Deduplicated Ingest**: Near-duplicate facts are merged instead of stored repeatedly
- ⚡ **High Performance**: Rust implementation with async I/O

## MCP Tools

| Tool | Description |
|------|-------------|
| `store` | Store a memory with automatic embedding generation, optional TTL, and automatic deduplication |
| `batch_store` | Store 1-50 memories atomically in a single call with the same deduplication rules |
| `query` | Search memories by semantic similarity |
| `purge` | Delete memories by agent ID or time range |

## Quick Start

### Prerequisites

- Rust 1.75+
- PostgreSQL 14+ with pgvector extension
- ONNX model files (all-MiniLM-L6-v2)

### Database Setup

```sql
CREATE ROLE openbrain_svc LOGIN PASSWORD 'change-me';
CREATE DATABASE openbrain OWNER openbrain_svc;
\c openbrain
CREATE EXTENSION IF NOT EXISTS vector;
```

Use the same PostgreSQL role for the app and for migrations. Do not create the
`memories` table manually as `postgres` or another owner and then run
OpenBrain as `openbrain_svc`, because later `ALTER TABLE` migrations will fail
with `must be owner of table memories`.

### Configuration

```bash
cp .env.example .env
# Edit .env with your database credentials
```

### Build & Run

```bash
cargo build --release
./target/release/openbrain-mcp migrate
./target/release/openbrain-mcp
```

### Database Migrations

This project uses `refinery` with embedded SQL migrations in `migrations/`.

Run pending migrations explicitly before starting or restarting the service:

```bash
./target/release/openbrain-mcp migrate
```

If you use the deploy script or CI workflow in `.gitea/deploy.sh` and `.gitea/workflows/ci-cd.yaml`, they already run this for you.

### E2E Test Modes

The end-to-end test suite supports two modes:

- Local mode: default. Assumes the test process can manage schema setup against a local PostgreSQL instance and, for one auth-only test, spawn a local `openbrain-mcp` child process.
- Remote mode: set `OPENBRAIN_E2E_REMOTE=true` and point `OPENBRAIN_E2E_BASE_URL` at a deployed server such as `http://your-server.example.com:3100` or `https://memory.example.com`. In this mode the suite does not try to create schema locally and skips the local process auth smoke test.

Recommended env for VPS-backed runs:

```bash
OPENBRAIN_E2E_REMOTE=true
OPENBRAIN_E2E_BASE_URL=https://memory.example.com
OPENBRAIN__AUTH__ENABLED=true
```

### TTL / Expiry

Transient facts can be stored with an optional `ttl` string on `store`, or on
either the batch itself or individual entries for `batch_store`.

Supported units:

- `s` seconds
- `m` minutes
- `h` hours
- `d` days
- `w` weeks

Examples:

- `30s`
- `15m`
- `1h`
- `7d`

Expired memories are filtered from `query` immediately, even before the
background cleanup loop deletes them physically. The cleanup interval is
configured with `OPENBRAIN__TTL__CLEANUP_INTERVAL_SECONDS` and defaults to 300.

The CI workflow uses this remote mode after `main` deploys so e2e coverage validates the VPS deployment rather than the local runner host. It now generates a random per-run e2e key, temporarily appends it to the deployed `OPENBRAIN__AUTH__API_KEYS`, runs the suite, then removes the key and restarts the service.

For live deployments, keep `OPENBRAIN__AUTH__API_KEYS` for persistent non-test access only. The server accepts a comma-separated key list, so a practical split is:

- `prod_live_key` for normal agent traffic
- `smoke_test_key` for ad hoc diagnostics

In Gitea Actions, that means:

- repo secret `OPENBRAIN__AUTH__API_KEYS=prod_live_key,smoke_test_key`

If you want prod e2e coverage without leaving a standing CI key on the server, the workflow-generated ephemeral key handles that automatically.

### Deduplication on Ingest

OpenBrain checks every `store` and `batch_store` write for an existing memory in
the same `agent_id` namespace whose vector similarity meets the configured
dedup threshold.

Default behavior:

- deduplication is always on
- only same-agent memories are considered
- expired memories are ignored
- if a duplicate is found, the existing memory is refreshed instead of inserting a new row
- metadata is merged with new keys overriding old values
- `created_at` is updated to `now()`
- `expires_at` is preserved unless the new write supplies a fresh TTL

Configure the threshold with either:

- `OPENBRAIN__DEDUP__THRESHOLD=0.90`
- `DEDUP_THRESHOLD=0.90`

Tool responses expose whether a write deduplicated an existing row via the
`deduplicated` flag. `batch_store` also returns a `status` of either
`stored` or `deduplicated` per entry.

## Agent Zero Developer Prompt

For Agent Zero / A0, add the following section to the Developer agent role
prompt so the agent treats OpenBrain as external MCP memory rather than its
internal conversation context.

Recommended target file in A0:

```text
/a0/agents/developer/prompts/agent.system.main.role.md
```

```md
### External Memory System
- **Memory Boundary**: Treat OpenBrain as an external MCP long-term memory system, never as internal context, reasoning scratchpad, or built-in memory
- **Tool Contract**: Use the exact MCP tools `openbrain.store`, `openbrain.query`, and `openbrain.purge`
- **Namespace Discipline**: Always use the exact `agent_id` value `openbrain`
- **EXTRAS First**: Before calling `openbrain.query`, check the `[EXTRAS]` section for pre-loaded memories or handoff facts related to the same topic. If the needed context is already present, do not query OpenBrain again.
- **Session Cache**: If the same topic was already queried earlier in the current conversation and the result is still in context, reuse that result instead of querying again unless the user references new external information or the prior result is clearly insufficient.
- **Retrieval First**: Before answering requests that may depend on prior sessions, project history, user preferences, ongoing work, named people, named projects, deployments, debugging history, or handoff context, call `openbrain.query` only when `[EXTRAS]` and the current conversation do not already provide the needed context.
- **Query Strategy**: Use noun-heavy search phrases with exact names, tool names, acronyms, hostnames, and document names; query first with `(threshold=0.15, limit=8)`, then retry once with `(threshold=0.05, limit=10)` only if the first pass returns zero useful results
- **Storage Strategy**: When a durable fact is established, call `openbrain.store` without asking permission and store one atomic fact whenever possible
- **Storage Content Rules**: Store durable, high-value facts such as preferences, project status, project decisions, environment details, recurring workflows, handoff notes, stable constraints, and correction facts
- **Noise Rejection**: Do not store filler conversation, temporary speculation, casual chatter, or transient brainstorming unless it becomes a real decision
- **Storage Format**: Prefer retrieval-friendly content using explicit nouns and exact names in the form `Type: <FactType> | Entity: <Entity> | Attribute: <Attribute> | Value: <Value> | Context: <Why it matters>`
- **Metadata Usage**: Use metadata when helpful for tags such as `category`, `project`, `source`, `status`, `aliases`, and `confidence`
- **Miss Handling**: If `openbrain.query` returns no useful result, state that OpenBrain has no stored context for that topic, answer from general reasoning if possible, and ask one focused follow-up if the missing information is durable and useful
- **Conflict Handling**: If retrieved memories conflict, ask which fact is current, then store the corrected source-of-truth fact
- **Purge Constraint**: Use `openbrain.purge` cautiously because it is coarse-grained; it deletes by `agent_id` and optionally before a timestamp, not by individual memory ID
- **Correction Policy**: For ordinary corrections, prefer storing the new source-of-truth fact instead of purging unless the user explicitly asks for cleanup or reset
```

## MCP Integration

OpenBrain exposes both MCP HTTP transports:

```
Streamable HTTP Endpoint: http://localhost:3100/mcp
Legacy SSE Endpoint: http://localhost:3100/mcp/sse
Legacy Message Endpoint: http://localhost:3100/mcp/message
Health Check: http://localhost:3100/mcp/health
```

Use the streamable HTTP endpoint for modern clients such as Codex. Keep the
legacy SSE endpoints for older MCP clients that still use the deprecated
2024-11-05 HTTP+SSE transport.

Header roles:
- `X-Agent-ID` is the memory namespace. Keep this stable if multiple clients
  should share the same OpenBrain memories.
- `X-Agent-Type` is an optional client profile label for logging and config
  clarity, such as `agent-zero` or `codex`.

### Example: Codex Configuration

```toml
[mcp_servers.openbrain]
url = "https://memory.example.com/mcp"
http_headers = { "X-API-Key" = "YOUR_OPENBRAIN_API_KEY", "X-Agent-ID" = "openbrain", "X-Agent-Type" = "codex" }
```

### Example: Agent Zero Configuration

```json
{
  "mcpServers": {
    "openbrain": {
      "url": "https://memory.example.com/mcp/sse",
      "headers": {
        "X-API-Key": "YOUR_OPENBRAIN_API_KEY",
        "X-Agent-ID": "openbrain",
        "X-Agent-Type": "agent-zero"
      }
    }
  }
}
```

Agent Zero should keep using the legacy HTTP+SSE transport unless and until its
client runtime supports streamable HTTP. Codex should use `/mcp`.

### Example: Store a Memory

```json
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "store",
    "arguments": {
      "content": "The user prefers dark mode and uses vim keybindings",
      "agent_id": "assistant-1",
      "ttl": "7d",
      "metadata": {"source": "preferences"}
    }
  }
}
```

### Example: Query Memories

```json
{
  "jsonrpc": "2.0",
  "id": 2,
  "method": "tools/call",
  "params": {
    "name": "query",
    "arguments": {
      "query": "What are the user's editor preferences?",
      "agent_id": "assistant-1",
      "limit": 5,
      "threshold": 0.6
    }
  }
}
```

### Example: Batch Store Memories

```json
{
  "jsonrpc": "2.0",
  "id": 3,
  "method": "tools/call",
  "params": {
    "name": "batch_store",
    "arguments": {
      "agent_id": "assistant-1",
      "entries": [
        {
          "content": "The user prefers dark mode",
          "ttl": "24h",
          "metadata": {"category": "preference"}
        },
        {
          "content": "The user uses vim keybindings",
          "metadata": {"category": "preference"}
        }
      ]
    }
  }
}
```

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│                    AI Agent                              │
└─────────────────────┬───────────────────────────────────┘
                      │ MCP Protocol (Streamable HTTP / Legacy SSE)
┌─────────────────────▼───────────────────────────────────┐
│              OpenBrain MCP Server                        │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │   store     │  │   query     │  │   purge     │      │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘      │
│         │                │                │              │
│  ┌──────▼────────────────▼────────────────▼──────┐      │
│  │           Embedding Engine (ONNX)              │      │
│  │           all-MiniLM-L6-v2 (384d)              │      │
│  └──────────────────────┬────────────────────────┘      │
│                         │                                │
│  ┌──────────────────────▼────────────────────────┐      │
│  │         PostgreSQL + pgvector                  │      │
│  │         HNSW Index for fast search             │      │
│  └────────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────────┘
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `OPENBRAIN__SERVER__HOST` | `0.0.0.0` | Server bind address |
| `OPENBRAIN__SERVER__PORT` | `3100` | Server port |
| `OPENBRAIN__DATABASE__HOST` | `localhost` | PostgreSQL host |
| `OPENBRAIN__DATABASE__PORT` | `5432` | PostgreSQL port |
| `OPENBRAIN__DATABASE__NAME` | `openbrain` | Database name |
| `OPENBRAIN__DATABASE__USER` | - | Database user |
| `OPENBRAIN__DATABASE__PASSWORD` | - | Database password |
| `OPENBRAIN__EMBEDDING__MODEL_PATH` | `models/all-MiniLM-L6-v2` | ONNX model path |
| `OPENBRAIN__AUTH__ENABLED` | `false` | Enable API key auth |

## License

MIT