Add server-side deduplication on ingest

This commit is contained in:
Agent Zero
2026-03-24 05:40:30 +00:00
parent 5d5c042dd1
commit 61d6448b44
8 changed files with 421 additions and 23 deletions

View File

@@ -11,14 +11,15 @@ OpenBrain is a Model Context Protocol (MCP) server that provides AI agents with
- 🐘 **PostgreSQL + pgvector**: Production-grade vector storage with HNSW indexing
- 🔌 **MCP Protocol**: Streamable HTTP plus legacy HTTP+SSE compatibility
- 🔐 **Multi-Agent Support**: Isolated memory namespaces per agent
- ♻️ **Deduplicated Ingest**: Near-duplicate facts are merged instead of stored repeatedly
-**High Performance**: Rust implementation with async I/O
## MCP Tools
| Tool | Description |
|------|-------------|
| `store` | Store a memory with automatic embedding generation and optional TTL for transient facts |
| `batch_store` | Store 1-50 memories atomically in a single call |
| `store` | Store a memory with automatic embedding generation, optional TTL, and automatic deduplication |
| `batch_store` | Store 1-50 memories atomically in a single call with the same deduplication rules |
| `query` | Search memories by semantic similarity |
| `purge` | Delete memories by agent ID or time range |
@@ -123,6 +124,31 @@ In Gitea Actions, that means:
If you want prod e2e coverage without leaving a standing CI key on the server, the workflow-generated ephemeral key handles that automatically.
### Deduplication on Ingest
OpenBrain checks every `store` and `batch_store` write for an existing memory in
the same `agent_id` namespace whose vector similarity meets the configured
dedup threshold.
Default behavior:
- deduplication is always on
- only same-agent memories are considered
- expired memories are ignored
- if a duplicate is found, the existing memory is refreshed instead of inserting a new row
- metadata is merged with new keys overriding old values
- `created_at` is updated to `now()`
- `expires_at` is preserved unless the new write supplies a fresh TTL
Configure the threshold with either:
- `OPENBRAIN__DEDUP__THRESHOLD=0.90`
- `DEDUP_THRESHOLD=0.90`
Tool responses expose whether a write deduplicated an existing row via the
`deduplicated` flag. `batch_store` also returns a `status` of either
`stored` or `deduplicated` per entry.
## Agent Zero Developer Prompt
For Agent Zero / A0, add the following section to the Developer agent role