Skip to main content
Open Source · Self-Hosted · v0.1 MVP

Shared Memory.
Unified Soul.

YuYi gives every AI agent you use a shared, persistent memory layer deployed on your own infrastructure, owned by you, and readable by every tool you trust.

Java 21 + Spring Boot·PostgreSQL + pgvector·MCP / CLI / REST
memory-server
# Write a memory from any agent
$ curl -X POST https://memory.local/v1/memories \
    -H "Authorization: Bearer mk-..." \
    -d '{
  "content": "All API responses follow {code, message, data}",
  "memoryType": "project_rule",
  "scope": { "projectId": "p_demo" }
}'

{
  "code": "CREATED",
  "data": {
    "id": "mem_a1b2c3d4",
    "memoryType": "project_rule",
    "importance": 0.9,
    "status": "active"
  }
}

# Recall context for a task from any other agent
$ curl -X POST https://memory.local/v1/recall \
    -d '{ "task": "Implement user registration", "maxTokens": 2000 }'

AI models still operate in memory isolation

01

Closed ecosystems

ChatGPT Memory and Claude Memory are still product silos. Knowledge built in one tool does not naturally flow into the next.

02

Context amnesia

Switch models, clients, or sessions, and the operating context disappears. Agents have to restart from zero over and over again.

03

No real user control

You cannot truly audit what is stored, delete one memory precisely, or migrate data on your own schedule. Your knowledge lives somewhere else.

This is not a bug in one product. Memory has been treated like an app feature, when it should be infrastructure in the same class as databases or version control.

Six principles behind every architectural decision

P1

User data sovereignty first

Memory belongs to you. Self-host it, manage it, export it fully, and delete it for real. No mandatory external account involved.

P2

Explicit control over automation

Stage one writes only when you say so. Auto-write is opt-in, visible, and reversible instead of hidden in product magic.

P3

Memory and history are not the same thing

Stable Memory stores distilled facts. History Recall stores process context. Mixing them degrades retrieval quality almost immediately.

P4

Model and host agnostic

No dependency on any vendor native memory API. Plug in any LLM or embedding provider. If one agent dies, the core still runs.

P5

Deterministic context assembly

The recall pipeline emits a token-budgeted context block that is ready for direct injection. It is not a thin wrapper over document search.

P6

Closed loop first

Ship the smallest verifiable loop first. Layer in complexity only after the foundation has already proved it can carry real work.

Conflict resolution order:
Data SovereigntyExplicit ControlSeparationAgnosticismRecall QualitySimplicity

Six layers, three pipelines, one stable API surface

Layer 1Client / Agent LayerCodex · Antigravity · Claude Desktop · IDE Agent · CLI
Layer 2Access Adapter LayerSkill · Tool · MCP Server · CLI · SDK
Layer 3Memory Orchestrator
Write PipelineRecall PipelineControl Pipeline
Layer 4Memory Storage PlaneStable Memory Store · History Recall Store · Control Metadata
Layer 5Search & RankingLexical · Vector (pgvector) · RRF Fusion · Tag & Scope Filter
Layer 6Storage / InfraPostgreSQL · Redis · Object Store · TLS · Reverse Proxy

Adapters stay thin

Each adapter translates intent into a standard API call. Business logic lives server-side, so adding a new agent does not force a core rewrite.

The orchestrator handles judgment

Layer 3 decides what is worth storing, which layer it belongs to, and what still fits into recall. It is a scheduler, not just a transport layer.

Search degrades gracefully

Vector and lexical retrieval fail independently. If embeddings are missing, lexical search still works. Missing zhparser should never crash the pipeline.

Two kinds of memory. One unified system.

Stable Memory
Long-term, high-fidelity facts
preferenceproject_ruledecisionfactworkflowreferencesummary
  • Low write frequency, high read frequency
  • Versioned so every change stays traceable
  • State machine: active → archived / invalid → deleted
  • semantic_key deduplication prevents clutter
  • importance ≥ 0.9 never gets truncated by the token budget
vs
History Recall
Process context that is allowed to expire
task_summarydecision_tracesession_excerptrecent_progressincident_contextmeeting_note
  • Append-first writes with low overwrite pressure
  • TTL policy support (30d / 90d / 365d / permanent)
  • Looser structure for richer freeform context
  • Lower recall weight that complements stable memory

Memory Judge decides where new content belongs

A three-stage decision chain runs on every write request. Hard rules run first and cannot be overridden by the LLM. The model stays advisory, and the rule fallback is always available.

L1
Hard Guards
Sensitive content, empty payloads, and length violations are blocked deterministically before any model gets a vote.
L2
LLM Judge
The model returns structured JSON and must pass a confidence threshold. If it fails, the write path still keeps moving.
L3
Rule Fallback
Admin-managed rules fill in keyword classification, importance defaults, and semantic_key inference when the model path is unavailable.

Recall isn't search. It's context engineering.

The Recall Orchestrator runs eight steps to produce a RecallContextBlock, a compacted, deduplicated, token-budgeted block that can be injected directly into any agent context window.

1Candidate retrievalHybrid lexical + vector retrieval across both storage planes
2Scoringimportance, recency, and hybrid relevance become one ranking signal
3SortingCandidates are processed in descending composite order
4semantic_key dedupOnly the strongest version survives per semantic cluster
5Conflict detectionMemories carrying hasConflict stay surfaced instead of buried
6Token budgetingprefer controls how much space stable memory and history can consume
7Freshness notesOlder material gets explicit 30/90/365-day freshness warnings
8Block assemblyThe result is a RecallContextBlock ready for direct injection
Token budget by prefer mode:
ModeStableHistory
stable_first
70%
30%
history_first
30%
70%
balanced
50%
50%
stable_only
100%

A stable, versioned REST surface

All endpoints return { code, message, data, httpStatus, timestamp }. Errors use structured MEM-* codes instead of raw HTTP status text.

POST/v1/memoriesWrite a stable memory
// Request body
{
  "content": "All backend APIs return {code, message, data}",
  "title": "Response format convention",
  "memoryType": "project_rule",
  "scope": { "projectId": "p_demo" },
  "semanticKey": "p_demo:rule:api_response_format",
  "importance": 0.9
}

// Response 201
{
  "code": "CREATED",
  "data": { "id": "mem_a1b2c3d4", "status": "active", "version": 1 }
}
POST/v1/history-recordsWrite a history record
{
  "content": "Completed user auth module. MailCodeService handles TTL.",
  "recordKind": "task_summary",
  "scope": { "projectId": "p_demo" },
  "ttlPolicy": "365d"
}

Error code structure

MEM-AUTH*Authentication & authorization
MEM-MEM*Memory operation errors
MEM-RECALL*Recall pipeline errors
MEM-JUDGE*Judge / LLM provider errors
MEM-SEARCH*Embedding & search errors
MEM-SYS*System & validation errors

Running in under five minutes

1

Clone the deploy repo

git clone https://github.com/plumememory/memory-deploy
cd memory-deploy
2

Configure environment

cp .env.example .env
# Set required variables:
MEMORY_BOOTSTRAP_ADMIN_USERNAME=admin
MEMORY_BOOTSTRAP_ADMIN_PASSWORD=change-me-now
POSTGRES_PASSWORD=your-db-password
3

Start the stack

# Development
docker compose -f docker-compose.dev.yml up -d

# Production (with TLS via Caddy/Nginx)
docker compose -f docker-compose.prod.yml up -d
4

Verify the MVP

bash verify-mvp.sh
# ✓ Write stable memory
# ✓ Write history record
# ✓ Task recall
# ✓ Cross-agent read
# ✓ Delete & temporary mode

What's included

  • memory-server — Java Spring Boot core
  • memory-web — React admin console
  • PostgreSQL 16 + pgvector + zhparser
  • Caddy / Nginx reverse proxy examples

Enable vector search

MEMORY_EMBEDDING_ENABLED=true
MEMORY_EMBEDDING_API_KEY=sk-...
MEMORY_EMBEDDING_MODEL=text-embedding-3-small

Enable LLM Judge

MEMORY_JUDGE_LLM_ENABLED=true
MEMORY_LLM_API_KEY=sk-...
MEMORY_LLM_MODEL=gpt-4o-mini
MEMORY_LLM_BASE_URL=https://api.openai.com/v1

Four-repo structure

Javamemory-serverCore business logic & API
TSclient-sharedCLI + MCP Server adapter
Reactmemory-webAdmin & debug console
🐳memory-deployCompose + proxy configs