Open Source · Self-Hosted · v0.1 MVP

Shared Memory.
Unified Soul.

YuYi gives every AI agent you use a shared, persistent memory layer deployed on your own infrastructure, owned by you, and readable by every tool you trust.

Quick Deploy Read Docs

Java 21 + Spring Boot·PostgreSQL + pgvector·MCP / CLI / REST

memory-server

# Write a memory from any agent
$ curl -X POST https://memory.local/v1/memories \
    -H "Authorization: Bearer mk-..." \
    -d '{
  "content": "All API responses follow {code, message, data}",
  "memoryType": "project_rule",
  "scope": { "projectId": "p_demo" }
}'

{
  "code": "CREATED",
  "data": {
    "id": "mem_a1b2c3d4",
    "memoryType": "project_rule",
    "importance": 0.9,
    "status": "active"
  }
}

# Recall context for a task from any other agent
$ curl -X POST https://memory.local/v1/recall \
    -d '{ "task": "Implement user registration", "maxTokens": 2000 }'

The Problem

AI models still operate in memory isolation

Closed ecosystems

ChatGPT Memory and Claude Memory are still product silos. Knowledge built in one tool does not naturally flow into the next.

Context amnesia

Switch models, clients, or sessions, and the operating context disappears. Agents have to restart from zero over and over again.

No real user control

You cannot truly audit what is stored, delete one memory precisely, or migrate data on your own schedule. Your knowledge lives somewhere else.

This is not a bug in one product. Memory has been treated like an app feature, when it should be infrastructure in the same class as databases or version control.

Design Philosophy

Six principles behind every architectural decision

User data sovereignty first

Memory belongs to you. Self-host it, manage it, export it fully, and delete it for real. No mandatory external account involved.

Explicit control over automation

Stage one writes only when you say so. Auto-write is opt-in, visible, and reversible instead of hidden in product magic.

Memory and history are not the same thing

Stable Memory stores distilled facts. History Recall stores process context. Mixing them degrades retrieval quality almost immediately.

Model and host agnostic

No dependency on any vendor native memory API. Plug in any LLM or embedding provider. If one agent dies, the core still runs.

Deterministic context assembly

The recall pipeline emits a token-budgeted context block that is ready for direct injection. It is not a thin wrapper over document search.

Closed loop first

Ship the smallest verifiable loop first. Layer in complexity only after the foundation has already proved it can carry real work.

Conflict resolution order:

Data Sovereignty›Explicit Control›Separation›Agnosticism›Recall Quality›Simplicity

Architecture

Six layers, three pipelines, one stable API surface

Layer 1Client / Agent LayerCodex · Antigravity · Claude Desktop · IDE Agent · CLI

↓

Layer 2Access Adapter LayerSkill · Tool · MCP Server · CLI · SDK

↓

Layer 3Memory Orchestrator

Write PipelineRecall PipelineControl Pipeline

↓

Layer 4Memory Storage PlaneStable Memory Store · History Recall Store · Control Metadata

↓

Layer 5Search & RankingLexical · Vector (pgvector) · RRF Fusion · Tag & Scope Filter

↓

Layer 6Storage / InfraPostgreSQL · Redis · Object Store · TLS · Reverse Proxy

Adapters stay thin

Each adapter translates intent into a standard API call. Business logic lives server-side, so adding a new agent does not force a core rewrite.

The orchestrator handles judgment

Layer 3 decides what is worth storing, which layer it belongs to, and what still fits into recall. It is a scheduler, not just a transport layer.

Search degrades gracefully

Vector and lexical retrieval fail independently. If embeddings are missing, lexical search still works. Missing zhparser should never crash the pipeline.

Core Design

Two kinds of memory. One unified system.

Stable Memory

Long-term, high-fidelity facts

preferenceproject_ruledecisionfactworkflowreferencesummary

Low write frequency, high read frequency
Versioned so every change stays traceable
State machine: active → archived / invalid → deleted
semantic_key deduplication prevents clutter
importance ≥ 0.9 never gets truncated by the token budget

History Recall

Process context that is allowed to expire

task_summarydecision_tracesession_excerptrecent_progressincident_contextmeeting_note

Append-first writes with low overwrite pressure
TTL policy support (30d / 90d / 365d / permanent)
Looser structure for richer freeform context
Lower recall weight that complements stable memory

Memory Judge decides where new content belongs

A three-stage decision chain runs on every write request. Hard rules run first and cannot be overridden by the LLM. The model stays advisory, and the rule fallback is always available.

Hard Guards

Sensitive content, empty payloads, and length violations are blocked deterministically before any model gets a vote.

→

LLM Judge

The model returns structured JSON and must pass a confidence threshold. If it fails, the write path still keeps moving.

→

Rule Fallback

Admin-managed rules fill in keyword classification, importance defaults, and semantic_key inference when the model path is unavailable.

Recall Orchestrator

Recall isn't search. It's context engineering.

The Recall Orchestrator runs eight steps to produce a RecallContextBlock, a compacted, deduplicated, token-budgeted block that can be injected directly into any agent context window.

1Candidate retrievalHybrid lexical + vector retrieval across both storage planes

2Scoringimportance, recency, and hybrid relevance become one ranking signal

3SortingCandidates are processed in descending composite order

4semantic_key dedupOnly the strongest version survives per semantic cluster

5Conflict detectionMemories carrying hasConflict stay surfaced instead of buried

6Token budgetingprefer controls how much space stable memory and history can consume

7Freshness notesOlder material gets explicit 30/90/365-day freshness warnings

8Block assemblyThe result is a RecallContextBlock ready for direct injection

Token budget by prefer mode:

ModeStableHistory

stable_first

70%

30%

history_first

30%

70%

balanced

50%

stable_only

100%—

API Reference

A stable, versioned REST surface

All endpoints return { code, message, data, httpStatus, timestamp }. Errors use structured MEM-* codes instead of raw HTTP status text.

POST/v1/memoriesWrite a stable memory

// Request body
{
  "content": "All backend APIs return {code, message, data}",
  "title": "Response format convention",
  "memoryType": "project_rule",
  "scope": { "projectId": "p_demo" },
  "semanticKey": "p_demo:rule:api_response_format",
  "importance": 0.9
}

// Response 201
{
  "code": "CREATED",
  "data": { "id": "mem_a1b2c3d4", "status": "active", "version": 1 }
}

POST/v1/history-recordsWrite a history record

{
  "content": "Completed user auth module. MailCodeService handles TTL.",
  "recordKind": "task_summary",
  "scope": { "projectId": "p_demo" },
  "ttlPolicy": "365d"
}

Error code structure

MEM-AUTH*Authentication & authorization

MEM-MEM*Memory operation errors

MEM-RECALL*Recall pipeline errors

MEM-JUDGE*Judge / LLM provider errors

MEM-SEARCH*Embedding & search errors

MEM-SYS*System & validation errors

Deployment

Running in under five minutes

Clone the deploy repo

git clone https://github.com/plumememory/memory-deploy
cd memory-deploy

Configure environment

cp .env.example .env
# Set required variables:
MEMORY_BOOTSTRAP_ADMIN_USERNAME=admin
MEMORY_BOOTSTRAP_ADMIN_PASSWORD=change-me-now
POSTGRES_PASSWORD=your-db-password

Start the stack

# Development
docker compose -f docker-compose.dev.yml up -d

# Production (with TLS via Caddy/Nginx)
docker compose -f docker-compose.prod.yml up -d

Verify the MVP

bash verify-mvp.sh
# ✓ Write stable memory
# ✓ Write history record
# ✓ Task recall
# ✓ Cross-agent read
# ✓ Delete & temporary mode

What's included

memory-server — Java Spring Boot core
memory-web — React admin console
PostgreSQL 16 + pgvector + zhparser
Caddy / Nginx reverse proxy examples

Enable vector search

MEMORY_EMBEDDING_ENABLED=true
MEMORY_EMBEDDING_API_KEY=sk-...
MEMORY_EMBEDDING_MODEL=text-embedding-3-small

Enable LLM Judge

MEMORY_JUDGE_LLM_ENABLED=true
MEMORY_LLM_API_KEY=sk-...
MEMORY_LLM_MODEL=gpt-4o-mini
MEMORY_LLM_BASE_URL=https://api.openai.com/v1

Four-repo structure

Javamemory-serverCore business logic & API

TSclient-sharedCLI + MCP Server adapter

Reactmemory-webAdmin & debug console

🐳memory-deployCompose + proxy configs

Shared Memory.Unified Soul.