sleds — Shared AI memory for teams

When someone makes an observation in a SLEDS thread from Claude, that context needs to be available in another team member's Cursor session within seconds. Building this real-time context mesh was one of our hardest engineering challenges.

The Architecture

The context mesh has three layers: ingestion (how context arrives), processing (how it's indexed and linked), and propagation (how it reaches connected tools). Each layer uses different technology optimized for its specific requirements.

Ingestion happens primarily through MCP (Model Context Protocol) connections. When a connected AI tool generates an observation, it sends a structured payload to our API. The API validates and stores the observation, then publishes an event to Redis pub/sub.

Processing Pipeline

Raw observations are useful, but their real value comes from processing. Our BullMQ workers handle three critical tasks: embedding generation for semantic search, automatic thread linking based on content similarity, and summary regeneration when new observations meaningfully change a thread's context.

The embedding pipeline uses a two-stage approach. First, we generate vector embeddings using our embedding model and store them in pgvector. Then we update the full-text search index using PostgreSQL's built-in tsvector. This dual approach lets us combine semantic similarity ("find threads about similar concepts") with keyword matching ("find threads mentioning this specific term").

Propagation

Connected tools don't poll for updates — they receive them. Each active MCP session maintains a lightweight SSE (Server-Sent Events) connection. When a Redis pub/sub event fires for a space, all connected sessions in that space receive a notification. The tool can then pull the updated context on its next interaction.

For the web dashboard, we use a WebSocket connection that provides real-time updates to the activity stream, Pulse briefings, and thread views. This means you can watch your team's AI conversations contribute context in real time.

Performance

The end-to-end latency from observation to availability is typically under two seconds. The embedding generation is the slowest step at 200-400ms per observation. Redis pub/sub propagation is sub-millisecond. Summary regeneration is queued and non-blocking, so it doesn't affect the critical path.

At current scale, the system processes around 50 observations per minute per active sled, with bursts up to 200/min during high activity periods. The BullMQ worker pool auto-scales based on queue depth, and pgvector's HNSW index keeps semantic search under 50ms even as the embedding corpus grows.

Lessons Learned

The biggest lesson was that real-time doesn't mean instant-everything. Some operations (like summary regeneration) can be eventual without affecting user experience. Others (like observation storage and event propagation) need to be fast. Being deliberate about which operations need real-time performance and which can be background-processed kept the system manageable.

Building a Real-time Context Mesh

The Architecture

Processing Pipeline

Propagation

Performance

Lessons Learned

Ready to try shared AI memory?