so-yesterday.ai

An AI-native knowledge platform — and a reference architecture for secure, traceable, agent-ready AI.

Project duration december 2025 - ongoing

A production, hybrid-intelligence knowledge platform that curates and continuously refreshes the fast-moving frontier of AI — so people and AI agents alike can tell durable signal from hype and avoid building on approaches that are already "so yesterday". Every synthesized insight is traceable back to its sources and recorded in a cryptographically verifiable audit trail, and each user's private notes stay in a space the central system never reads.

Live at so-yesterday.ai, the platform is both a public resource and Abelium's working proving ground for a secure, three-layer architecture that we are extending toward regulated environments where AI must work over confidential documents.

The Challenge & Technical Context

In no domain does the functional lifespan of tools, frameworks, and best practices shrink faster than in AI itself. Models, agent patterns, and workflows that were state of the art a few months ago are routinely superseded; teams invest in approaches already sliding into obsolescence, while practitioners drown in informational noise. Most AI-news tools broadcast what is emerging — but leave a blind spot around what is durable versus what is becoming "so yesterday."

Two problems compound this. First, trust: in high-stakes settings, a summary is worthless unless you can trace it back to an authoritative source and prove it has not been tampered with. Second, sovereignty: organizations increasingly cannot send confidential material to external AI services at all.

so-yesterday.ai was engineered as a curated knowledge base and analytics engine that answers all needs at once. It implements a three-layer information architecture

sources of truth → a publicly synthesized knowledge index → each user's private knowledge
built on a hybrid AI architecture that
fuses localized, data-compliant on-premise open-source models (run on-box for chat and embeddings) with optional high-reasoning frontier API models (e.g., Anthropic Claude, OpenAI GPT, Google Gemini).

It runs on a containerized Python/FastAPI backend with a hybrid keyword + semantic search engine and a React front end, in production since early 2026 with a daily synthesizing pipeline. The same architecture — agentic, provenance-bearing, auditable, and data-sovereign — is exactly what regulated organizations need, which is why so-yesterday.ai doubles as a reference implementation.

Core Structural Subpages & Specialized Technical Modules

The platform is organized as a set of functional modules — the same building blocks that make it a reference implementation for secure, agent-ready AI.

1. Autonomous AI Agents & the Cortex Engine

At the heart of the system is Cortex, an always-on layer of autonomous agents that maintain the knowledge base under human supervision rather than acting unchecked.

Continuous Maintenance Agents: Scheduled agents read the platform's own canon and community content, detect structural signals — co-occurring concepts, mis-classified tags, coverage gaps, contradictions, confidence decay — and file structured proposals into a moderator queue. Crucially, agents never publish directly: a moderator accepts or rejects each proposal, and the decision is recorded permanently. This human-in-the-loop pattern carries the five properties a governed agent layer needs — persistent state, clear ownership, defined actions, a queryable history, and a permission model.
Hybrid, Auditable Pipeline: Local open-source models perform fast, low-cost classification; heavier synthesis can be routed to frontier models. Every item flows through a configurable, fully logged pipeline of classification → synthesis → human review, so the system's behavior is reconstructable after the fact rather than opaque.

2. The Model Context Protocol (MCP) Integration Gateway

To make the knowledge base equally consumable by humans and by AI agents, so-yesterday.ai natively implements the open Model Context Protocol (MCP).

Standardized Ecosystem Integration: The portal runs an MCP server exposing a suite of typed tools, resources, and prompts — search, latest digests, concept and video lookups, knowledge-graph access, and moderated write actions — secured with OAuth 2.1 (PKCE). Any MCP-capable client integrates without a bespoke connector, collapsing the integration matrix from an unscalable N × M problem to a clean N + M topology.
Open by Design: The same corpus is published as open markdown under a "file = URL" contract, alongside an agent-readable capability descriptor — so an agent can integrate simply by reading the public data repository. The platform is built for the AI ecosystem, not merely served to it.

3. Three-Layer Knowledge Architecture — Provenance & a Cryptographic Audit Trail

The defining property of the platform is that every answer can be traced to its sources, while each user's private exploration stays invisible to the central system.

Synthesis-at-Write-Time with Provenance: 400+ sources of truth (video transcripts, essays, daily digests, curated web pages, public posts) feed a public index of synthesized concepts (the Karpathy LLM Wiki pattern), each carrying a provenance field that links back to the authoritative source. An indexing pipeline converts raw material into AI-readable text and embeddings, powering hybrid keyword + semantic search in which every result links back to its origin.
Tamper-Evident Audit & Private Personal Space: Every change to the public index is written into a hash-chained, cryptographically signed audit trail with periodic checkpoints and independent public verification — the kind of end-to-end traceability that regulations such as the EU AI Act and DORA call for. Meanwhile, each user's personal notes live in their own independent space (which they can disconnect at any time without losing data); the server retains only an encrypted access token, never the note contents.

4. Secure, Data-Sovereign Architecture & the Regulated-Sector Direction

For organizations where where the data goes matters as much as the answer, the platform is designed to keep processing local.

Local-First Processing & Guardrails: Because chat and embeddings run on on-premise open-source models, sensitive text can be processed within the operator's own perimeter; frontier models are an optional accelerator, not a dependency. Inbound fetches are guarded against internal-network probing, all content is sanitized server- and client-side, and attachments pass a strict host allow-list — closing the common prompt-injection and exfiltration vectors at the boundary.
Extending to Regulated Environments (research direction): Abelium is adapting this same three-layer pattern for public administration, justice, finance, and healthcare — adding per-user and per-group access control tied to the client's identity provider (SI-PASS, Keycloak, Microsoft Entra ID), isolated execution with a controlled one-way egress, and a regulatory documentation pack (EU AI Act, DORA, NIS 2, BSI C5). so-yesterday.ai is the production proving ground; the regulated, access-controlled system is active research and development.

Platform Capabilities in Production

Beyond the architecture, so-yesterday.ai ships a rich, openly accessible product — all live and free to browse:

Daily AI Digests & Essays: A daily, ranked briefing of the five developments that matter, plus long-form essays with original frameworks — analysis with technical depth, not press-release summaries.
Interactive Knowledge Graph & Hybrid Search: A live, navigable graph of synthesized concepts and their relationships, and a search that blends keyword and semantic matching with full provenance.
AI Persona Chat (multilingual): Chat with AI-transformation personas in English or Slovenian, in persona or coach mode, grounded in the canon with inline citations so answers stay accountable.
Typed Content Templates: Each video is rendered through the right lens — tech-overview, tutorial, podcast, concept, opinion-essay, or engineering-deepdive — instead of a one-size-fits-all summary.
BYOT — Bring Your Own Template/Model: Curators can plug in their own frontier model to author richer, higher-fidelity renders, with the on-box model as the default and a human moderator publishing the result — a concrete demonstration of the hybrid local/frontier design and the human-in-the-loop governance model.
The "So Yesterday Meter": An interactive, reflective gauge that turns the platform's thesis into a tool — a calibrated read on how current (versus "so yesterday") a given tool, practice, or take really is, designed to prompt reflection rather than hype.
Agent-Ready by Default: A documented REST API, the MCP server, the open data repository, and agent-aware responses make the whole corpus a reusable building block in AI-native workflows.

Main Project Objectives

Architect a Hybrid LLM Orchestration Layer: Operate routing that balances local open-source models (for cost, throughput, and data sovereignty) with optional frontier models (for advanced synthesis), so quality scales without surrendering privacy or control.
Make Provenance and Auditability First-Class: Guarantee that every synthesized answer is traceable to authoritative sources and recorded in a tamper-evident, independently verifiable audit trail — the foundation for trustworthy AI in high-stakes settings.
Standardize Agent Access via MCP: Mature the MCP server and the open "file = URL" data contract so the knowledge base is a first-class, low-friction building block in the AI-agent ecosystem.
Extend the Secure, Data-Sovereign Substrate to Regulated Sectors: Build on the local-first architecture toward per-user/per-group access control, isolated execution, and a controlled one-way egress, validated against EU AI Act, DORA, NIS 2, and BSI C5 requirements — developed with research and pilot partners.

Expected Strategic Outcomes & Impact

Stay Ahead of AI Obsolescence: Companies, teams and individuals get a continuously maintained, framework-rich view of the AI frontier — so they retire "so yesterday" approaches early instead of investing in methods already in decline.
Trustworthy, Traceable Knowledge: Because every claim links to its source and every change is cryptographically logged, decisions made on the platform's output are defensible — a prerequisite for adoption in regulated and high-assurance environments.
Knowledge Built for Humans and Agents: The same curated corpus is exposed through a human web experience and standardized agent interfaces (MCP, open markdown), making it durable, reusable infrastructure — and a proving ground for secure AI over confidential documents.

This research initiative — spanning hybrid local/frontier AI orchestration, Model Context Protocol standardization, and secure, verifiable, data-sovereign knowledge systems — is developed by Abelium's math-first R&D team, together with research and pilot partners and modern cloud security pioneers, as an openly inspectable, AI-native exemplar: an AI product that tracks AI, where the team's own transformation is the proof of capability.

Similar projects

DIGITRUST - A digital transformation of a business with innovative digital solutions and a decentralized system
Darklens - Automating the Detection of Multiple-Lensed Galaxies. Funded under the ESA PRODEX program.
AI-FINREG - Regulation of modern FinTech services and protection of personal data