ADR-004: TF-IDF In-Memory Search

Date: 2026-06-04 Status: Accepted Source: plans/schema-driven-pipeline

Context

Several tools need relevance ranking: search_docs (full-text search), suggest_components (rank components for a described UI), and get_implementation_guide (assemble relevant pieces for a goal). The server is a single local process that must start fast and stay dependency-light, and the corpus is the bounded set of entries in the bundled schema.

Options Considered

Option A: External search engine (e.g. a vector DB / service)

Pros: Powerful semantic search.
Cons: Heavy; network/service dependency; defeats the offline, local, zero-key runtime goal; overkill for a bounded corpus.

Option B: Embeddings computed at runtime

Pros: Semantic relevance.
Cons: Needs a model/API at runtime (keys, cost, latency); non-deterministic.

Option C: TF-IDF index built in memory at startup

Pros: Zero extra dependencies; deterministic; instant for a bounded corpus; trivially rebuildable on reindex.
Cons: Lexical, not semantic — relies on shared vocabulary.

Decision

Chosen option: Option C — build a TF-IDF index in memory from the SchemaStore at startup (src/search/search-index.ts), queried by search-engine.ts.

Rationale

TF-IDF matches the constraints exactly: the corpus is small and bounded, the runtime must be deterministic and offline, and we want no additional dependencies. The index rebuilds cheaply, so reindex can swap in a fresh schema without restarting. Lexical limitations are acceptable for documentation lookup where the agent's query and the docs share vocabulary.

Consequences

Positive

No runtime dependencies, no keys, deterministic results.
Fast startup and cheap reindex.

Negative

Purely lexical; no semantic/synonym matching.

Risks

Vocabulary mismatch could hurt recall — acceptable for now; a future ADR could revisit embeddings if needed.

ADR-004: TF-IDF In-Memory Search ​

Context ​

Options Considered ​

Option A: External search engine (e.g. a vector DB / service) ​

Option B: Embeddings computed at runtime ​

Option C: TF-IDF index built in memory at startup ​

Decision ​

Rationale ​

Consequences ​

Positive ​

Negative ​

Risks ​