Semantic Search · kerkstra.dev

ranked0

passing0

bestcos=0.00

filterall

semantic searchcosine ranking · query · threshold · filter

queryexample

retrievaltop-k5threshold0.36

filtermetadata

how it works

Semantic search turns a query and a corpus into the same kind of embedding vectors, then ranks chunks by cosine similarity. The canvas is using real MiniLM vectors generated at build time: 6 query examples searched against 12 document chunks.

The ranked list is the practical core of RAG. A user question is embedded, the nearest chunks are fetched, and those chunks become context for the language model. The model is not searching strings; it is receiving whatever the vector search considered nearby enough to include.

Top-k controls how many chunks are allowed through. Raising it improves recall but adds noise and context cost. The threshold slider makes a second decision: even if a chunk is in the top-k set, it is rejected when the cosine score is too low.

Metadata filters are not a hack; they are how production retrieval keeps vector similarity grounded. A query for "bank" can land near finance and geography. Filtering by source, tenant, document type, or permission boundary narrows the candidate set before the nearest-neighbor ranking matters.

For small corpora, exact cosine search is fine. At scale, approximate nearest neighbor indexes such as HNSW avoid comparing the query against every vector. That is the retrieval tradeoff: much faster search for a small chance of missing the exact best match.

Lewis et al. 2020 · Retrieval-Augmented Generation pgvector · vector search in Postgres