Embedding Space · kerkstra.dev

top cosine0.00

focus—

dim384D

vocab75

embedding spacehover · drag · query

queryword

layers

run

how it works

Computers don't understand words. A word embedding maps each word to a dense vector of floats. The ones on this page are real: 75 words embedded in 384 dimensions by all-MiniLM-L6-v2 at build time, projected to 2D with UMAP. Every cosine score and nearest-neighbor list you see is computed in the full 384D space. Dragging a word around the plane does not change what it is close to.

the 75 seeds · closest in-cluster pair shown with real 384D cosine

animals 12: cat · dog · fish · bird · horse · mouse · lion · wolf · bear · tiger · eagle · shark; closest: lion ↔ tiger cos=0.67
actions 13: run · walk · jump · swim · fly · climb · sprint · leap · crawl · dash · ran · walked · jumped; closest: jump ↔ jumped cos=0.87
emotions 10: happy · sad · angry · calm · excited · fearful · proud · shy · brave · gentle; closest: happy ↔ excited cos=0.58
colors 10: red · blue · green · yellow · purple · orange · black · white · pink · gray; closest: black ↔ white cos=0.76
size 9: big · small · tiny · huge · giant · large · vast · little · massive; closest: small ↔ tiny cos=0.91
food 9: bread · rice · meat · fruit · cake · soup · pasta · salad · cheese; closest: pasta ↔ cheese cos=0.64
capitals 6: paris · berlin · tokyo · london · rome · madrid; closest: berlin ↔ london cos=0.59
countries 6: france · germany · japan · england · italy · spain; closest: germany ↔ england cos=0.78

The vectors are learned, not hand-crafted. During training, words appearing in similar contexts get pushed toward nearby points. No dictionary, no labels. "Cat" and "dog" converge because they share grammatical slots; "happy" and "calm" converge for the same reason. Hover any word to see its top-5 neighbors with real cosine scores.

Cosine similarity measures angle between vectors: cos(a, b) = a·b / (‖a‖·‖b‖). Unit-normalized vectors (like these) reduce that to a simple dot product. 1.0 is identical direction, 0 is orthogonal, −1 is opposite. Synonyms cluster above 0.7; unrelated words fall below 0.3.

The striking property is that directions encode relationships. The vector from france to paris is parallel to germany to berlin. So paris − france + germany ≈ berlin, and the page proves it by resolving the arithmetic against the full vocabulary and showing the actual top match with its cosine score. Not programmed. It emerges because the model learned "capital-of" as a consistent contextual shift.

In production, embeddings become a database column. Each 384D float32 vector is 1.5 KB; a million of them is ~1.5 GB before indexing. pgvector stores them in Postgres with cosine, L2, and inner product operators. For anything beyond a few thousand rows, approximate nearest neighbor indexes like HNSW deliver sub-millisecond retrieval at ~0.98 recall. A semantic search query is one line: SELECT ... ORDER BY emb <=> $1 LIMIT 10.

Modern LLMs extend this with contextual embeddings: one vector per token occurrence, computed from surrounding context. "Bank" near "river" and "bank" near "money" get different vectors. The static word-level space shown here is the substrate that contextual models are trained on top of.

Mikolov et al. 2013 · Word2Vec Reimers & Gurevych 2019 · Sentence-BERT (MiniLM lineage)McInnes et al. 2018 · UMAP pgvector · Postgres extension