Embeddings are not just points. They also contain directions. If several word pairs share the same relationship, the displacement between their vectors often points the same way. This is the classic vector arithmetic result.
The direction view uses 3 generated analogies. Each one is resolved in the full 384D MiniLM space, then drawn in the 2D projection. For example, the page computes paris - france + germany, ranks every candidate word, and shows the real top match and cosine score.
This works because training creates a learned representation. The model does not store a symbolic "capital-of" rule, but examples involving capitals and countries produce similar geometric offsets.
The context view shows the limit of static word-level maps. A contextual embedding represents a word occurrence, not just a spelling. The same word can move toward different anchors depending on nearby words. This page includes 2 ambiguous surface words generated as full phrases at build time.
In transformer models, those contextual vectors are produced by self-attention and feed-forward layers. Earlier tokens, nearby nouns, syntax, and long-range references all shape the final representation used by later prediction or retrieval steps.
Mikolov et al. 2013 · Linguistic regularities in vector spaceVaswani et al. 2017 · Attention Is All You Need