Imagine searching for "a sad song about missing someone" and finding exactly the right track — even though none of those words appear in the title or lyrics. Or asking a customer support system "my order hasn't shown up" and having it surface the right help article, even though the article talks about shipping delays and tracking numbers, not the specific words you typed. These are problems that keyword search was never designed to solve. Keywords match text; they don't understand what you mean.
What makes this kind of search work is a different representation of data entirely. Instead of words or metadata, you represent content as vectors — lists of numbers produced by machine learning models trained to capture meaning. The model has compressed a song, a sentence, a product description into a point in a high-dimensional space such that things with similiar meanings end up geometrically close to each other. Your query gets the same treatment. Then the problem reduces to geometry: find the points in a database of millions that sit closest to yours.
This works because the models learn relationships, not just content. "Puppy" and "kitten" land near each other not because someone labeled them as similar, but because the model absorbed that pattern from enormous amounts of text. The geometry encodes what the model learned about the world.
Vectors in Practice
A 1536-dimensional vector — the size that OpenAI's
text-embedding-3-small
produces — is just a point in a very large space. You can't visualize 1536
dimensions, but you can think about it the same way you think about points on a
map. Two cities close together share climate, culture, maybe language. Two
vectors close together share meaning.
This geometric property is what makes embeddings so useful. You can encode text, images, user behavior, audio — almost anything — and then ask "what's close to this?" to get semantically relevant results. Recommendation engines, semantic search, image similarity, anomaly detection: all of these reduce to the same operation.
In two dimensions you can see it clearly: animal words cluster together, vehicle words cluster together, and the gap between the clusters is large. At 1536 dimensions the same principle holds — you just can't draw it.
The Brute Force Wall
To find similar vectors, you need a way to measure similarity. The most common approach is cosine similarity — the angle between two vectors, regardless of their magnitude. Vectors pointing in the same direction score close to 1; vectors pointing in opposite directions score close to -1. For normalized vectors, this is equivalent to computing the dot product, which is fast to calculate on modern hardware. Euclidean distance is another option: the straight-line gap between two points in space. Both measures give you a single number that answers "how alike are these two things?", and the choice rarely matters much in practice once your vectors are normalized.
With that in hand, the most direct way to find the nearest neighbor to a query vector is also the most honest: compute the distance from the query to every vector in the database and return the closest one. Exact, reliable, and perfectly fine at small scale.
The problem shows up as your dataset grows. At a hundred million vectors, you're looking at a hundred million distance calculations per query. Even if each one takes a single microsecond — which is optimistic for high-dimensional dot products — that's a hundred seconds of compute before you can return a result. More hardware helps at the margins, but you can't parallelise your way out of the fundamental O(n) cost.
The natural next step is to reach for a spatial index. B-trees and related structures work beautifully for sorted, one-dimensional data: you partition the space in half at each node, skip entire subtrees, and find your answer in O(log n) time. For a long time, researchers hoped similar tree-based approaches — k-d trees, ball trees — would transfer to higher dimensions. In practice, they don't. The deeper problem goes by the name the curse of dimensionality: as the number of dimensions grows, the volume of space increases so rapidly that data becomes spread out. Points that seem geometrically "nearby" stop being meaningfully closer than points that seem "far away" — every point is roughly equidistant from every other. A tree that prunes branches by reasoning about distance loses its ability to skip large portions of the search space, because no region is actually safe to ignore.
The Small World Idea
HNSW — Hierarchical Navigable Small World — didn't appear from nowhere. It's the result of layering a hierarchy on top of an earlier structure called NSW, Navigable Small World graphs. To understand why HNSW works, it helps to start with NSW and the idea it borrowed from an unlikely place: social networks.
The observation is that any two people on Earth are connected through a surprisingly short chain of acquaintances — six degrees of separation. What makes this possible is that social networks aren't purely local. They contain some long-range connections that bridge distant clusters and act as shortcuts across the whole graph.
NSW applies the same idea to vectors. Build a graph where each node is a vector, and each node connects to a mix of nearby neighbors and a few distant ones. When you want to find the nearest neighbor to a query, you pick an entry point and greedily hop to whichever neighbor is closest to your target. You keep hopping until no neighbor is closer than where you currently are. Because some edges span large distances, you can cross the space in O(log n) hops rather than scanning every node.
The purple path shows greedy traversal: from start, always move to whichever connected node is geometrically closest to the query point. Four hops reach the destination instead of scanning all eight nodes.
Flat NSW has a problem, though. As the graph grows, the entry points become bottlenecks. Long-range connections get overwhelmed by the volume of local ones, and the quality of the greedy search degrades. You need more hops. The logarithmic behavior breaks down.
Adding the Hierarchy
The 2018 paper by Malkov and Yashunin fixes this with layers. Think of it like a road network. At the top you have motorways connecting distant cities. One layer down you have A-roads connecting towns. At the bottom you have every street connecting individual buildings. You use the motorway to get close, then switch to smaller roads to navigate the final stretch.
Insertion starts by deciding how high the new point reaches. The maximum
layer is drawn from an exponential distribution — floor(-ln(random()) * mL),
where mL is a normalisation factor tied to M. This means most points land
only in layer 0, a smaller fraction reach layer 1, fewer still reach layer 2,
and so on. The geometry mirrors a skip list: the top layers stay sparse by
design.
Once the layer is decided, the algorithm descends from the top of the graph to
the insertion layer using greedy search, exactly as if it were answering a
query. From the insertion layer downward, it switches to a wider beam search
controlled by the efConstruction parameter, collecting a richer candidate set
at each level. The new node then connects to its nearest neighbours in that
candidate set — but not simply the closest ones. HNSW uses a neighbour selection
heuristic that favours diversity: it prefers neighbours that cover different
directions in the space over neighbours that are all clustered in the same
region. This keeps the graph well-connected and prevents local dead ends. All
edges are bidirectional; if an existing node already has M connections at that
layer, the weakest one gets pruned to make room.
Search works top-down. You enter at the top layer at a fixed entry point and
greedily descend to the closest node, then drop to the next layer and repeat —
each time starting from the node found above. At layer 0 the algorithm switches
from pure greedy to a beam search: it tracks a candidate set of ef nodes,
always expanding the closest unvisited one, until no unvisited candidate is
nearer than the farthest result already collected. The upper layers navigate the
space in coarse strides; layer 0 resolves the answer with precision. It's the
same motion as zooming into a map: continent, then city, then street.
Why Approximate Is Good Enough
HNSW doesn't guarantee that it finds the exact nearest neighbor. It finds an approximate one — typically returning 95–99% recall compared to brute force, with 100–1000× less computation. For semantic search, this is an entirely acceptable trade. If the exact nearest vector would score 0.97 similarity and the one HNSW returns scores 0.96, you've lost nothing meaningful. The goal is relevance, and close enough is indistinguishable from exact.
Three parameters give you control over the trade-offs. M is the number of
bidirectional connections each node maintains per layer — more connections mean
a denser graph, better recall, and higher memory usage. efConstruction
controls how wide the candidate search is during index build; a higher value
produces a better-connected graph at the cost of slower insertions. ef (or
efSearch) is the beam width at query time — it controls how many candidates
the algorithm tracks during the final layer 0 search. Raising ef improves
recall at the cost of latency, without requiring a rebuild. A typical starting
point is M=16, efConstruction=128, and ef=64; pushing to M=32 and
ef=200 gets you into near-exact territory.
Memory is the main cost to plan for. Each node stores its vector plus adjacency
lists for every layer it appears in. For a 1536-dimensional float32 vector with
M=16, that's roughly 6 KB for the vector and around 500 bytes for the graph
structure per node — so a hundred million vectors will need something in the
range of 600 GB just for the index. Most production deployments use quantisation
(compressing each dimension from 32 bits to 8 or fewer) to bring that down to a
manageable size while accepting a small additional hit to recall.
Deletion is the one area where HNSW is awkward. The structure doesn't support removing nodes cleanly — edges woven through multiple layers make surgical deletion expensive. Most implementations handle it with soft deletes: marking vectors as tombstoned and filtering them out of results. Periodic re-indexing cleans up the accumulated tombstones when they start to affect performance.
HNSW is why modern vector databases — Pinecone, Weaviate, pgvector, Qdrant, Olingo — can return results in milliseconds against hundreds of millions of vectors. The math is elegant: a hierarchy of small worlds, each layer a coarser map of the same territory. You enter from the sky, descend to street level, and arrive close enough to your destination that the remaining walk is trivial.

