Where Traditional Queries Fail Picture this: your dog just birthed a cute litter of say 5 puppies.
BobaMilk
Where Traditional Queries Fail Picture this: your dog just birthed a cute litter of say 5 puppies.
BobaMilk
“Similar meaning” comes from the embedding model, not the database. The model turns your text into a long numeric vector, and related concepts end up close together because it learned those patterns from tons of examples.
“Similar meaning” comes from the embedding model, not the database. The model turns your text into a long numeric vector, and related concepts end up close together because it learned those patterns from tons of examples. The “close together” part is usually just a distance metric choice, like cosine similarity or dot product.
Swapping the embedding model can totally flip what “similar” means even when your vector DB config is unchanged.
The database is basically just doing fast nearest‑neighbor lookup (cosine/dot/etc); the meaning is whatever the model baked into those vectors during training.
Look — the “similarity metric” matters too: cosine vs dot vs L2 can reshuffle neighbors unless you normalize embeddings consistently. I’ve seen people swap models and forget the new one’s scale/normalization behavior, then blame the vector DB when results get weird.
Yeah, I’ve tripped over this exact thing — cosine and dot feel “the same” until you remember dot is basically “cosine times length, ” so a few high‑norm vectors can start winning for no semantic reason. Normalizing on write (or using a DB/index that does it for you) saved me a lot of “why is this neighbor here” debugging.
I ran into this too — once we normalized, the “weird neighbor” results mostly disappeared, and the remaining odd ones were usually because the chunk was too long and mixed topics. splitting text into smaller, more consistent chunks helped more than i expected.
:: Copyright KIRUPA 2024 //--