🔍Vector Similarity Search with FastAPI, DuckDB, and Annoy
Published: July 2023
In mid 2023, I gave a talk to my current team focused on implementing vector similarity search, a foundational technique behind recommendation systems, semantic search engines, and many modern AI-powered applications. The goal was to explore how we can build a practical system to retrieve “similar” items based on their vector representation. The full code from the session is available here.


🧠What Do We Mean by “Similarity”?
Before jumping into code, we need to clarify what “similar” means in a machine’s world. In this context, we treat entities (like movies or users) as vectors, allowing us to compute distances and similarities using mathematical operations.
To do that, we explore vectorization techniques such as:
- Bag of Words
- TF-IDF (Term Frequency – Inverse Document Frequency)
- Word2Vec
- FeatureHasher (from scikit-learn)
These methods convert text or categorical data into numerical representations that can be compared meaningfully.
⚙️Tools We Used
- FastAPI: To expose a REST endpoint for similarity search.
- DuckDB: An in-memory OLAP database to load and query data efficiently.
- Annoy (by Spotify): An Approximate Nearest Neighbor (ANN) library.
- scikit-learn: For vectorization.
🚀Brute-Force vs Approximate Nearest Neighbors
In the demo, we compared two approaches:
- Brute-Force Search (KNN)
- Scans the entire dataset and calculates similarity one-by-one.
- Accurate but inefficient for large datasets.
- Time complexity: O(n)
- Approximate Nearest Neighbors (ANN)
- Annoy uses tree structures to speed up queries.
- Sacrifices a bit of precision for massive performance gains.
- Time complexity: O(log(n))
We built a binary tree-based index using Annoy that enables fast retrieval of similar movie entries from a dataset queried via a FastAPI endpoint.
đź§ŞWhat We Built
We implemented a basic FastAPI endpoint where a client sends:
- A movie entity
- The number of similar items to retrieve
The server returns the top-N most similar movies using precomputed vector embeddings and Annoy’s efficient search algorithm.
đź§©Challenges and Next Steps
- User-based similarity search: This could enable collaborative filtering.
- Handling large-scale indexes: Building and updating indexes can be time-consuming.
- DuckDB limitations: Although powerful, it’s in-memory and requires careful handling when working with dynamic or mutable data.
- External tools: Systems like Pinecone and Redis Vector DB provide out-of-the-box vector search capabilities, should we leverage them?
📚References & Inspiration
- MLOps Community – Vector Search: From Basics to Production
- Annoy Explained
- FeatureHasher in scikit-learn
- Neptune.ai – Vectorization Techniques
This internal session turned out to be a great primer for anyone exploring semantic search or building recommender systems from scratch. Check out the GitHub repo for the live code and try running it locally!
