🔍Vector Similarity Search with FastAPI, DuckDB, and Annoy

Published: July 2023

In mid 2023, I gave a talk to my current team focused on implementing vector similarity search, a foundational technique behind recommendation systems, semantic search engines, and many modern AI-powered applications. The goal was to explore how we can build a practical system to retrieve “similar” items based on their vector representation. The full code from the session is available here.

🧠What Do We Mean by “Similarity”?

Before jumping into code, we need to clarify what “similar” means in a machine’s world. In this context, we treat entities (like movies or users) as vectors, allowing us to compute distances and similarities using mathematical operations.

To do that, we explore vectorization techniques such as:

Bag of Words
TF-IDF (Term Frequency – Inverse Document Frequency)
Word2Vec
FeatureHasher (from scikit-learn)

These methods convert text or categorical data into numerical representations that can be compared meaningfully.

⚙️Tools We Used

FastAPI: To expose a REST endpoint for similarity search.
DuckDB: An in-memory OLAP database to load and query data efficiently.
Annoy (by Spotify): An Approximate Nearest Neighbor (ANN) library.
scikit-learn: For vectorization.

🚀Brute-Force vs Approximate Nearest Neighbors

In the demo, we compared two approaches:

Brute-Force Search (KNN)
- Scans the entire dataset and calculates similarity one-by-one.
- Accurate but inefficient for large datasets.
- Time complexity: O(n)
Approximate Nearest Neighbors (ANN)
- Annoy uses tree structures to speed up queries.
- Sacrifices a bit of precision for massive performance gains.
- Time complexity: O(log(n))

We built a binary tree-based index using Annoy that enables fast retrieval of similar movie entries from a dataset queried via a FastAPI endpoint.

🧪What We Built

We implemented a basic FastAPI endpoint where a client sends:

A movie entity
The number of similar items to retrieve

The server returns the top-N most similar movies using precomputed vector embeddings and Annoy’s efficient search algorithm.

🧩Challenges and Next Steps

User-based similarity search: This could enable collaborative filtering.
Handling large-scale indexes: Building and updating indexes can be time-consuming.
DuckDB limitations: Although powerful, it’s in-memory and requires careful handling when working with dynamic or mutable data.
External tools: Systems like Pinecone and Redis Vector DB provide out-of-the-box vector search capabilities, should we leverage them?

📚References & Inspiration

This internal session turned out to be a great primer for anyone exploring semantic search or building recommender systems from scratch. Check out the GitHub repo for the live code and try running it locally!