Select Page

🔍Vector Similarity Search with FastAPI, DuckDB, and Annoy

Published: July 2023

In mid 2023, I gave a talk to my current team focused on implementing vector similarity search, a foundational technique behind recommendation systems, semantic search engines, and many modern AI-powered applications. The goal was to explore how we can build a practical system to retrieve “similar” items based on their vector representation. The full code from the session is available here.

🧠What Do We Mean by “Similarity”?

Before jumping into code, we need to clarify what “similar” means in a machine’s world. In this context, we treat entities (like movies or users) as vectors, allowing us to compute distances and similarities using mathematical operations.

To do that, we explore vectorization techniques such as:

  • Bag of Words
  • TF-IDF (Term Frequency – Inverse Document Frequency)
  • Word2Vec
  • FeatureHasher (from scikit-learn)

These methods convert text or categorical data into numerical representations that can be compared meaningfully.

⚙️Tools We Used

  • FastAPI: To expose a REST endpoint for similarity search.
  • DuckDB: An in-memory OLAP database to load and query data efficiently.
  • Annoy (by Spotify): An Approximate Nearest Neighbor (ANN) library.
  • scikit-learn: For vectorization.

🚀Brute-Force vs Approximate Nearest Neighbors

In the demo, we compared two approaches:

  1. Brute-Force Search (KNN)
    • Scans the entire dataset and calculates similarity one-by-one.
    • Accurate but inefficient for large datasets.
    • Time complexity: O(n)
  2. Approximate Nearest Neighbors (ANN)
    • Annoy uses tree structures to speed up queries.
    • Sacrifices a bit of precision for massive performance gains.
    • Time complexity: O(log(n))

We built a binary tree-based index using Annoy that enables fast retrieval of similar movie entries from a dataset queried via a FastAPI endpoint.

đź§ŞWhat We Built

We implemented a basic FastAPI endpoint where a client sends:

  • A movie entity
  • The number of similar items to retrieve

The server returns the top-N most similar movies using precomputed vector embeddings and Annoy’s efficient search algorithm.

đź§©Challenges and Next Steps

  • User-based similarity search: This could enable collaborative filtering.
  • Handling large-scale indexes: Building and updating indexes can be time-consuming.
  • DuckDB limitations: Although powerful, it’s in-memory and requires careful handling when working with dynamic or mutable data.
  • External tools: Systems like Pinecone and Redis Vector DB provide out-of-the-box vector search capabilities, should we leverage them?

📚References & Inspiration

This internal session turned out to be a great primer for anyone exploring semantic search or building recommender systems from scratch. Check out the GitHub repo for the live code and try running it locally!