turbopuffer benchmark demo

Late interaction keeps token-level detail, but it is not always the best first stage.

This page shows the Amazon-C4 cases where late interaction ranked the labeled item above dense retrieval. It also runs live BM25 searches against the turbopuffer namespace from the larger benchmark.

20,463

unique items indexed

500

queries evaluated

9

late interaction wins over dense

live turbopuffer query

Search the indexed Amazon-C4 reviews

This search uses BM25 because the Cloudflare edge runtime should not run BGE or ColBERT models. The late-interaction examples below come from the measured benchmark.

benchmark evidence

Cases where late interaction helped

These are fixed benchmark results from the full-index, 500-query run. They show where token-level matching ranked the labeled item higher than dense retrieval.