turbopuffer benchmark demo
Late interaction keeps token-level detail, but it is not always the best first stage.
This page shows the Amazon-C4 cases where late interaction ranked the labeled item above dense retrieval. It also runs live BM25 searches against the turbopuffer namespace from the larger benchmark.
unique items indexed
queries evaluated
late interaction wins over dense
live turbopuffer query
Search the indexed Amazon-C4 reviews
This search uses BM25 because the Cloudflare edge runtime should not run BGE or ColBERT models. The late-interaction examples below come from the measured benchmark.
benchmark evidence
Cases where late interaction helped
These are fixed benchmark results from the full-index, 500-query run. They show where token-level matching ranked the labeled item higher than dense retrieval.