Understanding Threshold and Preflight APIs

Understanding Threshold and Preflight APIs

Michele Riva

Algorithms

5

min read

Apr 26, 2023

With the release of Orama v1.0.0-beta.12, we introduced two new important APIs: Threshold and Preflight.

While these APIs work well independently from each other, they are designed to work together to provide a better experience for our end users.

In this blog post, we will see how they work and how to use them together to return a list of relevant results for every search.

Threshold, the problem

The threshold property is used to set the minimum/maximum number of results to return.

Let’s consider the following example:

import { create, insert, search } from '@orama/orama'
 
const db = await create({
  schema: {
    title: 'string',
  }
})
 
await insert(db, { term: 'Blue t-shirt, slim fit' })
await insert(db, { term: 'Blue t-shirt, regular fit' })
await insert(db, { term: 'Red t-shirt, slim fit' })
await insert(db, { term: 'Red t-shirt, oversize fit' })

As you can see, we’re inserting 4 documents with a lot of common keywords.

What happens if I search for "t-shirt"?

const results = await search(db, {
  term: 't-shirt',
})
 
// results.count = 4

In that case, every single document will be returned, as they all contain the "t-shirt" keyword.

Now, what happens if I search for "regular fit"?

const results = await search(db, {
  term: 'regular fit',
})
 
// results.count = 4

What! Why do I get 4 results? I only have 1 document that contains the "regular fit" keyword!

Well, Orama will position the document containing the "regular fit" keyword at the top of the results, but it will also return the other 3 documents, as they also contain the "fit" keyword.

With very long search queries, this can lead to a lot of results, which depending on your index size, might not be what you want.

Imagine you have a database with 1 million documents, and you want to search for "red t-shirt with long sleeves and a motorbike printed on the front". That’s a pretty broad search, right? Maybe it’s the case to limit the results a bit.

Threshold, the solution

The threshold property solves this problem by limiting (or maximizing) the number of results to return when performing a search operation. It must be a number between 0 and 1, and it represents the percentage of results to return.

By default, Orama sets the threshold to 1. This means that all the results will be returned.

const results = await search(db, {
  term: 'slim fit',
  threshold: 1 // default value
})

This will return all the documents containing either the "slim" keyword or the "fit" keyword. In our case, considering the example above, all the documents will be returned.

But what would happen if we set the threshold to 0?

const results = await search(db, {
  term: 'slim fit',
  threshold: 0,
})

In this case, only the document containing both "slim" and "fit" keywords will be returned. This applies to all the document properties; if a keyword is found in a property, and another keyword is found in a different property, the document will be returned.

You can boost the results depending on where a property is found using the field boosting API.

We can do one last thing when playing around with this property; we can try to set it to a decimal number.

Considering this example:

const results = await search(db, {
  term: 'slim fit',
  threshold: 0.6,
})

Orama will return all the documents containing both "slim" and "fit" keywords, plus 60% of the documents containing either "slim" or "fit" keywords.

Preflight queries

Preflight search is an Orama feature that allows you to run a preliminary search query that will return just the number of results that match your query. This is useful for determining if a search query will return a large number of results, which can be useful for determining if you should run a full search query and facets (if needed).

Running a preflight query is as simple as adding preflight: true to our query:

const results = await search(db, {
  query: 'slim fit',
  preflight: true
})

The results object will return a standard Orama response, but the hits property will be an empty array.

Orama is extremely fast at searching, but still, it loses a large portion of the elapsed time retrieving documents and assigning them to the final results.hits array.

By using a preflight request, you will be able to retrieve facets and a total number of results in a very fast manner, and then programmatically decide if you want to run a full search query, how to enrich it, or how to set properties such as the threshold property.

Preflight and threshold, the perfect match

Preflight requests are particularly useful in certain situations, like when spawned right before a query with a certain threshold.

For example, let’s say you have a large database of 50,000 products. If a user searches for a very rare product, you may end up with just a few results if the threshold is set to 0 (exact match).

By running a preflight search, you will be able to programmatically set a different threshold based on the number of results returned by the preflight search.

Scenarios

  • I am searching for "slim fit", and the preflight search returns 3 documents. I may want to display more products in my storefront, so I will set the threshold to 0.5, returning the 3 results + 50% of the fuzzy-matched results.

  • I am searching for "oversize fit", and the preflight search returns 10 results. I will then set the threshold to 0.2, returning the 10 results + 20% of the fuzzy-matched results.

  • I am searching for "blue t-shirt", and the preflight search returns 100 results. 100 results are more than enough, so I will set the threshold to 0, returning only the 100 exact-matched results.

Conclusion

In conclusion, the introduction of threshold and preflight APIs in Orama v1.0.0-beta.12 significantly enhances the search experience for end users. The threshold property allows you to fine-tune the number of results returned by controlling the balance between exact and fuzzy matches. preflight queries, on the other hand, enable you to estimate the number of results a query would return before running a full search, allowing you to make informed decisions about whether to proceed with a full search or adjust the threshold.

By using threshold and preflight APIs together, you can create a more dynamic and efficient search experience for your users, ensuring they receive the most relevant results without being overwhelmed by too many matches. Whether you have a small or large dataset, these new APIs help you optimize your search performance and deliver an improved user experience.

Run unlimited full-text, vector, and hybrid search queries at the edge, for free!