Understanding Threshold and Preflight APIs
Michele Riva
Algorithms
5
min read
Apr 26, 2023
With the release of Orama v1.0.0-beta.12
, we introduced two new important APIs: Threshold and Preflight.
While these APIs work well independently from each other, they are designed to work together to provide a better experience for our end users.
In this blog post, we will see how they work and how to use them together to return a list of relevant results for every search.
Threshold, the problem
The threshold property is used to set the minimum/maximum number of results to return.
Let’s consider the following example:
As you can see, we’re inserting 4 documents with a lot of common keywords.
What happens if I search for "t-shirt"
?
In that case, every single document will be returned, as they all contain the "t-shirt"
keyword.
Now, what happens if I search for "regular fit"
?
What! Why do I get 4 results? I only have 1 document that contains the "regular fit"
keyword!
Well, Orama will position the document containing the "regular fit"
keyword at the top of the results, but it will also return the other 3 documents, as they also contain the "fit"
keyword.
With very long search queries, this can lead to a lot of results, which depending on your index size, might not be what you want.
Imagine you have a database with 1 million documents, and you want to search for "red t-shirt with long sleeves and a motorbike printed on the front"
. That’s a pretty broad search, right? Maybe it’s the case to limit the results a bit.
Threshold, the solution
The threshold
property solves this problem by limiting (or maximizing) the number of results to return when performing a search operation. It must be a number between 0
and 1
, and it represents the percentage of results to return.
By default, Orama sets the threshold to 1
. This means that all the results will be returned.
This will return all the documents containing either the "slim"
keyword or the "fit"
keyword. In our case, considering the example above, all the documents will be returned.
But what would happen if we set the threshold to 0
?
In this case, only the document containing both "slim"
and "fit"
keywords will be returned. This applies to all the document properties; if a keyword is found in a property, and another keyword is found in a different property, the document will be returned.
You can boost the results depending on where a property is found using the field boosting API.
We can do one last thing when playing around with this property; we can try to set it to a decimal number.
Considering this example:
Orama will return all the documents containing both "slim"
and "fit"
keywords, plus 60% of the documents containing either "slim"
or "fit"
keywords.
Preflight queries
Preflight search is an Orama feature that allows you to run a preliminary search query that will return just the number of results that match your query. This is useful for determining if a search query will return a large number of results, which can be useful for determining if you should run a full search query and facets (if needed).
Running a preflight query is as simple as adding preflight: true
to our query:
The results object will return a standard Orama response, but the hits property will be an empty array.
Orama is extremely fast at searching, but still, it loses a large portion of the elapsed time retrieving documents and assigning them to the final results.hits
array.
By using a preflight request, you will be able to retrieve facets and a total number of results in a very fast manner, and then programmatically decide if you want to run a full search query, how to enrich it, or how to set properties such as the threshold
property.
Preflight and threshold, the perfect match
Preflight requests are particularly useful in certain situations, like when spawned right before a query with a certain threshold.
For example, let’s say you have a large database of 50,000 products. If a user searches for a very rare product, you may end up with just a few results if the threshold is set to 0 (exact match).
By running a preflight search, you will be able to programmatically set a different threshold based on the number of results returned by the preflight search.
Scenarios
I am searching for
"slim fit"
, and the preflight search returns 3 documents. I may want to display more products in my storefront, so I will set the threshold to0.5
, returning the 3 results + 50% of the fuzzy-matched results.I am searching for
"oversize fit"
, and the preflight search returns 10 results. I will then set the threshold to0.2
, returning the 10 results + 20% of the fuzzy-matched results.I am searching for
"blue t-shirt"
, and the preflight search returns 100 results. 100 results are more than enough, so I will set the threshold to0
, returning only the 100 exact-matched results.
Conclusion
In conclusion, the introduction of threshold
and preflight
APIs in Orama v1.0.0-beta.12
significantly enhances the search experience for end users. The threshold
property allows you to fine-tune the number of results returned by controlling the balance between exact and fuzzy matches. preflight
queries, on the other hand, enable you to estimate the number of results a query would return before running a full search, allowing you to make informed decisions about whether to proceed with a full search or adjust the threshold.
By using threshold
and preflight
APIs together, you can create a more dynamic and efficient search experience for your users, ensuring they receive the most relevant results without being overwhelmed by too many matches. Whether you have a small or large dataset, these new APIs help you optimize your search performance and deliver an improved user experience.