Announcing the Orama Secure AI Proxy
Michele Riva
Product Updates
5
min read
Jan 15, 2024
Orama 2.0 offers vector and hybrid search in both its open-source and cloud versions. To operate these search mechanisms, you need to generate text embeddings, usually with an ML model like OpenAI's text-embedding-ada-002 or S-BERT.
Running these models on a large scale can be costly. Making an API call to OpenAI (or any other hosted service) to generate embeddings from the frontend may lead to significant monthly bills if your API key is compromised or your service is exploited by malicious users.
But Orama is a full-text search engine and a vector database that operates in your browser, so we'd like to be able to call OpenAI directly from the frontend to acquire text embeddings. Once this is complete, you may want to use the OpenAI APIs again to generate a summary or a text response from a prompt.
This is why we created and released the Orama Secure AI Proxy.
How the Orama Secure AI Proxy helps you secure your API keys and prompts
The Orama AI Secure Proxy is a proxy that lives on the same edge network as the Orama Global Search Network, making it available in more than 300 points of presence and 100 countries worldwide, reducing latency time and allowing massive and unpredictable spikes of traffic.
This system comprises a series of components designed to ensure that your OpenAI (and soon, other services') API keys are never exposed to the browser.
It includes:
A web application firewall for analyzing and mitigating malicious traffic.
A system-level rate limiter, ensuring that your API key usage doesn't exceed a certain quota, regardless of the number of active users.
A user-level rate limiter, preventing a single user from making more than a specified number of requests per minute.
The proxy itself, a nanoservice located near your users, which makes the actual requests to OpenAI or other services on your behalf.
Orama AI Secure Proxy is a solution for those interested in using OpenAI’s GPT models for generating summaries and chat experiences on your frontend. The proxy offers end-to-end encryption for your prompts. This will protect your intellectual property and ensure your prompt engineering skills remain confidential. It's particularly useful when communicating over insecure networks or when a third party is inspecting network traffic, such as when someone is observing the network console in Google Chrome or any other browser.
The Orama Secure AI Proxy Performance
A common concern when using proxies is their potential impact on application performance.
At Orama, we specialize in edge-application development. This allows us to build high-performance, low-latency applications distributed via global CDNs. In other words, we prioritize performance and security when developing software.
Our proxy operates within a few milliseconds, as the databases, firewall, and workers are all located in the same distributed point of presence.
In our tests, we have not observed any delays of more than 20ms caused by the proxy.
Moreover, upcoming versions of the proxy will feature vectors and prompt caches. This enhancement will help you save money and reduce latency with each API call.
Supported models
As for today, the Orama AI Secure proxy supports two main models for embedding generation:
openai/text-embedding-ada-002
: the popular model released by OpenAI, which generates a vector of 1,536 dimensions and accepts up to 8,191 tokens.orama/gte-small
: our fork of Alibaba DAMO Academy’s GTE Small. It's based on BERT and generates a 384-dimensional vector. It accepts up to 512 tokens.
We recommend using orama/gte-small
as it is optimized for high-performance embedding generation. Moreover, having fewer dimensions enables faster and higher-quality text retrieval.
Currently, we support the following models for prompts and chat experiences:
openai/gpt-4
openai/gpt-4-1106-preview
openai/gpt-3.5-turbo
openai/gpt-3.5-turbo-16k
We also plan to support additional models and will announce these soon.
Orama Secure AI Proxy Pricing
We are currently offering the Orama Secure AI Proxy free of charge to all Orama Cloud users. To enable it, visit https://cloud.oramasearch.com/secure-proxy.
We will soon announce pricing plans, which will remove the few restrictions currently placed on the free plan of the Orama Secure AI Proxy. These restrictions include:
Maximum number of authorized domains: 3
Maximum number of requests per minute (user-level): 45
Maximum number of requests per minute (system-level): 500
Our future Pro and Enterprise plans will extend these limits.
How to use the Secure AI Proxy
When used with Orama open-source, the Secure AI Proxy can be adopted using the official Orama plugin, which is available on npm:
Once the plugin is installed, you can simply add it to the Orama plugin list during the database initialization:
After initializing the database, there's no need to explicitly include a vector
property in your search query. The secure proxy will automatically handle this for you:
Since the default property for vector search was specified during the Secure AI Proxy plugin initialization, Orama understands that it needs to perform vector similarity search on the embeddings
property. However, you can override the default property by including the vector.property
property in the search query.
Standalone APIs
We’ve also integrated the Secure AI Proxy APIs inside the @oramacloud/client
npm package, making it easy for everyone to adopt it.
It is composed of two distinct APIs:
Embedding Generation
Chat Completion APIs
In summary
Orama 2.0 introduces the Orama Secure AI Proxy, a solution designed to secure API keys and prompts when using OpenAI's GPT models. The proxy, available in over 300 points of presence worldwide, includes a web application firewall, system-level and user-level rate limiters, and end-to-end encryption for prompts. It supports two main models for embedding generation and several models for prompts and chat experiences. The Orama Secure AI Proxy is currently free for all Orama Cloud users, with future pricing plans to be announced.