OramaCore is now in beta!🎉

Blog

OramaCore is now in beta!🎉

Michele Riva

Product Updates

min read

Feb 3, 2025

If you’ve been following Orama over the past few months, you’re likely familiar with the project's evolution.

Orama began as a fast, simple, and powerful JavaScript full-text and vector search engine designed to run in the browser. This was the first release.

Later, we launched Orama 2.0 using the same JavaScript-based codebase (and a little Rust/WASM), enabling highly scalable search services on Cloudflare and allowing us to offer Orama Cloud as a global service. This version powers major websites like nodejs.org, tanstack.com, and many more.

With Orama 3.0, released in October 2024, we expanded Orama’s capabilities by introducing a fully-fledged RAG pipeline, transforming it into an answer engine that integrates full-text and vector search.

To this day, Orama is used to power a wide range of use cases, from e-commerce search and recommendations to technical product documentation, healthcare guidance, and customer support. You can use Orama on nodejs.org, tanstack.com, solidJS.org, and many more.

What we learned from helping implement all these RAG systems is manifold. Embeddings need to be fast and directly integrated with the search database. The generative model benefits from fine-tuning. Answer and recommendation quality requires multi-shot and panel-of-experts type approaches, which in turn mandate chaining and rapid back-and-forth between context (DB) and LLMs. Orama was becoming slower and more expensive, which really bothered us. As engineers, we know there's always a way to optimize if we approach the problem from a different perspective.

This is exactly what we're aiming for with Orama 4.0 - powered by the newly open-source OramaCore.

OramaCore open beta

You can try it on your servers - or even on your own PC with any consumer-level GPU.

OramaCore is a fresh, easy-to-use, high-performance search and vector database with built-in LLMs. It runs directly on your machine (faster if you have a Nvidia GPU available) eliminating expensive network calls and optimizing every step of the information retrieval process. With OramaCore you can build ultra-fast AI applications using a single system—no more complex architectures combining mismatched components. Everything in OramaCore is built from the ground up to work seamlessly together - from the vector database and embedding generation system to the search engine and LLM for generating AI-driven answers. We rewrote in Rust which allows us to be extremely resource-efficient. As a result, the system is capable of running vector searches through millions of embeddings in under 2ms - including query embedding generation. And you can use any LLM available on HuggingFace.

Our benchmarks - which we’ll be publishing in the coming weeks - clearly show OramaCore’s performance gains. It can perform vector search in approximately 30ms (the 2ms mentioned above, plus HTTP and network latency from the data center to our machine), compared to ~500ms in the current Orama Cloud. The same applies to full-text search, where OramaCore reduces latency from ~100ms in Orama Cloud to ~30ms.

Answer generation has seen an incredible speedup, reducing time to first token (TTFT) from ~5 seconds to just 1 second. While services like Anthropic and OpenAI do an excellent job optimizing high-performance LLM serving, we have a unique advantage: we can run multiple small, fine-tuned models directly on the same machine, completely eliminating network latency.

Since OramaCore is a single system that orchestrates all the various components to work together, it comes as a single Dockerfile. Just follow this guide to getting started with Docker, or this one to build it and run it from source!

Keep following Orama on LinkedIn to get updated with every new release and don’t forget to leave a star on GitHub - it means a lot to us!

Run unlimited full-text, vector, and hybrid search queries at the edge, for free!

Try Orama Cloud