
Vibrant Labs
W24Building the open source standard for evaluating LLM Applications
RL environments for long horizon AI Agents
The fragmented and proprietary evaluation tools today are leading to significant inefficiencies and confusion among developers. The world needs a standard everyone can rely on and that is why we are building Ragas as the open-source standard. We have 4k stars on GitHub, 1.3k members in our discord community, and over 80+ external contributors. We also have partnerships with key AI companies like Langchain, Llamaindex, Arize, Weaviate and more to help create a standard. We already process 5 million evaluations monthly for engineers from companies like AWS, Microsoft, Databricks, and Moody’s and it is growing at 70% month over month. We are building LLM application testing and evaluation infrastructure for Enterprises.
We work on benchmarking and improving the long-horizon capabilities of AI Agents. We build out specialised environments to improve the long-horizon capabilities of browser and computer use agents.
The company shifted from building an open-source standard for evaluating LLM applications (evaluation tools and benchmarks) to creating specialized RL environments for benchmarking and improving long-horizon AI agents—this is a meaningful product pivot within the AI developer tools space.