Relari

Software Development

San Francisco, California 522 followers

Data-driven toolkit to evaluate and improve LLM applications.

See jobs Follow

Discover all 3 employees

About us

Relari is the only data-driven toolkit to evaluate and improve LLM applications. AI developers leverage Relari to take the guesswork out of LLM development and ship reliable AI products faster. We are backed by top investors from Y Combinator, Soma Capital and General Catalyst. We are hiring! Please reach out!

Website: https://www.relari.ai
External link for Relari
Industry: Software Development
Company size: 2-10 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2023
Specialties: LLM, AI, GenerativeAI, LLMOps, Infrastructure, EnterpriseAI, EvaluationPipeline, Experimentation, and Reliability

Locations

Primary

San Francisco, California 94107, US

Get directions
Cambridge, Massachusetts 02142, US

Get directions

Employees at Relari

Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents

See all employees

Updates

Relari reposted this
Pasquale Antonante

PhD @ MIT | Co-Founder @ Relari AI (YC W24)
13h
Report this post
𝗣𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘀𝘁𝗶𝗰 𝗠𝗲𝘁𝗿𝗶𝗰𝘀 𝗮𝗿𝗲 𝗡𝗼𝘄 𝗢𝗽𝗲𝗻 𝗦𝗼𝘂𝗿𝗰𝗲! 🎉 We’ve just open-sourced Probabilistic Metrics, a feature previously exclusive to our cloud platform. 🔑 𝗪𝗵𝗮𝘁’𝘀 𝗶𝗻 𝗶𝘁 𝗳𝗼𝗿 𝘆𝗼𝘂? • Access the full probability distribution across classes—not just the top prediction. • Enhance your evaluation and decision-making with richer insights. 🛠️ 𝗛𝗼𝘄 𝘁𝗼 𝗚𝗲𝘁 𝗦𝘁𝗮𝗿𝘁𝗲𝗱: Install continuous-eval today and unlock new possibilities for your projects! Let’s innovate together. Dive into the code, experiment, and share your breakthroughs. 🌟 👉 https://lnkd.in/eCAVHThf Discord: https://lnkd.in/evuwsAqd #aiops #relari #oss #opensource #llm #llmevaluation #continuouseval
2 Comments

Like Comment Share
Relari reposted this
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
2w
Report this post
🔍 Traditional LLM metrics are broken for AI agents. Why? Because Agents don't just generate text—they make meaningful decisions and take actions that change the environment around them. Think about it: When a AI customer support agent says "Your refund is processed!" how do you know if it actually: 1️⃣ Retrieved the correct order 2️⃣ Processed the refund properly 3️⃣ Collected customer feedback Drawing from Pasquale's PhD research on AI system reliability and our experience helping companies deploying agents in the real-world, we're introducing 𝗔𝗴𝗲𝗻𝘁 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁𝘀: a new framework to evaluate and verify AI agents in production. Similar to legal contracts, Agent Contracts outline obligations and boundaries for complex agentic applications. They let you define and verify at: ▪️ Module-Level: verify the input, action, and output of each agentic operation ▪️ Trace-Level: ensure agents performed all necessary steps in dynamic, multi-step sequences The key insight? We need to measure not just WHAT agents say, but HOW they operate. 🤖 Curious how it works? We break down the framework with real examples in our latest blog post. ➡️ Check it out here: https://lnkd.in/eqwEqRca We're launching a beta for select Agent use cases. If you're building agents with LangGraph (LangChain), CrewAI, AG2, or Swarms, ping us👇 to get early access! #AIAgents #MLOps #EnterpriseAI #LLM #AgentEvaluation
1 Comment

Like Comment Share
Relari reposted this
Pasquale Antonante

PhD @ MIT | Co-Founder @ Relari AI (YC W24)
3w
Report this post
Traditional LLM Metrics don’t work for agent developers and here’s why… Unlike simple LLM applications, agents work in dynamic environments, and take actions that modify the environment. Traditional metrics (like LLM-as-a-Judge) fall short because they only measure what the agent outputs (think about answer correctness), not what it does. For this reason agentic AI applications need a new evaluation paradigm. Picture a customer support agent handling a refund request. The correct response might be: “𝘠𝘦𝘴, 𝘵𝘩𝘦 𝘳𝘦𝘧𝘶𝘯𝘥 𝘩𝘢𝘴 𝘣𝘦𝘦𝘯 𝘱𝘳𝘰𝘤𝘦𝘴𝘴𝘦𝘥!” But does this guarantee the refund was processed correctly? It could just be a hallucination. These metrics don’t capture the full picture. For many agentic applications, you can’t even define a single “correct” output: think the case of web search, where the content might change all the time. 🚀𝗔𝗴𝗲𝗻𝘁 𝗖𝗼𝗻𝘁𝗿𝗮𝗰𝘁𝘀, 𝗮 𝗻𝗲𝘄 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗽𝗮𝗿𝗮𝗱𝗶𝗴𝗺: Inspired by formal methods, we’re introducing a new framework to measure and verify agentic systems: Agent Contracts. Agent Contracts allow you to define: • Module-Level Contracts: Specify the expected input-output relationship, preconditions, and postconditions of individual agent actions. • Trace-Level Contracts: Capture the expected sequence of actions—the agent’s journey from start to finish. Contracts are scenario specific, they are relevant only when some conditions are met, in this case the user asking for a refund. 🤖 𝗔 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗘𝘅𝗮𝗺𝗽𝗹𝗲 𝗶𝗻 𝗔𝗰𝘁𝗶𝗼𝗻: To make everything more concrete, let’s say we are developing a customer support agent. Suppose the user asks for a refund. Agent Contracts would define: Module-Level Contract: • Precondition: user asks for a refund • Postconditions: the agent triggers the refund process (e.g., database update). Trace-Level Contract: • The agent calls the GetOrder tool to retrieve the order details, then • The agent uses the ProcessRefund tool with the correct order information, then • The agent collects feedback from the customer post-refund. Think of it like this: Module contracts set the entry and exit rules for a room, while Trace contracts describe the full journey through the building. 💡𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿𝘀: By defining contracts specific to scenarios, we can ensure agent reliability, traceability, and correctness—even in complex, multi-step interactions. This idea builds on what I studied during my PhD on agent reliability, tackling the challenge of evaluating AI systems in real-world settings. 🔥𝗝𝗼𝗶𝗻 𝗨𝘀 𝗶𝗻 𝘁𝗵𝗲 𝗕𝗲𝘁𝗮: We’re building a library to enable Contract-based evaluation and observability for AI agents. If you’re working on dynamic agents and want to explore a new standard for measuring and verifying their performance, we’d love to hear from you. #AIagents #formalmethods #LLMEvaluation #AIInnovation #AgentReliability #AIResearch #LLMOps

7 Comments

Like Comment Share
Relari reposted this
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
1mo
Report this post
🤖 Building AI agents can be surprisingly simple—or frustratingly complex—depending on the framework choice. So, what's the best framework for you? To help you decide, our team at Relari built the same Multi-Agent Finance Assistant using 3 popular frameworks and compared them head-to-head: ▪️ LangGraph by LangChain: code-heavy with explicit graph-based orchestration ▪️ CrewAI: task-focused orchestration with a structured prompting ▪️ OpenAI Swarm: Minimalist, lightweight "Anti-Framework" design Which one’s best for you? Find out in our decision tree👇 ➡️ Deep-dive blog: https://lnkd.in/djDmPc-B #AIAgent #LLM #LangGraph #CrewAI #OpenAISwarm
23 Comments

Like Comment Share
Relari reposted this
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
2mo Edited
Report this post
Always fun to be on a podcast - even better to be hosted by one of our customers! Sam, Founder & CEO of Thoropass, and I discussed the nuances of putting RAG into production. We explored how to ensure accuracy and reliability when integrating enterprise data with LLMs. Big thanks to Sam for having me on the episode, and to Kevin for driving our AI partnership forward!!

Sam Li

Founder/CEO @ Thoropass (fka Laika) | Backed by J.P. Morgan | We fixed IT audits
2mo

🔍 How often do you RAG? No, it’s not something you’d use to wipe up a spill—it stands for Retrieval-Augmented Generation! Think of it as the AI's version of open-book exams: instead of generating answers from thin air, it “cheats” by pulling in relevant data from external sources. Even LLMs need some help sometimes... I'm excited to share the first episode of our limited series, GenAI Done Right! In this series, I sit down with fellow founders and execs to dive deep into the impact of Generative AI on business use cases, while tackling how to implement GenAI responsibly. In our debut episode, I had the pleasure of chatting with my friend, Yi Zhang, co-founder and CEO of Relari. Relari is revolutionizing how we ensure LLM-powered applications are safe and reliable, offering a data-driven toolkit to evaluate and improve them. Yi and I break down the ins and outs of Retrieval-Augmented Generation (RAG)—how it works, its pros and cons, and the critical criteria to consider when assessing GenAI products. I'll be posting more interviews with incredible AI entrepreneurs, so stay tuned for more insights on how to do GenAI right! 🎥 Catch the full episode here: https://lnkd.in/ecxSEfxC

1 Comment

Like Comment Share
Relari

522 followers
2mo
Report this post
👀 The "why" behind Relari
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
2mo

I recently had a fun conversation with Darius from @TesoroAI where we reflected on some of the pivotal moments in my journey starting Relari with Pasquale. Some topics we covered: - My career trajectory before Relari - What motivated us to start a company in the AI infra space - What has been challenging about AI adoption despite all the buzz - Future of AI and Multimodal models If you're curious about the "why" behind Relari or want a peek into our journey, I’d love for you to check it out and share your thoughts! Youtube: https://lnkd.in/ekK2XAw5 Apple Podcast: https://lnkd.in/eubYnXCP Spotify: https://lnkd.in/e3EfbBxG
Like Comment Share
Relari reposted this
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
3mo
Report this post
Thank you Qdrant for being a great partner! David and Thierry did a great job walking through how to use Relari for end-to-end RAG evaluation and optimization. This case study also includes a sneak peek to our latest feature: Auto Prompt Optimization (APO). Many companies using APO have seen significant improvement in prompt performance. Ping me if you'd like to give it a try! #RAG #PromptEngineering #LLM #EnterpriseAI
Qdrant

30,079 followers
3mo

How do you measure the performance of a RAG app? We’ve been working with Relari to make that easier. In our latest blog, Thierry Damiba, David Myriel, and Yi Zhang show practical steps for: ✅ Adjusting 𝐓𝐨𝐩-𝐊 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 to return the right number of results ✅ Using 𝐀𝐮𝐭𝐨 𝐏𝐫𝐨𝐦𝐩𝐭 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 to refine chatbot responses ✅ Evaluating it all using the GitLab legal policies dataset as an example With this workflow, you can run fast, iterative tests, experiment with hybrid search, and access detailed metrics like precision, recall, and rank-based evaluations. Check out the full blog for code and implementation 👉 https://lnkd.in/dsw3BQrx
1 Comment

Like Comment Share
Relari reposted this
Qdrant

30,079 followers
3mo
Report this post
How do you measure the performance of a RAG app? We’ve been working with Relari to make that easier. In our latest blog, Thierry Damiba, David Myriel, and Yi Zhang show practical steps for: ✅ Adjusting 𝐓𝐨𝐩-𝐊 𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 to return the right number of results ✅ Using 𝐀𝐮𝐭𝐨 𝐏𝐫𝐨𝐦𝐩𝐭 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 to refine chatbot responses ✅ Evaluating it all using the GitLab legal policies dataset as an example With this workflow, you can run fast, iterative tests, experiment with hybrid search, and access detailed metrics like precision, recall, and rank-based evaluations. Check out the full blog for code and implementation 👉 https://lnkd.in/dsw3BQrx
Like Comment Share
Relari reposted this
Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
4mo
Report this post
Building an LLM application? It's time to rethink how you approach evaluation. - "We’re too early for evaluation.” - "We don’t have enough resources to do this now." - "😅, I just eyeball it.” Too often I hear these comments in my conversation with LLM developers. But these are missed opportunities. Evaluation isn't just about avoiding mistakes—it’s a powerful tool to differentiate your product from the noise. We see it over and over again with our customers: AI teams/startups that embrace evaluation from the start don’t just catch errors—they learn, iterate, and build better products, faster. It’s not a luxury; it’s a necessity. “You can’t improve what you can’t measure.” A strong evaluation framework is more than a safety net; it’s your secret weapon to winning in a crowded market. So, what’s holding you back from running evaluation? Curious about how strong evaluation can elevate your product? Let’s discuss! #LLMEvaluation #EnterpriseAI #RAG #LLMAgents

4 Comments

Like Comment Share
Relari

522 followers
5mo
Report this post
Discover how Vanta's AI team launched multiple successful LLM products, leveraging Relari's data-driven evaluation and improvement toolkit!

Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents
5mo

Few things make me happier than seeing our customers flourish and bring powerful AI applications to life. Last week, Vanta raised $150 million at a $2.45 billion valuation, further solidifying its lead in the security compliance space. Over the last several months, we've been working closely with Vanta's AI team. We’ve witnessed firsthand how effectively they leveraged a data-driven approach to productionize several high-impact LLM products. The result? Overwhelming positive feedback from Vanta's customers! To date, Relari has proudly supported Vanta with over 50,000 evaluation runs on various synthetic datasets, ensuring robust performance for each LLM product. A key recurring trend we've observed: contrary to popular belief, extensive testing accelerates iteration speed rather than slowing it down. Teams that implement data-driven tests from the prototyping phase reach production faster. They achieve this by making quick, confident decisions backed by rapid, targeted experiments on key components such as RAG parameters, prompts and LLM architectural choices. Congratulations to Vanta on this milestone! We're excited to see the innovative Generative AI products they'll launch next. Check out more details in this case study on our approach and results! What trends have you noticed among teams building successful LLM products? What strategies seem to resonate most with customers? Share your insights in the comments! #LLM #RAG #GenerativeAI #LLMEvaluation #DataDrivenDevelopment

How Vanta Leverages Relari.ai for Enhanced LLM-Powered Compliance Product

relari.ai

Like Comment Share

Browse jobs

Funding

Relari 1 total round

Last Round

Pre seed May 3, 2024

US$ 500.0K

Investors

Y Combinator

See more info on crunchbase

Relari

Software Development

San Francisco, California 522 followers

Data-driven toolkit to evaluate and improve LLM applications.

About us

Locations

Employees at Relari

Yi Zhang

Co-Founder @ Relari.ai | Ship Reliable AI Agents

Updates

How Vanta Leverages Relari.ai for Enhanced LLM-Powered Compliance Product

relari.ai

Join now to see what you are missing

Similar pages

Upsolve AI (YC W24)

Assembly

OpenFoundry (YC W24)

Stacksync (YC W24)

SciPhi

Pretzel AI (YC W24)

Risotto (YC W24)

Ellipsis (YC W24)

Forge (YC W24)

Celest (YC W24)

Browse jobs

Group Manager jobs

Head jobs

Project Manager jobs

Funding