Ragas

Software Development

Building the opensource framework for testing and evaluating AI Applications

Discover all 3 employees

About us

Website: https://github.com/explodinggradients
External link for Ragas
Industry: Software Development
Company size: 2-10 employees
Type: Privately Held

Employees at Ragas

Ruchita Anand

Founder- Once Upon A Time | Founder - RAGAS | Branch Head | Asst Head Of Dept | Teacher | Mentor | IN HOUSE counsel| Associate Counsel
Jithin James

Maintainer - Ragas
Shahul ES

Building Ragas (YC W24) |Kaggle GM | Opensource @ OpenAssistant AI

See all employees

Updates

Ragas reposted this
Rachitt Shah

Applied AI Consultant | Past: Sequoia, Founder, Quant, SRE, Google OSS
1w
Report this post
Was reading Shahul's excellent blog on aligning LLM judges with Human Experts, and came across Shreya's paper on EvalGen. TL;DR: Problem: LLMs are increasingly used to evaluate other LLM outputs, but these LLM-based evaluators can be unreliable and require validation. Existing tools lack sufficient support for verifying the quality of LLM-generated evaluations. Users struggle to define evaluation metrics for custom tasks. Proposed Solution: EvalGen: A mixed-initiative interface that assists users in creating and validating LLM-based evaluations. Workflow: LLM suggests evaluation criteria based on the prompt under test. LLM generates candidate assertions (code or LLM prompts) for each criterion. Users grade a subset of LLM outputs, providing feedback. EvalGen selects assertions that best align with user grades. A report card shows the alignment between chosen assertions and user grades. Key Features: Criteria Generation: LLM-powered suggestions for evaluation criteria. Assertion Synthesis: LLM-generated candidate implementations (code or LLM prompts). Active Learning: User grades guide the selection of aligned assertions. Alignment Measurement: Report card showing the alignment of assertions with user preferences. Mixed-Initiative: Combines automated assistance with user control. Evaluation: Offline Evaluation: Compared EvalGen's algorithm with SPADE (a fully automated assertion generation tool). EvalGen achieved better alignment with fewer assertions due to human input in the criteria selection stage. Qualitative User Study: Nine industry practitioners used EvalGen to build evaluators for LLM pipelines. As businesses scale AI first workflows, evals grow even more critical. Combining humans and LLMs seems like an excellent fit.
2 Comments

Like Comment Share
Ragas

1,361 followers
1w
Report this post
We are launching something new this week! 🚀 Fix your broken AI product evals by aligning LLM-based evaluators with human evaluators. 👉🏽 In this blog, we cover 1. Why your AI product evals are broken 2. How to fix them with ragas. 3. Explanation of the mechanism behind our solution. Checkout the blog: https://lnkd.in/g_VY7AB8
Like Comment Share
Ragas

1,361 followers
1w Edited
Report this post
Now align your LLM-based evaluators with human evaluators.🚀 LLM as judge metrics often fail to give desired results due to misalignment with human evaluators. We have taken the first step in solving this problem by introducing a new workflow for evaluation 1️⃣ Evaluate using LLM-based metrics 2️⃣ Review results and give feedback 3️⃣ Automatically train and align your evaluators with the collected data This creates a data flywheel where your evaluators improve continuously as you perform more evaluations and reviews. A detailed blog covering our experiments and optimization algorithm will be coming soon. 👉🏽 Get started now: https://lnkd.in/gKujRja5 👉🏽 Watch video: https://lnkd.in/gjWYadBb ⭐️ Star us on Github: https://lnkd.in/drY7MQHW
Like Comment Share
Ragas

1,361 followers
1w
Report this post
Weekly release update 🎉 v0.2.7 Release Highlights! • added support for multi-language in test set generation: https://lnkd.in/gk-XSBKK • bug fixes for test data generation • basic blocks for something new we are working on 🤫 Thanks to our contributors • @bmerkle: Fixed critical documentation issues in RAG testset generation • @ayulockin: Resolved a key system bug improving stability Thank you to all contributors for making this release possible!🙌

Like Comment Share
Ragas

1,361 followers
2w
Report this post
Full House for Evaluation driven development with Ragas workshop on @awscloud re: Invent conference ❤️ In this workshop, we covered: 1) Basic of Ragas 2) Evaluating RAG workflows with ragas 3) Evaluating Agentic workflows with ragas Special thanks to Aris Tsakpinis for making this happen.
1 Comment

Like Comment Share
Ragas

1,361 followers
1mo
Report this post
We are hiring for engineering roles remotely. We are building a small, slick team of passionate individuals to build a supercharged workflow for evaluating and optimizing LLM applications. Apply here: https://lnkd.in/gRiZShfs
1 Comment

Like Comment Share
Ragas

1,361 followers
1mo
Report this post
🚀 Synthetic data is reshaping the way we train and evaluate AI models. But how do you tailor high-quality synthetic data to fit your unique needs?🤔 Our latest blog explores Synthetic Data Generation for : 👉 Synthetic Data for Pre-training 👉 Fine-tuning with Synthetic Data 👉 Model Alignment and Safety 👉 Evaluating LLM Applications https://lnkd.in/gFv5xqcU
Like Comment Share
Ragas reposted this
Sarthak Rastogi Sarthak Rastogi is an Influencer

AI engineer experienced in agents, advanced RAG, LLMs and software engineering | Prev: ML research in multiple labs
1mo
Report this post
There are 4 metrics you should use to evaluate your RAG pipeline. Here’s how to easily calculate them in Python, using the Ragas library — 1. Faithfulness: Measures how accurately the generated answer aligns with the given context, which indicates factual consistency. It is scored from 0 to 1, with higher values indicating better faithfulness. 2. Answer Relevance: Assesses how directly and appropriately the generated answer addresses the original question. It uses mean cosine similarity between the original question and questions generated from the answer, with higher scores indicating better relevance. 3. Context Precision: Evaluates whether all relevant items in the contexts are ranked higher. Scores range from 0 to 1, with higher values indicating better precision. 4. Context Recall: Measures how well the retrieved context matches the ground-truth answer. It ranges from 0 to 1, with higher scores indicating better alignment with the ground truth. Ragas makes it easy to evaluate your RAG pipeline on these metrics. You can create a dataset of QAs with their respective context, and choose multiple metrics to evaluate them on. Link to the library docs is in the comments. #AI #LLMs #RAG
16 Comments

Like Comment Share
Ragas

1,361 followers
1mo
Report this post
Ragas today hits 500,000+ downloads per month. All credit to our awesome open-source community ❤️ Supercharge your LLM application evaluation with Ragas ⭐️ https://lnkd.in/drY7MQHW
Like Comment Share
Ragas reposted this
Meri Nova

ML/AI Engineer | Community Builder | Founder @Break Into Data | ADHD + C-PTSD advocate
1mo
Report this post
Don't make the mistake 80% of AI engineers do when building RAG evaluations. Many forget to measure the individual components of RAG and instead only focus on the output accuracy or relevance. To truly get consistent results with RAG, you need to evaluate these systems at multiple stages: 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐒𝐭𝐚𝐠𝐞: - Context Precision: What percentage of the retrieved documents are actually relevant to the query? - Context Recall: Out of all relevant documents, what percentage does the system successfully retrieve? If document ranking is important, consider metrics like: - NDCG (Normalized Discounted Cumulative Gain) - MRR (Mean Reciprocal Rank). 𝐎𝐯𝐞𝐫𝐚𝐥𝐥 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞: Evaluate the system end-to-end to ensure all components work harmoniously. Think of these questions when evaluating your RAG system: - How scalable is it, both in terms of data storage and query traffic? - How much data can you process in bulk at once when indexing? - What is query latency? and more... For a comprehensive evaluation framework, consider using ragas, an open source LLM evaluation library. This tool is specifically designed to assess both the retrieval and generation components of any RAG application. ... If you're eager to learn more about optimizing RAG systems, check out my course "RAG with Langchain" on the DataCamp's platform. You can take the first chapter for free here - https://lnkd.in/gnGXMkTe And if you want to gain full access to all of their courses, you can get 50% off within the next 36 hours! Use the link here: https://lnkd.in/guVFxeQq And start building RAG projects!

11 Comments

Like Comment Share

Ragas

Software Development

Building the opensource framework for testing and evaluating AI Applications

About us

Employees at Ragas

Ruchita Anand

Founder- Once Upon A Time | Founder - RAGAS | Branch Head | Asst Head Of Dept | Teacher | Mentor | IN HOUSE counsel| Associate Counsel

Jithin James

Maintainer - Ragas

Shahul ES

Building Ragas (YC W24) |Kaggle GM | Opensource @ OpenAssistant AI

Updates

Join now to see what you are missing

Similar pages

BentoML

LlamaIndex

Superagent (YC W24)

LangChain

phospho (YC W24)

Athina AI (YC W23)

Fume (YC W24)

Kiosk

Intercept (YC W24)

Quantic

Browse jobs

Python Developer jobs

Scientist jobs