🎉 We're thrilled to announce the launch of Unstructured’s new Enterprise ETL Platform that automates the complex process of transforming unstructured data in any format and from any source to your GenAI stack. 🚀 🔥 Features: - No-code UI - VLM data transformation - Continuous data processing on your schedule - In-VPC deployment option - SOC 2 Type 2, HIPAA, & GDPR compliance - 50+ connectors Check out our new Platform video to learn more. https://lnkd.in/esPAMfg2 👉Contact us to get started: https://lnkd.in/entVRx7m #WhateverItIsWeCanStructureIt
unstructured.io
Software Development
San Francisco, CA 17,586 followers
Get your data RAG-ready. #ETLforLLMs
About us
At Unstructured, we're on a mission to give organizations access to all their data. We know the world runs on documents—from research reports and memos, to quarterly filings and plans of action. And yet, 80% of this information is trapped in inaccessible formats leading to inefficient decision-making and repetitive work. Until now. Unstructured captures this unstructured data wherever it lives and transforms it into AI-friendly JSON files for companies who are eager to fold AI into their business.
- Website
-
http://www.unstructured.io/
External link for unstructured.io
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, CA
- Type
- Privately Held
- Founded
- 2022
- Specialties
- nlp, natural language processer, data, unstructured, LLM, Large Language Model, AI, RAG, Machine Learning, Open Source, API, Preprocessing Pipeline, Machine Learning Pipeline, Data Pipeline, artificial intelligence, and database
Locations
-
Primary
San Francisco, CA, US
Employees at unstructured.io
Updates
-
If you missed our webinar yesterday, "Build production-ready ETL for RAG in 10 minutes", here are some links to catch up! You can watch the recording at: https://lnkd.in/gXf6sCvf Try out Unstructured Platform with a free trial: https://lnkd.in/gWRrnBZa Or check out the notebook in which we walked through how to set up Platform for S3 -> Pinecone, set up local RAG with Llama3.2, and evaluate the pipeline with Ragas: https://lnkd.in/gE4YgfU6
Build Production-Ready RAG Pipelines in 10 Minutes with Unstructured
https://www.youtube.com/
-
🧠Document partitioning with Anthropic’s Claude 3.5 Sonnet is now available in Unstructured Platform! 🔍 Tackling complex documents? Unstructured Platform's VLM partitioner delivers enhanced transformation capabilities for your toughest documents - from intricate nested forms to handwritten text and degraded scans. ⚙️ When first released, Unstructured Platform featured a VLM partitioner that relies on OpenAI's GPT-4o with optimized prompts to maximize text extraction quality and ensure robust outputs. Now, you can choose between OpenAI's GPT-4o and Anthropic’s Claude 3.5 Sonnet. 💪️Through rigorous testing on real-world documents, we've observed Claude 3.5 Sonnet achieving superior accuracy on form structures and noisy documents, making it our recommended default choice. 🚀 This is just the beginning! We're actively expanding our model support to accommodate diverse use cases and more integrations are in development! Stay tuned.
-
🔧 Meet the Unstructured Platform API: Programmatic data transformation powered by our Platform. 👩💻 Use REST-enabled interface for data ingestion and transformation to: ⚙️ Create source/destination connectors, pre-configured workflows (Basic/Advanced/Platinum) ⚙️ Execute workflows as part of your CI/CD ⚙️ Ensure consistent data transformation logic across environments 💪 Existing Serverless API users: Your keys are compatible! Learn more: https://lnkd.in/ekArrjQK
-
⚡️ Last call! In just a few hours, we're showing you how to build production-grade ETL pipelines faster than you can order your morning coffee! ☕️ Join us today at 10am PST/1pm EST to learn: - How to set up ETL with a VLM in 10 minutes flat - Transform data from AWS S3 to Pinecone vector DB - Get hands-on with a local Llama3.2 RAG pipeline using Ragas 👉 Grab your spot now: https://lnkd.in/gCRtmz6e
-
Don't miss our webinar tomorrow, where we walk through how to set up your production-scale ETL with a VLM in 10 minutes, transforming data from Amazon Web Services (AWS) S3 to a Pinecone vector DB, and evaluate a local Llama3.2 RAG pipeline with Ragas Sign up today at https://lnkd.in/gCRtmz6e
-
🚀 Build a No-Code AI Assistant in Minutes with Unstructured Platform, AstraDB, and Langflow! 💡 Check out our beginner-friendly RAG tutorial and learn how to build an AI chatbot for your data—all without writing a single line of code. 🤯🤯🤯 🔑 Key Takeaways: ✅ **Tackle LLM Hallucinations**: Use RAG to ground AI in your internal knowledge. ✅ **Simplify Unstructured Data Processing**: Prepare PDFs, emails, and more with the Unstructured Platform’s no-code UI. ✅ **Leverage AstraDB**: Seamlessly store RAG-ready data in a vector database and retrieve context for your LLM as needed. ✅ **Build the app with Langflow**: Build a conversational AI assistant in minutes. This tutorial walks you through every step—from connecting AWS S3 to AstraDB, to setting up workflows and building a fully functional chat interface using Langflow. Let's go! https://lnkd.in/etrnfMJZ
-
🚨 Friday notebook drop: Multimodal RAG: Enhancing RAG outputs with image results For this demo, we used the widely read The Illustrated Transformer by Jay Alammar to perform visually-enriched QnA. This blog post is famous for how well it illustrates the concepts behind the ubiquitous transformer architecture. So why shouldn't your RAG flow include these insightful images in the output? blog: https://lnkd.in/gi6-AQB2 notebook: https://lnkd.in/gprNmzWX
Include relevant image output with your RAG results – Unstructured
unstructured.io
-
With Unstructured Platform for developers, you pay as you go, and only for the document pages that you process. A question that we often get is - with all the different file types that Platform supports, how do you define “pages” for documents that don’t necessarily have pages? The answer is in our documentation, but it’s worth highlighting: We calculate a page as follows: 🧮 For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff. 🧮 For .docx files that have page metadata, we calculate the number of pages based on that metadata. 🧮 For all other file types, we calculate the number of pages as the file’s size divided by 100 KB. 🧮 For non-file data, we calculate a page as 100 KB of incoming data to be processed. https://lnkd.in/guGPReqS
Billing
docs.unstructured.io
-
Build Production-Ready ETL Pipelines for RAG in 10 Minutes with Unstructured! Join us next Wednesday, December 18th for a hands-on technical webinar with Unstructured’s engineers, showing you how to leverage Platform with Ragas to quickly build pipelines and evaluate accuracy using the latest Llama 3.2. We will transform data in an Amazon Web Services (AWS) S3 bucket to a Pinecone vector database using no code! Sign up today at https://lnkd.in/gCRtmz6e