unstructured.io

Software Development

San Francisco, CA 17,588 followers

Get your data RAG-ready. #ETLforLLMs

See jobs Follow

Discover all 85 employees

About us

At Unstructured, we're on a mission to give organizations access to all their data. We know the world runs on documents—from research reports and memos, to quarterly filings and plans of action. And yet, 80% of this information is trapped in inaccessible formats leading to inefficient decision-making and repetitive work. Until now. Unstructured captures this unstructured data wherever it lives and transforms it into AI-friendly JSON files for companies who are eager to fold AI into their business.

Website: http://www.unstructured.io/
External link for unstructured.io
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Founded: 2022
Specialties: nlp, natural language processer, data, unstructured, LLM, Large Language Model, AI, RAG, Machine Learning, Open Source, API, Preprocessing Pipeline, Machine Learning Pipeline, Data Pipeline, artificial intelligence, and database

Locations

Primary

San Francisco, CA, US

Get directions

Employees at unstructured.io

See all employees

Updates

unstructured.io

17,588 followers
2w
Report this post
🎉 We're thrilled to announce the launch of Unstructured’s new Enterprise ETL Platform that automates the complex process of transforming unstructured data in any format and from any source to your GenAI stack. 🚀 🔥 Features: - No-code UI - VLM data transformation - Continuous data processing on your schedule - In-VPC deployment option - SOC 2 Type 2, HIPAA, & GDPR compliance - 50+ connectors Check out our new Platform video to learn more. https://lnkd.in/esPAMfg2 👉Contact us to get started: https://lnkd.in/entVRx7m #WhateverItIsWeCanStructureIt
3 Comments

Like Comment Share
unstructured.io

17,588 followers
2d
Report this post
If you missed our webinar yesterday, "Build production-ready ETL for RAG in 10 minutes", here are some links to catch up! You can watch the recording at: https://lnkd.in/gXf6sCvf Try out Unstructured Platform with a free trial: https://lnkd.in/gWRrnBZa Or check out the notebook in which we walked through how to set up Platform for S3 -> Pinecone, set up local RAG with Llama3.2, and evaluate the pipeline with Ragas: https://lnkd.in/gE4YgfU6

Build Production-Ready RAG Pipelines in 10 Minutes with Unstructured

https://www.youtube.com/

Like Comment Share
unstructured.io

17,588 followers
3d
Report this post
🧠Document partitioning with Anthropic’s Claude 3.5 Sonnet is now available in Unstructured Platform! 🔍 Tackling complex documents? Unstructured Platform's VLM partitioner delivers enhanced transformation capabilities for your toughest documents - from intricate nested forms to handwritten text and degraded scans. ⚙️ When first released, Unstructured Platform featured a VLM partitioner that relies on OpenAI's GPT-4o with optimized prompts to maximize text extraction quality and ensure robust outputs. Now, you can choose between OpenAI's GPT-4o and Anthropic’s Claude 3.5 Sonnet. 💪️Through rigorous testing on real-world documents, we've observed Claude 3.5 Sonnet achieving superior accuracy on form structures and noisy documents, making it our recommended default choice. 🚀 This is just the beginning! We're actively expanding our model support to accommodate diverse use cases and more integrations are in development! Stay tuned.
Like Comment Share
unstructured.io

17,588 followers
4d
Report this post
🔧 Meet the Unstructured Platform API: Programmatic data transformation powered by our Platform. 👩💻 Use REST-enabled interface for data ingestion and transformation to: ⚙️ Create source/destination connectors, pre-configured workflows (Basic/Advanced/Platinum) ⚙️ Execute workflows as part of your CI/CD ⚙️ Ensure consistent data transformation logic across environments 💪 Existing Serverless API users: Your keys are compatible! Learn more: https://lnkd.in/ekArrjQK
Like Comment Share
unstructured.io

17,588 followers
4d
Report this post
⚡️ Last call! In just a few hours, we're showing you how to build production-grade ETL pipelines faster than you can order your morning coffee! ☕️ Join us today at 10am PST/1pm EST to learn: - How to set up ETL with a VLM in 10 minutes flat - Transform data from AWS S3 to Pinecone vector DB - Get hands-on with a local Llama3.2 RAG pipeline using Ragas 👉 Grab your spot now: https://lnkd.in/gCRtmz6e
Like Comment Share
unstructured.io

17,588 followers
5d
Report this post
Don't miss our webinar tomorrow, where we walk through how to set up your production-scale ETL with a VLM in 10 minutes, transforming data from Amazon Web Services (AWS) S3 to a Pinecone vector DB, and evaluate a local Llama3.2 RAG pipeline with Ragas Sign up today at https://lnkd.in/gCRtmz6e
Like Comment Share
unstructured.io

17,588 followers
6d Edited
Report this post
🚀 Build a No-Code AI Assistant in Minutes with Unstructured Platform, AstraDB, and Langflow! 💡 Check out our beginner-friendly RAG tutorial and learn how to build an AI chatbot for your data—all without writing a single line of code. 🤯🤯🤯 🔑 Key Takeaways: ✅ **Tackle LLM Hallucinations**: Use RAG to ground AI in your internal knowledge. ✅ **Simplify Unstructured Data Processing**: Prepare PDFs, emails, and more with the Unstructured Platform’s no-code UI. ✅ **Leverage AstraDB**: Seamlessly store RAG-ready data in a vector database and retrieve context for your LLM as needed. ✅ **Build the app with Langflow**: Build a conversational AI assistant in minutes. This tutorial walks you through every step—from connecting AWS S3 to AstraDB, to setting up workflows and building a fully functional chat interface using Langflow. Let's go! https://lnkd.in/etrnfMJZ

Build a No-Code RAG AI Assistant with Unstructured Platform, AstraDB, and LangFlow – Unstructured

unstructured.io

3 Comments

Like Comment Share
unstructured.io

17,588 followers
1w
Report this post
🚨 Friday notebook drop: Multimodal RAG: Enhancing RAG outputs with image results For this demo, we used the widely read The Illustrated Transformer by Jay Alammar to perform visually-enriched QnA. This blog post is famous for how well it illustrates the concepts behind the ubiquitous transformer architecture. So why shouldn't your RAG flow include these insightful images in the output? blog: https://lnkd.in/gi6-AQB2 notebook: https://lnkd.in/gprNmzWX

Include relevant image output with your RAG results – Unstructured

unstructured.io

Like Comment Share
unstructured.io

17,588 followers
1w
Report this post
With Unstructured Platform for developers, you pay as you go, and only for the document pages that you process. A question that we often get is - with all the different file types that Platform supports, how do you define “pages” for documents that don’t necessarily have pages? The answer is in our documentation, but it’s worth highlighting: We calculate a page as follows: 🧮 For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff. 🧮 For .docx files that have page metadata, we calculate the number of pages based on that metadata. 🧮 For all other file types, we calculate the number of pages as the file’s size divided by 100 KB. 🧮 For non-file data, we calculate a page as 100 KB of incoming data to be processed. https://lnkd.in/guGPReqS

Billing

docs.unstructured.io

Like Comment Share
unstructured.io

17,588 followers
1w
Report this post
Build Production-Ready ETL Pipelines for RAG in 10 Minutes with Unstructured! Join us next Wednesday, December 18th for a hands-on technical webinar with Unstructured’s engineers, showing you how to leverage Platform with Ragas to quickly build pipelines and evaluate accuracy using the latest Llama 3.2. We will transform data in an Amazon Web Services (AWS) S3 bucket to a Pinecone vector database using no code! Sign up today at https://lnkd.in/gCRtmz6e
Like Comment Share

Browse jobs

Funding

unstructured.io 3 total rounds

Last Round

Series B Apr 14, 2024

US$ 40.0M

Investors

Menlo Ventures + 9 Other investors

See more info on crunchbase

unstructured.io

Software Development

San Francisco, CA 17,588 followers

Get your data RAG-ready. #ETLforLLMs

About us

Locations

Employees at unstructured.io

Tom Whiteaker

Co-Founder and Partner, IBM Ventures Investments

James Reid

Head of BizOps at Unstructured

John Newton

Co-Founder of Alfresco and Documentum, Board Member, Investor

Robin Vasan

Enterprise Seed / Early Stage Investor

Updates

Build Production-Ready RAG Pipelines in 10 Minutes with Unstructured

https://www.youtube.com/

Join now to see what you are missing

Similar pages

Primer.ai

Contextual AI

LangChain

Cleanlab

LlamaIndex

Pinecone

Qdrant

Hebbia

Cognition

Perplexity

Browse jobs

Engineer jobs

Presales Solutions Architect jobs

Analyst jobs

Javascript Developer jobs

Site Reliability Engineer jobs

Researcher jobs

Scientist jobs

Director jobs

Developer jobs

Software Engineer jobs

Manager jobs

Funding