Looking for some #datalakehouse goodness after the holidays? Join this webinar next month with Vinoth Chandar and Kyle Weller to learn how Onehouse can help you accelerate performance 🚀 and reduce costs (save big 💰) across Apache Hudi, #Iceberg, and #DeltaLake with ease.
Onehouse
Software Development
Menlo Park, California 8,263 followers
The Universal Data Lakehouse
About us
Onehouse, the pioneer in open data lakehouse technology, empowers enterprises to deploy and manage a world-class data lakehouse in minutes on Apache Hudi, Apache Iceberg, and Delta Lake. Delivered as a fully-managed cloud service in your VPC, Onehouse offers high-performance ingestion pipelines for minute-level freshness and optimizes tables for maximum query performance. Thanks to its truly open data architecture, Onehouse eliminates data format, table format, compute and catalog lock-ins, guarantees interoperability with virtually any warehouse/data processing engine, and ensures exceptional ELT and query performance for all your workloads. Companies worldwide rely on Onehouse to power their analytics, reporting, data science, machine learning, and GenAI use cases from a single, unified source of data. Built on Apache Hudi and Apache XTable (Incubating), Onehouse features advanced capabilities such as indexing, ACID transactions, and time travel, ensuring consistent data across all downstream query engines and tools. The platform’s unique incremental processing capabilities deliver unmatched ELT cost and performance by minimizing data movement and optimizing resource usage. With 24/7 reliability, immediate cost savings, and open access for all major tools and query engines, benefit from Onehouse's #nolockin philosophy to future-proof any stack.
- Website
-
https://onehouse.ai
External link for Onehouse
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- Menlo Park, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
2550 Sand Hill Rd
STE 200
Menlo Park, California 94025, US
Employees at Onehouse
-
Jim Griffin
Digital Media Consulting
-
Jerry Chen
Partner at Greylock
-
Albert Wong
Senior Solutions Architect | K8S, Application & Data Integration, Database | Open Source & Retail SME, 15x+ certifications, 3x Patent Author
-
Gaetan Castelein
CMO & Head of Sales and Partnerships @ Onehouse
Updates
-
You have open table formats. Now how about speeding those up? 🚀 Join Vinoth Chandar and Kyle Weller next month to learn how Table Optimizer from Onehouse can accelerate performance and reduce costs (save 💰) whether you are working with #ApacheHudi, #ApacheIceberg, and/or #DeltaLake. 2-10x faster queries, 20-80% infra cost reductions. See you there! https://lnkd.in/gKqExqUT
-
🎁 Early holiday gift from Onehouse: the top 5 tips for scaling Apache Spark! Tired of shuffle failures and out of memory errors? Spark jobs running too slow? We demystified the most important considerations to scaling your Spark workloads, including: - On-heap vs. off-heap memory - Spilling to disk - Optimizing your data structures - Choosing the right serialization technique - Adaptive query execution - Dynamic allocation - … and more! These tips come from the Onehouse team’s experiences operating and taming complex, petabyte-scale Spark workloads for the largest data lakes on the planet. Read the blog post here at https://lnkd.in/dVeXhn2E.
Top 5 tips for scaling Apache Spark
onehouse.ai
-
🚨Amazon #S3 data lakes have revolutionized the way businesses ingest, store, and analyze data, offering unmatched flexibility, scalability, and cost-effectiveness. This approach enables businesses to adapt quickly and derive value from their data in ways that traditional storage systems cannot. This blog post from Po Hong provides readers a comprehensive guide to Amazon Web Services (AWS) S3 data lakes and recommends the use of a lakehouse architecture, provided by an open source project such as Apache Hudi, in cases where improved manageability, queryability, or other desired characteristics make it worth the modest additional effort. 💪 https://lnkd.in/eBXA4kfe #Onehouse #DataLakehouse #ApacheHudi #RealTimeAnalytics #DataStreaming #DataEngineering
Amazon S3 Data Lakes: A Complete Guide
onehouse.ai
-
Is your team overburdened managing #ApacheSpark? Learn how Conductor eliminated complex #Spark management, reduced query times by >75%, and freed their engineering team to focus on innovation by transforming their data infrastructure with Onehouse. #dataengineering #datalakehouse https://lnkd.in/gJBeeehW
Conductor-Onehouse-Case-Study.pdf
info.onehouse.ai
-
⚒️ Are your Apache Hudi pipelines on Amazon #EMR fully optimized? 💡 Join our webinar to see how you can cut costs by up to 80% and boost performance without rewriting a single line of code. ✍️ Save your spot now!
Automated Performance Tuning for Apache Hudi™ on Amazon EMR
www.linkedin.com
-
Coming up soon! Don't miss this webinar if you are working with Apache Hudi, especially if you are building pipelines on Amazon #EMR. Accelerating data pipelines while cutting costs - get the top tips and tricks from the experts. https://lnkd.in/ePSpyybE
⚒️ Are your Apache Hudi pipelines on Amazon #EMR fully optimized? 💡 Join our webinar to see how you can cut costs by up to 80% and boost performance without rewriting a single line of code. ✍️ Save your spot now!
Automated Performance Tuning for Apache Hudi™ on Amazon EMR
www.linkedin.com
-
🚀 Just Launched: LakeView Insights + New Deployment Models! 🚀 Managing Apache Hudi tables just got a whole lot easier. Our new LakeView Insights brings the answers data engineers need, right to your inbox! 📬 Now you can: 🔍 Get quick insights on ingestion volumes, query latency, and storage layout issues ⚡ Deploy effortlessly with Pull, Push, and the new SyncTool options (demo videos in the comments) 🔒 Keep your data private – LakeView reads only your Hudi metadata, nothing else! If you’re running Hudi, LakeView is your go-to for streamlined monitoring and optimization. Start for free and experience the LakeView difference today! Sign up at https://lnkd.in/g7ZZ4dDT. Read the latest and see all the demos at https://lnkd.in/gSYmbxcy
LakeView: the free data lakehouse observability tool
onehouse.ai
-
Are you trying to choose a metastore or a catalog, but not sure what to pick? Read our latest comprehensive comparison article where Kyle Weller tears down some of the top Catalogs on the market: Unity Catalog, Apache Polaris Incubating, DataHub, Glue, Apache Gravitino, and Atlan. In this deep dive you will find head-to-head comparisons of features ranging from access controls, data quality, data discovery, and much more. The article breaks down the difference between a metastore and a business catalog, and it describes the intricate relationships of catalogs and lakehouse open table formats Apache Hudi, Delta Lake, and Apache Iceberg. Sneak peek at the rankings: Data Discovery and Exploration ✅ Best = Atlan ❌ Worst = Apache Polaris Data Connectors ✅ Best = DataHub ❌ Worst = Apache Polaris Access Control ✅ Best = Apache Polaris ❌ Worst = DataHub Compliance ✅ Best = DataHub ❌ Worst = Apache Gravitino Data Lineage: ✅ Best = Databricks Unity Catalog ❌ Worst = Apache Polaris Data Quality: ✅ Best = Glue ❌ Worst = Unity Catalog OSS Read the blog for the full descriptions, links to docs, and the metastore rankings: https://lnkd.in/gBvA6YiK #datacatalog #unitycatalog #datahub #apachepolaris #atlan #awsglue #apachegravitino #dataengineering #apachehudi #apacheiceberg #deltalake
Comprehensive Data Catalog Comparison
onehouse.ai
-
🌊 AWS re:Invent 2024 Recap: AI & Open Table Formats Take the Spotlight From S3 Tables to SageMaker Studio, AWS delivered major announcements at this year’s #reInvent, reinforcing 2024 as the year of AI and Open Table Formats. Key highlights for data teams: ✅ S3 Tables – Native integration with open table formats with the hope of easier, faster queries and automated table maintenance. ✅ Glue 5.0 – Data lineage, fine-grained access control, and version upgrades for Apache Hudi, Apache Iceberg, and Delta Lake. ✅ SageMaker Lakehouse + Studio – A unified interface for analytics and AI, centralizing tools like Redshift, Glue, and SageMaker. We break down the benefits, drawbacks, and what these updates mean for your data lakehouse strategy. 👉 Dive deeper into our full blog: https://lnkd.in/ensbpTwt #AWS #DataLakehouse #OpenTableFormats #AI
AWS re:Invent Recap 2024: AI & Open Table Formats
onehouse.ai