During NeurIPS in Vancouver last month, our team co-organized the inaugural workshop on video-language models. Thanks to the growing interest in video-language models across two key areas - video understanding (video-to-text) and video generation (text-to-video), the workshop brought together researchers to discuss current obstacles, foster open collaboration, and accelerate the development of video foundation models for real-world applications. 👩🏫 👨🏫 Several crucial themes resonated throughout the workshop. Dima Damen highlighted the importance of understanding fine-grained temporal dynamics in egocentric videos, while Gedas Bertasius emphasized the need to move beyond pure language-centric approaches in video understanding. Yong Jae Lee's presentation revealed that even short-form video understanding remains challenging, particularly when dealing with counterfactual temporal information. Jianwei Yang's vision of multimodal agentic models and Ishan Misra's insights into video generation demonstrated how the field is evolving beyond simple understanding tasks toward more sophisticated applications. Doyup Lee's exploration of general world models highlighted the potential for video-language models to develop comprehensive world understanding. We synthesized the key insights that emerged from our distinguished speakers' presentations in this blog post: https://lnkd.in/gv3rqEp2 👈 At Twelve Labs, these challenges resonate deeply with our experience developing video foundation models. As we move forward, we envision: 🚩 More sophisticated temporal understanding capabilities that can handle complex narratives and counterfactual reasoning 🚩 Better integration of world knowledge with visual understanding 🚩 More efficient architectures that can process longer videos while maintaining temporal coherence 🚩 Stronger bridges between academic research and industrial applications The future of video-language models is incredibly promising!
Twelve Labs
Software Development
San Francisco, California 9,003 followers
Help developers build programs that can see, listen, and understand the world as we do.
About us
Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.
- Website
-
http://www.twelvelabs.io
External link for Twelve Labs
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
55 Green St
San Francisco, California 94111, US
Employees at Twelve Labs
Updates
-
🎥 ~ NEW WEBINAR ~ Just dropped! Dive into the world of multimodal embeddings with Manish Maheshwari and Hrishikesh Yadav as they give a masterclass on our newly released Embed API product. What's inside: 🔍 How multimodal embeddings enable any-to-any retrieval 🧑🏫 Demo of Embed API via SDK and Playground 🤔 Why empower your products with multimodal 👗 Demo of Fashion Assistant app with Embed API Watch now: https://lnkd.in/gqBCgPn5 📺 #Embeddings #Multimodal #Developers #API
A Deep Dive into Twelve Labs Embed API for Multimodal Embeddings | Multimodal Weekly 66
https://www.youtube.com/
-
🎥 ~ NEW WEBINAR ~ Just dropped! Listen to Hyeongmin Lee as he shares analysis and insights from a holistic evaluation of video foundation models. What's inside: 🔍 A literature review of image foundation models ⁉️ Limitations of existing video foundation models 📽️ Video is not just a sequence of images 🚀 Introducing Twelve Labs' TWLV-I - a SOTA video foundation model that captures motion and appearance Watch now: https://lnkd.in/gPfNMXdk 📺 #VideoFoundationModels #Evaluation #Research
Analysis and Insights from Holistic Evaluation on Video Foundation Models | Multimodal Weekly 65
https://www.youtube.com/
-
📌 2024 Year in Review at Twelve Labs What a year it’s been at Twelve Labs! Our journey through 2024 has been filled with meaningful moments. Here are some highlights we're grateful to share: 🌏 Global Collaboration Our teams from San Francisco and Seoul came together at the Seoul Summit, sharing insights and building stronger connections. The synergy and exchange of ideas set a fantastic tone for the year, fueling innovations and strengthening our global operations. 🌟 Team Growth We've been fortunate to welcome 35 new talented individuals to the Twelve Labs family this year! Each person brings unique expertise and fresh perspectives, driving us closer to our goal of transforming how the world interacts with video content. 🏆 Achievements and Innovations 2024 has been a groundbreaking year for Twelve Labs, culminating in the launch of Marengo 2.7. This latest release is a testament to our team's hard work and dedication, pushing the boundaries of video AI to set new industry standards. We're thrilled by the enthusiastic reception from our partners and customers, validating the impact and value of our advanced models. Besides that, our launch of API version 1.3 and the Embedding API, broadening the scope and applicability of our technologies, as well as the introduction of Indexing 2.0 and Pegasus 1.1, further refined our capabilities to meet diverse enterprise needs. We’re SOC-2 compliant, too! 🔗 Community and Industry Engagement From participating in major conferences to hosting webinars and workshops, we've made meaningful impacts and contributed to critical dialogues around AI and technology in the media and entertainment sectors. NAB Show in May was a major turning point, followed by amazing moments at #IBC, SVG summits, Amazon Web Services (AWS) re:Invent, NeurIPS, and many others! As we gear up for another great year, we are immensely grateful for our investors’ continued robust support. And yes- we’re growing our team! Explore opportunities to be part of a team that’s setting the pace in AI advancements. https://lnkd.in/ggc-mYa8 #VideoAI #2024Recap #YearInReview
-
As #NeurIPS2024 wraps up, we at Twelve Labs are energized by an incredible week at one of the world's leading AI conferences in beautiful Vancouver. From December 10-12, our booth buzzed with conversations as attendees explored our latest breakthroughs in AI and machine learning. We were inspired by every discussion, question, and collaborative idea shared! A standout moment was hosting the first-ever workshop on video language models on December 14th. Together with leading researchers and industry innovators, we explored the frontiers of video AI and its transformative potential. The packed session and dynamic discussions exceeded our expectations—thank you to everyone who joined and contributed! Swipe through to relive some of our favorite moments. From insightful discussions at our booth to the engaging interactions during our workshop and the lively atmosphere at the happy hour, every moment was a step forward in our journey to lead in AI innovation. #VideoAI
-
🎥 ~ NEW WEBINAR ~ Just dropped! Dive into the Twelve Labs Playground with Maninder Saini and Sue Kim as they uncover the secrets of effective VFM prompting. What's inside: 🔍 Video Search & Generate live demo ✨ Expert prompting techniques 🚀 Hands-on Playground walkthrough Watch now: https://lnkd.in/gUETvQ2E 📺 #TwelveLabs #VideoAI #AITutorial
Intro to Playground and Prompting | Multimodal Weekly 64
https://www.youtube.com/
-
Twelve Labs reposted this
We are thrilled to announce a significant milestone in our journey—a strategic investment of $30M from global leaders like Databricks, SK Telecom, Snowflake, HubSpot Ventures, and IQT (In-Q-Tel). This funding underscores the transformative impact and value of our advanced video understanding technology in the AI ecosystem- especially to end customers across the media and entertainment space. 🌟 With this investment, we're excited to welcome Yoon Kim as our new President and Chief Strategy Officer. Yoon brings a wealth of experience from SK Telecom and Apple, where he was integral to the development of Siri. His expertise will be invaluable as we continue to innovate and expand our technological capabilities and market reach. 🤝 "Companies like OpenAI and Google are investing heavily in general-purpose multimodal models. But these models aren’t optimized for video. Our differentiation lies in being video-first from the beginning … We believe video is deserving of our sole focus — it’s not an add-on,” said our very own Jae Lee. 🔗 This partnership with Databricks and Snowflake will enhance our offerings, enabling seamless integration with their robust vector databases and opening up new possibilities for enterprise video applications. 🌐 As we embark on this next phase of growth, we're keen to keep pushing the boundaries of what's possible in AI and video understanding. Join us on this exciting journey and see where innovation takes us next! https://lnkd.in/gU7Xvi36
-
We are thrilled to announce a significant milestone in our journey—a strategic investment of $30M from global leaders like Databricks, SK Telecom, Snowflake, HubSpot Ventures, and IQT (In-Q-Tel). This funding underscores the transformative impact and value of our advanced video understanding technology in the AI ecosystem- especially to end customers across the media and entertainment space. 🌟 With this investment, we're excited to welcome Yoon Kim as our new President and Chief Strategy Officer. Yoon brings a wealth of experience from SK Telecom and Apple, where he was integral to the development of Siri. His expertise will be invaluable as we continue to innovate and expand our technological capabilities and market reach. 🤝 "Companies like OpenAI and Google are investing heavily in general-purpose multimodal models. But these models aren’t optimized for video. Our differentiation lies in being video-first from the beginning … We believe video is deserving of our sole focus — it’s not an add-on,” said our very own Jae Lee. 🔗 This partnership with Databricks and Snowflake will enhance our offerings, enabling seamless integration with their robust vector databases and opening up new possibilities for enterprise video applications. 🌐 As we embark on this next phase of growth, we're keen to keep pushing the boundaries of what's possible in AI and video understanding. Join us on this exciting journey and see where innovation takes us next! https://lnkd.in/gU7Xvi36
-
~ New Webinar ~ The webinar recording with Siyuan Li, Son Jaewon, and Jinwoo Ahn is up! Watch here: https://lnkd.in/g4QQqz4h 📺 They discussed: - Matching Anything By Segmenting Anything - CNN-based Spatiotemporal Attention for Video Summarization - Compositional Video Understanding Enjoy!
Video summarization, Compositional video understanding, & Tracking everything | Multimodal Weekly 63
https://www.youtube.com/
-
🚀 Exciting News! Twelve Labs is heading to #NeurIPS2024 in Vancouver! 🌟 We’re happy to announce our participation as a Gold Sponsor at NeurIPS, the premier global conference for groundbreaking AI and machine learning research. Don’t miss this opportunity to connect with our team at our booth to explore these innovations firsthand. 📍 Visit Us at Our Booth: December 10: 12 PM - 8 PM December 11: 9 AM - 5 PM December 12: 9 AM - 4 PM 🎉 First-Ever Video Language Models Workshop 🎥 But that’s not all! We’re proud to host the first-ever workshop on video language models at NeurIPS on Dec 14 that explores the cutting edge advancements of AI in this field. Join us in East Meeting Room 13 where we'll dive into the latest innovations in video AI. Register for the workshop here: https://lnkd.in/gZ5MF235 and learn more about our participation: https://lnkd.in/gZDm4wv7 #VideoAI #VideoLanguageModels #TwelveLabs