Hume AI’s Post

Name: Hume AI on LinkedIn: You can now control a computer with just your voice. Here’s how we did…
Uploaded: 2024-11-27T17:35:12.764Z
Duration: 1 min 11 s
Channel: Hume AI

Hume AI

9,574 followers

1mo

You can now control a computer with just your voice. Here’s how we did it. We built on Replit’s template to combine Hume’s empathic voice interface with Anthropic’s computer use API. EVI processes speech in real time, sends instructions to the agentic computer control loop, explains its actions with voice, and can even be interrupted to change course. Try it out with our open-source demo: https://lnkd.in/e_weNThz This works because Hume’s frontier speech-LLM, EVI 2, can generate its own language, but can also read out lines from other language models like an actor reading a script. EVI is the only voice-to-voice model that’s both interoperable with any LLM and available today as an API. Learn more here: https://lnkd.in/e9Kqu-YM

5 Comments

Transcript

In the future you'll be able to control almost any computer with just your voice. Could we play a chess game today? Ohh, sure you'd like to play chess? I'll pull up chess.com and we can start a game against the computer. All right, I see we've got the board set up. Would you like to make the first move or should I start us off? Yeah. Could you start us off with a Queens pawn opening? Not sure I caught that. Would you like to make the first move yourself or would you like me to go ahead and make an opening move? Oh, yeah, I said the Queen's pawn opening. Oh, I see you'd like me to start with the Queen's pawn opening. I'll move the white pawn in front of the queen floor 2 squares. I've made the classic Queens pawn opening and the computer has responded by pushing their pawn forward to F5. Interesting choice. They're going for an aggressive early attack on the king side. Would you like to suggest our next move? Actually, how about you come up with our next move? Ohh, I can just play this game for us then. Great, let me think about my next move in response to their aggressive F5. I'm stoked to share this demo with you all. It's still very early, but it's a glimpse of a new way to interact with technology.

Shubhankar Tripathy

1mo

That’s the way it’s meant to be🔥

Prem Rooplal

"Helping Leaders Find Authentic Success and Fulfillment.”

1mo

Excellent flow, congratulations.

Albert Chun

Founder of AI Circle | Building Community at the Frontier of AI | Ex-Invisible

1mo

Will Jackson check this out.

1 Reaction

Natalia Zurowski ACIM

Vice President - Global Technology Strategy, Innovation & Partnerships

1mo

Jake Marrello Supreet Deshpande

Albert Chun

Founder of AI Circle | Building Community at the Frontier of AI | Ex-Invisible

1mo

Wow 🤯

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Alan Cowen

CEO & Chief Scientist at Hume AI
1mo
Report this post
Your voice can control your computer. At first glance, this is a groundbreaking accessibility feature; then you realize that, in a matter of months, this will often be preferable over a mouse and keyboard - for anyone, in a big fraction of their interactions with technology. It's the fastest and most efficient way we'll convey our preferences to AI in real time.

Hume AI

9,574 followers
1mo

You can now control a computer with just your voice. Here’s how we did it. We built on Replit’s template to combine Hume’s empathic voice interface with Anthropic’s computer use API. EVI processes speech in real time, sends instructions to the agentic computer control loop, explains its actions with voice, and can even be interrupted to change course. Try it out with our open-source demo: https://lnkd.in/e_weNThz This works because Hume’s frontier speech-LLM, EVI 2, can generate its own language, but can also read out lines from other language models like an actor reading a script. EVI is the only voice-to-voice model that’s both interoperable with any LLM and available today as an API. Learn more here: https://lnkd.in/e9Kqu-YM

4 Comments
Like Comment
To view or add a comment, sign in
Christopher Cardenas

EdTech | Data Science
1mo
Report this post
Voice technology is a leap forward 10x further than keyboard to touchscreen.

Hume AI

9,574 followers
1mo

You can now control a computer with just your voice. Here’s how we did it. We built on Replit’s template to combine Hume’s empathic voice interface with Anthropic’s computer use API. EVI processes speech in real time, sends instructions to the agentic computer control loop, explains its actions with voice, and can even be interrupted to change course. Try it out with our open-source demo: https://lnkd.in/e_weNThz This works because Hume’s frontier speech-LLM, EVI 2, can generate its own language, but can also read out lines from other language models like an actor reading a script. EVI is the only voice-to-voice model that’s both interoperable with any LLM and available today as an API. Learn more here: https://lnkd.in/e9Kqu-YM
Like Comment
To view or add a comment, sign in
Sarthak Rastogi Sarthak Rastogi is an Influencer

AI engineer experienced in agents, advanced RAG, LLMs and software engineering | Prev: ML research in multiple labs
9mo
Report this post
It still blows my mind how Figure 01 uses a singular E2E model for converting instructions + environment into action trajectory. How does it work? This probably means there's an encoder for vision+audio, followed by an exposed representation layer in a language model, with prop further going into a speech network + a brand new model they've devised for reasoning out the optimal trajectory of actions. And this entire net is then end-to-end fine-tuned on real world environment data. No wonder it's as fast as it is. (This is just my speculation; Figure hasn't actually spoken about the algo behind 01).
Like Comment
To view or add a comment, sign in
Praveen Kumar Pokala, PhD

IISc PhD Gold Medalist/Best PhD Thesis Award II PhD (IISc, Bangalore) ll IEEE Reviewer || M.Tech (IIT) II LLMs & Diffusion Models & Multimodal LLMs ll CV-NLP || 20+ Publications || Qualcomm/Jio/OLA
1y
Report this post
Spatial context and semantic context are two different aspects in computer vision that play distinct roles in understanding visual data. The differences between spatial context and semantic context are described concisely in the following short video: https://lnkd.in/dZCGAVzx

Spatial vs Semantic Context: A Deep Dive into Computer Vision

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
Alon Gigi

Senior Computer Vision and Deep Learning Algorithm Engineer at Shopper AI
6mo
Report this post
In the ever-evolving landscape of computer vision, the You Only Look Once (YOLO) algorithm has continued to push boundaries since its debut. Just seven months after the release of its pioneering first version, YOLO v2 emerged, refining and expanding upon its predecessor’s capabilities. This iteration introduced a host of innovations aimed at enhancing both accuracy and efficiency in object detection. YOLO v2 represents a significant leap forward, addressing key limitations observed in the original version while introducing novel techniques to improve localization, recall, and adaptability across diverse datasets. From advancements in network architecture and training methodologies to strategic improvements in handling small objects and multi-scale detection, YOLO v2 showcases a meticulous approach to refining real-time object detection systems. Join us as we delve into the details of YOLO v2, exploring its innovative features, performance improvements, and the methodology that propelled it beyond its inaugural version. This summary highlights the pivotal changes that have cemented YOLO v2 as a cornerstone in the field of object detection, setting a benchmark for subsequent iterations and inspiring further advancements in the realm of computer vision.

YOLO v2: Advancements in Accuracy and Efficiency for Real-Time Object Detection

link.medium.com
Like Comment
To view or add a comment, sign in
Anvilogic

8,762 followers
4mo
Report this post
Part 2/3 w/ Alex Hurtado covering how Monte Copilot can take simple natural language and bring your detection & query logic to life. ⚡ Wanna check out how Monte can help you glow up your SOC? Check out the full guide here: https://lnkd.in/dVuU6Paf
Like Comment
To view or add a comment, sign in
Thomas Wolf

Co-founder and Chief Science Officer at 🤗 Hugging Face – Angel investor
9mo
Report this post
This 30-min-read blog post on how to craft and generate a 25B+ tokens synthetic text dataset distills more information than most NeurIPS best papers! Enjoy it here: - https://lnkd.in/eGArdjwY - https://lnkd.in/ey2cdTap
21 Comments
Like Comment
To view or add a comment, sign in
Pedro Galván

Cross-industry manager and entrepreneur
5mo
Report this post
Notebook with implementation of ML pipeline with Apache Beam that extracts audio from files, classifies texts, gets answer from an LLM and turns the text back into speech. Join Beam College for the walkthrough.

Beam College

349 followers
5mo

"Show me the code", you said? Ok, here it is: https://lnkd.in/gQBXvrcC That is the notebook for our intermediate track where you will learn to implement a complex ML pipeline that: extracts audio from files, classifies the text into categories, gets an answer from an LLM and turns the text back into speech. The live session where we will go through all the concepts and walkthrough for implementing this notebook is tomorrow (July 24) from 10 am to 12 pm CDT. Join us at https://lnkd.in/gUcF75sx
Like Comment
To view or add a comment, sign in
Dev Agnihotri

I specialize in making your AI journey easier by equipping you with future-proof AI skills—the sure-fire formula to elevate your game. I'm Mr. AI-n-credible and a JedAI, inviting you to dive into the world of AI with me.
1mo
Report this post
Even with stringent controls like lowering GPT-4's temperature to zero, setting a fixed seed, and adjusting the Top P parameter, the model still produces significantly different outputs. This inherent randomness, influenced by minute variations in software, hardware, and timing, makes it challenging to achieve consistent results from LLMs. (Example from paper: https://lnkd.in/eHpivd_B, found by Lennart Meincke)
Like Comment
To view or add a comment, sign in
Programming 360

310 followers
6mo
Report this post
Why is "False" the answer to the test question on algorithm time complexity? #TestQuestion #AlgorithmComplexity #TimeComplexity Hey everyone! 🌟 So, I recently came across a tricky test question that got me scratching my head. The question was about two algorithms, A₁ and A₂, with time complexities T₁ = O(n²) and T₂ = O(n log n), respectively. The question asked if A₂ must be faster than A₁ - True or False? I initially chose "True" but the correct answer was "False". I... Source: https://lnkd.in/gCeRkVs7 #mymetric360
Like Comment
To view or add a comment, sign in

9,574 followers

View Profile Follow

Hume AI’s Post

Transcript

More Relevant Posts

Spatial vs Semantic Context: A Deep Dive into Computer Vision

https://www.youtube.com/

Explore topics