We've put AI at Meta's latest Llama 3.3 70B model to the test on the Arm compute platform. Here's what we found when evaluating the inference performance of it on Arm Neoverse-powered Google Axion processors. ⏬ 🔵 Prompt encoding speed is consistent for handling various batch sizes, enhancing operational reliability 🔵 Token generation speed increases with larger batch sizes, providing scalability while serving multiple users effectively 🔵 Text generation is delivered with human-level readability, helping responsiveness for interactive use cases. With its smaller model size and comparable performance to Llama 3.1 405B model, this has game-changing potential for bringing more accessible and efficient GenAI text generation capabilities to everyone. https://okt.to/ITh5v0
Arm’s Post
More Relevant Posts
-
NVIDIA has just dropped a bombshell in the AI world with its new Nemotron-70B, an open-source LLM that beats both OpenAI’s GPT-4o and Anthropic’s Sonnet 3.5. ⠀ Built on Llama 3.1, this 70B-parameter model has smashed key benchmarks like Arena Hard, AlpacaEval 2 LC, and MT-Bench, outperforming larger models with ease. ⠀ This move signals NVIDIA’s push to dominate both the AI software and hardware space—and open-source AI is the perfect vehicle for that. ⠀ Nemotron-70B is already leading the generative AI race. ⠀ Jensen Huang is cooking.
To view or add a comment, sign in
-
Say no to bottlenecks and yes to scalability! Cluster Protocol's AI-enhanced #DePINs network with GPU nodes distribute processing across multiple points, ensuring each node is primed for high loads. 🚀 As demand grows, our AI algorithms optimize performance in real-time, scaling seamlessly—guaranteeing top performance without the lag.
To view or add a comment, sign in
-
MediaTek’s Dimensity platforms are optimized for Google’s Gemini Nano with Multimodality. By teaming up with Google to push the boundaries of generative AI on Android, users will enjoy powerful new capabilities for text, images, and speech. For example, you can take a picture of something and instantly receive a detailed description of what’s been captured. This is just the beginning of how we’re making AI more intuitive and accessible. https://bit.ly/3BKHKGh
To view or add a comment, sign in
-
Join Weights & Biases and Run:ai in Boston on June 12. Discover cutting-edge insights and strategies for navigating AI in regulated environments, tackling GPU availability constraints, and handling complex datasets. Secure your spot → https://lnkd.in/gMK4hPbp
To view or add a comment, sign in
-
Are you considering entering the GPUaaS / AI infrastructure scene? Check out Ditlev B.'s take on the margins and business model. There are seriously slim margins in a lot of the typical configs, and frankly, that's one of the reasons we're here. We will help you seriously increase those margins by giving you the ability to thin provision and overcommit those very costly GPU cards. Your GPU business will be a lot more profitable with hostedAI than without it.
Dear AI provider - how's business? I've been working on a model to better understand what kind of margins service providers in the GPU space work with. https://hosted.ai/model/ << take a look at the model there. It's very much a beta ... so there could be all kinds of issues. But check it out and let me know. Start by filling out the variables (I've set all of them to default values) and click compute. Let me know if you find any oddities or surprises. I was quite surprised... Thanks for your help :)
To view or add a comment, sign in
-
I saw the LinkedIn post below from Gradio and followed the link to the webpage, as an experiment I then used LlamaIndex to parse the blog post and return the content of the blog post as markdown, then I pushed the markdow into ChatGPT and asked it to write a LinkedIn post about the announcement, all that took more or less 5 minutes because I was doing it adhoc, I wanted to see how fast it would be. Interesting points of note, LlamaIndex correctly identified the tables in the blog post and rendered them as tables in the final markdown file, not to mention it picked up on the the parts of text that were written in Korean. For those that want to try out the LG models you can find them here on the LG AI Research on HuggingFace 🤗 . The 2.4B ultra lightweight model is interesting, built for on device use, could we be seeing a an LG phone on the market soon that comes preloaded? https://lnkd.in/gqj-SHH7 Everything below was written using ChatGPT ..... 🚀 Exciting News in AI Innovation! 🎉 LG AI Research has officially open-sourced EXAONE 3.5, unveiling three cutting-edge models that redefine AI performance: 2.4B Model: Perfect for on-device and low-infrastructure use. 7.8B Model: Versatile and lightweight with superior performance. 32B Model: Frontier-level model for those demanding top-tier AI. 💡 What Makes EXAONE 3.5 Stand Out? 1️⃣ Unmatched Long Context Understanding: Handles up to 32K tokens effectively, excelling in real-world applications. 2️⃣ Top Instruction Following Capabilities: Dominates global benchmarks in multiple languages, enhancing productivity. 3️⃣ Competitive in General Domains: Achieves state-of-the-art results in mathematical and coding benchmarks. 🛠️ Training Efficiency & Ethics: The models were crafted with a focus on affordability, ethical AI practices, and meticulous decontamination processes to ensure high-quality outputs. Transparency in ethical assessments ensures responsible AI deployment. 🌐 These open-source models are ready to accelerate AI research and innovation, inviting feedback from researchers worldwide. Let’s shape the future of AI together! 🔗 Try EXAONE 3.5 Models Now! 🔗 Read the Full Technical Report #AI #OpenSource #EXAONE3_5 #Innovation
🚀 Big News! LG AI Research has open-sourced three EXAONE 3.5 models! 32K tokens Long-Context Understanding 🚀 Excels in English and Korean. ✅ 2.4B, 7.8B, and 32B Models: Supports on-device/low-end GPU usage to versatile frontier-level apps! 🚀 Ranked #1 in instruction-following across seven benchmarks. Delivers top-tier performance in instruction following and long-context understanding! LG AI Research's EXAONE 3.5: A Series of Large Language Models for Real-world Use Cases. Explore Now 🔗: 👉 Try the models from Hugging Face collection: https://lnkd.in/gPkjugkN 👉 Read the Blog: https://lnkd.in/gBBxEXBn 👉 Official Gradio space for EXAONE 3.5 (2.4B, 7.8B Models): https://lnkd.in/gvPb-DeJ
To view or add a comment, sign in
-
Grab an API key in Google AI Studio, and get started with the Gemini API Cookbook! Explore the new GenAI model updates to Gemini and more!
Gemini 1.5 Pro updates, 1.5 Flash debut and 2 new Gemma models
google.smh.re
To view or add a comment, sign in
-
#aiplatforms - where an “ai first” solution is architected with intent across the stack for security, observability, scalability, resiliency, adaptability, etc. #aiops #llmops #mlops #dataops #devsecops #appdev
I'm attending the Mistral AI </summit> in San Francisco today. Some highlights from this morning: Making AI useful is not just about powerful LLMs, but building complex AI systems 1. Orchestration (tool calls, function calling) 2. Multimodality (with images and documents) 3 Instruction Following and guardrails enforcement 4. Strong reasoning and capacity to identify user intent 5. API speed Mistral AI announced Pixtral 12B, their first multimodal model 1. Understand images and text 2. Variable image resolution, can support images with arbitrary sizes 3. Long context window: 128k 4. Open-weight, with Apache 2.0 license 5. The benchmarks are very impressive: SOTA on multimodality leaderboards And, of course, the highlight was the presence of Jensen Huang from Nvidia. He talked about 1. How we will interact with billions of digital humans 2. It is not going to be human-in-the-loop but machine-in-the-loop 3. The future of AI computing, how hard is to do inference 4. Lessons on how to build great product and build strategies 5. This is JUST THE BEGINNING Thanks, Mistral AI team, for the partnership with IBM and the invitation!
To view or add a comment, sign in
-
Armand Ruiz talks about making AI useful and how the future of AI is more than creating powerful LLMs. It’s about intelligent, multimodal systems capable of complex orchestration, reasoning, and user intent identification. Thanks Armand for highlighting the need for guardrails and the importance of securing AI. Really liked NVIDIA’s Jensen Huang statement on how we will soon interact with billions of digital humans in a machine-in-the-loop future. The possibilities with AI are endless and while we embark on this exciting futuristic journey, we need to ensure adequate Governance, Complaince and Security for AI systems. Stay ahead and stay secure. #SecuringAI #TCS #CyberforAI #AIforCyber
I'm attending the Mistral AI </summit> in San Francisco today. Some highlights from this morning: Making AI useful is not just about powerful LLMs, but building complex AI systems 1. Orchestration (tool calls, function calling) 2. Multimodality (with images and documents) 3 Instruction Following and guardrails enforcement 4. Strong reasoning and capacity to identify user intent 5. API speed Mistral AI announced Pixtral 12B, their first multimodal model 1. Understand images and text 2. Variable image resolution, can support images with arbitrary sizes 3. Long context window: 128k 4. Open-weight, with Apache 2.0 license 5. The benchmarks are very impressive: SOTA on multimodality leaderboards And, of course, the highlight was the presence of Jensen Huang from Nvidia. He talked about 1. How we will interact with billions of digital humans 2. It is not going to be human-in-the-loop but machine-in-the-loop 3. The future of AI computing, how hard is to do inference 4. Lessons on how to build great product and build strategies 5. This is JUST THE BEGINNING Thanks, Mistral AI team, for the partnership with IBM and the invitation!
To view or add a comment, sign in
-
Bring AI to life with DataStax, NVIDIA, and Wikimedia Deutschland e. V.! Catch the session replay from #AWSreInvent and learn how you can reduce AI development time by up to 60%. ⏱ Hear how two developers were able to ingest, chunk, and vectorize one of the world’s largest and most complex datasets in just three days, which is faster than other solutions. 🚀
AWS re:Invent 2024 - Bring AI to life with DataStax, NVIDIA, and Wikimedia (AIM219)
https://www.youtube.com/
To view or add a comment, sign in
522,623 followers
Impressive strides in AI efficiency! Leveraging Llama 3.3 on Arm Neoverse shows how scalability and performance can go hand in hand, making GenAI more accessible than ever.