🚀 Llama 3.1 405B now runs on Cerebras Inference at 969 tok/s, a new world record! Highlights: • 969 tokens/s – this is frontier AI at instant speed • 12x faster than GPT-4o, 18x faster than Claude, 75x faster than AWS • 128K context length with 16-bit weights • Industry leading time-to-first-token: 240ms This year we pushed Llama 3.1 8B and 70B to over 2,000 tokens/s, but frontier models are still stuck at GPU speed. Not anymore. On Cerebras, Llama 3.1 405B now runs at 969 tokens/s—code, reason, and RAG workflows just got 12-18x faster than closed frontier models. Cerebras Inference for Llama 3.1 405B is in customer trials today with general availability coming in Q1 2025, priced at $6/million tokens (input) and $12/million tokens (output). Frontier AI now runs at instant speed on Cerebras. #Llama #Inference #AI Read more here: https://lnkd.in/g-RGjf9Q
Cerebras Systems
Computer Hardware
Sunnyvale, California 42,524 followers
AI insights, faster! We're a computer systems company dedicated to accelerating deep learning.
About us
Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, functional business experts and engineers of all types. We have come together to build a new class of computer to accelerate artificial intelligence work by three orders of magnitude beyond the current state of the art. The CS-3 is the fastest AI computer in existence. It contains a collection of industry firsts, including the Cerebras Wafer Scale Engine (WSE-3). The WSE-3 is the largest chip ever built. It contains 4 trillion transistors and covers more than 46,225 square millimeters of silicon. In artificial intelligence work, large chips process information more quickly producing answers in less time. As a result, models that in the past took months to train, can now train in minutes on the Cerebras CS-3 powered by the WSE-3. Additionally, Cerebras accelerates inference of large models, enabling instant results. Join us: https://cerebras.net/careers/
- Website
-
http://www.cerebras.ai
External link for Cerebras Systems
- Industry
- Computer Hardware
- Company size
- 201-500 employees
- Headquarters
- Sunnyvale, California
- Type
- Privately Held
- Founded
- 2016
- Specialties
- artificial intelligence, deep learning, natural language processing, and inference
Products
Locations
Employees at Cerebras Systems
Updates
-
Cerebras Inference by the numbers: 🔥 Blazing fast 2200 tk/s on Llama 3.3 70B. 🏎️ Record-breaking 969 tk/s on Llama 3.1 405B. 👋 70x faster than GPUs. What will you build in 2025 with the world's fastest Inference? Try it today: chat.cerebras.ai
-
🎉 12 awards in 1 year? What an honor! Thank you Time Inc., Fortune, Forbes, HPCwire for recognizing Cerebras in 2024. Let's build the future of AI together: https://lnkd.in/ge74WNtj
-
Introducing CerebrasCoder! An open-source app that generates websites with Llama3.3-70b created by Steve Krouse from Val Town. 100% free and open-source. Try it now: https://cerebrascoder.com/
-
Cerebras Systems reposted this
a groundbreaking achievement in collaboration with Sandia National Laboratories successfully demonstrating training of a 1 trillion parameter AI model on a single CS-3 system. Trillion parameter models represent the state of the art in today’s LLMs, requiring thousands of GPUs and dozens of hardware experts to perform. By leveraging Cerebras’ Wafer Scale Cluster technology, researchers at Sandia were able to initiate training on a single AI accelerator – a one-of-a-kind achievement for frontier model development. Cerebras Systems #ai #hpc
-
🏎️💨 Speed matters, we are no longer in the dial-up era of AI. "Once we got broadband, all of a sudden, you had new applications, you had streaming, you had all these things that were fun, and the engagement was high, and I think that’s what’s happening right now with AI, is that as you get faster, you move into the sort of the broadband era of AI inference.” - Andrew Feldman Watch the full panel interview from Fortune Brainstorm AI 2024: https://lnkd.in/gWDVX53c
Thanks to Fortune and the organizers of Fortune Brainstorm for inviting me to be on their AI Chips panel. I was honored to be with AMD's CTO Mark Papermaster and Altera's CEO Sandra Rivera. What a pleasure to discuss the industry with such esteemed colleagues. #AI
-
At Cerebras, our mission is to accelerate #AI by making it faster, easier to use, and more energy efficient, which is why we have created a unique research grant for university faculty and researchers. We want you to leverage Cerebras Inference to drive forward new techniques and applications. Learn more and apply: https://lnkd.in/gZccy_VU
-
From our booth, to happy hour with E14 Fund, to Cafe Compute with Greylock and SF Compute, it's been a packed house at #NeurIPS24. Thanks to everyone who stopped by our booth to discuss how we can work together to move the frontier of ML forward!
-
Meet Memo-ry, an instrumental tool to help make everyday activities more attainable for those with memory loss. Created by Fellow Jensen Coonradt, Cerebras Inference powerful speed processes conversations in near real-time to extract actionable tasks. Read more: https://lnkd.in/gHYF4zTt
-
Using a single Cerebras CS-3, researchers at Sandia National Laboratories were able to demonstrate training for a 1 trillion parameter model. We’re proud to see our partners achieve breakthroughs in frontier #AI. 🤝 Read more: https://lnkd.in/g6j9vtd4