Cerebras Triples its Industry-Leading Inference Performance, Setting New All Time Record

Today, Cerebras Systems, the pioneer in high performance AI compute, smashed its previous industry record for inference, delivering 2,100 tokens/second performance on Llama 3.2 70B. This is 16x faster...

Business Wire

Cerebras Inference delivers 2,100 tokens/second for Llama 3.2B 70B -- 16X performance of the fastest GPUs and 68x faster than hyperscale clouds

SUNNYVALE, Calif.: Today, Cerebras Systems, the pioneer in high performance AI compute, smashed its previous industry record for inference, delivering 2,100 tokens/second performance on Llama 3.2 70B. This is 16x faster than any known GPU solution and 68x faster than hyperscale clouds as measured by Artificial Analysis, a third-party benchmarking organization. Moreover, Cerebras Inference serves Llama 70B more than 8x faster than GPUs serve Llama 3B, delivering an aggregate 184x advantage (8x faster on models 23 x larger). By providing Instant Inference for large models, Cerebras is unlocking new AI use cases powered by real-time, higher quality responses, chain of thought reasoning, more interactions and higher user engagement.

“The world’s fastest AI inference just got faster. It takes graphics processing units an entirely new hardware generation -- two to three years- - to triple their performance. We just did it in a single software release,” said Andrew Feldman, CEO and co-founder, Cerebras. “Early adopters and AI developers are creating powerful AI use cases that were impossible to build on GPU-based solutions. Cerebras Inference is providing a new compute foundation for the next era of AI innovation.”

From global pharmaceutical giants like GlaxoSmithKline (GSK), to pioneering startups like Audivi, Tavus, Vellum and LiveKit, Cerebras is eliminating AI application latency with 60x speed-ups:

GSK: “With Cerebras’ inference speed, GSK is developing innovative AI applications, such as intelligent research agents, that will fundamentally improve the productivity of our researchers and drug discovery process,” said Kim Branson, SVP of AI and ML, GSK.
LiveKit: “When building voice AI, inference is the slowest stage in your pipeline. With Cerebras Inference, it’s now the fastest. A full pass through a pipeline consisting of cloud-based speech-to-text, 70B-parameter inference using Cerebras Inference, and text-to-speech, runs faster than just inference alone on other providers. This is a game changer for developers building voice AI that can respond with human-level speed and accuracy,” said Russ d’Sa, CEO of LiveKit.
Audivi AI: "For real-time voice interactions, every millisecond counts in creating a seamless, human-like experience. Cerebras’ fast inference capabilities empower us to deliver instant voice interactions to our customers, driving higher engagement and expected ROI,” said Seth Siegel, CEO of Audivi AI.
Tavus: “We migrated from a leading GPU solution to Cerebras and reduced our end-user latency by 75%,” said Hassan Raza, CEO of Tavus.
Vellum: “Our customers are blown away with the results! Time to completion on Cerebras is hands down faster than any other inference provider and I’m excited to see the production applications we’ll power via the Cerebras inference platform,” Akash Sharma, CEO of Vellum.

Cerebras is gathering the llama community in llamapalooza NYC, a developer event that will feature talks from meta, Hugging Face, LiveKit, Vellum, LaunchDarkly, Val.town, Haize Labs, Crew AI, Cloudflare, South Park Commons, and Slingshot.

Cerebras Inference is powered by the Cerebras CS-3 system and its industry-leading AI processor, the Wafer Scale Engine 3 (WSE-3). Unlike graphic processing units that force customers to make trade-offs between speed and capacity, the CS-3 delivers best in class per-user performance while delivering high throughput. The massive size of the WSE-3 enables many concurrent users to benefit from blistering speed. With 7,000x more memory bandwidth than the Nvidia H100, the WSE-3 solves Generative AI’s fundamental technical challenge: memory bandwidth. Developers can easily access the Cerebras Inference API, which is fully compatible with the OpenAI Chat Completions API, making migration seamless with just a few lines of code.

Cerebras Inference is available now, at a fraction of the cost of hyperscale and GPU clouds. Try Cerebras Inference today: www.cerebras.ai.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world’s largest and fastest AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference, powered by Wafer-Scale Engine 3, delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on premise. For further information, visit www.cerebras.ai or follow us on LinkedIn or X.

Fonte: Business Wire

Last News

RSA at Cybertech Europe 2024

Alaa Abdul Nabi, Vice President, Sales International at RSA presents the innovations the vendor brings to Cybertech as part of a passwordless vision for…

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

G11 Media's SecurityOpenLab magazine rewards excellence in cybersecurity: the best vendors based on user votes

How Austria is making its AI ecosystem grow

Always keeping an European perspective, Austria has developed a thriving AI ecosystem that now can attract talents and companies from other countries

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

Cerebras Triples its Industry-Leading Inference Performance, Setting New All Time Record

Related news

eGrowcery and Red Pepper Digital Announce Partnership to Deepen Retail Customer Engagement

CNM LLP Named One of the 2025 Best Places to Work in Orange County for the Sixth Time

Cadence Introduces Industry-First LPDDR6/5X 14.4Gbps Memory IP to Power Next-Generation AI Infrastructure

JEDEC Releases New LPDDR6 Standard to Enhance Mobile and AI Memory Performance

Sysdig Launches Open Source Community to Unite and Empower Millions of Cloud Security Innovators and Builders of All Levels

Vanta Named a Leader in 2025 IDC MarketScape for Worldwide Governance, Risk, and Compliance Software

Cyberstarts Launches $300 Million Employee Liquidity Fund to Power the Next Stage of Cybersecurity Startup Growth

AIRIA Announces $1.8 Million AFWERX Contract

Last News

RSA at Cybertech Europe 2024

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

How Austria is making its AI ecosystem grow

Sparkle and Telsy test Quantum Key Distribution in practice

Most read

NiCE Unveils 2025 International CX Excellence Award Winners, Spotlighting…

Roblox to Report Second Quarter 2025 Financial Results on July 31, 2025

LambdaTest Announces Deeper Collaboration with Appium as Strategic Partnership

ServiceNow to Announce Second Quarter 2025 Financial Results on July 23

G11 Media Networks

Cerebras Triples its Industry-Leading Inference Performance, Setting New All Time Record

Related news

Last News

Most read

Newsletter signup

G11 Media Networks