Meta Collaborates with Cerebras to Drive Fast Inference for Developers in New Llama API

Meta has teamed up with Cerebras to offer ultra-fast inference in its new Llama API, bringing together the world’s most popular open-source models, Llama, with the world’s fastest inference techno...

Business Wire

SUNNYVALE, Calif.: Meta has teamed up with Cerebras to offer ultra-fast inference in its new Llama API, bringing together the world’s most popular open-source models, Llama, with the world’s fastest inference technology, delivered by Cerebras. This new platform unlocks groundbreaking possibilities for a massive developer audience.

Developers building on the Llama 4 Cerebras model in the API can expect generation speeds up to 18 times faster than traditional GPU-based solutions. This acceleration unlocks an entirely new generation of applications that are impossible to build on other technology. Real-time agents, conversational low latency voice, interactive code generation, and instant multi-step reasoning — all of which require chaining multiple LLM calls — can now be completed in seconds rather than minutes.

By partnering with Meta to serve Llama models from Meta’s new API service, Cerebras gains exposure to an expanded global developer audience and deepens its business and partnership with Meta and their incredible teams.

Since launching its inference solutions in 2024, Cerebras has delivered the world’s fastest Llama inference, serving billions of tokens through its own AI infrastructure. The broad developer community now has direct access to a robust, OpenAI-class alternative for building intelligent, real-time systems — backed by Cerebras speed and scale.

“Cerebras is proud to make Llama API the fastest inference API in the world,” said Andrew Feldman, CEO and co-founder of Cerebras. “Developers building agentic and real-time apps need speed. With Cerebras on Llama API, they can build AI systems that are fundamentally out of reach for leading GPU-based inference clouds.”

Cerebras is the fastest AI inference solution as measured by third party benchmarking site Artificial Analysis, reaching over 2,600 tokens/sec for Llama 4 Scout compared to ChatGPT at ~130 tokens/sec and DeepSeek at ~25 tokens/sec.

Developers will be able to access the fastest Llama 4 inference by selecting Cerebras from the model options within the Llama API. This streamlined experience will make it easy to prototype, build, and scale real-time AI applications. To sign up for early access to the Llama API and to experience Cerebras speed today, visit www.cerebras.ai.

About Cerebras Systems

Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world’s largest and fastest commercially available AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on-premises. For further information, visit cerebras.ai or follow us on LinkedIn, X and/or Threads.

Fonte: Business Wire

Last News

RSA at Cybertech Europe 2024

Alaa Abdul Nabi, Vice President, Sales International at RSA presents the innovations the vendor brings to Cybertech as part of a passwordless vision for…

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

G11 Media's SecurityOpenLab magazine rewards excellence in cybersecurity: the best vendors based on user votes

How Austria is making its AI ecosystem grow

Always keeping an European perspective, Austria has developed a thriving AI ecosystem that now can attract talents and companies from other countries

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

Meta Collaborates with Cerebras to Drive Fast Inference for Developers in New Llama API

Related news

Tri Counties Bank Announces Business and Commercial Banking Expansion

Keysight and NIO Pioneer the Next Generation of Smart Electric Vehicles

MangoBoost Sets New Benchmark for Multi-Node LLM Training on AMD GPUs in MLPerf Training v5.0

Voltage Park Addresses Kubernetes Complexity for AI Developers with New Managed Offering

Real NFL Data Powers EA SPORTS™ Madden NFL 26 to New Heights Featuring All New QB DNA and Coach DNA

Jopari Solutions and Waystar Partner to Deliver Seamless eBilling Solution for Providers Nationwide

The Redesign Group Named to CRN Solution Provider 500 List for 2025

RSA Announces New Windows Desktop Login and Entra ID Passwordless Solutions

Last News

RSA at Cybertech Europe 2024

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

How Austria is making its AI ecosystem grow

Sparkle and Telsy test Quantum Key Distribution in practice

Most read

Fannie Mae, FHFA, and Palantir Join Forces to Combat Mortgage Fraud—FundingShield…

IPinfo Launches IPinfo Core: City-Level Precision and Privacy Clarity

APFC Board Examines Asset Allocation, Approves Targeted Portfolio Adjustments

EY US names Rod Larson of Oceaneering as a Finalist for Entrepreneur Of…

G11 Media Networks

Meta Collaborates with Cerebras to Drive Fast Inference for Developers in New Llama API

Related news

Last News

Most read

Newsletter signup

G11 Media Networks