Rafay Launches Serverless Inference Offering to Accelerate Enterprise AI Adoption and Boost Revenues for GPU Cloud Providers

Today, Rafay Systems, a leader in cloud-native and AI infrastructure orchestration & management, announced general availability of the company’s Serverless Inference offering, a token-metered AP...

New offering empowers NVIDIA Cloud Partners and GPU Cloud Providers to rapidly launch high-margin AI services on Rafay-powered infrastructure—accelerating time-to-market and maximizing ROI

SUNNYVALE, Calif.: Today, Rafay Systems, a leader in cloud-native and AI infrastructure orchestration & management, announced general availability of the company’s Serverless Inference offering, a token-metered API for running open-source and privately trained or tuned LLMs. Many NVIDIA Cloud Providers (NCPs) and GPU Clouds are already leveraging the Rafay Platform to deliver a multi-tenant, Platform-as-a-Service experience to their customers, complete with self-service consumption of compute and AI applications. These NCPs and GPU Clouds can now deliver Serverless Inference as a turnkey service at no additional cost, enabling their customers to build and scale AI applications fast, without having to deal with the cost and complexity of building automation, governance, and controls for GPU-based infrastructure.

The Global AI inference market is expected to grow to $106 billion in 2025, and $254 billion by 2030. Rafay’s Serverless Inference empowers GPU Cloud Providers (GPU Clouds) and NCPs to tap into the booming GenAI market by eliminating key adoption barriers—automated provisioning and segmentation of complex infrastructure, developer self-service, rapidly launching new GenAI models as a service, generating billing data for on-demand usage, and more.

“Having spent the last year experimenting with GenAI, many enterprises are now focused on building agentic AI applications that augment and enhance their business offerings. The ability to rapidly consume GenAI models through inference endpoints is key to faster development of GenAI capabilities. This is where Rafay’s NCP and GPU Cloud partners have a material advantage,” said Haseeb Budhani, CEO and co-founder of Rafay Systems.

“With our new Serverless Inference offering, available for free to NCPs and GPU Clouds, our customers and partners can now deliver an Amazon Bedrock-like service to their customers, enabling access to the latest GenAI models in a scalable, secure, and cost-effective manner. Developers and enterprises can now integrate GenAI workflows into their applications in minutes, not months, without the pain of infrastructure management. This offering advances our company’s vision to help NCPs and GPU Clouds evolve from operating GPU-as-a-Service businesses to AI-as-a-Service businesses.”

Rafay Pioneers the Shift from GPU-as-a-Service to AI-as-a-Service

By offering Serverless Inference as an on-demand capability to downstream customers, Rafay helps NCPs and GPU Clouds address a key gap in the market. Rafay’s Serverless Inference offering provides the following key capabilities to NCPs and GPU Clouds:

Seamless developer integration: OpenAI-compatible APIs require zero code migration for existing applications, with secure RESTful and streaming-ready endpoints that dramatically accelerate time-to-value for end customers.
Intelligent infrastructure management: Auto-scaling GPU nodes with right-sized model allocation capabilities dynamically optimize resources across multi-tenant and dedicated isolation options, eliminating over-provisioning while maintaining strict performance SLAs.
Built-in metering and billing: Token-based and time-based usage tracking for both input and output provides granular consumption analytics, while integrating with existing billing platforms through comprehensive metering APIs and enabling transparent, consumption-based pricing models.
Enterprise-grade security and governance: Comprehensive protection through HTTPS-only API endpoints, rotating bearer token authentication, detailed access logging, and configurable token quotas per team, business unit, or application satisfy enterprise compliance requirements.
Observability, storage, and performance monitoring: End-to-end visibility with logs and metrics archived in the provider’s own storage namespace, support for backends like MinIO- a high-performance, AWS S3-compatible object storage system, and Weka-a high-performance, AI-native data platform; as well as a centralized credential management ensure complete infrastructure and model performance transparency.

Availability

Rafay’s Serverless Inference offering is available today to all customers and partners using the Rafay Platform to deliver multi-tenant, GPU and CPU based infrastructure. The company is also set to roll out fine-tuning capabilities shortly. These new additions are designed to help NCPs and GPU Clouds rapidly deliver high-margin, production-ready AI services, eradicating complexity.

To read more about the technical aspects of the capabilities, visit the blog.

To learn more about Rafay, visit www.rafay.co and follow Rafay on X and LinkedIn.

About Rafay Systems

Founded in 2017, Rafay is committed to elevating CPU and GPU-based infrastructure to a strategic asset for enterprises and cloud service providers. Enterprises, NVIDIA Cloud Partner, and GPU Clouds leverage the company’s GPU PaaS™ (Platform-as-a-Service) stack to simplify the complexities of managing cloud and on-premises based infrastructure while enabling self-service workflows for platform and DevOps teams–all within one multi-tenant offering. The Rafay Platform also helps companies improve governance capabilities, optimize costs of CPU & GPU resources, and accelerate the delivery of cloud-native and AI-powered applications. Customers such as MoneyGram and Guardant Health entrust Rafay to be the cornerstone of their modern infrastructure strategy and AI architecture. Gartner has recognized Rafay as a Cool Vendor in Container Management. GigaOm named Rafay as a Leader and Outperformer in the GigaOm Radar Report for Managed Kubernetes.

To learn more about Rafay, visit www.rafay.co.

Fonte: Business Wire

Last News

RSA at Cybertech Europe 2024

Alaa Abdul Nabi, Vice President, Sales International at RSA presents the innovations the vendor brings to Cybertech as part of a passwordless vision for…

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

G11 Media's SecurityOpenLab magazine rewards excellence in cybersecurity: the best vendors based on user votes

How Austria is making its AI ecosystem grow

Always keeping an European perspective, Austria has developed a thriving AI ecosystem that now can attract talents and companies from other countries

Sparkle and Telsy test Quantum Key Distribution in practice

Successfully completing a Proof of Concept implementation in Athens, the two Italian companies prove that QKD can be easily implemented also in pre-existing…

G11 Media Networks

InnovationOpenLab is a channel of BitCity, a newspaper registered at the court of Como ,
n. 21/2007 del 11/10/2007- Registration ROC n. 15698

G11 MEDIA S.R.L. Registered office Via NUOVA VALASSINA, 4 22046 MERONE (CO) - P.IVA/C.F.03062910132 Como business register n. 03062910132 - REA n. 293834 CAPITALE SOCIALE Euro 30.000 i.v.

UAE Automotive Spare Parts E-Commerce Market Size, Share, Growth Drivers, Trends, Opportunities, Competitive Landscape & Forecast 2025-2030 - ResearchAndMarkets.com

UKG Agrees to Acquire Inova Payroll

Public Transport Smart Cards Market Analysis Report 2025 Featuring Key Players - Infineon Technologies, NXP Semiconductors, Oberthur Technologies, Giesecke & Devrient, CPI Card - ResearchAndMarkets.com

BitGo Secures OCC Approval to Convert to Federally Chartered National Trust Bank

Vocal Biomarkers Industry Review 2019-2025 and Forecast to 2031 Featuring Strategy Profiles of Beyond Verbal Communication, Sonde Health, IBM, Cogito - ResearchAndMarkets.com

System1 Receives Notice of Non-Compliance with New York Stock Exchange Listing Rules

GMP Cytokine Market and Competition Outlook to 2031: Growing at 8.4% CAGR, Led by Bio-Techne, PeproTech, CellGenix Among Others - ResearchAndMarkets.com

Marvell Technology, Inc. Declares Quarterly Dividend Payment

Rafay Launches Serverless Inference Offering to Accelerate Enterprise AI Adoption and Boost Revenues for GPU Cloud Providers

Related news

Last News

RSA at Cybertech Europe 2024

Italian Security Awards 2024: G11 Media honours the best of Italian cybersecurity

How Austria is making its AI ecosystem grow

Sparkle and Telsy test Quantum Key Distribution in practice

Most read

Integral AI Unveils World’s First AGI-capable Model

Reply Achieves the AWS Agentic AI Specialization and Is Named an Implementation…

Tecnotree Emerges as CX Catalyst Winner for Impact at The Fast Mode Awards…

CoMotion GLOBAL 2025 Launches in Riyadh: Global Mobility Leaders Unite…

G11 Media Networks

Rafay Launches Serverless Inference Offering to Accelerate Enterprise AI Adoption and Boost Revenues for GPU Cloud Providers

Related news

Last News

Most read

Newsletter signup

G11 Media Networks