Skip to content

Latest commit

 

History

History
58 lines (34 loc) · 3.95 KB

File metadata and controls

58 lines (34 loc) · 3.95 KB

Scenario 3: Azure API Management - Generative AI resources as backend

This reference implementation demonstrates how to provision and interact with Generative AI resources through API Management. The implementation is on top of the APIM baseline and additionally includes private deployments of Azure OpenAI endpoints, and the policies for the following capabilities that are specifically tailored for GenAI use cases.

By the end of this deployment guide, you would have deployed private Azure OpenAI endpoints and an opinionated set of policies in APIM to manage traffic to these endpoints. You can then test the policies by sending requests to the APIM gateway, and can modify either to include the policy fragments listed here or to include your own custom policies.

Architecture

Architectural diagram showing an Azure API Management deployment in a virtual network with AOAI as backend.

Core components

  • Azure OpenAI endpoints
  • Azure Event Hub
  • Azure Private Endpoint
  • Azure Private DNS Zones

GenAI Gateway capabilities

GenAI capabilities

Deploy the reference implementation

This reference implementation is provided with the following infrastructure as code options. Select the deployment guide you are interested in. They both deploy the same implementation.

▶️ Bicep-based deployment guide ▶️ Terraform-based deployment guide

GenAI Gateway

A "GenAI Gateway" serves as an intelligent interface/middleware that dynamically balances incoming traffic across backend resources to achieve optimizing resource utilization. In addition to load balancing, GenAI Gateway can be equipped with extra capabilities to address the challenges around billing, monitoring etc.

To read more about considerations when implementing a GenAI Gateway, see this article.

This accelerator contains APIM policies showing how to implement different GenAI Gateway capabilities in APIM, along with code to enable you to deploy the policies and see them in action.

Scenarios handled by this accelerator

This repo currently contains the policies showing how to implement these GenAI Gateway capabilities:

Capability Description
Load balancing (round-robin) Load balance traffic across PAYG endpoints using simple and weighted round-robin algorithm.
Managing spikes with PAYG Manage spikes in traffic by routing traffic to PAYG endpoints when a PTU is out of capacity.
Adaptive rate limiting Dynamically adjust rate-limits applied to different workloads
Tracking token usage Record the token consumption for usage tracking and attribution

Test/Demo setup

If you are looking for a quick way to test or demo these capabilities with a minimalistic non production like APIM setup against a Azure OpenAI simulator, check out this repository.

▶️ APIM GenAI Gateway Toolkit

AI Hub Gateway capabilities

Looking for comprehensive reference implementation to provision your AI Hub Gateway? Check out AI Hub Gateway scenario.

▶️ AI Hub Gateway