Scaling Design patterns
One area we haven’t covered yet is how these multiple TPMs or PTU-based Azure OpenAI accounts can work in unison. That is, once you have set up multiple AOAI accounts, how would you send prompts to each? Or, if you are sending too many prompts at once, how can you manage the error/response codes?
The answer is by using the Azure APIM service. APIs form the basis of an APIM service instance. Each API consists of a group of operations that app developers can use. Each API has a link to the backend service that provides the API, and its operations correspond to backend operations. Operations in APIM have many configuration options, with control over URL mapping, query and path parameters, request and response content, and operation response caching. We won’t cover these additional features, such as URL mapping and response caching, in this book, but you can read more about APIM in the reference link at the end of this chapter.
Using APIM is yet another way to help organizations scale up to meet business and user requirements.
For example, you can also create a “spillover” scenario, where you may be sending prompts to PTUs that have been enabled for deploying an AOAI account. Then, if you exceed PTU limits, you can spill over to a TPM-enabled AOAI account that is used in the pay-as-you-go model.
The following figure shows the basic setup, but this architecture can scale and also include many other Azure cloud resources. However, for simplicity and focus, only the relevant services are depicted here:

Figure 7.5 – AOAI and APIM in a single Azure region
As described in the single region scenario, you can use APIM to queue and send prompts to any AOAI endpoint, so long as those endpoints can be reached. In a multi-region example, as shown in the following figure, we have two AOAI accounts in one region (one PTU and another TPM), and then a third Azure OpenAI account in another Azure region.
Thus, a single APIM service can easily scale and support many AOAI accounts, even across multiple regions, as described here:

Figure 7.6 – Multi-region AOAI deployment using a single APIM service
As you can see, a single APIM service can serve multiple AOAI accounts, both in the same Azure region and also in multiple regions.
As we continue our “scaling” journey, it is a good time to mention that APIM has three production-level tiers: Basic, Standard, and Premium. With the Premium tier, you can use a single APIM instance in as many Azure regions as you need, so long as APIM can access the AOAI endpoint in the other region(s). When you make an APIM service, the instance has only one unit in a single Azure region (the main region). What does this provide? If you have a multi-regional Azure OpenAI deployment, does this mean you are required to also have a multi-region (Premium) SKU of APIM? No, not necessarily. As shown in the preceding multi-region architecture, a single APIM service instance can support multi-region, multi-AOAI accounts. Having a single APIM service makes sense when an application using the service is in the same region and you do not need disaster recovery (DR).
However, as this chapter is about scaling at an enterprise level, we recommend multiple APIM service accounts to cover the DR scenario using the APIM Premium SKU.
The Premium SKU allows you to have one region be the primary and any number of regions as secondaries. In this case, you can use a secondary, or multiple secondaries, in different scenarios – for example, if you are planning for any DR scenarios, which is always recommended for any enterprise architecture. Note that your enterprise applications should also be designed for data resiliency using DR strategies. Another example is if you are monitoring the APIM services. If you are seeing extremely heavy usage and can scale out your application(s) across regions, then you may want to deploy APIM service instances across multiple regions.
For more information on how to deploy an APIM service instance to multiple Azure regions, please see How to deploy an Azure API Management service instance to multiple Azure regions: https:// https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-deploy-multi-region