Costs, training and support
To round off this chapter on deploying ChatGPT in the cloud with architecture design and scaling strategies, three additional areas are associated with a scaled enterprise service: costs, training and support.
Costs
Throughout this chapter, we discussed many services for a robust, enterprise-ready cloud ChatGPT service. While we wanted to focus on technical aspects of architecture design and scaling strategies, the topic of costs will (and should) be discussed, a critical factor from an ROI perspective that executives invariably weigh. Recognizing its significance, this section is dedicated to understanding the various elements that influence costs, alongside discussing strategies for cost optimization across different architectural layers – namely, the Model, Data, Application, and Infrastructure Layers.
There are variations in costs and these costs also change over time for any service. That is the nature of any business, not only a technology-based solution such as ChatGPT. We won’t list exact pricing here as it will have already changed once this book has been published, if not sooner! Instead, we wanted to mention some of the categories to consider when pricing the solution. This varies by vendor, how large or small your enterprise solution is, and a dozen other factors.
You must understand that there is not only the pricing of the GenAI/LLM models themselves to consider, each with its versions and types, but also how quickly you want those processed and also cost varies depending on the cost model – Pay-As-You-Go or PTU, as we described when we covered the TPMs and PTUs topic earlier this chapter.
Of course, there is the cost of any ancillary services to support your enterprise-ready GenAI deployment, and the costs of training and support, as described earlier in this section, as well as the cost of staff who design, deploy, manage, and operate the robust enterprise cloud solution.
Below, we list cost considerations and some optimization best practices to help lower cost or reduce resources:
Model and Data Layer
- Model selection: Choose a pre-trained model that closely aligns with your task requirements. This can reduce the need for extensive fine-tuning and data collection, saving time and resources. Use popular benchmarks discussed in Chapter 3 to shortlist your models for a particular task. Consider small language models and open source models for low-impact, internal (non-client) facing applications and batch tasks to reduce costs where quality and performance is not of the highest importance.
- Data efficiency: Utilize data augmentation techniques to create more training data from your existing dataset. This can help you achieve better results with less data, reducing storage and processing costs. Textbook quality data can help you achieve more high performing models with less tokens. For example, Phi-2 a 2.7B parameter model was created using textbook quality synthetic datasets. It outperforms models 25x its size on complex benchmarks.
- Early stopping: Implement early stopping during training to prevent overfitting and reduce training time. This helps you find a good model without wasting resources on unnecessary iterations.
- Model optimization: Prune or quantize your model to reduce its size and computational requirements. This can lead to faster training and inference, lowering cloud costs. Model quantization leads to reduced memory, faster computation, energy efficiency, network efficiency and hence leading to reduced costs.