Hosting GenAI in the Cloud for RegTech

In his recent letter to shareholders, JP Morgan Chase CEO Jamie Dimon highlighted cloud computing and GenAI as integral components of the firm’s strategy, with 75% of its data and 70% of its applications to be hosted in a combination of public and private clouds this year.

Over the past decade, cloud computing has become mainstream. The economics of hosting data and applications in the cloud have been learned through expensive trial and error. The failures of early ‘lift and shift’ strategies have given way to a ‘cloud-first’ policy for application design and development. This approach leverages microservices and orchestration to take full advantage of the elasticity and automated resource optimisation features offered by modern cloud infrastructure, and has already been adopted by many financial technology suppliers.

So, what should professionals charged with building systems to address regulatory challenges be thinking about as they consider adopting GenAI and LLM technology in their cloud-hosted solutions?

Cloud Workloads for LLMs

Understanding workloads is key to achieving an optimal design for all cloud deployments.

What exactly is a workload, and what are the particular characteristics of GenAI and LLMs that need to be considered for cloud deployment?

In cloud computing, the term ‘workload‘ refers to any set of computational tasks or applications that are managed and run on cloud infrastructure. Workloads encompass the resources and processing needed to run applications, execute data processing tasks, or manage services over cloud-based environments. This definition includes not only the execution of software and applications but also the data they process, the operations they perform, and the resources they consume, such as CPU cycles, memory usage, storage, and networking capabilities.

When a RegTech company considers integrating GenAI or Large Language Models (LLMs) into their product set, particularly for clients like global investment banks and asset managers, workloads must be carefully evaluated to ensure a fit-for-purpose, compliant, fully auditable and economically satisfactory implementation. That is a steep hill to climb.

Large Language Models like ChatGPT from OpenAI and Claude from Anthropic are extremely resource-intensive requiring vast amounts of training data and compute power during the training phase and only slightly less so for the production, or inference phases. Additionally, models in regulated industries need to be constantly monitored for drift and recalibrated to maintain performance.

Training an LLM involves applying a tokenization algorithm to large datasets resulting in billions of tokens or parameters. BloombergGPT – released last year – employed an initial training corpus of 50 billion tokens. Millions of training cycles adjust the model’s parameters (weights) until the output converges to an acceptable level of accuracy. The easiest way to think of model training is “directed trial and error on a brute-force scale.” Specialised hardware – graphic processing units (GPUs) or tensor processing units (TPUs) – are deployed for parallel processing to reduce the training time to an ‘acceptable’ level. Even on the most advanced hardware, training can run into weeks and months before convergence reaches an acceptable level of accuracy.

When an LLM is moved into production, resource requirements shift from training to inference workloads, fulfilling requests from users. During this phase the dynamic capabilities of cloud computing – load balancing across multiple cloud instances, elasticity and dynamic scaling – are available to deliver a performant and economically effective model.

Implementing such a dynamic multi-cloud strategy for LLM workloads involves several challenges:

Orchestrating workloads across multiple clouds adds complexity. Solutions include using multi-cloud management platforms that provide a unified interface for deploying, managing, and monitoring across clouds.

Ensuring interoperability between different cloud environments is crucial. Containerization technologies like Docker and orchestration systems like Kubernetes can help standardize deployments across clouds.

Minimizing data transfer between clouds is essential to constrain costs and latency. Techniques like keeping data processing close to storage and optimizing network routes can help.

Implementing robust, unified security policies across all cloud platforms to protect data and comply with regulations.

By addressing these challenges, organizations can effectively leverage the strengths of various cloud providers to maximize the efficiency and effectiveness of LLM workloads during both training and production phases.?Continuous monitoring and recalibration of production LLMs is essential to detect performance drifts and incorporate new regulatory demands.

Ethics and Explainability

Ethical considerations in LLMs require that the models do not propagate or amplify biases present in training data. This requires careful curation of training datasets and employing techniques such as fairness-aware machine learning algorithms. Moreover, ethical AI practices necessitate transparency about how models are used, particularly in contexts that could significantly impact individuals’ lives, such as financial services. Ongoing ethical reviews and adhering to ethical guidelines proposed by AI research communities and regulatory bodies are essential steps in mitigating risks.

Explainability refers to the ability to trace how a model’s decisions or outputs are derived. For LLMs, this can be challenging due to their complex and often opaque nature. Implementing techniques like feature attribution methods (e.g., LIME, SHAP) helps in understanding the contribution of various inputs to the model’s decision. Auditability involves keeping detailed logs of model training and inference activities to support reviews and compliance checks. This is particularly important in regulated industries, where demonstrating compliance with laws and regulations regarding AI systems is mandatory and varies across jurisdictions. Ensuring that LLMs are both explainable and auditable supports accountability and helps in identifying and correcting any deviations from expected ethical standards.

Deployment Options

A hybrid multi-cloud strategy is a combination of public and private cloud services from different providers to avoid vendor lock-in. This is the preferred approach for most firms. Deciding how workloads are distributed across a hybrid multi-cloud environment is driven by security, regulatory and latency considerations.

Security is paramount to protect sensitive data from unauthorised disclosure and maintain compliance with data protection regulations like GDPR. Private cloud can be implemented on a third-party platform or brought in-house to maintain absolute control. JP Morgan has 32 data centres globally and spent $2 billion on new private cloud data centres in 2021. At the same time, they are targeting up to 30% of their data being deployed on public clouds.

Some Encouraging Developments

While a lot of attention remains focused on LLMs from OpenAI, Anthropic et al, developments are underway that could make the power of GenAI more accessible. ChatGPT and Claude can be thought of as large general-purpose models trained on hundreds of billions or trillions of parameters using publicly available data from the internet.

Smaller models trained on 20 billion parameters or fewer from a highly curated data set have proven to be equally powerful If not better when trained for a specific use case. In February 2023, Meta released LLaMA, which included models of 7 billion and 13 billion parameters. It was very powerful, and it was introduced as open software. Similarly, new cloud services are emerging that offer specialized features. NVIDIA NeMo is one of the latest offering a cloud native development platform with access to NVIDIA’s most powerful GPU technology.

These developments are encouraging for use-case specific RegTech solutions in areas like e-Comms Surveillance or Regulatory Intelligence.? What’s clear, though, is that cloud computing is here to stay and RegTech developments are following a cloud-first policy. While GenAI powered solutions create new workload demands, the pace of innovation shows no signs of slowing down.

Subscribe to our newsletter

Browse by brand

RegTech Insight

TradingTech Insight

Data Management Insight

Browse by content type

A-Team Insight Blogs

Hosting GenAI in the Cloud for RegTech

Share article

Related content

WEBINAR

Recorded Webinar: Best practices for regulatory reporting

BLOG

Managing Cognitive Dissonance in Regulatory Compliance with Corlytics

EVENT

Data Management Summit New York City

GUIDE

Solvency II Data Management Handbook

Share on Mastodon

A-Team Insight Blogs

Hosting GenAI in the Cloud for RegTech

Share article

Related content

webinars

Recorded Webinar: Best practices for regulatory reporting

Related content

WEBINAR

Recorded Webinar: Best practices for regulatory reporting

BLOG

Managing Cognitive Dissonance in Regulatory Compliance with Corlytics

EVENT

Data Management Summit New York City

GUIDE

Solvency II Data Management Handbook