Generative AI Poised for Leading Role as Regulatory Data Burden Grows

Amidst the hype around Generative AI (GenAI) and Large Language Models (LLMs), practitioners are beginning to realise that these emerging technologies can make a positive impact on the collection and validation of regulatory data.

The categories and scope of regulatory data requirements have expanded considerably in response to rapid market developments and growing regulatory scrutiny. It is no longer enough to present a set of numbers. Regulators want to know the origins of the underlying data, how that data was selected, how it was vetted and the lineage back through transformations to a certified provisioning point or other auditable record. Above all, regulators want to see evidence of a robust, principles-based approach to regulatory data management.

Faced with this increasingly onerous regulatory data environment, data managers are assessing how the new breed of AI technologies can make a difference.

It’s worth considering the variation of data types that can be drawn upon to fulfill financial institutions’ regulatory reporting responsibilities. Regulatory data can be considered as any data that is reported to or contributes via transformation(s) to the information disclosed in regulatory filings. This boils down to a number of categories:

Trade Data

Trade reporting involves providing detailed information on trading activities across the financial markets, such as equities, derivatives, fixed income securities and newer alternative and crypto markets. This data ensures transparency and helps regulators monitor market activity. Under MiFID II, ESMA requires firms to report trades to Approved Reporting Mechanisms (ARMs) within a specified timeframe. In the U.S., the SEC’s Consolidated Audit Trail (CAT) requires broker-dealers to report comprehensive trade data to facilitate market oversight and analysis. Similarly, the U.S. Commodity Futures Trading Commission (CFTC) requires firms to report swap transactions to Swap Data Repositories (SDRs), ensuring transparency in the derivatives market.

Audit Data

Audit trails are comprehensive logs that provide a traceable history of transaction status changes and changes to data, ensuring accountability and transparency. These logs are essential for regulatory investigations and compliance verification. The SEC mandates that firms maintain detailed audit trails for all transactions as part of their recordkeeping requirements. Similarly, the CFTC requires firms to maintain audit trails for all futures and options trades to ensure transparency and compliance with regulatory standards.

Customer Data

Know Your Customer (KYC) data involves collecting and verifying the identity of clients to prevent money laundering, terrorist financing, and other financial crimes. This includes personal identification information, financial status, and transaction history. The Financial Conduct Authority (FCA) mandates strict KYC procedures as part of its Anti-Money Laundering (AML) regulations. In the U.S., the Securities and Exchange Commission (SEC) requires broker-dealers to adhere to the Customer Identification Program (CIP) rules under the Patriot Act, ensuring proper identification and verification of their clients.

Risk Data

Risk data encompasses information related to an institution’s exposure to various types of risk, such as credit, market, operational, and liquidity risks. Regulators use this data to assess the resilience of financial institutions and the broader financial system. The Basel Committee on Banking Supervision (BCBS) outlines principles for effective risk data aggregation and reporting in BCBS 239. Originally targeted for 2016, the fact that the market has yet to fully comply with BCBS 239 underpins the data challenges that remain. But more on BCBS 239 later.

Compliance Data

Compliance data includes records that demonstrate an institution’s adherence to regulatory requirements, such as AML measures, sanctions compliance, and tax reporting. This data ensures that institutions are operating within the legal and regulatory frameworks set by authorities. For example, FINRA in the U.S. requires firms to maintain comprehensive records of their compliance activities and report any suspicious activities through Suspicious Activity Reports (SARs). Similarly, the ESMA requires investment firms to comply with the Market Abuse Regulation (MAR) by reporting any instances of market manipulation or insider trading.

Operational data

Operational data pertains to information about an institution’s internal processes, governance structures, and internal audits. This data helps regulators assess the effectiveness of an institution’s internal controls and governance. The FCA’s Senior Managers and Certification Regime (SM&CR) requires firms to maintain detailed records of their governance structures and the roles and responsibilities of senior managers. The Federal Reserve also mandates that banks submit reports on their operational risk management and internal control systems as part of their regulatory filings.

Performance data

Performance data includes financial metrics and reports such as profit and loss statements, balance sheets, and capital adequacy ratios. This data is crucial for assessing the financial health and stability of institutions. The SEC requires publicly traded companies to submit quarterly and annual financial statements as part of their regulatory filings. In Singapore, the MAS mandates that banks provide regular updates on their financial performance, including capital adequacy and liquidity coverage ratios, to ensure they maintain sufficient capital buffers.

Incident Reports

Incident reporting includes information on any incidents or breaches, such as cybersecurity incidents, fraud, or operational failures. This data is critical for regulators to understand the impact of such events and to take appropriate action. The UK Financial Conduct Authority (FCA) requires firms to report significant operational incidents, including IT failures, under its Incident Reporting Rules. FINRA requires firms to file suspicious activity reports (SARs) when incidents of financial crimes like money laundering or insider trading are suspected. The Monetary Authority of Singapore (MAS) also has stringent requirements for reporting cybersecurity incidents, ensuring that financial institutions promptly notify the regulator of any significant breaches.

Marketing, Corporate and Communications Data

Information communicated in advertising, promotional and marketing materials is subject to regulatory oversight to ensure that the information is not misleading or makes unrealistic promises or guarantees. The company’s annual report, 10K and quarterly filings are all subject to regulatory scrutiny. This category of regulatory data will include a considerable amount of text-based information including statements by the officers and board, auditors reports and statement of financial condition.

Clearly, this set of data types represents a wide spectrum of characteristics that needs to be embraced by any regulatory data management approach. Practitioners are recognising that GenAI and LLMs can be deployed differently from traditional regression, clustering and early NLP models, allowing them to address the entire regulatory data spectrum.

AI for Regulatory Data Management

Whilst AI has been used in some way at each stage of the regulatory data life cycle from sourcing and collection through transformation, reporting and archival, Generative AI (GenAI) and Large Language Models (LLMs), represent significant advancements over previous generations of AI, offering enhanced capabilities in several key areas.

These advancements are already delivering substantial performance improvements in several regulatory data use cases by providing more sophisticated, context-aware, and efficient solutions. The main capabilities of GenAI and LLMs that set them apart from their predecessors are contextual understanding, normalisation and transformation and, lineage and transparency.

Contextual Understanding

Unlike earlier AI models, which often struggled with context and nuance, GenAI and LLMs excel in understanding and generating content based on context. This ability allows them to perform complex tasks such as analysing and summarizing vast quantities of text-based information, generating coherent narratives, and understanding nuanced queries.

This deep contextual understanding is a crucial step up in capability for applications like natural language processing (NLP) and conversational AI, where understanding the subtleties of language is essential.

Firms are already seeing substantial improvements in productivity and efficiency in text use cases like scanning for and interpreting regulatory changes and text-based information for streamlining KYC, Onboarding and scanning for AML violations and exposure to Politically Exposed Persons (PEPs).

Other use cases are content oriented with GenAI and LLM’s ability to generate well formatted ‘boiler plate’ regulatory narratives for Suspicious Activity Reports (SARs), 10Qs etc. Another text-based use case is horizon scanning for changes in regulatory text, translating and interpreting their impact and highlighting any required policy. AI-powered language translation has reached a level of accuracy for compliance demands with firms seeing dramatic improvements in accuracy and productivity.

Normalization and Transformation

GenAI and LLMs bring significant improvements in data normalization and transformation processes. They can be trained to accurately map and convert data between different formats, ensuring consistency and integrity across diverse datasets. This capability is essential for applications requiring standardized and aggregated data for regulatory compliance and analysis.

Lineage and Transparency

Maintaining clear and accurate data lineage is crucial for regulatory compliance and data governance. GenAI and LLMs provide robust capabilities for tracking and documenting the history of data transformations and movements. This transparency ensures that organizations can demonstrate compliance and maintain high standards of data governance.

GenAI and LLMs can offer substantial improvements over previous AI generations by providing enhanced contextual understanding, efficient data processing, improved accuracy, advanced data normalization, and comprehensive data lineage capabilities. But these improvements come at a cost. GenAI and LLMs are resource intensive and should be deployed carefully. Use cases that reduce repetitive manual efforts such as reviewing large volumes of text or, where sampling methods can be replaced by comprehensive scans are ripe for GenAI.

AI-Enabled BCBS 239

The Basel Committee on Banking Supervision (BCBS) has outlined principles for effective risk data aggregation and risk reporting – see BCBS 239. These principles, while initially focused on risk data, can be adapted to apply more broadly to all regulatory data. Additionally, GenAI technologies can accelerate compliance with these principles by enhancing data quality, accuracy, and management.

Principle 1: Governance

Strong governance frameworks should be established for all types of regulatory data, not just risk data. This includes setting clear policies, procedures, and accountability for data management across the organization.

GenAI can enhance governance by automating the documentation of data governance policies, ensuring consistent application across different types of data. GenAI can also assist in monitoring compliance with these policies in real-time, providing alerts and recommendations when deviations occur.

Principle 2: Data Architecture and IT Infrastructure

Data architecture and IT infrastructure should support comprehensive data management capabilities, ensuring that all regulatory data is accurately captured, stored, and processed, even during times of stress or crisis.

GenAI can help design and maintain a robust data architecture by optimizing data storage and retrieval processes, reducing redundancies, and enhancing data integration from multiple sources. This ensures data availability and reliability across different regulatory requirements. BCG highlights that AI can bring efficiency and accuracy to data management tasks that traditionally required significant manual effort – see The Solution to Data Management’s GenAI Problem? More GenAI.

Principle 3: Accuracy and Integrity

All regulatory data should be accurate and reliable, pre-processed and/or aggregated largely through automated processes to minimize errors. This principle ensures that data used for compliance and reporting is dependable.

GenAI agents can automate data validation and error correction, significantly enhancing the accuracy and integrity of regulatory data. These models can detect anomalies and inconsistencies in real-time, reducing the likelihood of errors in compliance reporting. GenAI can streamline data cleaning processes by generating code for parsing, formatting, and identifying data quality issues.

Principle 4: Completeness

Regulatory data management should capture and aggregate all material data across the organization. This includes data from various business lines, legal entities, and other relevant groupings to identify and report exposures, concentrations, and emerging risks.

GenAI can enhance data completeness by automating the aggregation of data from diverse sources, ensuring no critical information is omitted. These technologies can continuously monitor data inputs to verify that all necessary data is captured and integrated accurately.

Principle 5: Data Lineage and Transparency

Maintaining clear and accurate data lineage is essential for all regulatory data, enabling organizations to trace the origins, movements, and transformations of data throughout its lifecycle.

GenAI can generate detailed data lineage reports automatically, providing transparency and traceability of data processes. These reports help organizations demonstrate compliance with data governance standards to regulators. The ability to document and audit data transformations effectively ensures that organizations can respond to regulatory inquiries with confidence and clarity.

Back to Principles

The industry has struggled to fully implement BCBS 239 and we’re well past the original 2016 target date. But, GenAI and LLM technologies offer real potential for the industry to make significant progress on BCBS compliance and regulatory data management in general.

By leveraging these technologies, organizations can build robust data management practices that can keep pace with and anticipate changes in regulatory requirements and improve overall operational efficiency. Best-practices for standards follow a principles-based approach and so should regulatory data management if GenAI is to deliver on its potential.

Subscribe to our newsletter

Browse by brand

RegTech Insight

TradingTech Insight

Data Management Insight

Browse by content type

A-Team Insight Blogs

Generative AI Poised for Leading Role as Regulatory Data Burden Grows

Share article

Related content

WEBINAR

Recorded Webinar: How to leverage Generative AI and Large Language Models for regulatory compliance

BLOG

A-Team Webinar: Best Practices in Regulatory Reporting – Data Quality, Standards and Stakeholder Communications

EVENT

Data Management Summit New York City

GUIDE

BCBS 239 Data Management Handbook

Share on Mastodon

A-Team Insight Blogs

Generative AI Poised for Leading Role as Regulatory Data Burden Grows

Share article

Related content

webinars

Upcoming Webinar: Addressing conduct risk: approaches to surveillance

Related content

WEBINAR

Recorded Webinar: How to leverage Generative AI and Large Language Models for regulatory compliance

BLOG

A-Team Webinar: Best Practices in Regulatory Reporting – Data Quality, Standards and Stakeholder Communications

EVENT

Data Management Summit New York City

GUIDE

BCBS 239 Data Management Handbook