Primary tabs

Association of Financial Markets in Europe

Many of our Member firms do not use Machine Learning (ML) directly for IRB regulatory RWA calculations.
For those firms that use Machine Learning in an IRB context, examples of use cases provided include, applying a Machine Learning based component in an IRBA-approved rating system for corporate counterparties and the use of a Machine Learning-based sub-model to assess the textual parts of annual reports within the rating methodology applied to Large Corporate clients.
Furthermore ML models are often used for variable selection and for benchmarking purposes (both by development and validation functions).
A few firms also use ML-based algorithms to support or validate certain steps in the model development phase within the IRB context. As an example, an organization used a model based on a convolution neural network (CNN) as a challenger model to a recently-developed local retail unsecured probability of defualt (PD) IRB model.
The main reason cited by Member firms for not using ML-models for IRB purposes extensively, primarily lies in the difficulty to demonstrate adherence with strict, applicable regulatory requirements. For instance, where model re-development plans have already been agreed with competent authorities and the introduction of new ML-models may not be compatible with existing plans. Supervisors also often require humans in the loop when applying the models in production, which limits leveraging the full potential of ML e.g., with respect to automation.
As outlined in Sections 3.1 and 4.1 of the discussion paper, there is uncertainty on how models could comply with existing regulations. In the absence of certainty that ML would be accepted, it is a questionable option for resource commitment. Without clear criteria for acceptance or precedents, it would be unlikely to be a feasible option. For example, it is unclear if ML would satisfy all the criteria set out in specific EU regulatory requirements such as the EBA Guidelines etc
A more limited constraint in using ML in the context of IRB, is data availability. Within Wholesale credit, there are few default observations, making it challenging to apply ML in this context. Although noteworthy that data availability might be a problem in some context, this challenge is not specific to the use of ML and should not forestall the opportunities that exist in using ML within the IRB context.
We would like to emphasize that the reasons for not calculating parameters using ML are current realities which could change in the future.
ML-based algorithms are used in the estimation of the parameters for the probability of default (PD), the loss given default (LGD), the exposure at default (EAD) and the credit conversion factor CCF.
Specifically, some instances of use of ML algorithms include:
• Risk differentiation
• Decision on number of grades.
The use of ML is still very limited as explained in Q1. Member firms mainly use or would use ML for PD (ratings). However, ML offers opportunities to improve the estimation of any of these parameters. We do believe that there are opportunities in using ML for:
(1) Incorporating alternative information sources like textual data
(2) Risk driver selection;
(3) Imputation of missing variables;
(4) Simulation of different combinations to select the most suitable tree;
(5) Data quality control;
(6) Information processing;
(7) Challenger models (performance challenge of the algorithms of traditional techniques).
Ideally if there was some regulatory assurance, firms would consider the use of ML-based algorithms for the following purposes:
• dimensionality reduction;
• interaction detection – ML based-algorithms (e.g. decision trees) are used to understand features and the differentiation between data points to assist with explainability;
• feature generation –ML is used to create transformations of raw data;
• residual modelling – upon completion of model selection, ML is used to conclude whether any residual structured information remains in the data to further improve model performance;
• issuing data imputation;
• text classification - (re)classification of issues and findings raised on models and rating systems in Risk;
We welcome some clarity from the Regulators on the requirements or regulatory constraints(if any) that may apply in the afore-mentioned scenarios.
The use of ML models in the IRB context whilst still developing, has been applied in the context of model benchmarking the last few years in order to challenge the expert-driven champion models.
Examples of ML models and algorithms used include: Gradient Boosting Machine; Random Forest; Decision Tree; One-layer Near Neighbour (NN); Unsupervised Clustering (KNN); Natural Language Processing with the Latent Dirichlet Allocation.
Other types of ML algorithms under consideration for use include:
• random forest;
• stochastic gradient boosted Trees or Decision Trees from;
• (linear) support Vector Machine;
• probabilistic neural networks;
This is not a definitive list of ML algorithms that would be useful in the future, and the uses will evolve over time.
Some, our Member firms do not currently use unstructured/semi-structured data for development in IRB ML models. However, one institution reported the use of textual data from annual reports in a ML-based component of an IRBA-approved rating system for corporate counterparties.
Some of our Members acknowledge the potential of a variety of data sources that can be used in ML to explain customers’ behaviours. Examples include customer comments in collections, email addresses, text files (e.g. comments, email messages), audio, video, images, financial reporting and log files. For this reason, unstructured data should be considered as an option as it may provide value to the model.
ML can also be used to classify comments into categories in Validation.
With regards to ensuring adequate data quality, unstructured data can be cleaned to adhere to controls in such a way that the resulting data meets the quality and completeness requirements of existing regulatory requirements in IRB. These controls are part of the monitoring process of the models and they are performed periodically to mitigate any possible risks.
To ensure data quality, some firms apply minimum standards that constitute a set of data quality tests covering various dimensions irrespective of the model technique/approach and can be applied across various model categories (e.g. provision or for capital modelling purposes).
In order to mitigate the risk that ML-based approaches may introduce or recommend any decision in violation of ethical principles, some firms introduced a Model Ethics framework.
More generally, firms tend to use externally developed ML-based algorithms, particularly in cases when detailed data are accessible only via third party or there are no specific appetite or resources available to develop the models internally.
The validation of ML-based models/algorithms is a core function predominantly performed internally and not often outsourced, but it may be expedient to outsource in limited circumstances. for example where technical niche expertise is required or a shortage of resources becomes an impediment to timely outcomes.
In general terms, development and validation processes do not change substantially due to the use of ML. Some of the techniques and tests might be different but they will be based on the same fundamental principles as for other traditional techniques.
Firms acknowledge that the internal users’ understanding and acceptance of the model and its functioning are key elements to ensure the model delivers on its purpose of supporting business decisions . This requires significant efforts for example in explaining ML models when compared to traditional models.ML-based approaches could pose an additional challenge for interpretation from a business user perspective, particularly when it relates to gaining comfort over the reasonableness of the model outputs, particularly for wholesale banking. It is noteworthy that this challenge exists in every use case of ML and is not peculiar to ML models within IRB . However, partial explainability of the relationship or outcomes of the ML models is generally acceptable in other user cases of ML, but unclear if this would also be acceptable in the IRB context.
One key issue for user acceptance would be the explainability, which is not a clearly defined term; especially in cases where the model view diverges from the expert view, it would be crucial for the experts to understand the methodology of the model and, in some cases, the specific factors/reasons for the model outputs.
In spite of the challenges highlighted above, there are techniques that can be used to explain and describe the results in a similar way as we do with traditional models (e.g. by describing the relevant factors involved in the decision). To address the challenge of internal user acceptance, some firms establish model risk management framework with controls to ensure that internal user acceptance is built-in the selection of the approach. Typically a simple more intuitive approach may be preferred over a more complex model to retain a deep understanding over the model usage (e.g. use of loss rates per LTV buckets over a model).
Generally, the understanding of the model output and its functioning is achieved by segregation of duties across multiple control functions, depending on the role and the model use. As with the usage of traditional models, it is key to ensure that 1st and 3rd lines of defence have the skills to understand the results of the models. Minimum standards for model use require user training, clear and current instructions to model usage and pre-authorization for additional uses. In essence, the effectiveness of the controls from end-to-end is achieved via reliance on integrated controls and by compounding the effectiveness of specific reviews along the model lifecycle steps.
Typically, a detailed understanding of the technical aspects is predominantly expected from the validation functions which would perform dedicated tests for specific uses. In addition, the model output and its reasonableness and responsiveness to inputs is evaluated by panels of experts consulted during the development phase and prior to completing the internal approval process. The implementation design also progresses starting as early as the development phase and it is approved by the model owner (or delegates) prior to receiving internal approval.
Overall, we do not consider that the introduction of ML models will increase any risk associated to models' results interpretation With regards to the development and validation processes, these processes do not change substantially.
ML is used in many areas apart from IRB models such as in AML/CFT and fraud prevention. When ML is used in those contexts, the explainability techniques that are already available are proving to be effective to understand the outcomes. However, within the IRB context, financial institutions do not have sufficient clarity on supervisory expectations about the interpretability of the results, which becomes a barrier to the adoption of this technology. We believe that there is a need for the EBA to develop guidance on this issue and clarifies what level of explainability meets CRR requirements.
ML models in the context of IRB are often used in the development process. Some firms use ML-based algorithms only to support or validate certain steps in the model development phase within the IRB context, human intervention is allowed and required during the model development phase to evaluate any proposed model output or instances where experts may simply disagree with the model output. In application, output of ML models is reviewed and approved by credit officers for further consideration in IRB models.
ML has also been utilised in the context for IRB for low value-added tasks such as the preparation of information sources or the selection of variables to make the process more efficient and eliminate the subjectivity of these processes, as every decision is statistically supported.
Specific examples include:
• challenging variable selection based on deep knowledge of the business, professional experience and/or economic intuition
• visual inspection of the proposed model results, e.g. definition of number of grades obtained from the clustering algorithm or verifying possible overtime intersections of the target variable across grades
• bucketing of grades/pools
• evaluating performance and fitness-for-purpose of challenger model
The use of ML often allows for human intervention in the development and implementation of the algorithms. Tasks such as the algorithm selection, the establishment of the boundary conditions, and parameterization of the algorithms (number of nodes, clustering...) are performed by human intervention. Although there are also opportunities to automate the end-to-end process and this offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models.
We do not foresee the use of ML posing any additional challenge to comply with data protection regulation. However, some firms face challenges not to compromise any of the following requirements based on legal uncertainties:
• CRR requires firms to collect and store all relevant data to provide effective support to its credit risk measurement and management process.(CRR Article 144(1) (d)). This in practice also includes personal data.
• CRR specifies that this dataset need to be retained for at least for five years. If the available observation period spans a longer period for any source, and this data is relevant, the longer period shall be used. (CRR Article 180 1. (h)) . Whereas, GDPR does not specify retention periods for personal data but stipulates that a firm must ensure that personal data is stored for no longer than necessary for the purposes for which it was collected. That period should take into account the reasons why the firm needs to process the data as well as any legal obligations to keep the data for a fixed period of time. CRR also requires traceability of the dataset: The history; processing and location of the data under consideration can be easily traced (CRR Article 174(b)). With GDPR, If data is anonymized, the GDPR allows you to keep it for as long as you want. By way of an exception, personal data may be kept for a longer period for archiving purposes in the public interest or for reasons of scientific or historical research, provided that appropriate technical and organization measures are put in place (such as anonymization, encryption etc.)
ML-based models have been used by some firms for the estimation of both risk differentiation and quantification. There are no specific challenges identified in the aforementioned areas.
a) Methodology (e.g. which tests to use/validation activities to perform).
• Although many firms are not currently using ML in credit risk models, we do not think that the use of ML requires a different methodology or specific steps in the development process. The steps are the same (selection of LR, selection of variables, selection of drivers, construction of decision trees etc.).
b) Traceability (e.g. how to identify the root cause for an identified issue).
• As noted in 3 above, explainability of models is key. Explainaibility is in turn linked to Traceability, as models would need to be transparent to internal model users and developers. Traceability (e.g. how to identify the root cause for an identified issue) is a part of the validation activities. Gaining a deep understanding of how hyperparameter settings impact a model and detect potential issues (e.g. for risk drivers selection, assessing statistical dependencies, testing assumptions, evaluating human judgement, etc.) is an iterative process and typically entails replicating model output by re-running scripts (or part thereof) and challenging the model design.
There are available techniques that can be used to explain and describe the results in a similar way as is done with traditional models, but legal certainty about the sufficiency of these explanations to meet supervisory expectations will be helpful.
• We do not consider knowledge required by the validation function constitutes a challenge in that firms can address the continuous need for professional development by offering in-house training on ML where required. In addition, validation teams are structured to allow mobility across projects and countries, thereby offering the opportunity to improve validators’ experience with ML approaches and develop an appreciation for pros/cons of various methods.

• Subject to the adopted approach, validation of candidate models (or live models) may require significantly more tests and methodologies than current approaches. (E.g. linear regression tests are simple, standardised and well understood. A Neural network needs additional and more complex validation)
• As highlighted above, resources needed to perform the validation do not necessarily represent a challenge any more than non-ML models. For non -ML models factors that are key to the successful completion of a validation activity are the planning and pre-agreed validation procedures. Additional time would be allocated if required to build a model or test the justification to use a ML algorithm. Nonetheless, ML-based models do require resources with a specialist knowledge of ML model statistical properties and programming skills.
Some Member firms have indicated that models would be subject to all known techniques available to avoid overfitting, such as out-of-sample performance, cross-validation, k-fold testing, regularisation, hyper parameter tuning etc. Other Members have explored common approaches such as splitting data into training/testing, cross-validation, limiting the number of features (by their importance), regularization and increasing the size of training data.
Cross validation and regularisation remain viable tests, in addition to limiting the number of features by their importance and testing model outputs by increasing the size of training data.
There are also techniques available to verify that the relationship of “individual attributes to the target” remains stable over time. These continue to evolve as further work is done in ML.
Additionally, with regards to transparency and explainability, the strategy could be for Credit professionals to check and validate the risk factors, impacts, economic links, etc to ensure that the model follows common business sense and is not overfitted.
We wish to reiterate that monitoring and control processes for ML-based models are similar to those for models based on other techniques. The building steps do not change.
Some of the challenges regarding the development, maintenance and control of ML models in the IRB context may include:
• IT implementation, controls and governance . ML processes, cleans and monitors large amount of data so it requires different controls ,requires more capability and IT expertise than would be required for traditional methods e.g. logistical regression In this regard, legacy IT infrastructure is very important and could be a challenge. Some existing IT infrastructure may not fully support the implementation of ML models in credit risk (e.g. number of parameters, non-linear relationships between risk drivers and target variable, unstructured data used in ML models, computation capability). However this challenge is not peculiar to ML models, technological problems are intrinsic to any new model implementation.
• There may also be a challenge with interpretability, when poor performance is identified (e.g. high override rates); it may be more difficult in such an instance, to attribute this or to trace the key reason for the poor performance if the model is essentially a black box.
• It is noteworthy that the challenges highlighted above, are not insurmountable.
The common use of ML-based algorithms in the IRB context is to support or validate certain steps in the model development phase. Model-related changes would occur when a material and consistent deterioration of the model performance is observed. Examples include deterioration in discriminatory power or loss of accuracy.
Similarly, changes to the non-model components of a rating system (e.g. implementation platform) are driven by business requirements when efficiency gains or risk mitigation opportunities justify the change.
Depending on the use of the IRB models, some firms undertake a full review every 3 years, which includes a reassessment of the model design and changes in parameters.
Changes to IRB models are evaluated against regulatory requirements and are governed by internal policies. Given the current regulations, it does not seem to be feasible to have models that are updated more frequently than every few years. Even at an optimistic interpretation, re-estimating parameters in a model may qualify as “changes to the rating criteria and/or their weights or hierarchy“ which would in turn trigger a pre-notification or pre-approval) .
Challenger models for instance, should be developed at a certain interval and then will be compared with a champion model; whenever the challenger model is significantly better then champion. Ideally, the challenger should ideally become the champion. However, this is challenging under existing guidelines.
It is necessary to differentiate between scenarios where ML models are used exclusively to design the model, but then the model remains static afterwards from scenarios where the training is allowed for ML models. We welcome a clarification from the Regulators , if either of the afore-mentioned scenario is considered more acceptable.
The usage of ML models are totally effective for calculating provisions. Estimations of expected losses and regulatory capital should be equivalent due to shortfall, and should be calculated using the same methodology. A practical example will be pricing models that use both capital and provisions costs what means that they have to be calculated by using the same methodology.

Apart from regulatory capital purposes, some firms use ML to develop credit acceptance models, affordability models (income and expenses estimation to assess customers’ repayment capacity), detect application frauds, generate early warning signals and create new variables for risk models (e.g. based on transactional data). In addition. Some firms are also investigating opportunities to develop challenger models of credit decision making models using ML. Other firms use ML for acquisition and collection scorecards. We see many opportunities in the use of ML within the credit risk area and beyond capital estimation, e.g. in the admission process.
We foresee certain challenges in using ML in the context of the IRB models stemming from the proposed AI Act. For example, Paragraph 80 of the draft AI Act states that “limited derogations should also be envisaged in relation to the quality management system of providers and the monitoring obligation placed on users of high-risk AI systems to the extent that these apply to credit institutions regulated by Directive 2013/36/EU (i.e. defined in Article 3(1)(1) of that directive and “credit institution as defined in point (1) of Article 4(1) of Regulation (EU) No 575/2013” [i.e. the CRR]). This means that most likely, many institutions will have to comply with most/all of the High Risk section.
These poses the following few issues:
• The Definition of “AI” in Annex 1 includes “Statistical approaches, Bayesian estimation” which could be far reaching and cover almost any existing method used by Member Firms. We should welcome some clarity from the EBA on where to draw the line between Statistical non-AI / non-ML models, and new AI/ML models for which these directives and requirements would apply.
• Annex 3 of the proposed AI Act lists as high risk, “AI systems intended to be used to evaluate the creditworthiness of natural persons or establish their credit score, with the exception of AI systems put into service by small scale providers for their own use” – which therefore classifies Retail Rating/PD models as High Risk items.
• If IRB ML models are classified as High Risk AI systems i.e. " AI systems intended to be used for making individual risk assessments"; then they will need to be compliant with another list of requirements resulting from AI EU Act, such as:
• Conformity assessment
• Reporting obligations
• Technical documentation
• Human oversight (human-machine interface tools)
• Bias and discrimination prevention
We wish to highlight this as a possible unintended consequence. We propose that when credit scoring is used as an input to IRB models, the requirements of the AI act should not apply to IRB models for the following reasons:
• The use of AI in IRB models does not adversely affect the decision-making process of the lender, and therefore has no impact on the rights of individuals
• Any potential risk of harm to consumers will already be mitigated at this early stage, given that institutions will have to comply with the requirements set out in the AI act to calculate the credit scoring.
We believe there is a need to differentiate AI applications which are used in the wider credit process (e.g. AI applications used in the valuation of collateral, are used as background tools in the process and do not affect a person’s access to essential services; or applications used in any phases following the initial disbursement of the loan which are used for monitoring and internal process efficiency as well as other subsequent uses of the credit scoring in other applications from those applications that can cause harm to consumers (e.g. capital consumption models, or marketing campaigns) . AI applications which do not cause consumer harm, including IRB models should be clearly excluded from the scope of the AI Act.
Until the further clarity is provided, it is very challenging to develop ML/AI models as the goalpost or acceptance-threshold is unknown. For example, the AI Act is not clear on who will be the Competent Authority. Member firms also seek some understanding on projected timelines as this could help with strategy discussions related to the use of ML models in the IRB context.
The regulatory framework for the approval of regulatory models is demanding. We would recommend that Supervisory teams should be sized and skilled properly to face the review of these models. These types of models will require an effective supervisory process.
A limited use of an ML model for this purpose is already underway. Some initiatives have been undertaken to explore the use of ML based models for collateral valuation in selected countries.
Potential beneficial uses of ML models span serval applications in the Financial Services industry. Examples include:
• Willingness versus ability to pay in collections and recoveries
• Collateral valuations
• Know-your-customer (KYC)
• Application / transactional fraud
• Cyber security
• Operations
• Marketing
• Anomalies detection
Explainability/interpretability techniques are being developed in the industry. ML benchmarks are generally basic and self-explanatory (e.g. decision trees with very few nodes). If ML was expanded, we would have to introduce standard explainability techniques that would vary based on the context and types of explanations needed and would be generally accepted also by regulators.
From a control perspective, some firms have introduced a Model Ethics framework to govern the risk that models may infringe ethical principles.
We are not concerned about how to share the information gathered with different stakeholders. Understandable there are differing levels of technical knowledge and expertise within a firm.
In that regard, sharing information on models’ interpretability with different stakeholders does represent a challenge. However, this is not a challenge peculiar to ML models and non-technical stakeholders do not need to get into technicalities, similar to the approach adopted when using other mathematical techniques to build IRB models. We believe that with the right involvement and training of different internal stakeholders this knowledge gap and that the senior management may have of these models can be can bridged.
Some firms have developed various approaches and tools supporting such activities.
Some practical tips that some firms have adopted to bridge the knowledge gap in other areas, which have proved include:
• providing senior management with short summaries of key model characteristics. To this extent, MV reports already include model specifications and validation outcomes;
• relying on reader-friendly graphics to help committee members visualize the key elements of the model structure in presentations for committee meetings;
• providing dedicated training on the basic concepts and approaches used in ML;
• explaining the model score with segments (high, medium, low scoring) using a simple decision tree and top attributes as branches;
• shortlisting the most important variables;
• providing graphical representations of functions such as PDP/SHAP.
We would appreciate some regulatory assurance from the Regulators that having explainability and interpretability of results covered in a layered approach in the various stages of the life cycle would this be compatible with the CRR principle. For example, Senior Management will have sight of the final conclusions, whilst the technical people are involved with the granular details.
We welcome the introduction of principle-based recommendations. Furthermore, it would be extremely beneficial if such recommendations explicitly stated the regulators’ objectives and context, such that the interpretation and logic that institutions are expected to adopt is clear.
Additionally, the principle of regular updates does not appear to be compatible with CRR or EU 529/2014 in their current form. We would need clarification from the regulators on the conflict.
As raised previously in Q9, we believe it would be necessary to distinguish between scenarios where ML techniques are used exclusively to design the model and cases where the training of the model is allowed.
Also, we would encourage the regulator to :
• clearly differentiate what is expected to be minimum standards from recommended practices;
• Set out supervisory timing for issuing an opinion regarding these new approaches;
• provide usage of interpretability techniques that are well accepted and considered as robust enough for explaining the predictions made by the models and global behaviour consistently;
• provide guidance on supervisory expectations in terms of sufficient identification and elimination of possible unfair biases.
Association of Financial Markets in Europe