In our institution, there is no explicit definition for “first” and “subsequent” validation, while the internal framework clearly defines what an “initial” vs “ongoing” validation is. The main difference is the prescription for qualitative tests in the initial validation (e.g., on variable selection or model design), while quantitative tests are performed both in initial and in ongoing validation. In case of material model change or extension to a rating system, our framework simply foresees an initial validation. In case of non-material model change or extension to a rating system, when no report by the independent review or validation is expected to be part of the application package, the framework anyway foresees in the relevant ongoing validation that qualitative tests must be performed in case of changes in the steps under assessment or in the relevant requirements. Our understanding is that the internal framework is already consistent to footnote 36.
As regards the first validation, we see constraints in fulfilling Article 94(a) related to overrides, as the Rating Desk would be required to simulate the usage of the new model before its implementation and while the previous is still in production, to understand whether overrides are needed. This implies an unplanned effort from Rating Desk and, according to the size of the portfolio, the task could need several months, hindering the compliance on the recentness of data. Furthermore, Article 94(b) ask for a backward simulation of ratings for testing stability; while to compare the ratings at performance and use test samples is quite straightforward, adding further years would be cumbersome.
As regards the subsequent validation, we see that the prescription of Articles 113 and 115 should be part of the review of estimates performed by CRCU, and that the internal validation should leverage on it, as per Type 1 interaction in Interaction Box 12.
The only situation where single models are applied on different UniCredit Group Legal Entities for Regulatory purposes refers to the so-called “Group-Wide Models”. These models cover customers whose nature goes beyond the domestic dimension and whose risk profile is defined independently from the Legal Entity of the Group practically managing the relationship. The development of these models is centralized, based on samples covering all the intended application perimeters; the validation is performed consistently, i.e., focusing on the entire scope of application. In case a given model suffers any gap, this is detected at overall level. Additional analyses are foreseen at the level of specific Legal Entities, but these tests have a purely descriptive purpose, and no recommendations are generally issued. Indeed, in case a model shows a good performance at overall level any given issue identified on local sub-portfolios is not related to a methodological deficiency. It is in Validation responsibility to evaluate whether any local gap should drive an overall model revision or recalibration.
In case a correction on the model is needed, this has to be defined and set at overall level: an adjustment driven by a local gap would bias the results on other Legal Entities; the alternative path of defining corrective measures at local level would lead to the violation of the rating uniqueness principle and would introduce the possibility of unwarranted arbitrage among Legal Entities (i.e., the same credit given to the same multinational customer would have different RWs based on the Legal Entities materially disbursing the loan).
As per Interaction Box 3, the IA is expected to verity the compliance to external regulation and the IT implementation of default detection, while internal validation is expected to verify the proper implications on IRB models and relevant RDS. An explicit split would anyway be beneficial.
Our understanding is that:
- For changes related to IRB portfolios, but for which the CRCU demonstrate that no change is needed to the rating system, internal validation is expected to verify the CRCU’s assessment and, in case it is confirmed, no further validation activity is needed.
- For changes related to IRB portfolios for which the CRCU identified the need for change to the rating system (recalibration or redevelopment), internal validation activity is expected on the classification of the change and, consequently, on the rating system.
In both cases, all process-related aspects are expected to be covered by IA as per Content box 1.
In UniCredit opinion the back-testing of estimates for IRB purposes shall follow a Trough-the-cycle calibration philosophy: at each moment in time, PD estimates shall be representative of a quantification of riskiness that reflects an entire economic cycle. In other words, in each and every moment the quantification of the PD shall reflect the LRADR of relevant portfolio. This is to ensure the stability of capital requirements and, in consequence, to limit as much as possible any procyclical effect of the Basel framework. This assumption reflects in the need for a back-testing that is, as a first and most important step, based on the entire portfolio and comparing the PD estimate against the LRADR. Once verified that, at the validation date, the portfolio as a whole has a PD reflecting the LRADR, one can say that the overall quantification is sound and that capital requirements are not distorted by the use of the PD model (of course, limited to the back-testing). It is important to stress, in addition, that generally speaking a series of comparisons between estimates of PD at a given reference date and corresponding 1-year DRs cannot provide a complete assessment of the risk quantification capabilities of a model: this approach, indeed, would return satisfactory results only in case of PIT-calibrated models, returning PD always aligned to most figures current at that date. TTC-calibrated models instead will be aligned with the 1-year DR only on few points of the time series.
After this preliminary and general assessment, the Validation shall perform additional tests aimed at better understanding and evaluating the risk-quantification capabilities of the model, in particular focusing on the back-testing at rating class level. In this context, the rating philosophy of a model plays a fundamental role: in case of a PIT rating assignment, the Validation should expect a DR of the given class rather stable through time; in contrast, a TTC rating assignment will reflect in classes showing more volatile DR. In this vein, a failure of a comparison between PD and DR at grade level will give rise to more severe findings in the assessment of a PD model featured by a PIT rating philosophy while in case of a TTC one the misalignment is somehow expected and should be seen as a gap only in case of systematical misalignments observed on several years, typically with the same direction. The results of the comparisons between each 1-year PD and relevant observed DR can also provide additional insight on the rating philosophy, in addition to the analyses required by Article 47(c).
As per paragraph , the validation assessment covers all items from a) to d). In particular, the modelling choices are evaluated considering the regulatory requirements, the adherence to the internal processes, the market conditions, the products characteristics as well as all expert-based evidence reported in the documentation provided by the CRCU. In case the cut-off values or the weights (used for aggregation purposes) are obtained by means of statistical/empirical approaches, the appropriateness of the methodology adopted is investigated considering, for example, the size of samples used, the type of target variable. In case the weights have been expertly defined, the adequacy and meaningfulness of the justifications are evaluated from a qualitative point of view. In any case, the validation function requires to have a weight equal to 100% for each level of aggregation (i.e., sub-factor components, sub-factor, factor).
With reference to the performance of the slotting approach, the validation function assesses the discriminatory power by means of the Somers’ D metric at different level of analysis (e.g., final score and factor level) considering the default event occurred in the relevant observation period (flag 0/1) as target variable compared to the score obtained at each level of assessment. Also, the representativeness between the modelling samples and the application portfolio is verified by means of the PSI metric considering the most relevant drivers.
On top of these quantitative assessment, the validation function performs additional qualitative/empirical analyses on ongoing basis in order to verify potential concentration or instability phenomena among the final categories. Moreover, an excessive potential frequency of overrides is verified together with their impact on the final categories. In case of not fully positive results, the validation function deeper investigates the evidence in order to verify if the weaknesses could be related to the weights assignment or to the aggregation methodology at sub-factor/factor levels. Finally, for the default event occurred in the relevant observation period, the validation function back-tests the appropriateness of the categories assigned when the position was in performing (e.g., at the beginning of the observation period), considering the expectation that the worse categories (i.e., average and weak) should have been assigned. It is worth to mention that all results from both quantitative tests and empirical analyses are evaluated taking into consideration the potential low size of the samples as well as the availability of the default event within the relevant observation period.
At the end, regarding the assessment on homogeneity, the low granularity of the estimated categories (i.e., the four slots assigned by the slotting approach application) compared to the high number of different combinations of factors, sub-factor and sub-factor components that could lead to the same slot attribution make difficult the identification of proper target phenomenon to be investigated. Indeed, the differentiation of the answers (attributed to the same factors, sub-factor or sub-factor components) potentially observed on the transactions attributed to the same slot should not be necessarily a synonymous of issues in terms of model design (e.g., for the weight definition), even if the phenomenon could reflect a quite dispersion of the transactions characteristic within a specific slot. On the other hand, it is not easy to identify an alternative element (not strictly related to the values attributed to the drivers and their combination) that could be a target variable for assessing the homogeneity within the slot. On the basis of this consideration, no specific analysis on terms of homogeneity is foreseen for the slotting approach criteria, and the assessment on the appropriateness of the slot attribution and weight definition mainly relies on the evaluation of potential stability and concentration issue as well as on the rank ordering or back testing analysis. Furtherly, the sensitivity analysis proposes within Focus Box 5 (items 2 and 3) is deemed more feasible for the assessment of the weight definition.
In UniCredit opinion Validation should be conducted on data not used for the purpose of development. In the context of data scarcity, it is common practice to perform OOT analyses leveraging on the additional time series that becomes available between the beginning of development a validation activity. In case, for any give reason, the Validation assessment relies on time series overlapping (even partially) with the samples used for development, quantitative tests will consider a stricter set of thresholds.
Validation samples based on OOT observations becoming available after the beginning of development activities can be integrated with multi-year samples build by random sampling a given portion of the development samples, integrated with the additional year used for validation. For instance, assuming a model developed on the 2010-2020 time series and validated based on 2021 observations, an additional layer of analysis could be the (e.g.) 30% random sample extracted from the overall 2010-2021 time series. This approach could bring additional insight on model performance, of course to be read taking into consideration its overlap with the information underlying estimates.