The proposal seems consistent with the generally adopted definitions of model change, new models, etc. and it also seems to us to be in line with how internal models are inspected. Therefore, we see no particular constraints in implementing this differentiation.
In general, a deficiency on a model is such regardless of the level of its application. The management of the findings should therefore preferably be unique, applying a singular set of remedial actions across all the entities.
Eventually, downstream from the validation process, some findings could be identified as not material / not relevant for specific portfolios / sub-portfolios with the aim of not invalidating the assessment of a model on a certain perimeter with deficiencies that do not affect it (e.g. the cross-country model has a finding that does not have a material impact on Country A Portfolio, then the assessment of that model on the Country A entity should not be negative because of that finding).
In several experiences, we have seen how the use of specific Model Risk Management software tools has brought considerable advantages, both from the point of view of optimising and tracking the sharing of findings between validation functions (so as to avoid duplications, inconsistencies, etc.), and for defining a structured process for sharing and applying these findings.
We believe that to avoid a proliferation of definitions and also to simplify communication with third parties, it would be preferable to use a single definition.
NA
In our opinion, a best practice of incorporating rating philosophy into back-testing analyses should include the following steps:
1) the model should be defined - a priori - as PIT or TTC or Hybrid by the credit risk control unit (CRCU)
2) the rating (PIT or TTC or Hybrid) should be validated by Internal Validation (IV)
3) IV should develop differentiated metrics (e.g. more focussed towards Risk Quantification for TTC models, and towards Risk Differentiation for PIT models) and formally made them explicit in the institution's validation framework.
For example, for a TTC model, it might be interesting to evaluate the fluctuations over time of the average scores produced, which should remain within predefined ranges. To this end, it could be useful, in addition to analysing score trends over time, to test the model results after artificial perturbations in the inputs: for example, TTC models should produce results that are not too reactive to variations in the inputs (model variables). In this way, we believe that validation outcomes would be more consistent with the nature of the model (PIT or TTC or Hybrid).
The fact slotting approach produces fixed regulatory risk – weights poses a challenge when testing its performance given the absence of a direct measure to compare against “observed” outcome (both in term of default and loss).
A possible solution is to “imply” the PD (LGD values are not considered in this example) by reverting the regulatory equation for the RW calculation (where factor “K” here is considered equal to the slotting regulatory values) reported in attachment.
In this formula, for:
• Institutions under foundation approach: LGD is fixed to the F-IRB regulatory values
• Institutions under advanced approach: the Corporate LGD Model is applied (as average values by appropriate sector levels)
The approach allows to obtain an “Implied PD” (Impl_PD_K) value for each of the 5 slotting regulatory categories. These will be used to assess the slotting discriminatory power against:
• Observed historical defaults (challenging given the low default nature of the specialised lending portfolios)
• PD_EL derived from the Expected Loss Provisioning: PD_EL=EL / (LGD*EAD).
Same in terms of homogeneity test: to assess each slotting criteria contains homogeneous observations the Impl_PD_K can be used against observed defaults and PD_EL values.
Among the alternative hypotheses proposed to evaluate the performance of models in data scarcity contexts:
1) Conduct the validation solely based on either OOT or an OOS sample, using data not used at all by the CRCU for the model development
2) Leverage on the analyses performed by the CRCU, where the CRCU has assessed the performance of the model via OOT and OOS samples only during intermediate steps, but has used the whole sample to train the final model
3) Complement the tests performed by the CRCU with in sample tests and qualitative analysis (such as with the one mentioned above);
Solution 1 would be the ideal option from a validation point of view, but may generate further instability in the model: if the context is already one of "data scarcity", excluding observations from the development sample for validation purposes may contribute to making the situation even worse.
Solution 2 makes it possible to use all available data by evaluating the stability of the OOT and OOS model during intermediate steps, but it is based exclusively on analyses carried out by CRCU and this certainly cannot be considered as a best practice from our point of view.
Solution 3 probably represents the best practice because it allows all available data to be used for estimating the model, but integrates the analyses performed by CRCU with those performed by the control unit: for example, in our experience in data scarcity contexts, cross-validation analyses using leave-one-out strategies have proved to be significant in order to identify the likely range of variation of performance metrics. Furthermore, in this case it is possible for IV to re-train the model again using leave-one-out samples and assess the stability of the results in terms of variable contributions, coefficient values, performance, etc. In this way it is possible to combine the maximum exploitation of the available data with a rigorous and independent assessment of the stability of the estimates, their performance, and so on and so forth.
A further unmentioned alternative could be to use solution 1) by supplementing the development sample with synthetic data generated through advanced oversampling techniques.
In this way, real OOS/OOT data could be used to validate a model that has been estimated on real data as well as synthetic data. It is worth noting that his approach has the obvious drawback of using synthetic data for model training purposes.