Ivona Cickovic and Andrea Serafino

Machine studying fashions are more and more utilized in organisational decision-making, but their inside workings usually stay opaque. When these programs affect actual world outcomes, understanding what they predict shouldn’t be sufficient – we additionally want to know why. Explainability strategies purpose to light up this ‘black field,’ and function attribution instruments that hyperlink predictions to particular person inputs are particularly well-liked. They really feel intuitive however depend on strict knowledge assumptions that hardly ever maintain, making their outputs unreliable. The 2019 Apple Card case illustrates why this issues: regardless of gender not being an express enter, girls appeared to obtain decrease credit score limits than males with related profiles – an consequence attribution strategies battle to clarify. This submit examines a key assumption underpinning these instruments and the way it distorts explanations.
The restrictions of well-liked explainability strategies
Machine studying (ML) fashions are sometimes sufficiently advanced that it’s obscure how modifications within the knowledge moving into result in modifications within the predictions popping out. This has pushed the event of assorted explainability strategies that declare to see by means of this opacity and summarise the connection between a mannequin’s inputs and outputs.
Frequent examples embrace Shapley Additive Clarification (SHAP), a technique that assigns every function its common marginal contribution throughout all doable subsets of options; Native interpretable model-agnostic rationalization (LIME), which explains particular person predictions by becoming a easy, interpretable mannequin domestically across the statement of curiosity; Partial Dependence Plot (PDP), visible instruments that present how a mannequin’s common prediction modifications as one function varies whereas the consequences of others are averaged out; and Permutation function significance (PFI), a efficiency‑based mostly method that assesses function relevance by randomly shuffling values and measuring the ensuing loss in accuracy. Nonetheless, a rising physique of analysis has highlighted limitations in these extensively used strategies (eg Salih et al (2024); Bordt et al (2022); Velmurugan et al (2023); and Ragodos et al (2024)).
A serious concern is that these approaches implicitly assume that mannequin inputs – usually known as options in ML – are unbiased, an assumption that hardly ever holds in actual‑world knowledge units. Though textbooks and practitioner guides (eg, Molnar (2025)) warn about the violation of these assumptions, the caveats are sometimes ignored in sensible purposes. Whereas some options in monetary fashions could also be largely unbiased (for instance, the variety of standing orders versus a cell phone invoice), many others are naturally correlated, reminiscent of mortgage quantity and month-to-month compensation. When such dependencies are current, attribution strategies produce distorted or deceptive explanations, obscuring the true drivers of a mannequin’s behaviour. As highlighted in earlier Financial institution Underground work on AI equity, opaque or biased mannequin behaviour can amplify but conceal discriminatory choice patterns.
A managed experiment: unbiased versus correlated knowledge
For instance how a lot this issues, we run a easy experiment utilizing two massive artificial knowledge units (50,000 rows × 50 options): one with unbiased options (or predictors) and one by which the predictors are correlated. In each knowledge units, the goal is a linear mixture of options plus noise. For the correlated‑options knowledge set, Chart 1 exhibits the pairwise correlation heatmap (with pink and blue marking optimistic and unfavourable relationships, respectively; darker colors point out stronger correlations, whereas paler colors present weaker ones), and Chart 2 exhibits the distribution of absolute pairwise correlations. Collectively, these charts present a sample typical of many credit score‑danger or financial knowledge units: most function relationships are weak – with a median absolute correlation of about 0.20 – whereas a smaller quantity exhibit stronger associations, carefully mirroring what we observe in actual‑world modelling for instance Inventory and Watson (2017) or Laloux et al (1999)).
On every knowledge set, we fitted 4 widespread fashions – linear regression, random forest, gradient boosting, and a neural community – and utilized the 4 explainability strategies talked about above. We then in contrast the function rankings assigned by these strategies with the true rankings implied by the info‑producing course of (ie, the coefficients we used to generate the artificial knowledge). We measured the rank settlement between the 2 rankings – that’s, the extent to which they place options in the identical order – utilizing Spearman’s Rho (ρ) as a rank-agreement coefficient. This was repeated 500 instances to see how steady the outcomes are.
Chart 1: Pairwise function correlation heatmap

Chart 2: A consultant distribution of pairwise function correlations (absolute values)

What the outcomes present
Explainability strategies are dependable solely when options are unbiased, however their efficiency deteriorates sharply as soon as options develop into even mildly correlated (Chart 3). The chart exhibits the distribution of rank settlement coefficients between estimated and true feature-importance rankings throughout 500 repeated simulation runs. Every panel corresponds to an explainability technique, with separate boxplots for the fashions used.
Blue boxplots symbolize simulations with unbiased options, whereas orange boxplots present outcomes when options are correlated. Every field exhibits the interquartile vary (the center 50% of outcomes), with the median indicated by the horizontal line. When options are unbiased, all strategies get better the true rating with excessive accuracy and low variability, as mirrored within the slender blue boxplots clustered close to one.
Against this, as soon as correlation is launched, rating efficiency worsens considerably. The orange boxplots are a lot wider, median rank settlement coefficients fall (usually to between 0.3 and 0.8), and a few runs even exhibit unfavourable settlement, which means genuinely necessary options are ranked decrease than unimportant ones. In actual world settings, the place solely a single knowledge set is often noticed slightly than a whole lot of simulations, this means that function significance explanations from a single mannequin run will be extremely deceptive. That is particularly regarding in excessive stakes contexts like credit score scoring, the place choices carry actual penalties.
Chart 3. Boxplots of rank-agreement coefficients between true function rankings implied by the info producing course of and rankings implied by a variety of explainability strategies for a set of fashions (throughout 500 simulations), for the highest 10 options.
Chart 3: Boxplots of rank-agreement coefficients

To unpack what the coefficients proven within the charts imply in apply, it’s useful to consider what occurs in a person mannequin run. In our simulations, though the info producing course of is an easy totally recognized linear system, explainability strategies usually battle to get better the true ordering of function significance as soon as options are correlated.
Two broad patterns stand out. First, even genuinely necessary predictors will be severely misrepresented. In lots of runs, options which can be among the many prime three true drivers of the end result are pushed far down the rating produced by explainability strategies or disappear from the highest ten altogether. This illustrates how simply actual drivers of a mannequin’s behaviour will be obscured as soon as options exhibit even gentle dependence.
Second, options with little or no true significance are regularly promoted into the highest ranks. This kind of mis-ranking is especially problematic in apply. It encourages customers to construct interpretive narratives round variables that performed no actual function in producing the end result, resulting in a false sense of understanding of how the mannequin really works.
The place does this go away us?
This submit argues that function attribution explainability strategies carry out poorly in trendy ML settings, the place massive knowledge units and mutually dependent options are the norm. The outcomes offered point out that even modest and sensible ranges of function correlation – round 0.20 on common – can meaningfully scale back the accuracy and stability of widespread attribution strategies. In our simulations, rank-agreement that’s near good in unbiased settings usually fell sharply as soon as correlations have been launched, with necessary predictors transferring down the checklist and low relevance options transferring up. This issues as a result of instruments reminiscent of SHAP, LIME, PDPs and permutation significance are regularly used to assist mannequin interpretation. Below sensible knowledge circumstances, nevertheless, their outputs develop into unreliable, making it more durable to establish which options are genuinely driving a mannequin’s behaviour. If these strategies battle to get better the highest options in a clear, totally specified linear system, it raises severe questions on their suitability for explaining excessive dimensional fashions utilized in actual world decisioning. Quite than clarifying mannequin behaviour, they danger reinforcing deceptive narratives, discouraging deeper investigation, and creating unwarranted confidence – finally setting the stage for misguided choices.
Making function attribution genuinely insightful would require far more construction than most ML pipelines assist. That may imply introducing disciplined function building – explicitly mapping correlation construction, grouping variables into interpretable clusters (eg, socioeconomic standing, credit score behaviour, stability, demographics), and reporting explanations on the group stage slightly than for particular person options.
Whereas this type of structured organisation is customary in classical statistics, many modern ML pipelines rely as an alternative on massive units of uncooked or mechanically engineered options. In such settings, fashions are sometimes skilled on no matter variables can be found within the knowledge set, with the expectation that the educational algorithm will uncover helpful construction with out in depth guide grouping by area. Consequently, express function grouping isn’t a part of trendy ML workflows, and with many correlated variables, even defining significant teams can develop into a analysis process in its personal proper.
It’s price noting that there are attribution strategies designed to loosen up independence assumptions – reminiscent of Conditional SHAP and Causal SHAP – however these are very troublesome to scale. Conditional SHAP requires estimating the joint function distribution with a view to compute conditional expectations; Causal SHAP wants a properly specified causal graph, which most sensible ML initiatives shouldn’t have. Each are computationally very costly and fragile in excessive dimensions. So, though these options tackle a few of the theoretical shortcomings of classical function attribution strategies, they continue to be largely impractical for routine ML use. This leaves a noticeable hole between what explainability strategies promise in precept and what they will realistically ship as we speak.
Quite than treating function attribution as the first technique of understanding a mannequin, these findings level to a have to rethink how ML fashions are assessed. One approach to transfer past attribution is to look at mannequin behaviour by exploring how outputs change below structured ‘what if’ variations in inputs. A fuller exploration of this and different approaches is past the scope of this submit.
Ivona Cickovic and Andrea Serafino work within the Financial institution’s Mannequin Evaluation and Growth Division.
If you wish to get in contact, please e-mail us at bankunderground@bankofengland.co.uk or go away a remark beneath.
Feedback will solely seem as soon as accepted by a moderator, and are solely revealed the place a full title is provided. Financial institution Underground is a weblog for Financial institution of England employees to share views that problem – or assist – prevailing coverage orthodoxies. The views expressed listed here are these of the authors, and aren’t essentially these of the Financial institution of England, or its coverage committees.
Share the submit “Explainability in machine studying: do well-liked strategies ship on their guarantees?”
