Do you trust your mod­els? – Explain­able AI



Why we need Explain­able AI

Let us con­sider a simple exper­i­ment: I push a stone to make it rotate clock­wise. Now I state that after a few clock­wise rota­tions, the stone will sud­denly change its dir­ec­tion and begin to rotate anti-clock­wise. You will prob­ably not believe me. Fur­ther­more, if I tell you that I came to this con­clu­sion because the stone is gray, you would com­pletely lose your trust in me. How­ever, if I base my assump­tion not on the color but on the stone’s length, width, and spe­cific shape, that might be another case. Ima­gine that the stone has the shape of a rat­tle­back. If you read (or know), what a rat­tle­back is, the situ­ation changes and you will tend to trust my prediction.

It is obvi­ous that explan­a­tions give us con­fid­ence in the pre­dic­tions and decisions we make. Or, on the other hand, can make us very sus­pi­cious about them (‘the stone will make some strange move­ments because it is gray’). Also, machine learn­ing (ML) mod­els can greatly bene­fit from explan­a­tions as opposed to hop­ing that the model already deliv­ers the cor­rect res­ults. In some fields, this would „only“ increase the trust and thus the accept­ance of the model, in oth­ers this is an abso­lute basis to use the given pre­dic­tions.

This brings us to the topic of Explain­able AI (XAI). XAI tries to make pre­dic­tions of ML mod­els explain­able. For this mat­ter, a wide range of dif­fer­ent meth­ods exists. Some can be applied to com­mon ML mod­els and oth­ers are built-in fea­tures of mod­els spe­cific­ally designed for this case. Even if the topic of XAI is still under strong devel­op­ment, hyper­scalers like Amazon Web Ser­vices, Azure, and the Google Cloud Plat­form all offer ser­vices to explain ML pre­dic­tions. In this blog, we first want to explain some import­ant defin­i­tions when it comes to XAI. Fur­ther­more, we look into the tech­nical fun­da­ment­als of dif­fer­ent XAI meth­ods to get a feel­ing of what they are cap­able of.

Import­ant Definitions

Although some­times some phrases are used inter­change­ably or with slightly dif­fer­ent mean­ings, we try to define some of the most import­ant con­cepts of Explain­able AI.

  • Inter­pretab­il­ity: Describes if it is clear to humans how a model works intern­ally and if they under­stand the decision-mak­ing pro­cess. An example is a simple lin­ear regres­sion where we know the weights and can inter­pete them. On the other hand, black-box mod­els, like deep neural net­works, are not dir­ectly interpretable.
  • Explain­ab­il­ity: Describes whether it is clear to humans why a model arrived at a spe­cific decision. We do not have to know the internal decision-mak­ing pro­cess of the model. In the fol­low­ing sec­tion we describe meth­ods that account for explainability.
  • Global versus local explain­ab­il­ity: While global explain­ab­il­ity aims to explain the over­all model beha­vior (e.g. identi­fy­ing the most import­ant fea­tures), local explain­ab­il­ity focuses on explain­ing indi­vidual predictions.
  • Model-agnostic: If we can apply a method to an arbit­rary model, we call it model-agnostic.

Explain­able AI – Methods

To get a feel­ing of how explain­able AI works, in this sec­tion, we want to dis­cuss two com­mon XAI methods.

Shap­ley Addit­ive Explan­a­tions (SHAP):

SHAP is an explain­able AI method based on Shap­ley val­ues from game the­ory, which describe the amount a fea­ture con­trib­utes to the over­all res­ult. Let us assume a model for the pre­dic­tion of can­cel­la­tion prob­ab­il­it­ies of a sub­scrip­tion. To make it not too com­plic­ated, we assume a quite low num­ber of only 3 fea­tures: Age of the abon­nement (F1 = 6), num­ber of com­plaints received (F2 = 8), and the week­day of sub­scrip­tion (F3 = „Tues­day“). p(x) describes the prob­ab­il­ity of can­cel­la­tion which we cal­cu­late for every pos­sible com­bin­a­tion of fea­tures (The val­ues in this case are only examples, but we give an explan­a­tion of why they are realistic) :

p({})=0.20Base prob­ab­il­ity of can­cel­la­tion as aver­aged can­cel­la­tion ratio over all customers.
p({F1})=0.14Cus­tom­ers with a 6‑year-old con­tract will have a lower can­cel­la­tion prob­ab­il­ity than the aver­age customer.
p({F2})=0.24Cus­tom­ers with 8 com­plaints will have a higher can­cel­la­tion prob­ab­il­ity than the aver­age customer.
p({F3})=0.21The week­day of the sub­scrip­tion will only add some noise to the aver­age can­cel­la­tion ratio.
p({F1,F2})=0.22Cus­tom­ers with a 6‑year-old con­tract but also 8 com­plaints will have a higher can­cel­la­tion prob­ab­il­ity than an aver­age customer.
p({F1,F3})=0.12Adding the week­day of the sub­scrip­tion will only add some noise to p({F1}).
p({F2,F3})=0.24Adding the week­day of the sub­scrip­tion will only add some noise/has no influ­ence to p({F2}).
p({F1,F2,F3})=0.21Pre­dic­tion of our model by incor­por­at­ing all fea­tures. This is the pre­dic­tion we want to explain.

In our example there are in total 6 dif­fer­ent ways of adding features:

  • {} → {F1} → {F1, F2} → {F1, F2, F3}
  • {} → {F1} → {F1, F3} → {F1, F3, F2}
  • {} → {F2} → {F2, F1} → {F2, F1, F3}
  • {} → {F2} → {F2, F3} → {F2, F3, F1}
  • {} → {F3} → {F3, F1} → {F3, F1, F2}
  • {} → {F3} → {F3, F2} → {F3, F2, F1}

It is obvi­ous but import­ant to note that p({F1,F2}) is not just the sum of p({}), p({F1}) and p({F2}). Also, F1 will have a dif­fer­ent influ­ence if we add it to the set {F2} or {F3}. Thus, we have to cal­cu­late the influ­ence of the fea­tures based on every row denoted as pr=x({Fea­ture}).

Let us take the first row as an example. To cal­cu­late the influ­ence of adding each fea­ture to the pre­vi­ous com­bin­a­tion, we cal­cu­late the dif­fer­ence between the can­cel­la­tion prob­ab­il­ity after and before adding the fea­ture, i.e.

  • pr=1({F1}) = p({F1}) – p({}) = 0.14 – 0.20 = ‑0.06
  • pr=1({F2}) = p({F1, F2}) – p({F1}) = 0.22 – 0.14 = 0.08
  • pr=1({F3}) = p({F1, F2, F3}) – p({F1, F2}) = 0.21 – 0.22 = ‑0.01

Repeat­ing this cal­cu­la­tion for every row yields:

Row (x)pr=x(F1)pr=x(F2)pr=x(F3)
1-0.060.08-0.01
2-0.060.09-0.02
3-0.020.04-0.01
4-0.030.040.00
5-0.090.090.01
6-0.030.030.01

Even­tu­ally, we aver­age the influ­ence of every fea­ture over all rows and arrive at SHAP val­ues of1

  • S(F1) = ‑0.048
  • S(F2) = 0.062
  • S(F3) = ‑0.001

What can we learn from these val­ues? Ima­gine, in our case, S(F3) would be for example not ‑0.001 but 0.12. Would we trust the model? Prob­ably not. As we would not trust a per­son that a stone will change its rota­tion because it is gray, we would not trust a model pre­dict­ing a high can­cel­la­tion prob­ab­il­ity because the sub­scrip­tion was made on a Tues­day (except we have very good reas­ons for this assump­tion). Since our val­ues are instead reas­on­able, we tend to trust the pre­dic­tion. We can also try to get some more insights from the SHAP val­ues. By doing so we have – as often in Data Sci­ence – to be care­ful to dis­tin­guish between cor­rel­a­tion and caus­al­ity. It is bey­ond the scope of this blog post to dig deeper into this topic. For those who are inter­ested this art­icle shows a good example.

As we often handle data­sets with many entries, the SHAP val­ues are com­monly dis­played in beeswarm or sim­ilar plots. Fig. 1 shows an example of such a plot. There, the x-pos­i­tion of each marker denotes the SHAP value of a spe­cific entry while the color dis­plays the fea­ture value itself. For our example we see that the SHAP val­ues cor­res­pond­ing to the sub­scrip­tion week­day are small regard­less of the week­day on which the sub­scrip­tion was closed. Also, entries with higher sub­scrip­tion ages tend to have lower SHAP val­ues for this fea­ture and val­ues with higher num­ber of com­plaints higher SHAP values.

Fig­ure 1: Example beeswarm plot dis­play­ing SHAP values.

Local Inter­pretable Model-agnostic Explan­a­tions (LIME):

In the SHAP example shown above, the mean­ing of the fea­tures is more or less dir­ectly inter­pretable. Con­sid­er­ing a long text that the model handles as embed­ding vec­tors or images, where the model recog­nizes the color chan­nels of the pixels, this would not be pos­sible. There­fore, to be inter­pretable for humans, the fea­tures must be dis­played in another way. This can be the pres­ence of spe­cific words for text clas­si­fic­a­tion or the pres­ence of spe­cific seg­ments in a pic­ture. Let us skip the math (which you can find along­side a more in-depth explan­a­tion in this paper) and explain LIME based on an example for image clas­si­fic­a­tion. Assume that the pic­ture shown in Fig. 2 a) is clas­si­fied by our ori­ginal model as „banana“.

Original image and example images showing bananas and apples with different segments grayed out.
Fig­ure 2: Ori­ginal image and example seg­men­ted images with grayed-out segments.

We can split this image into dif­fer­ent seg­ments which we call „super-pixels“. We can state that every of these super-pixels is a fea­ture (Fy) (note, that it is also pos­sible to com­bine mul­tiple super-pixels to a fea­ture). Now, dif­fer­ent images are cre­ated where dif­fer­ent fea­tures are ran­domly grayed out as in Fig. 2 b). For every of these sample images, the ori­ginal model cal­cu­lates the prob­ab­il­ity for the class „banana“.  Based on the prob­ab­il­it­ies returned by the ori­ginal model for each of the sample images, we can fit a lin­ear regres­sion model

p = Fw + b.

Here the ele­ments Fx,y in each row x of the mat­rix F denote whether the cor­res­pond­ing fea­ture is act­ive (Fx,y = 1) or grayed out (Fx,y = 0) in a spe­cific sample image, w is a vec­tor con­tain­ing the weights wFy of the dif­fer­ent fea­tures, and b the intercept.

For fit­ting, the sample images are weighted by the dif­fer­ence from the ori­ginal image. A sample image that is mostly grayed out has a higher dis­tance and thus a lower weight than a pic­ture with almost all seg­ments being act­ive. Even­tu­ally, the returned weights wFx give us the influ­ence of the indi­vidual fea­tures on the over­all pre­dic­tion of our ori­ginal model. In our example the weights for fea­tures show­ing the bana­nas would be large, while weights refer­ring to fea­tures show­ing the apples are neg­at­ive, indic­at­ing a con­tra­dic­tion to the pre­dicted class „banana“. If we would determ­ine explan­a­tions for the class „apple“ instead of „banana“, this would be the other way around. For visu­al­iz­a­tion, it is help­ful to plot images with only the most con­trib­ut­ing fea­tures or the most con­tra­dict­ing ones like in Fig. 2 c) and d), respectively.

Thus, LIME can increase the trust in our mod­els as well as help to identify issues in our data­sets that oth­er­wise are hard to recog­nize. This Google White­pa­per gives a good example of crit­ical model train­ing. There, the model was trained to detect dis­eases from X‑rays. But instead of rely­ing on rel­ev­ant areas for the dis­eases the model arrives at pre­dic­tions based on pen marks from radi­olo­gists which are hardly vis­ible to humans.

Con­clu­sion

In this blog post, we dived into the topic of explain­able AI by defin­ing some import­ant terms and gain­ing an under­stand­ing of what explain­ab­il­ity means in the con­text of explain­able AI. Fur­ther­more, we learned about two com­mon meth­ods used to explain machine learn­ing mod­els. This could help to answer the ques­tion raised in the cap­tion with a „Yes, I do“ and greatly improve the debug-abil­ity of machine learn­ing models.

  1. Note, that we cal­cu­lated with prob­ab­il­it­ies in this example. In prac­tice, it would be ensured that these prob­ab­il­it­ies won’t be neg­at­ive for example with Log-Odds. ↩︎