Do you trust your models? – Explainable AI



Why we need Explainable AI

Let us con­sider a simple expe­ri­ment: I push a stone to make it rotate clock­wise. Now I state that after a few clock­wise rota­ti­ons, the stone will sud­denly change its direc­tion and begin to rotate anti-clock­wise. You will pro­ba­bly not believe me. Fur­ther­more, if I tell you that I came to this con­clu­sion because the stone is gray, you would com­ple­tely lose your trust in me. Howe­ver, if I base my assump­tion not on the color but on the stone’s length, width, and spe­ci­fic shape, that might be ano­ther case. Ima­gine that the stone has the shape of a ratt­le­back. If you read (or know), what a ratt­le­back is, the situa­tion chan­ges and you will tend to trust my prediction.

It is obvious that expl­ana­ti­ons give us con­fi­dence in the pre­dic­tions and decis­i­ons we make. Or, on the other hand, can make us very sus­pi­cious about them (‹the stone will make some strange move­ments because it is gray›). Also, machine lear­ning (ML) models can greatly bene­fit from expl­ana­ti­ons as oppo­sed to hoping that the model alre­ady deli­vers the cor­rect results. In some fields, this would „only“ increase the trust and thus the accep­tance of the model, in others this is an abso­lute basis to use the given pre­dic­tions.

This brings us to the topic of Explainable AI (XAI). XAI tries to make pre­dic­tions of ML models explainable. For this mat­ter, a wide range of dif­fe­rent methods exists. Some can be applied to com­mon ML models and others are built-in fea­tures of models spe­ci­fi­cally desi­gned for this case. Even if the topic of XAI is still under strong deve­lo­p­ment, hypers­ca­lers like Ama­zon Web Ser­vices, Azure, and the Google Cloud Plat­form all offer ser­vices to explain ML pre­dic­tions. In this blog, we first want to explain some important defi­ni­ti­ons when it comes to XAI. Fur­ther­more, we look into the tech­ni­cal fun­da­men­tals of dif­fe­rent XAI methods to get a fee­ling of what they are capa­ble of.

Important Defi­ni­ti­ons

Alt­hough some­ti­mes some phra­ses are used inter­ch­an­ge­ably or with slightly dif­fe­rent mea­nings, we try to define some of the most important con­cepts of Explainable AI.

  • Inter­pr­e­ta­bi­lity: Descri­bes if it is clear to humans how a model works intern­ally and if they under­stand the decis­ion-making pro­cess. An exam­ple is a simple linear regres­sion where we know the weights and can inter­pete them. On the other hand, black-box models, like deep neu­ral net­works, are not directly interpretable.
  • Explaina­bi­lity: Descri­bes whe­ther it is clear to humans why a model arri­ved at a spe­ci­fic decis­ion. We do not have to know the inter­nal decis­ion-making pro­cess of the model. In the fol­lo­wing sec­tion we describe methods that account for explainability.
  • Glo­bal ver­sus local explaina­bi­lity: While glo­bal explaina­bi­lity aims to explain the over­all model beha­vior (e.g. iden­ti­fy­ing the most important fea­tures), local explaina­bi­lity focu­ses on explai­ning indi­vi­dual predictions.
  • Model-agno­stic: If we can apply a method to an arbi­trary model, we call it model-agnostic.

Explainable AI – Methods

To get a fee­ling of how explainable AI works, in this sec­tion, we want to dis­cuss two com­mon XAI methods.

Shap­ley Addi­tive Expl­ana­ti­ons (SHAP):

SHAP is an explainable AI method based on Shap­ley values from game theory, which describe the amount a fea­ture con­tri­bu­tes to the over­all result. Let us assume a model for the pre­dic­tion of can­cel­la­tion pro­ba­bi­li­ties of a sub­scrip­tion. To make it not too com­pli­ca­ted, we assume a quite low num­ber of only 3 fea­tures: Age of the abon­ne­ment (F1 = 6), num­ber of com­plaints recei­ved (F2 = 8), and the week­day of sub­scrip­tion (F3 = „Tues­day“). p(x) descri­bes the pro­ba­bi­lity of can­cel­la­tion which we cal­cu­late for every pos­si­ble com­bi­na­tion of fea­tures (The values in this case are only examp­les, but we give an expl­ana­tion of why they are realistic) :

p({})=0.20Base pro­ba­bi­lity of can­cel­la­tion as aver­a­ged can­cel­la­tion ratio over all customers.
p({F1})=0.14Cus­to­mers with a 6‑year-old con­tract will have a lower can­cel­la­tion pro­ba­bi­lity than the average customer.
p({F2})=0.24Cus­to­mers with 8 com­plaints will have a hig­her can­cel­la­tion pro­ba­bi­lity than the average customer.
p({F3})=0.21The week­day of the sub­scrip­tion will only add some noise to the average can­cel­la­tion ratio.
p({F1,F2})=0.22Cus­to­mers with a 6‑year-old con­tract but also 8 com­plaints will have a hig­her can­cel­la­tion pro­ba­bi­lity than an average customer.
p({F1,F3})=0.12Adding the week­day of the sub­scrip­tion will only add some noise to p({F1}).
p({F2,F3})=0.24Adding the week­day of the sub­scrip­tion will only add some noise/has no influence to p({F2}).
p({F1,F2,F3})=0.21Pre­dic­tion of our model by incor­po­ra­ting all fea­tures. This is the pre­dic­tion we want to explain.

In our exam­ple there are in total 6 dif­fe­rent ways of adding features:

  • {} → {F1} → {F1, F2} → {F1, F2, F3}
  • {} → {F1} → {F1, F3} → {F1, F3, F2}
  • {} → {F2} → {F2, F1} → {F2, F1, F3}
  • {} → {F2} → {F2, F3} → {F2, F3, F1}
  • {} → {F3} → {F3, F1} → {F3, F1, F2}
  • {} → {F3} → {F3, F2} → {F3, F2, F1}

It is obvious but important to note that p({F1,F2}) is not just the sum of p({}), p({F1}) and p({F2}). Also, F1 will have a dif­fe­rent influence if we add it to the set {F2} or {F3}. Thus, we have to cal­cu­late the influence of the fea­tures based on every row deno­ted as pr=x({Fea­ture}).

Let us take the first row as an exam­ple. To cal­cu­late the influence of adding each fea­ture to the pre­vious com­bi­na­tion, we cal­cu­late the dif­fe­rence bet­ween the can­cel­la­tion pro­ba­bi­lity after and before adding the fea­ture, i.e.

  • pr=1({F1}) = p({F1}) – p({}) = 0.14 – 0.20 = ‑0.06
  • pr=1({F2}) = p({F1, F2}) – p({F1}) = 0.22 – 0.14 = 0.08
  • pr=1({F3}) = p({F1, F2, F3}) – p({F1, F2}) = 0.21 – 0.22 = ‑0.01

Repea­ting this cal­cu­la­tion for every row yields:

Row (x)pr=x(F1)pr=x(F2)pr=x(F3)
1-0.060.08-0.01
2-0.060.09-0.02
3-0.020.04-0.01
4-0.030.040.00
5-0.090.090.01
6-0.030.030.01

Even­tually, we average the influence of every fea­ture over all rows and arrive at SHAP values of1

  • S(F1) = ‑0.048
  • S(F2) = 0.062
  • S(F3) = ‑0.001

What can we learn from these values? Ima­gine, in our case, S(F3) would be for exam­ple not ‑0.001 but 0.12. Would we trust the model? Pro­ba­bly not. As we would not trust a per­son that a stone will change its rota­tion because it is gray, we would not trust a model pre­dic­ting a high can­cel­la­tion pro­ba­bi­lity because the sub­scrip­tion was made on a Tues­day (except we have very good reasons for this assump­tion). Since our values are ins­tead reasonable, we tend to trust the pre­dic­tion. We can also try to get some more insights from the SHAP values. By doing so we have – as often in Data Sci­ence – to be careful to distin­gu­ish bet­ween cor­re­la­tion and cau­sa­lity. It is bey­ond the scope of this blog post to dig deeper into this topic. For those who are inte­res­ted this article shows a good example.

As we often handle data­sets with many ent­ries, the SHAP values are com­monly dis­played in bees­warm or simi­lar plots. Fig. 1 shows an exam­ple of such a plot. There, the x-posi­tion of each mar­ker deno­tes the SHAP value of a spe­ci­fic entry while the color dis­plays the fea­ture value its­elf. For our exam­ple we see that the SHAP values cor­re­spon­ding to the sub­scrip­tion week­day are small regard­less of the week­day on which the sub­scrip­tion was clo­sed. Also, ent­ries with hig­her sub­scrip­tion ages tend to have lower SHAP values for this fea­ture and values with hig­her num­ber of com­plaints hig­her SHAP values.

Figure 1: Exam­ple bees­warm plot dis­play­ing SHAP values.

Local Inter­pr­e­ta­ble Model-agno­stic Expl­ana­ti­ons (LIME):

In the SHAP exam­ple shown above, the mea­ning of the fea­tures is more or less directly inter­pr­e­ta­ble. Con­side­ring a long text that the model hand­les as embed­ding vec­tors or images, where the model reco­gni­zes the color chan­nels of the pixels, this would not be pos­si­ble. The­r­e­fore, to be inter­pr­e­ta­ble for humans, the fea­tures must be dis­played in ano­ther way. This can be the pre­sence of spe­ci­fic words for text clas­si­fi­ca­tion or the pre­sence of spe­ci­fic seg­ments in a pic­ture. Let us skip the math (which you can find along­side a more in-depth expl­ana­tion in this paper) and explain LIME based on an exam­ple for image clas­si­fi­ca­tion. Assume that the pic­ture shown in Fig. 2 a) is clas­si­fied by our ori­gi­nal model as „banana“.

Original image and example images showing bananas and apples with different segments grayed out.
Figure 2: Ori­gi­nal image and exam­ple seg­men­ted images with grayed-out segments.

We can split this image into dif­fe­rent seg­ments which we call „super-pixels“. We can state that every of these super-pixels is a fea­ture (Fy) (note, that it is also pos­si­ble to com­bine mul­ti­ple super-pixels to a fea­ture). Now, dif­fe­rent images are crea­ted where dif­fe­rent fea­tures are ran­domly grayed out as in Fig. 2 b). For every of these sam­ple images, the ori­gi­nal model cal­cu­la­tes the pro­ba­bi­lity for the class „banana“.  Based on the pro­ba­bi­li­ties retur­ned by the ori­gi­nal model for each of the sam­ple images, we can fit a linear regres­sion model

p = Fw + b.

Here the ele­ments Fx,y in each row x of the matrix F denote whe­ther the cor­re­spon­ding fea­ture is active (Fx,y = 1) or grayed out (Fx,y = 0) in a spe­ci­fic sam­ple image, w is a vec­tor con­tai­ning the weights wFy of the dif­fe­rent fea­tures, and b the intercept.

For fit­ting, the sam­ple images are weigh­ted by the dif­fe­rence from the ori­gi­nal image. A sam­ple image that is mostly grayed out has a hig­her distance and thus a lower weight than a pic­ture with almost all seg­ments being active. Even­tually, the retur­ned weights wFx give us the influence of the indi­vi­dual fea­tures on the over­all pre­dic­tion of our ori­gi­nal model. In our exam­ple the weights for fea­tures show­ing the bana­nas would be large, while weights refer­ring to fea­tures show­ing the app­les are nega­tive, indi­ca­ting a con­tra­dic­tion to the pre­dic­ted class „banana“. If we would deter­mine expl­ana­ti­ons for the class „apple“ ins­tead of „banana“, this would be the other way around. For visua­liza­tion, it is hel­pful to plot images with only the most con­tri­bu­ting fea­tures or the most con­tra­dic­ting ones like in Fig. 2 c) and d), respectively.

Thus, LIME can increase the trust in our models as well as help to iden­tify issues in our data­sets that other­wise are hard to reco­gnize. This Google White­pa­per gives a good exam­ple of cri­ti­cal model trai­ning. There, the model was trai­ned to detect dise­a­ses from X‑rays. But ins­tead of rely­ing on rele­vant areas for the dise­a­ses the model arri­ves at pre­dic­tions based on pen marks from radio­lo­gists which are hardly visi­ble to humans.

Con­clu­sion

In this blog post, we dived into the topic of explainable AI by defi­ning some important terms and gai­ning an under­stan­ding of what explaina­bi­lity means in the con­text of explainable AI. Fur­ther­more, we lear­ned about two com­mon methods used to explain machine lear­ning models. This could help to ans­wer the ques­tion rai­sed in the cap­tion with a „Yes, I do“ and greatly improve the debug-ability of machine lear­ning models.

  1. Note, that we cal­cu­la­ted with pro­ba­bi­li­ties in this exam­ple. In prac­tice, it would be ensu­red that these pro­ba­bi­li­ties won’t be nega­tive for exam­ple with Log-Odds. ↩︎