Artificial intelligent models can be considered as complex black boxes that receive inputs and generate outputs. Explaining and understanding the model behavior is important when inspecting potential issues such as model biases, trustworthiness of results, and safety. Moreover, there are laws and regulations such as GDPR in Europe. The right to explanation is a legal right for a person to obtain an explanation of the logic involved in automated decision-making, such as AI techniques, when these decisions significantly affect them.

There are numerous techniques that are proposed, mostly based on linear models or decision trees/rules, to address the problem [2].

In this post I will introduce and summarize a recent paper that I came across. The paper is entitled “Synthesizing Pareto-Optimal Interpretations for Black-Box Models” [1] and it proposes a formal approach to the problem, as follows. (The details of the method and results are in the paper)

**A Motivating Example**

Consider a scenario where an airplane uses a camera sensor as an input to a neural network to taxi along a runway autonomously. The plane is supposed to follow the centerline of the runway within 2.5 meters tolerance. The plane has a monitoring module that decides when the neural network output is behaving correctly; hence, trustworthy and safe to be used.

For example, the monitoring module may use the weather condition, time of day, and initial positioning of the airplane to decide whether the neural network output is reliable. We wish to reason about this black-box monitoring module, and hence need an understandable interpretation for it.

**What is Pareto-Optimal Interpretation Synthesis**

In this work, different from most of the previous work, the objective is not just a model that with high accuracy explains the model behavior, but ease of explanation or interpretation is of concern. As such, the problem is an optimization problem with two conflicting objectives: high accuracy of interpretations and ease of explanation. Synthesizing such model is defined as a pareto optimization problem.

For example, for the above motivating example, the following models can be defined with different levels of correctness and explainability. Depending on the application of the model, user can choose one of the synthesized models that suits better.

For example, model (a) is very easy to explain, but not very accurate. Model (b) is very accurate, but not so easy to explain. And model (c) is a balance between accuracy and explainability.

A simple scenario in which trustworthiness of model is of high importance, model (b) can be selected. When based on weather condition, time of the day, and initial position “alert” is raised, automated control is disabled, and a safe controller (for example, manual control) is decided to be used.

**Pareto-Optimal Synthesis Flow**

Without going into details of method the simple flow is as follows. (Please refer to the paper [1] for more details)

1- Find the first solution using weighted MAX-SAT

(For quick introduction to SAT and MAX-SAT please refer to [3])

The solution partitions the state space into four sections as shown in the following figure

R1: there is no solution in this region

R2: solutions in this region are not pareto-optimal

R3: possible pareto-optimal solutions with higher correctness

R4: possible pareto-optimal solutions with better explainability

2- Find the next solution in one of the R3 or R4 regions (a new solution with higher correctness or better explainability), as shown in the following figure

3- Repeat step 2 until there is no (better) solution, or there is a solution with higher correctness or better explainability, or the existing solutions are satisfactory!

Summary of Results

The experiments are shown for 3 different problems. The airplane example, a bank loan predictor program, and a solvability predictor for a theorem prover.

The results show that by trying to solve tens of possible pareto optimal solutions (TNP), a handful number of pareto optimal solutions (PO) are found.

In terms of runtime, most of the problems can be solved in a few seconds, although there are cases that take significantly longer time. And if a problem has no solution (unsat) it is found in less than a second.

References:

[1] H. Torfah, S. Shah, S. Chakraborty, S. Akshay and S. Seshia, "Synthesizing Pareto-optimal Interpretations for Black-Box Models", International Conference on Formal Methods in Computer-Aided Design (FMCAD), pp. 153-162, Oct. 2021 [https://arxiv.org/abs/2105.14748]

[2] F. Bodria, F. Giannotti, R. Guidotti, F. Naretto, D. Pedreschi, and S. Rinzivillo, “Benchmarking and Survey of Explanation Methods for Black Box Models”, 2021 [https://arxiv.org/abs/2102.13076].

[3] F. Bacchus, M. Järvisalo, and R. Martins, “Advances in Maximum Satisfiability”, Tutorial at ECAI 2020: 24th European Conference on Artificial Intelligence, Sep. 2020, [https://www.cs.helsinki.fi/u/mjarvisa/papers/sat-hb-vol2-maxsat.pdf]

## Comentarios