Integrated Gradients is a technique used to understand the importance of features in a model’s prediction.
It works by computing the integral of the model’s gradients with respect to the input features over a path from a baseline input to the actual input.
How to calculate
Mathematically, the Integrated Gradients of a model’s output y with respect to an input feature $x_i$ can be calculated using the following equation:
\begin{equation} \text{IntegratedGrads}(x_i) = (x_i - x_i^{\prime}) \int_{\alpha=0}^1 \frac{\partial F(x^{\prime} + \alpha \times (x-x^{\prime}))}{\partial x_i} d\alpha \end{equation}
where $F$ is the model’s output function, $x_i$ is the i-th input feature, and $x_i^{\prime}$ is the baseline input value for that feature.
-
The integral term represents the cumulative sum of the gradients of the model’s output with respect to the input feature, computed at each point along the path from the baseline input to the actual input.
-
The Integrated Gradients method allows us to understand the contribution of each feature to the model’s output, by computing the gradients and integrating them along a path in the input space.
This information can be used to diagnose model behavior, identify important features, and gain insights into how the model makes predictions.