Language-Aware
Visual Explanations

A multimodal explainability framework that combines SHAP, SAM, and Vision Language Models to provide both visual and textual explanations for image classifiers.

📄 MIT License 🐍 Python 3.8+ 🔥 PyTorch 📊 SHAP 🎯 SAM 💬 VLM

Why LAVE?

Powerful, flexible, and easy-to-use explainability for any image classification model

🎯

No Training Required

Works directly with pre-trained models without any additional training. Just load and explain.

🖼️

Visual Explanations

Highlight important image regions using SHAP values combined with SAM segmentation masks.

📝

Textual Explanations

Generate human-readable natural language descriptions of model predictions using VLMs.

🔄

Flexible

Supports both custom-trained and pre-trained models. Compatible with multiple architectures.

⚡

Edge-Optimized

Includes optimizations for deployment on edge devices with FP16 support and quantization.

📚

Well Documented

Comprehensive docstrings, examples, and API documentation for easy integration.

Architecture

How LAVE combines multiple AI techniques to explain model predictions

System Pipeline

📷

Input Image

↓

📊

SHAP

Feature Importance

→

🎯

SAM

Visual Mask

→

💬

TinyLLaVA

Text Generation

↓

✨

Combined Output

Visual + Textual

📊 SHAP

Computes feature importance using game theory.

DeepExplainer for neural networks
Gradient-based attribution
Input → Feature importance

🎯 SAM

Meta's state-of-the-art segmentation model.

ViT-H based architecture
Point-based prompting
Important → Binary mask

💬 TinyLLaVA

Efficient VLM for text generation.

3.1B parameters
Image captioning
Masked → Text description

Documentation

Everything you need to get started with LAVE

🚀 Installation

Set up your development environment.

Clone repository
Create conda environment
Download checkpoints

🔧 Training

Train custom models with transfer learning.

Data preprocessing
Model training script
Evaluation & testing

🎯 Inference

Run predictions and explanations.

Pre-trained models
Custom models
Batch processing

🧩 Supported Models

Multiple backbone architectures.

ResNet18ResNet50VGG16DenseNet121MobileNetV2EfficientNet

📁 Project Structure

Repository organization.

train.py - Training
test.py - Testing
predict.py - Prediction
*_explainer.py - Explanation

🤝 Contributing

Help improve LAVE.

Fork repository
Follow coding standards
Submit pull request

Quick Start

Get started with LAVE in minutes

Installation

# Clone the repository
git clone https://github.com/Purushothaman-natarajan/VALE-Explainer.git
cd LAVE-Explainer

# Create and activate environment
conda env create -f environment.yaml
conda activate LAVE-Explainer

# Download model checkpoints
# SAM: https://github.com/facebookresearch/segment-anything

Run the Explainer

# For pre-trained models
python pre-trained_model_explainer.py \
    --model_name densenet121 \
    --img_path path/to/image.jpg \
    --sam_checkpoint path/to/sam_vit_h.pth

# For custom-trained models
python custom_model_explainer.py \
    --model_path model.pth \
    --img_path path/to/image.jpg \
    --num_classes 10

Python API

from pre_trained_model_explainer import PyTorchExplainableWrapper

explainer = PyTorchExplainableWrapper(
    model_name_or_path="densenet121",
    sam_checkpoint="./sam_vit_h.pth",
    tiny_llava_model_path="bczhou/TinyLLaVA-3.1B"
)

explainer.run_pipeline("path/to/image.jpg")

Research

If you use LAVE in your research, please cite our paper

@article{natarajan2024vale,
  title = {VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers},
  author = {Natarajan, Purushothaman and Nambiar, Athira},
  journal = {arXiv preprint arXiv:2408.12808},
  year = {2024}
}

Language-AwareVisual Explanations