Language-Aware
Visual Explanations

A multimodal explainability framework that combines SHAP, SAM, and Vision Language Models to provide both visual and textual explanations for image classifiers.

📄 MIT License 🐍 Python 3.8+ 🔥 PyTorch 📊 SHAP 🎯 SAM 💬 VLM

Why LAVE?

Powerful, flexible, and easy-to-use explainability for any image classification model

🎯

No Training Required

Works directly with pre-trained models without any additional training. Just load and explain.

🖼️

Visual Explanations

Highlight important image regions using SHAP values combined with SAM segmentation masks.

📝

Textual Explanations

Generate human-readable natural language descriptions of model predictions using VLMs.

🔄

Flexible

Supports both custom-trained and pre-trained models. Compatible with multiple architectures.

Edge-Optimized

Includes optimizations for deployment on edge devices with FP16 support and quantization.

📚

Well Documented

Comprehensive docstrings, examples, and API documentation for easy integration.

Architecture

How LAVE combines multiple AI techniques to explain model predictions

System Pipeline

📷
Input Image
📊
SHAP
Feature Importance
🎯
SAM
Visual Mask
💬
TinyLLaVA
Text Generation
Combined Output
Visual + Textual

📊 SHAP

Computes feature importance using game theory.

  • DeepExplainer for neural networks
  • Gradient-based attribution
  • Input → Feature importance

🎯 SAM

Meta's state-of-the-art segmentation model.

  • ViT-H based architecture
  • Point-based prompting
  • Important → Binary mask

💬 TinyLLaVA

Efficient VLM for text generation.

  • 3.1B parameters
  • Image captioning
  • Masked → Text description

Documentation

Everything you need to get started with LAVE

🚀 Installation

Set up your development environment.

  • Clone repository
  • Create conda environment
  • Download checkpoints

🔧 Training

Train custom models with transfer learning.

  • Data preprocessing
  • Model training script
  • Evaluation & testing

🎯 Inference

Run predictions and explanations.

  • Pre-trained models
  • Custom models
  • Batch processing

🧩 Supported Models

Multiple backbone architectures.

ResNet18ResNet50VGG16DenseNet121MobileNetV2EfficientNet

📁 Project Structure

Repository organization.

  • train.py - Training
  • test.py - Testing
  • predict.py - Prediction
  • *_explainer.py - Explanation

🤝 Contributing

Help improve LAVE.

  • Fork repository
  • Follow coding standards
  • Submit pull request

Quick Start

Get started with LAVE in minutes

Installation

# Clone the repository
git clone https://github.com/Purushothaman-natarajan/VALE-Explainer.git
cd LAVE-Explainer

# Create and activate environment
conda env create -f environment.yaml
conda activate LAVE-Explainer

# Download model checkpoints
# SAM: https://github.com/facebookresearch/segment-anything

Run the Explainer

# For pre-trained models
python pre-trained_model_explainer.py \
    --model_name densenet121 \
    --img_path path/to/image.jpg \
    --sam_checkpoint path/to/sam_vit_h.pth

# For custom-trained models
python custom_model_explainer.py \
    --model_path model.pth \
    --img_path path/to/image.jpg \
    --num_classes 10

Python API

from pre_trained_model_explainer import PyTorchExplainableWrapper

explainer = PyTorchExplainableWrapper(
    model_name_or_path="densenet121",
    sam_checkpoint="./sam_vit_h.pth",
    tiny_llava_model_path="bczhou/TinyLLaVA-3.1B"
)

explainer.run_pipeline("path/to/image.jpg")

Research

If you use LAVE in your research, please cite our paper

@article{natarajan2024vale,
  title = {VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers},
  author = {Natarajan, Purushothaman and Nambiar, Athira},
  journal = {arXiv preprint arXiv:2408.12808},
  year = {2024}
}