Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism

Why Synth-SONAR?

Powerful, flexible, and easy-to-use sonar image synthesis framework

Style Injection

Enhance diversity by blending stylistic elements from real sonar images into generated content using pre-trained diffusion models.

Dual Diffusion Models

Two-phase generation process produces coarse images first, then refines them for fine-grained details and realism.

GPT-Based Prompting

Generate high-quality text prompts using GPT models to guide the diffusion process for better text-image alignment.

VLM Enhancement

Visual Language Models provide improved captions with domain-specific language instructions for realistic textures.

Flexible Fine-Tuning

Support for both standard fine-tuning and LoRA-based techniques to adapt models with limited computational resources.

Complete Utilities

Built-in tools for metadata creation, caption generation, and style-based clustering of sonar images.

Architecture

How Synth-SONAR combines multiple AI techniques for sonar image synthesis

Three-Phase Pipeline

Phase 1: Data Acquisition
Gather publicly available sonar images, S3 simulated images, and stylized sonar images. Create low-level descriptions manually or via GPT.
Phase 2: Training & Coarse Generation
Use diffusion models with GPT-generated prompts to produce coarse sonar images. Fine-tune text transformers with LoRA for better alignment.
Phase 3: Fine-Grained Tuning
Apply a second diffusion model to refine coarse images. Use VLM with domain-specific instructions for enhanced realism and details.

Quick Start

Get started with Synth-SONAR in minutes

Installation

                # Clone the repository
git clone https://github.com/Purushothaman-natarajan/Synth-SONAR.git
cd Synth-SONAR

# Create and activate conda environment
conda env create -f environment.yaml
conda activate Synth-SONAR

# Download StableDiffusion weights
# From https://huggingface.co/CompVis/stable-diffusion-v1-4-original
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
            

Step 1: Style Injection

                # Run with default configuration
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.75 --T 1.5

# High style fidelity
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.3 --T 1.5
            

Step 2: Fine-Tuning

                # Standard Fine-Tuning
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export TRAIN_DIR="path_to_your_dataset"

accelerate launch --mixed_precision="fp16" ./text_to_image/train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$TRAIN_DIR \
  --use_ema --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 --gradient_accumulation_steps=4 \
  --gradient_checkpointing --max_train_steps=5000 \
  --learning_rate=1e-05 --output_dir="sd-sonar-model"
            

Step 3: Inference

                from diffusers import StableDiffusionPipeline
import torch

model_path = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")

image = pipe(prompt="A sonar image of an underwater scene").images[0]
image.save("sonar_image.png")
            

Utility: Create Metadata

python create_metadata(data_to_json).py <image_folder>

Utility: Generate Captions

                # Using GPT-3.5 Turbo
python generate_captions_GPT.py <json_file>

# Using LLaMA
python generate_captions_llama.py <jsonl_file> --batch_size 8
            

Citation

If you use Synth-SONAR in your research, please cite our paper

@misc{natarajan2024synthsonar,
  title={Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting},
  author={Purushothaman Natarajan, Kamal Basha, Athira Nambiar},
  year={2024},
  eprint={2410.08612},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Synth-SONAR

Why Synth-SONAR?

Style Injection

Dual Diffusion Models

GPT-Based Prompting

VLM Enhancement

Flexible Fine-Tuning

Complete Utilities

Architecture

Three-Phase Pipeline

Documentation

Installation

Style Injection

Fine-Tuning

Inference

Metadata & Captions

Style Clustering

Quick Start

Installation

Step 1: Style Injection

Step 2: Fine-Tuning

Step 3: Inference

Utility: Create Metadata

Utility: Generate Captions

Citation