Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting
Powerful, flexible, and easy-to-use sonar image synthesis framework
Enhance diversity by blending stylistic elements from real sonar images into generated content using pre-trained diffusion models.
Two-phase generation process produces coarse images first, then refines them for fine-grained details and realism.
Generate high-quality text prompts using GPT models to guide the diffusion process for better text-image alignment.
Visual Language Models provide improved captions with domain-specific language instructions for realistic textures.
Support for both standard fine-tuning and LoRA-based techniques to adapt models with limited computational resources.
Built-in tools for metadata creation, caption generation, and style-based clustering of sonar images.
How Synth-SONAR combines multiple AI techniques for sonar image synthesis
Gather publicly available sonar images, S3 simulated images, and stylized sonar images. Create low-level descriptions manually or via GPT.
Use diffusion models with GPT-generated prompts to produce coarse sonar images. Fine-tune text transformers with LoRA for better alignment.
Apply a second diffusion model to refine coarse images. Use VLM with domain-specific instructions for enhanced realism and details.
Everything you need to get started with Synth-SONAR
Set up your development environment with conda and download required model weights.
Generate styled sonar images using Style-ID with Stable Diffusion.
Fine-tune diffusion models with standard training or LoRA for custom datasets.
Run inference with fine-tuned models to generate new sonar images.
Create training metadata and generate captions using GPT or LLaMA.
Cluster sonar images by stylistic features using PCA and K-Means.
Get started with Synth-SONAR in minutes
# Clone the repository
git clone https://github.com/Purushothaman-natarajan/Synth-SONAR.git
cd Synth-SONAR
# Create and activate conda environment
conda env create -f environment.yaml
conda activate Synth-SONAR
# Download StableDiffusion weights
# From https://huggingface.co/CompVis/stable-diffusion-v1-4-original
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt
# Run with default configuration
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.75 --T 1.5
# High style fidelity
python run_styleid.py --cnt data/cnt --sty data/sty --gamma 0.3 --T 1.5
# Standard Fine-Tuning
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export TRAIN_DIR="path_to_your_dataset"
accelerate launch --mixed_precision="fp16" ./text_to_image/train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--use_ema --resolution=512 --center_crop --random_flip \
--train_batch_size=1 --gradient_accumulation_steps=4 \
--gradient_checkpointing --max_train_steps=5000 \
--learning_rate=1e-05 --output_dir="sd-sonar-model"
from diffusers import StableDiffusionPipeline
import torch
model_path = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="A sonar image of an underwater scene").images[0]
image.save("sonar_image.png")
python create_metadata(data_to_json).py <image_folder>
# Using GPT-3.5 Turbo
python generate_captions_GPT.py <json_file>
# Using LLaMA
python generate_captions_llama.py <jsonl_file> --batch_size 8
If you use Synth-SONAR in your research, please cite our paper
@misc{natarajan2024synthsonar,
title={Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting},
author={Purushothaman Natarajan, Kamal Basha, Athira Nambiar},
year={2024},
eprint={2410.08612},
archivePrefix={arXiv},
primaryClass={cs.CV}
}