DL-Studio | Documentation

What is DL-Studio?

Understanding the core purpose and capabilities of DL-Studio

Local-First ML Platform

DL-Studio is a local development environment for building, training, and deploying machine learning and deep learning models. It runs entirely on your machine with no cloud dependencies, ensuring complete data privacy and control.

Unified Framework

Combines traditional ML algorithms (XGBoost, Random Forest, SVM) with deep learning models (MLP, RNN, Transformer) in a single, intuitive interface for easy model comparison and selection.

Built-in Explainability

Features integrated XAI (Explainable AI) capabilities including SHAP, LIME, sensitivity analysis, and correlation matrices to understand and interpret model decisions.

Experiment Tracking

Every training run is logged with metrics, visualizations, and artifacts. Compare models side-by-side and track performance improvements over time.

End-to-End Workflow

From raw data to deployed model in five simple steps

1

Data Upload & Analysis

Upload your CSV or Excel dataset. DL-Studio automatically analyzes the data, detects feature types, identifies missing values, and provides distribution insights.

What happens:

Automatic data type detection (numerical, categorical)
Missing value identification and reporting
Statistical summary generation
Target variable selection

2

Data Preprocessing

Clean and transform your data with built-in preprocessing pipelines. Handle missing values, encode categories, and scale features automatically.

Available transformations:

Missing value imputation (mean, median, mode, drop)
Categorical encoding (one-hot, label)
Feature scaling (standardization, normalization)
Outlier detection and removal

3

Model Selection & Training

Choose from 20+ ML/DL algorithms with configurable hyperparameters. Train with automatic 80/10/10 train/val/test split.

Training features:

One-click model training with smart defaults
Real-time training progress monitoring
Learning curve visualization
Live training logs streaming
Configurable hyperparameters per model

4

Model Evaluation & Comparison

Evaluate models using multiple metrics across train/val/test splits. Compare performance across different algorithms to find the best fit.

Evaluation metrics (per split):

R² Score (train, validation, test)
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Side-by-side model comparison charts
Residual analysis and diagnostics

5

Explainability & Research Plots

Generate SHAP, LIME, and sensitivity analyses. Export paper-quality visualizations for publication and comprehensive reports.

Export options:

Trained model serialization (Keras format)
Feature importance reports (PNG)
SHAP/LIME explanation plots
Research-quality plots (correlation, residuals, distributions)
Complete run artifacts and logs

Supported Algorithms

Comprehensive collection of machine learning and deep learning models

Boosting Ensemble Methods

Gradient boosting algorithms that build trees sequentially, correcting errors from previous iterations. Industry-standard for tabular data performance.

XGBoost

Extreme Gradient Boosting with regularization (L1/L2). Best for structured data competitions.

LightGBM

Leaf-wise tree growth. Faster training on large datasets with similar accuracy.

CatBoost

Native categorical handling with ordered boosting. Minimal preprocessing needed.

Gradient Boosting (sklearn)

Scikit-learn implementation. Slower but reliable for smaller datasets.

Tree-Based Models

Decision tree algorithms that split data based on feature values. Easy to interpret and fast to train.

Decision Tree

Single tree. Good baseline model, prone to overfitting.

Random Forest

Ensemble of decision trees with feature bagging. Reduces overfitting.

Extra Trees

Extremely randomized trees. Faster than Random Forest, often similar performance.

Support Vector Machines

SVMs find optimal hyperplanes to separate classes or fit regression lines. Effective in high-dimensional spaces.

SVM (RBF Kernel)

Radial Basis Function. Handles non-linear relationships.

SVM (Linear)

Linear kernel. Fast for high-dimensional sparse data.

CNN (1-D Convolutional)

For sequential data with local patterns. Learns spatial hierarchies.

RNN / LSTM / GRU

For sequential data. Captures temporal dependencies.

Linear Models

Simple linear models that assume linear relationships between features and target. Fast, interpretable, and good baselines.

Linear Regression

Basic linear relationship. Fastest training, assumes linearity.

Ridge Regression

L2 regularization. Prevents overfitting with many features.

Lasso Regression

L1 regularization. Feature selection by shrinking coefficients to zero.

Deep Learning Models

Neural network architectures for learning complex patterns. Requires more data but can capture non-linear relationships automatically.

MLP (Multi-Layer Perceptron)

Fully connected feedforward network. Universal approximator.

RNN / LSTM / GRU

For sequential data. Captures temporal dependencies.

Transformer

Attention-based. State-of-the-art for NLP and sequences.

Dataset Requirements

Recommended dataset sizes and guidelines for optimal results

Algorithm Type	Minimum Samples	Recommended Samples	Features	Best For
Linear Models	50	500+	1-100	Small Data
Decision Tree	100	1,000+	1-50	Small Data
Random Forest	200	2,000+	1-200	Medium Data
SVM	100	5,000+	1-10,000	Medium Data
XGBoost / LightGBM	500	10,000+	1-1,000	Large Data
CatBoost	500	10,000+	1-500	Large Data
MLP	1,000	50,000+	1-10,000	Medium-Large

Good Data Quality Signs

No missing values or minimal (< 5%) missing data. Balanced classes for classification (within 10:1 ratio). Clean labels without typos or inconsistencies. Relevant features that correlate with target.

Data Quality Red Flags

High missing rates (> 20%) need careful imputation. Heavy class imbalance requires resampling or class weights. Data leakage where future info leaks into training. Outliers that may be errors or genuine extreme values.

Explainable AI (XAI) Techniques

Built-in interpretability methods for understanding model decisions

SHAP (SHapley Additive exPlanations)

Based on game theory, SHAP calculates the marginal contribution of each feature to each prediction. Provides both global feature importance (overall model) and local explanations (individual predictions).

Theoretically grounded Local accuracy guaranteed Missingness handled Global + Local views

LIME (Local Interpretable Model-agnostic Explanations)

LIME explains individual predictions by approximating the model locally with an interpretable model. Perturbs input data around the point of interest and weights samples by proximity.

Model-agnostic Human-interpretable Works on any model Feature perturbation

Sensitivity Analysis

Examines how model predictions change when individual features are varied while keeping others constant. Creates feature response curves showing the relationship between each feature and output.

Intuitive visualization Detects non-linearity No model assumption Domain expert friendly

Correlation Analysis

Computes Pearson correlation coefficients between all numerical features. Essential for understanding feature relationships, detecting multicollinearity, and feature selection.

Multicollinearity detection Feature engineering hints Heatmap visualization Quick insights

Residual Analysis

Plots residuals (actual - predicted) to diagnose model fit. Reveals patterns that the model failed to capture, outliers, and heteroscedasticity.

Model diagnostic Outlier detection Assumption checking Improvement guidance

Native Feature Importance

Built-in importance scores from tree-based models (Gini/MDI for sklearn, gain-based for XGBoost/LightGBM). Quick ranking of features by their contribution to splits.

Fast computation No extra library needed Tree-specific accuracy Baseline comparison

Key Features

What makes DL-Studio powerful for ML development

Easy Data Upload

Drag-and-drop CSV/Excel files. Auto-detection of columns and types.

Auto-Preprocessing

Automatic handling of missing values, encoding, and scaling.

Model Comparison

Train multiple models and compare metrics side-by-side across train/val/test splits.

Research Plots

Paper-quality visualizations for publications: regression, distributions, correlation, importance.

Experiment Tracking

Every run logged with parameters, metrics, and artifacts. History sidebar for comparing runs.

Built-in XAI

SHAP, LIME, Sensitivity Analysis, Correlation Matrix, Residual Analysis.

Configurable Hyperparameters

All models have customizable parameters via the Architecture tab. Click the gear icon.

Dynamic Architecture

Visual neural network diagram updates based on selected model and hyperparameters.

Easy Export

Download trained models, plots, and complete run reports.

Studio Tabs Guide

Understanding the six main tabs in the DL-Studio workspace

Tab	Description	Key Actions
Architecture	Select model family, configure hyperparameters via gear icon, preview dynamic neural network diagram	Choose model, set params, see architecture preview, set benchmark mode
Training Hub	Monitor training with live logs, loss curves, and MAE over epochs. Shows data split row counts	View learning curves, MAE charts, real-time metrics, live log stream
Verification	Test model with random samples. Auto-loads random data from dataset range. Randomize to get fresh samples	Randomize inputs, edit values, run prediction, view target outputs
Split Results	Comprehensive Train/Val/Test metrics table. All metrics: R², MAE, MSE, RMSE per split. Quality & overfit badges	Sort by any metric, radar chart comparison, best-per-split cards, fit status
Benchmark	Side-by-side model comparison across all trained algorithms. Top 3 podium, grouped bar charts	Compare R², MAE by split, full leaderboard, winner highlight
Intelligence	SHAP feature importance, LIME rules, sensitivity analysis curves, correlation matrix	View explanations, understand feature contributions, analyze residuals
Research Plots	Publication-quality charts: regression scatter, importance bar, correlation heatmap, distributions	Export PNG, customize colors, paper-ready visualizations

Run Management

Track, compare, and manage your training experiments

Run History

Every training run is saved with full metadata: parameters, metrics, model artifacts, and logs. Browse past runs via the History sidebar or Run Manager panel. Load any run by ID to restore its state and make new predictions.

Delete Runs

Delete Active: Remove the currently loaded run and all its artifacts. Clear All: Remove all training history at once (with confirmation). Use the cleanup buttons in the Run Manager panel to free disk space and keep your workspace organized.

Prediction Verification

The Verification tab auto-loads a random sample from your dataset with values within each feature's actual min/max range. Click Randomize to get a fresh sample. Modify any value and click Run Prediction to see the model's output. This helps you understand how individual features affect predictions.

Data Split Strategy

Automatic 80/10/10 train/validation/test split for robust model evaluation

Training Set (80%)

Used to train all models including traditional ML and deep learning. The model learns patterns from this data.

Validation Set (10%)

Used for hyperparameter tuning, early stopping, and model selection. Prevents overfitting by monitoring validation loss during training.

Test Set (10%)

Held-out data for final model evaluation. Provides unbiased performance estimate. Benchmark results show metrics for all three splits.

Benchmark Metrics by Split

The Benchmark tab displays R², MAE, and MSE for each model across all three splits. This helps identify overfitting (high train R², low test R²) and underfitting (low R² across all splits). Best practice: test R² should be within 5% of validation R².

Configurable Hyperparameters

Click the ⚙️ icon next to any model to configure its parameters

XGBoost / LightGBM

Trees: Number of trees (50-500)
Max Depth: Tree depth limit (3-12)
Learning Rate: Step size (0.01-0.3)
Subsample: Row sampling (0.5-1.0)
Colsample: Feature sampling (0.5-1.0)

Neural Networks (MLP)

Hidden Layers: Number of layers (1-5)
Neurons: Units per layer (8-512)
Activation: ReLU, Tanh, or Sigmoid
Dropout: Regularization rate (0-0.5)

LSTM / GRU

Units: Memory units (32-256)
Layers: Recurrent layers (1-3)
Bidirectional: Forward + backward (LSTM only)

Transformer

Attention Heads: Parallel attention (1-8)
Layers: Transformer blocks (1-4)
FFN Dimension: Feed-forward size (64-512)

When to Use DL-Studio

Ideal use cases and scenarios

Best For

Tabular data analysis with structured datasets. Quick prototyping to test multiple algorithms. Explainability requirements needing SHAP/LIME. Local development without cloud dependencies.

Not Ideal For

Very large datasets (> 1M rows) may need distributed computing. Real-time inference requiring low-latency APIs. Complex NLP/Vision requiring state-of-the-art transformers. Production pipelines needing CI/CD integration.

DL-Studio Documentation

What is DL-Studio?

Local-First ML Platform

Unified Framework

Built-in Explainability

Experiment Tracking

End-to-End Workflow

Data Upload & Analysis

What happens:

Data Preprocessing

Available transformations:

Model Selection & Training

Training features:

Model Evaluation & Comparison

Evaluation metrics (per split):

Explainability & Research Plots

Export options:

Supported Algorithms

Boosting Ensemble Methods

XGBoost

LightGBM

CatBoost

Gradient Boosting (sklearn)

Tree-Based Models

Decision Tree

Random Forest

Extra Trees

Support Vector Machines

SVM (RBF Kernel)

SVM (Linear)

CNN (1-D Convolutional)

RNN / LSTM / GRU

Linear Models

Linear Regression

Ridge Regression

Lasso Regression

Deep Learning Models

MLP (Multi-Layer Perceptron)

RNN / LSTM / GRU

Transformer

Dataset Requirements

Good Data Quality Signs

Data Quality Red Flags

Explainable AI (XAI) Techniques

SHAP (SHapley Additive exPlanations)

LIME (Local Interpretable Model-agnostic Explanations)

Sensitivity Analysis

Correlation Analysis

Residual Analysis

Native Feature Importance

Key Features

Easy Data Upload

Auto-Preprocessing

Model Comparison

Research Plots

Experiment Tracking

Built-in XAI

Configurable Hyperparameters

Dynamic Architecture

Easy Export

Studio Tabs Guide

Run Management

Run History

Delete Runs

Prediction Verification

Data Split Strategy

Training Set (80%)

Validation Set (10%)

Test Set (10%)

Benchmark Metrics by Split

Configurable Hyperparameters

XGBoost / LightGBM

Neural Networks (MLP)

LSTM / GRU

Transformer

When to Use DL-Studio

Best For

Not Ideal For