What is DL-Studio?
Understanding the core purpose and capabilities of DL-Studio
Local-First ML Platform
DL-Studio is a local development environment for building, training, and deploying machine learning and deep learning models. It runs entirely on your machine with no cloud dependencies, ensuring complete data privacy and control.
Unified Framework
Combines traditional ML algorithms (XGBoost, Random Forest, SVM) with deep learning models (MLP, RNN, Transformer) in a single, intuitive interface for easy model comparison and selection.
Built-in Explainability
Features integrated XAI (Explainable AI) capabilities including SHAP, LIME, sensitivity analysis, and correlation matrices to understand and interpret model decisions.
Experiment Tracking
Every training run is logged with metrics, visualizations, and artifacts. Compare models side-by-side and track performance improvements over time.
End-to-End Workflow
From raw data to deployed model in five simple steps
Data Upload & Analysis
Upload your CSV or Excel dataset. DL-Studio automatically analyzes the data, detects feature types, identifies missing values, and provides distribution insights.
What happens:
- Automatic data type detection (numerical, categorical)
- Missing value identification and reporting
- Statistical summary generation
- Target variable selection
Data Preprocessing
Clean and transform your data with built-in preprocessing pipelines. Handle missing values, encode categories, and scale features automatically.
Available transformations:
- Missing value imputation (mean, median, mode, drop)
- Categorical encoding (one-hot, label)
- Feature scaling (standardization, normalization)
- Outlier detection and removal
Model Selection & Training
Choose from 20+ ML/DL algorithms with configurable hyperparameters. Train with automatic 80/10/10 train/val/test split.
Training features:
- One-click model training with smart defaults
- Real-time training progress monitoring
- Learning curve visualization
- Live training logs streaming
- Configurable hyperparameters per model
Model Evaluation & Comparison
Evaluate models using multiple metrics across train/val/test splits. Compare performance across different algorithms to find the best fit.
Evaluation metrics (per split):
- R² Score (train, validation, test)
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Side-by-side model comparison charts
- Residual analysis and diagnostics
Explainability & Research Plots
Generate SHAP, LIME, and sensitivity analyses. Export paper-quality visualizations for publication and comprehensive reports.
Export options:
- Trained model serialization (Keras format)
- Feature importance reports (PNG)
- SHAP/LIME explanation plots
- Research-quality plots (correlation, residuals, distributions)
- Complete run artifacts and logs
Supported Algorithms
Comprehensive collection of machine learning and deep learning models
Boosting Ensemble Methods
Gradient boosting algorithms that build trees sequentially, correcting errors from previous iterations. Industry-standard for tabular data performance.
XGBoost
Extreme Gradient Boosting with regularization (L1/L2). Best for structured data competitions.
LightGBM
Leaf-wise tree growth. Faster training on large datasets with similar accuracy.
CatBoost
Native categorical handling with ordered boosting. Minimal preprocessing needed.
Gradient Boosting (sklearn)
Scikit-learn implementation. Slower but reliable for smaller datasets.
Tree-Based Models
Decision tree algorithms that split data based on feature values. Easy to interpret and fast to train.
Decision Tree
Single tree. Good baseline model, prone to overfitting.
Random Forest
Ensemble of decision trees with feature bagging. Reduces overfitting.
Extra Trees
Extremely randomized trees. Faster than Random Forest, often similar performance.
Support Vector Machines
SVMs find optimal hyperplanes to separate classes or fit regression lines. Effective in high-dimensional spaces.
SVM (RBF Kernel)
Radial Basis Function. Handles non-linear relationships.
SVM (Linear)
Linear kernel. Fast for high-dimensional sparse data.
CNN (1-D Convolutional)
For sequential data with local patterns. Learns spatial hierarchies.
RNN / LSTM / GRU
For sequential data. Captures temporal dependencies.
Linear Models
Simple linear models that assume linear relationships between features and target. Fast, interpretable, and good baselines.
Linear Regression
Basic linear relationship. Fastest training, assumes linearity.
Ridge Regression
L2 regularization. Prevents overfitting with many features.
Lasso Regression
L1 regularization. Feature selection by shrinking coefficients to zero.
Deep Learning Models
Neural network architectures for learning complex patterns. Requires more data but can capture non-linear relationships automatically.
MLP (Multi-Layer Perceptron)
Fully connected feedforward network. Universal approximator.
RNN / LSTM / GRU
For sequential data. Captures temporal dependencies.
Transformer
Attention-based. State-of-the-art for NLP and sequences.
Dataset Requirements
Recommended dataset sizes and guidelines for optimal results
| Algorithm Type | Minimum Samples | Recommended Samples | Features | Best For |
|---|---|---|---|---|
| Linear Models | 50 | 500+ | 1-100 | Small Data |
| Decision Tree | 100 | 1,000+ | 1-50 | Small Data |
| Random Forest | 200 | 2,000+ | 1-200 | Medium Data |
| SVM | 100 | 5,000+ | 1-10,000 | Medium Data |
| XGBoost / LightGBM | 500 | 10,000+ | 1-1,000 | Large Data |
| CatBoost | 500 | 10,000+ | 1-500 | Large Data |
| MLP | 1,000 | 50,000+ | 1-10,000 | Medium-Large |
Good Data Quality Signs
No missing values or minimal (< 5%) missing data. Balanced classes for classification (within 10:1 ratio). Clean labels without typos or inconsistencies. Relevant features that correlate with target.
Data Quality Red Flags
High missing rates (> 20%) need careful imputation. Heavy class imbalance requires resampling or class weights. Data leakage where future info leaks into training. Outliers that may be errors or genuine extreme values.
Explainable AI (XAI) Techniques
Built-in interpretability methods for understanding model decisions
SHAP (SHapley Additive exPlanations)
Based on game theory, SHAP calculates the marginal contribution of each feature to each prediction. Provides both global feature importance (overall model) and local explanations (individual predictions).
LIME (Local Interpretable Model-agnostic Explanations)
LIME explains individual predictions by approximating the model locally with an interpretable model. Perturbs input data around the point of interest and weights samples by proximity.
Sensitivity Analysis
Examines how model predictions change when individual features are varied while keeping others constant. Creates feature response curves showing the relationship between each feature and output.
Correlation Analysis
Computes Pearson correlation coefficients between all numerical features. Essential for understanding feature relationships, detecting multicollinearity, and feature selection.
Residual Analysis
Plots residuals (actual - predicted) to diagnose model fit. Reveals patterns that the model failed to capture, outliers, and heteroscedasticity.
Native Feature Importance
Built-in importance scores from tree-based models (Gini/MDI for sklearn, gain-based for XGBoost/LightGBM). Quick ranking of features by their contribution to splits.
Key Features
What makes DL-Studio powerful for ML development
Easy Data Upload
Drag-and-drop CSV/Excel files. Auto-detection of columns and types.
Auto-Preprocessing
Automatic handling of missing values, encoding, and scaling.
Model Comparison
Train multiple models and compare metrics side-by-side across train/val/test splits.
Research Plots
Paper-quality visualizations for publications: regression, distributions, correlation, importance.
Experiment Tracking
Every run logged with parameters, metrics, and artifacts. History sidebar for comparing runs.
Built-in XAI
SHAP, LIME, Sensitivity Analysis, Correlation Matrix, Residual Analysis.
Configurable Hyperparameters
All models have customizable parameters via the Architecture tab. Click the gear icon.
Dynamic Architecture
Visual neural network diagram updates based on selected model and hyperparameters.
Easy Export
Download trained models, plots, and complete run reports.
Studio Tabs Guide
Understanding the six main tabs in the DL-Studio workspace
| Tab | Description | Key Actions |
|---|---|---|
| Architecture | Select model family, configure hyperparameters via gear icon, preview dynamic neural network diagram | Choose model, set params, see architecture preview, set benchmark mode |
| Training Hub | Monitor training with live logs, loss curves, and MAE over epochs. Shows data split row counts | View learning curves, MAE charts, real-time metrics, live log stream |
| Verification | Test model with random samples. Auto-loads random data from dataset range. Randomize to get fresh samples | Randomize inputs, edit values, run prediction, view target outputs |
| Split Results | Comprehensive Train/Val/Test metrics table. All metrics: R², MAE, MSE, RMSE per split. Quality & overfit badges | Sort by any metric, radar chart comparison, best-per-split cards, fit status |
| Benchmark | Side-by-side model comparison across all trained algorithms. Top 3 podium, grouped bar charts | Compare R², MAE by split, full leaderboard, winner highlight |
| Intelligence | SHAP feature importance, LIME rules, sensitivity analysis curves, correlation matrix | View explanations, understand feature contributions, analyze residuals |
| Research Plots | Publication-quality charts: regression scatter, importance bar, correlation heatmap, distributions | Export PNG, customize colors, paper-ready visualizations |
Run Management
Track, compare, and manage your training experiments
Run History
Every training run is saved with full metadata: parameters, metrics, model artifacts, and logs. Browse past runs via the History sidebar or Run Manager panel. Load any run by ID to restore its state and make new predictions.
Delete Runs
Delete Active: Remove the currently loaded run and all its artifacts. Clear All: Remove all training history at once (with confirmation). Use the cleanup buttons in the Run Manager panel to free disk space and keep your workspace organized.
Prediction Verification
The Verification tab auto-loads a random sample from your dataset with values within each feature's actual min/max range. Click Randomize to get a fresh sample. Modify any value and click Run Prediction to see the model's output. This helps you understand how individual features affect predictions.
Data Split Strategy
Automatic 80/10/10 train/validation/test split for robust model evaluation
Training Set (80%)
Used to train all models including traditional ML and deep learning. The model learns patterns from this data.
Validation Set (10%)
Used for hyperparameter tuning, early stopping, and model selection. Prevents overfitting by monitoring validation loss during training.
Test Set (10%)
Held-out data for final model evaluation. Provides unbiased performance estimate. Benchmark results show metrics for all three splits.
Benchmark Metrics by Split
The Benchmark tab displays R², MAE, and MSE for each model across all three splits. This helps identify overfitting (high train R², low test R²) and underfitting (low R² across all splits). Best practice: test R² should be within 5% of validation R².
Configurable Hyperparameters
Click the ⚙️ icon next to any model to configure its parameters
XGBoost / LightGBM
Trees: Number of trees (50-500)
Max Depth: Tree depth limit (3-12)
Learning Rate: Step size (0.01-0.3)
Subsample: Row sampling (0.5-1.0)
Colsample: Feature sampling (0.5-1.0)
Neural Networks (MLP)
Hidden Layers: Number of layers (1-5)
Neurons: Units per layer (8-512)
Activation: ReLU, Tanh, or Sigmoid
Dropout: Regularization rate (0-0.5)
LSTM / GRU
Units: Memory units (32-256)
Layers: Recurrent layers (1-3)
Bidirectional: Forward + backward (LSTM only)
Transformer
Attention Heads: Parallel attention (1-8)
Layers: Transformer blocks (1-4)
FFN Dimension: Feed-forward size (64-512)
When to Use DL-Studio
Ideal use cases and scenarios
Best For
Tabular data analysis with structured datasets. Quick prototyping to test multiple algorithms. Explainability requirements needing SHAP/LIME. Local development without cloud dependencies.
Not Ideal For
Very large datasets (> 1M rows) may need distributed computing. Real-time inference requiring low-latency APIs. Complex NLP/Vision requiring state-of-the-art transformers. Production pipelines needing CI/CD integration.