System Architecture
Understanding Doctane's modular pipeline design
Processing Pipeline
📄 Input
Images & PDFs
🔄 Preprocess
Orientation & Straighten
🎯 Detect
Text Boxes
📝 Recognize
Text Transcription
📦 Output
Document Object
🖼️ Preprocessor
Handles image preprocessing including orientation detection and page straightening.
- Orientation Detection ✓
- Auto-deskew ✓
- Image Enhancement ✓
🎯 Detection Module
Segmentation-based text detection with support for straight and rotated text.
- LinkNet ✓
- DeepLabV3+ ✓
- SegFormer ✓
📝 Recognition Module
Sequence-to-sequence text recognition with attention mechanisms.
- SAR ✓
- ViTSTR ✓
- CRNN ✓
🔧 Builder Module
Constructs hierarchical Document objects with structure preservation.
- Page ✓
- Block ✓
- Line/Word ✓
⚡ Training Module
Distributed training support with DDP for multi-GPU setups.
- DDP Support ✓
- Checkpointing ✓
- Logging ✓
🌐 Export Module
Multiple output formats for downstream processing.
- JSON ✓
- hOCR ✓
- PDF ○
Data Flow Architecture
Image Input
Raw images (PNG, JPG, PDF) are loaded and converted to numpy arrays
Preprocessing
OrientationPredictor detects page tilt, OrientationCorrector straightens if needed
Text Detection
DetectionPredictor runs segmentation model to produce binary mask of text regions
Crop Extraction
Geometry module extracts ROIs based on detection mask contours
Text Recognition
RecognitionPredictor transcribes each crop to text using trained model
Document Assembly
DocumentBuilder assembles Page → Block → Line → Word hierarchy with confidence