System Architecture

Understanding Doctane's modular pipeline design

Images & PDFs

→

Orientation & Straighten

→

Text Boxes

→

Text Transcription

→

Document Object

Handles image preprocessing including orientation detection and page straightening.

Segmentation-based text detection with support for straight and rotated text.

Sequence-to-sequence text recognition with attention mechanisms.

Constructs hierarchical Document objects with structure preservation.

Distributed training support with DDP for multi-GPU setups.

Multiple output formats for downstream processing.

Raw images (PNG, JPG, PDF) are loaded and converted to numpy arrays

OrientationPredictor detects page tilt, OrientationCorrector straightens if needed

DetectionPredictor runs segmentation model to produce binary mask of text regions

Geometry module extracts ROIs based on detection mask contours

RecognitionPredictor transcribes each crop to text using trained model

DocumentBuilder assembles Page → Block → Line → Word hierarchy with confidence