Features Pipeline Models Get Started Profile GitHub Docs

System Architecture

Understanding Doctane's modular pipeline design

Processing Pipeline

📄 Input

Images & PDFs

🔄 Preprocess

Orientation & Straighten

🎯 Detect

Text Boxes

📝 Recognize

Text Transcription

📦 Output

Document Object

🖼️ Preprocessor

Handles image preprocessing including orientation detection and page straightening.

  • Orientation Detection
  • Auto-deskew
  • Image Enhancement

🎯 Detection Module

Segmentation-based text detection with support for straight and rotated text.

  • LinkNet
  • DeepLabV3+
  • SegFormer

📝 Recognition Module

Sequence-to-sequence text recognition with attention mechanisms.

  • SAR
  • ViTSTR
  • CRNN

🔧 Builder Module

Constructs hierarchical Document objects with structure preservation.

  • Page
  • Block
  • Line/Word

⚡ Training Module

Distributed training support with DDP for multi-GPU setups.

  • DDP Support
  • Checkpointing
  • Logging

🌐 Export Module

Multiple output formats for downstream processing.

  • JSON
  • hOCR
  • PDF

Data Flow Architecture

1

Image Input

Raw images (PNG, JPG, PDF) are loaded and converted to numpy arrays

2

Preprocessing

OrientationPredictor detects page tilt, OrientationCorrector straightens if needed

3

Text Detection

DetectionPredictor runs segmentation model to produce binary mask of text regions

4

Crop Extraction

Geometry module extracts ROIs based on detection mask contours

5

Text Recognition

RecognitionPredictor transcribes each crop to text using trained model

6

Document Assembly

DocumentBuilder assembles Page → Block → Line → Word hierarchy with confidence