Features Pipeline Models Get Started Profile GitHub Docs

Documentation

Comprehensive guide to Doctane's features, API, and usage.

Installation

# From source git clone https://github.com/Purushothaman-natarajan/doctane.git cd doctane pip install -e .

Quick Start

import numpy as np from PIL import Image from doctane.ocr_pipeline.ocr_predictor import OCRPredictor image = Image.open("document.jpg").convert("RGB") np_image = np.array(image) predictor = OCRPredictor() output = predictor([np_image]) for page in output.pages: print(page)

⚠️ Important

This is an open-source development framework. You need to provide your own trained model weights. We do not host or provide pre-trained weights.

Configuration

detection: model: seg_linknet_resnet50 input_size: [512, 512] batch_size: 8 recognition: model: sar_resnet34 vocab_size: 95 batch_size: 32 training: epochs: 100 lr: 0.001

OCRPredictor

Main class for end-to-end OCR processing.

MethodDescription
__init__(det_predictor, reco_predictor)Initialize with detection and recognition models
(images: List[np.ndarray]) -> DocumentProcess list of images and return Document

Detection Models

  • LinkNet - Fast and lightweight
  • DeepLabV3+ - High accuracy for complex layouts
  • SegFormer - State-of-the-art transformer-based

Recognition Models

  • SAR - Sequence Approximation Recognition
  • ViTSTR - Vision Transformer for STR
  • CRNN - CNN + RNN + CTC
  • MASTER - Multi-Aspect Self-Attention

Training

python train/text_detection/train_detection.py \ --config configs/detection.yaml \ --epochs 100