Documentation

Comprehensive guide to Doctane's features, API, and usage.

Installation

# From source
git clone https://github.com/Purushothaman-natarajan/doctane.git
cd doctane
pip install -e .
                    

Quick Start

import numpy as np
from PIL import Image
from doctane.ocr_pipeline.ocr_predictor import OCRPredictor

image = Image.open("document.jpg").convert("RGB")
np_image = np.array(image)

predictor = OCRPredictor()
output = predictor([np_image])

for page in output.pages:
    print(page)
                    

⚠️ Important

This is an open-source development framework. You need to provide your own trained model weights. We do not host or provide pre-trained weights.

Configuration

detection:
  model: seg_linknet_resnet50
  input_size: [512, 512]
  batch_size: 8
recognition:
  model: sar_resnet34
  vocab_size: 95
  batch_size: 32
training:
  epochs: 100
  lr: 0.001
                    

OCRPredictor

Main class for end-to-end OCR processing.

Method	Description
__init__(det_predictor, reco_predictor)	Initialize with detection and recognition models
(images: List[np.ndarray]) -> Document	Process list of images and return Document

Detection Models

LinkNet - Fast and lightweight
DeepLabV3+ - High accuracy for complex layouts
SegFormer - State-of-the-art transformer-based

Recognition Models

SAR - Sequence Approximation Recognition
ViTSTR - Vision Transformer for STR
CRNN - CNN + RNN + CTC
MASTER - Multi-Aspect Self-Attention

Training

python train/text_detection/train_detection.py \
    --config configs/detection.yaml \
    --epochs 100