Documentation
Comprehensive guide to Doctane's features, API, and usage.
Installation
# From source
git clone https://github.com/Purushothaman-natarajan/doctane.git
cd doctane
pip install -e .
Quick Start
import numpy as np
from PIL import Image
from doctane.ocr_pipeline.ocr_predictor import OCRPredictor
image = Image.open("document.jpg").convert("RGB")
np_image = np.array(image)
predictor = OCRPredictor()
output = predictor([np_image])
for page in output.pages:
print(page)
⚠️ Important
This is an open-source development framework. You need to provide your own trained model weights. We do not host or provide pre-trained weights.
Configuration
detection:
model: seg_linknet_resnet50
input_size: [512, 512]
batch_size: 8
recognition:
model: sar_resnet34
vocab_size: 95
batch_size: 32
training:
epochs: 100
lr: 0.001
OCRPredictor
Main class for end-to-end OCR processing.
| Method | Description |
|---|---|
| __init__(det_predictor, reco_predictor) | Initialize with detection and recognition models |
| (images: List[np.ndarray]) -> Document | Process list of images and return Document |
Detection Models
- LinkNet - Fast and lightweight
- DeepLabV3+ - High accuracy for complex layouts
- SegFormer - State-of-the-art transformer-based
Recognition Models
- SAR - Sequence Approximation Recognition
- ViTSTR - Vision Transformer for STR
- CRNN - CNN + RNN + CTC
- MASTER - Multi-Aspect Self-Attention
Training
python train/text_detection/train_detection.py \
--config configs/detection.yaml \
--epochs 100