DuwatBench is a comprehensive benchmark for evaluating LMMs on Arabic calligraphy recognition. Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. DuwatBench addresses the gap in evaluating how well modern AI systems can process stylized Arabic text.
Figure 1. Left: Proportional breakdown of calligraphic styles. Right: Proportional breakdown of textual categories.
Spanning 6 classical and modern calligraphic styles
Approximately 1,475 unique words spanning religious and cultural domains
For detection-level evaluation
With style and theme labels
Preserving real-world visual complexity
Ensuring annotation quality
Figure 2. End-to-end pipeline for constructing DuwatBench, from data collection and manual transcription with bounding boxes to multi-tier verification and style/theme aggregation.
706 samples (55%)
Ornate script used in mosque decorations
230 samples (18%)
Flowing Ottoman court script
110 samples (9%)
Standard readable script
83 samples (7%)
Geometric angular early Arabic script
76 samples (6%)
Modern everyday handwriting
67 samples (5%)
Persian-influenced flowing script
| Metric | Description |
|---|---|
| CER | Character Error Rate - edit distance at character level |
| WER | Word Error Rate - edit distance at word level |
| chrF | Character n-gram F-score - partial match robustness |
| ExactMatch | Strict full-sequence accuracy |
| NLD | Normalized Levenshtein Distance - balanced error measure |
Open-Source Models (8)
| Model | CER ↓ | WER ↓ | chrF ↑ | ExactMatch ↑ | NLD ↓ |
|---|---|---|---|---|---|
| MBZUAI/AIN* | 0.5494 | 0.6912 | 42.67 | 0.1895 | 0.5134 |
| Gemma-3-27B-IT | 0.5556 | 0.6591 | 51.53 | 0.2398 | 0.4741 |
| Qwen2.5-VL-72B | 0.5709 | 0.7039 | 43.98 | 0.1761 | 0.5298 |
| Qwen2.5-VL-7B | 0.6453 | 0.7768 | 36.97 | 0.1211 | 0.5984 |
| InternVL3-8B | 0.7588 | 0.8822 | 21.75 | 0.0574 | 0.7132 |
| EasyOCR | 0.8538 | 0.9895 | 12.30 | 0.0031 | 0.8163 |
| TrOCR-Arabic* | 0.9728 | 0.9998 | 1.79 | 0.0000 | 0.9632 |
| LLaVA-v1.6-Mistral-7B | 0.9932 | 0.9998 | 9.16 | 0.0000 | 0.9114 |
* Arabic-specific models
Closed-Source Models (5)
| Model | CER ↓ | WER ↓ | chrF ↑ | ExactMatch ↑ | NLD ↓ |
|---|---|---|---|---|---|
| Gemini-2.5-flash | 0.3700 | 0.4478 | 71.82 | 0.4167 | 0.3166 |
| Gemini-1.5-flash | 0.3933 | 0.5112 | 63.28 | 0.3522 | 0.3659 |
| GPT-4o | 0.4766 | 0.5692 | 56.85 | 0.3388 | 0.4245 |
| GPT-4o-mini | 0.6039 | 0.7077 | 42.67 | 0.2115 | 0.5351 |
| Claude-Sonnet-4.5 | 0.6494 | 0.7255 | 42.97 | 0.2225 | 0.5599 |
Word Error Rate (WER ↓) across calligraphy styles - Full Image mode
| Model | Kufic | Thuluth | Diwani | Naskh | Ruq'ah | Nasta'liq |
|---|---|---|---|---|---|---|
| Gemini-2.5-flash | 0.7067 | 0.3527 | 0.5698 | 0.4765 | 0.5817 | 0.5222 |
| Gemini-1.5-flash | 0.7212 | 0.4741 | 0.5783 | 0.4444 | 0.5445 | 0.5023 |
| GPT-4o | 0.8041 | 0.5540 | 0.6370 | 0.4189 | 0.5507 | 0.4434 |
| Gemma-3-27B-IT | 0.7802 | 0.6315 | 0.7326 | 0.5138 | 0.7571 | 0.6637 |
| MBZUAI/AIN | 0.7916 | 0.7036 | 0.7130 | 0.5367 | 0.6111 | 0.6916 |
Gemini-2.5-flash achieves the best overall performance with 41.67% exact match accuracy and the lowest CER of 0.37.
Models perform best on Naskh and Ruq'ah scripts due to their standardized strokes and clear letterforms.
Diwani and Thuluth (ornate scripts with dense ligatures) remain challenging for all models.
Bounding box localization improves performance across most models.
Figure 3. Qualitative results comparing open- and closed-source models on DuwatBench calligraphy samples.
@article{duwatbench2025,
title={DuwatBench: Bridging Language and Visual Heritage through an
Arabic Calligraphy Benchmark for Multimodal Understanding},
author={Patle, Shubham and Ghaboura, Sara and Tariq, Hania and
Khan, Mohammad Usman and Thawakar, Omkar and
Anwer, Rao Muhammad and Khan, Salman},
journal={arXiv preprint arXiv:2502.14865},
year={2025}
}
Digital Archives: Library of Congress, NYPL Digital Collections
Community Repositories: Calligraphy Qalam, Free Islamic Calligraphy, Pinterest
Tools: MakeSense.ai, CAMeL Tools