FormLens: From Ink to Insight with Adapting Vision-Language Models for Handwritten Form Digitization

Shaon Bhattacharya, Ajoy Mondal, and C V Jawahar
CVIT, International Institute of Information Technology
To be presented at ICVGIP 2025 (Dec 17 - 20; IIT Mandi, Himachal Pradesh, India.)

Key Contributions

Methodology

FormLens Pipeline
Figure 1: FormLens pipeline overview showing the end-to-end form digitization process.

Form6000 Dataset

Form6000 Dataset
Figure 2: Sample forms from the Form6000 dataset showing diverse layouts and handwriting styles.

Dataset Statistics

50
Unique Form Templates
650
Participants
6,000
Total Images
5,350
Mobile Captures
Characteristic Count
Number of unique form templates 50
Number of participants (writers) 650
Forms filled per participant 1-2
Total handwritten filled forms 650
Scanned high-resolution forms 650
Captured mobile images (7-10 per form) 5,350
Total dataset size (images) 6,000

Results

Performance Comparison

Method WRR CRR P R F1
Google Form Parser 92.14 96.38 88.90 90.45 89.67
Azure Form Recognizer 93.29 97.25 91.12 92.60 91.85
PaddleOCR 35.23 64.03 52.21 48.70 50.40
DocTR 32.22 65.44 50.93 46.28 48.50
Donut 65.33 70.12 66.45 67.21 66.83
Naugat 76.13 83.39 77.90 79.15 78.52
FormLens (ours) 95.44 98.33 94.12 95.31 94.71
FormLens Results Example 1
Example 1: FormLens results showing accurate form digitization.
FormLens Results Example 2
Example 2: FormLens results demonstrating robust performance across different form layouts.