HP Labs Technical Reports

Click here for full text: PDF

Adaptive Thresholding for OCR: A Significant Test

Smith, Ray; Newton, Chris; Cheatle, Phil



Abstract: Although many adaptive thresholding algorithms have been published, few have ever been rigorously tested on a significant number of images. Also, few have been applied to OCR. The results of testing a new algorithm against three previously published algorithms are given. Performance for OCR applications is tested objectively by running an OCR system on the thresholded images of 350 A4 pages. The results show our new algorithm to be superior to the others tested. The differences measured would happen with probability less than 1% if they were occurring randomly. Of the three existing algorithms tested, only Otsu's algorithm achieves a performance better than choosing a fixed threshold at 50% of the greyscale range. This has an important implication for other algorithms derived from the commonly used model of images- simple objects corrupted by Gaussian noise. Algorithms tested only with synthetic images derived from the model can easily fail when applied to real world images.

Back to Index

[Research] [News] [Tech Reports] [Palo Alto] [Bristol] [Japan] [Israel] [Site Map] [Home] [Hewlett-Packard]