Segmentation of Degraded Malayalam Words: Methods and Evaluation
Devendra Sachan Shrey Dutta T.S. Naveen C.V. Jawahar

In most of the Optical Character Recognition soft-wares, a substantial percentage of errors are caused by the incor-rect segmentation of degraded words. This is especially true forrecognizing old books, newspapers and historical manuscripts.In this paper, we propose multiple segmentation methods whichaddress the problem of cuts and merges in degraded words. Wehave created an annotated dataset of 1034 word images withpixel level ground truth for quantitative evaluation of the meth-ods. We compare the methods with a baseline implementationbased on connected component analysis. We report substantialimprovement in accuracy both at character and at word level.Keywords-Character Segmentation; Degradation Correction;Malayalam; Indian Language;

