CEERI, Pilani has been looking into development of a deep learning based framework to enhance the degraded document image for boosting OCR’s performance. Enhancement will include all factors which will help to improve OCR performance such as resolution enhancement, removing noises and removing text touching lines etc.

A Convolutional Autoencoder neural network model of 6 layers is used with 16, 32, 64 feature map. The kernal size was 5*5. Since the challenge was denoising and super-resolution, another framework with same specification for denoising, toching line removal is also trained. Then both network are end-to-end finetuned to get denoising and super-resolution at same time. Our de-noising architecture can do multiple task such as touching line removal, salt n pepper and different noise removal. Initially, the scale of super-resolution is 2X. The trained network is then used in cascaded form to create 8X super-resolved text images.

Pipeline for the lower-resolution image recognition system

Final block is a cascaded implementation of all the blocks shown above. In final block we give images as input. These images are then passed through the layout and linear element removal block, which gives clean images, which is then Super-Resolved 8 times to get a High resolution image. The images are then passed through a OCR. We also do some dictionary based post processing to get better results.