This​ ​project​ ​investigates​ ​a​ ​set​ ​of​ ​sub-problems​ ​related to​ ​recognition​ ​and​ ​retrieval​ ​of​ ​degraded​ ​and​ ​challenging document​ ​images​ ​in​ ​Indian​ ​languages.​ ​Traditionally​ ​the problem​ ​of​ ​recognition​ ​is​ ​called​ ​OCR.​ ​However,​ ​OCRs​ ​are reliable​ ​only​ ​when​ ​the​ ​document​ ​is​ ​printed​ ​and reasonably​ clean.​ ​Many​ ​practically​ ​important​ ​documents in​ ​Indian​ context​ ​(such​ ​as​ ​massive​ ​collection​ ​of manuscripts​ available​ ​in​ ​courts,​ ​historical​ ​newspaper articles, ​handwritten​ ​notes​ ​of​ ​freedom​ ​fighters)​ ​have variable ​in​print​ ​style,​ ​are​ ​affected​ ​by​ ​ageing related​ ​noise​ ​and ​varying​ ​scan​ ​settings.​

We focus on the content aware image processing algorithms for robust and efficient recognition and retrieval from Indian language document images. Our image processing algorithms aim at improving the quality of document images by removing the noise and low resolution artifacts by adopting content aware operations. We also work on developing recognizers using state of the art machine learning techniques such as deep learning for handwritten Indian language text. ​In this project, we specifically work on

Some of the results and publications for this project have been added here.