IIT Kharagpur has collected a large number of Bengali handwritten documents by eminent personalitiy like Satyajit Ray from National Digital Library project and groundtruth is generated for subsequent development of handwritten Bengali OCR.

Block diagram showing multiple features of preprocessing steps and annotation process with MultiDIAS system.

A graphical user interface based ground-truth generation tool called Multi-layered Document Image Annotation System (MultiDIAS) has been created. The proposed system can accommodate the multi-layered structure of the information content of the document images and simultaneously provide a simple platform to allow pixel level annotation of document images in various layers. The annotation process of a document image is conducted on the foreground pixels segmented with a semi-automatic module and coupled together with user supervision to form the elementary labeling units. These units are then labeled in four hierarchical layers (layout type, entity type, line type, word type) with the user defined labels. The output generated are four ground-truth images with an XML file containing the metadata information. The MultiDIAS is tested on a complex handwritten manuscript written by renowned film director Satyajit Ray for the movie 'Goopi Gyne Bagha Byne '. This annotated data generated using MultiDIAS can further be used in a wide range of applications of document image analysis.