India is a multi-lingual country where Roman script is often used alongside different Indic scripts in a text document. To develop a script specific handwritten Optical Character Recognition (OCR) system, it is therefore necessary to identify the scripts of handwritten text correctly. In this paper, we present a system, which automatically separates the scripts of handwritten words from a document, written in Bangla or Devanagri mixed with Roman scripts. In this script separation technique, we first, extract the text lines and words from document pages using a script independent Neighboring Component Analysis technique. Then we have designed a Multi Layer Perceptron (MLP) based classifier for script separation, trained with 8 different wordlevel holistic features. Two equal sized datasets, one with Bangla and Roman scripts and the other with Devanagri and Roman scripts, are prepared for the system evaluation. On respective independent text samples, word-level script identification accuracies of 99.29% and 98.43% are achieved.
Nowadays cell phone is the most common communicating used by mass people. SMS based communication is a cheap and popular communication method. It is human tendency to have the opportunity to write SMS in their mother language. Text input in mother language is more flexible when the alphabets of that language are printed on the keypad. Bangla mobile keypad based on phonetics has been proposed earlier. But the keypad is not scientific from frequency and flexibility point of view. Since it is not a feasible solution in this paper we have proposed an efficient Bengali keypad for cell phone and other cellular device. The proposed keypad is based on the frequency of the alphabets in Bengali language and also with the view of structure of human finger movements. We took the two points in count to provide a flexible and fast cell phone keypad.
Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired. This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition. Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature sets, which involves in structural and gradient features (directional features), are extracted on each Urdu word. Experiments have been conducted on the CENPARMI Urdu Words Database, and a high recognition accuracy of 97.00% has been achieved.