Recent advancements in the field of computer vision with the help of deep neural networks have led us to explore and develop many existing challenges that were once unattended due to the lack of necessary technologies. Hand Sign/Gesture Recognition is one of the significant areas where the deep neural network is making a substantial impact. In the last few years, a large number of researches has been conducted to recognize hand signs and hand gestures, which we aim to extend to our mother-tongue, Bangla (also known as Bengali). The primary goal of our work is to make an automated tool to aid the people who are unable to speak. We developed a system that automatically detects hand sign based digits and speaks out the result in Bangla language. According to the report of the World Health Organization (WHO), 15% of people in the world live with some kind of disabilities. Among them, individuals with communication impairment such as speech disabilities experience substantial barrier in social interaction. The proposed system can be invaluable to mitigate such a barrier. The core of the system is built with a deep learning model which is based on convolutional neural networks (CNN). The model classifies hand sign based digits with 92% accuracy over validation data which ensures it a highly trustworthy system. Upon classification of the digits, the resulting output is fed to the text to speech engine and the translator unit eventually which generates audio output in Bangla language. A web application to demonstrate our tool is available at http://bit.ly/signdigits2banglaspeech.
Detecting emotions from text is an extension of simple sentiment polarity detection. Instead of considering only positive or negative sentiments, emotions are conveyed using more tangible manner; thus, they can be expressed as many shades of gray. This paper manifests the results of our experimentation for fine-grained emotion analysis on Bangla text. We gathered and annotated a text corpus consisting of user comments from several Facebook groups regarding socio-economic and political issues, and we made efforts to extract the basic emotions (sadness, happiness, disgust, surprise, fear, anger) conveyed through these comments. Finally, we compared the results of the five most popular classical machine learning techniques namely Naive Bayes, Decision Tree, k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and K-Means Clustering with several combinations of features. Our best model (SVM with a non-linear radial-basis function (RBF) kernel) achieved an overall average accuracy score of 52.98% and an F1 score (macro) of 0.3324
Visual Genome is a dataset connecting structured image information with English language. We present ``Hindi Visual Genome'', a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. We have selected short English segments (captions) from Visual Genome along with associated images and automatically translated them to Hindi with manual post-editing which took the associated images into account. We prepared a set of 31525 segments, accompanied by a challenge test set of 1400 segments. This challenge test set was created by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity. Our dataset is the first for multimodal English-Hindi machine translation, freely available for non-commercial research purposes. Our Hindi version of Visual Genome also allows to create Hindi image labelers or other practical tools. Hindi Visual Genome also serves in Workshop on Asian Translation (WAT) 2019 Multi-Modal Translation Task.
A word having multiple senses in a text introduces the lexical semantic task to find out which particular sense is appropriate for the given context. One such task is Word sense disambiguation which refers to the identification of the most appropriate meaning of the polysemous word in a given context using computational algorithms. The language processing research in Hindi, the official language of India, and other Indian languages is restricted by unavailability of the standard corpus. For Hindi word sense disambiguation also, the large corpus is not available. In this work, we prepared the text containing new senses of certain words leading to the enrichment of the sense-tagged Hindi corpus of sixty polysemous words. Furthermore, we analyzed two novel lexical associations for Hindi word sense disambiguation based on the contextual features of the polysemous word. The evaluation of these methods is carried out over learning algorithms and favorable results are achieved.
Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages, Hindi and Chinese. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.
Sentiment Analysis and other semantic tasks are commonly used for social media textual analysis to gauge public opinion and make sense from the noise on social media. The language used on social media not only commonly diverges from the formal language, but is compounded by codemixing between languages, especially in large multilingual societies like India. Traditional methods for learning semantic NLP tasks have long relied on end to end task specific training, requiring expensive data creation process, even more so for deep learning methods. This challenge is even more severe for resource scarce texts like codemixed language pairs, with lack of well learnt representations as model priors, and task specific datasets can be few and small in quantities to efficiently exploit recent deep learning approaches. To address above challenges, we introduce curriculum learning strategies for semantic tasks in code-mixed Hindi-English (Hi-En) texts, and investigate various training strategies for enhancing model performance. Our method outperforms the state of the art methods for Hi-En codemixed sentiment analysis by 3.31% accuracy, and also shows better model robustness in terms of convergence, and variance in test performance.
Multilingual writers and speakers often alternate between two languages in a single discourse, a practice called "code-switching". Existing sentiment detection methods are usually trained on sentiment-labeled monolingual text. Manually labeled code-switched text, especially involving minority languages, is extremely rare. Consequently, the best monolingual methods perform relatively poorly on code-switched text. We present an effective technique for synthesizing labeled code-switched text from labeled monolingual text, which is more readily available. The idea is to replace carefully selected subtrees of constituency parses of sentences in the resource-rich language with suitable token spans selected from automatic translations to the resource-poor language. By augmenting scarce human-labeled code-switched text with plentiful synthetic code-switched text, we achieve significant improvements in sentiment labeling accuracy (1.5%, 5.11%, 7.20%) for three different language pairs (English-Hindi, English-Spanish and English-Bengali). We also get significant gains for hate speech detection: 4% improvement using only synthetic text and 6% if augmented with real text.
This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: `Non-aggressive`, `Overtly Aggressive`, and `Covertly Aggressive`. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted $F_1$ score is used as a primary evaluation metric. Results show that text representation using BoW performs better than word embedding on machine learning classifiers. While pre-trained Word embedding techniques perform better on classifiers based on deep neural net. Recent transfer learning model like ELMO, ULMFiT are fine-tuned for the Aggression classification task. However, results are not at par with pre-trained word embedding model. Overall, word embedding using fastText produce best weighted $F_1$-score than Word2Vec and Glove. Results are further improved using pre-trained vector model. Statistical significance tests are employed to ensure the significance of the classification results. In the case of lexically different test Dataset, other than training Dataset, deep neural models are more robust and perform substantially better than machine learning classifiers.
Reading comprehension is a complex process that stems from the development of decoding and understanding the written form of a language. Reading development largely depends on the typological and orthographical features of a language. Hence, research investigating the impact of different writing systems on reading processes and acquisition is needed to inform reading models and teaching practices across language/learning contexts. Malayalam is a prominent Indic language, but has hardly been studied in reading research. Therefore, to stimulate such research, the present chapter explains the orthographical features of Malayalam, considering these in terms of cross-linguistic factors that are important for reading acquisition. The chapter then presents a review of the relevant studies in reading, focusing on akshara orthographies and those recognising metalinguistic awareness as an aspect of successful reading acquisition, particularly in multilingual contexts. The chapter ends by arguing that phoneme-based instructional strategies should be usefully applied to Malayalam, despite its akshara characteristics.
This paper presents a novel approach to generate synthetic dataset for handwritten word recognition systems. It is difficult to recognize handwritten scripts for which sufficient training data is not readily available or it may be expensive to collect such data. Hence, it becomes hard to train recognition systems owing to lack of proper dataset. To overcome such problems, synthetic data could be used to create or expand the existing training dataset to improve recognition performance. Any available digital data from online newspaper and such sources can be used to generate synthetic data. In this paper, we propose to add distortion/deformation to digital data in such a way that the underlying pattern is preserved, so that the image so produced bears a close similarity to actual handwritten samples. The images thus produced can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the recognition system. We experimented using synthetic data to improve the recognition accuracy of isolated characters and words. The framework is tested on 2 Indic scripts - Devanagari (Hindi) and Bengali (Bangla), for numeral, character and word recognition. We have obtained encouraging results from the experiment. Finally, the experiment with Latin text verifies the utility of the approach.