Machine translation is the process of translating text from one language to another. In this paper, Statistical Machine Translation is done on Assamese and English language by taking their respective parallel corpus. A statistical phrase based translation toolkit Moses is used here. To develop the language model and to align the words we used two another tools IRSTLM, GIZA respectively. BLEU score is used to check our translation system performance, how good it is. A difference in BLEU scores is obtained while translating sentences from Assamese to English and vice-versa. Since Indian languages are morphologically very rich hence translation is relatively harder from English to Assamese resulting in a low BLEU score. A statistical transliteration system is also introduced with our translation system to deal basically with proper nouns, OOV (out of vocabulary) words which are not present in our corpus.
Extraction and recognition of Bangla text from video frame images is challenging due to complex color background, low-resolution etc. In this paper, we propose an algorithm for extraction and recognition of Bangla text form such video frames with complex background. Here, a two-step approach has been proposed. First, the text line is segmented into words using information based on line contours. First order gradient value of the text blocks are used to find the word gap. Next, a local binarization technique is applied on each word and text line is reconstructed using those words. Secondly, this binarized text block is sent to OCR for recognition purpose.
This paper explores the use of machine learning approaches, or more specifically, four supervised learning Methods, namely Decision Tree(C 4.5), K-Nearest Neighbour (KNN), Na\"ive Bays (NB), and Support Vector Machine (SVM) for categorization of Bangla web documents. This is a task of automatically sorting a set of documents into categories from a predefined set. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been conducted on Bangla language text categorization. Hence, we attempt to analyze the efficiency of those four methods for categorization of Bangla documents. In order to validate, Bangla corpus from various websites has been developed and used as examples for the experiment. For Bangla, empirical results support that all four methods produce satisfactory performance with SVM attaining good result in terms of high dimensional and relatively noisy document feature vectors.
Nowadays peoples are actively involved in giving comments and reviews on social networking websites and other websites like shopping websites, news websites etc. large number of people everyday share their opinion on the web, results is a large number of user data is collected .users also find it trivial task to read all the reviews and then reached into the decision. It would be better if these reviews are classified into some category so that the user finds it easier to read. Opinion Mining or Sentiment Analysis is a natural language processing task that mines information from various text forms such as reviews, news, and blogs and classify them on the basis of their polarity as positive, negative or neutral. But, from the last few years, user content in Hindi language is also increasing at a rapid rate on the Web. So it is very important to perform opinion mining in Hindi language as well. In this paper a Hindi language opinion mining system is proposed. The system classifies the reviews as positive, negative and neutral for Hindi language. Negation is also handled in the proposed system. Experimental results using reviews of movies show the effectiveness of the system
Opinions are very important in the life of human beings. These Opinions helped the humans to carry out the decisions. As the impact of the Web is increasing day by day, Web documents can be seen as a new source of opinion for human beings. Web contains a huge amount of information generated by the users through blogs, forum entries, and social networking websites and so on To analyze this large amount of information it is required to develop a method that automatically classifies the information available on the Web. This domain is called Sentiment Analysis and Opinion Mining. Opinion Mining or Sentiment Analysis is a natural language processing task that mine information from various text forms such as reviews, news, and blogs and classify them on the basis of their polarity as positive, negative or neutral. But, from the last few years, enormous increase has been seen in Hindi language on the Web. Research in opinion mining mostly carried out in English language but it is very important to perform the opinion mining in Hindi language also as large amount of information in Hindi is also available on the Web. This paper gives an overview of the work that has been done Hindi language.
The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.