Indian Language Benchmark Portal

3 results

Please Login/Register to submit the new Resources

JU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed Social Media Texts
Kamal Sarkar

This paper reports about our work in the NLP Tool Contest @ICON-2017, shared task on Sentiment Analysis for Indian Languages (SAIL) (code mixed). To implement our system, we have used a machine learning algo-rithm called Multinomial Na\"ive Bayes trained using n-gram and SentiWordnet features. We have also used a small SentiWordnet for English and a small SentiWordnet for Bengali. But we have not used any SentiWordnet for Hindi language. We have tested our system on Hindi-English and Bengali-English code mixed social media data sets released for the contest. The performance of our system is very close to the best system participated in the contest. For both Bengali-English and Hindi-English runs, our system was ranked at the 3rd position out of all submitted runs and awarded the 3rd prize in the contest.

Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015
Kamal Sarkar

This paper discusses the experiments carried out by us at Jadavpur University as part of the participation in ICON 2015 task: POS Tagging for Code-mixed Indian Social Media Text. The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information from dictionary as well as some other word level features to enhance the observation probabilities of the known tokens as well as unknown tokens. We submitted runs for Bengali-English, Hindi-English and Tamil-English Language pairs. Our system has been trained and tested on the datasets released for ICON 2015 shared task: POS Tagging For Code-mixed Indian Social Media Text. In constrained mode, our system obtains average overall accuracy (averaged over all three language pairs) of 75.60% which is very close to other participating two systems (76.79% for IIITH and 75.79% for AMRITA_CEN) ranked higher than our system. In unconstrained mode, our system obtains average overall accuracy of 70.65% which is also close to the system (72.85% for AMRITA_CEN) which obtains the highest average overall accuracy.

A CRF Based POS Tagger for Code-mixed Indian Social Media Text
Kamal Sarkar

In this work, we describe a conditional random fields (CRF) based system for Part-Of- Speech (POS) tagging of code-mixed Indian social media text as part of our participation in the tool contest on POS tagging for codemixed Indian social media text, held in conjunction with the 2016 International Conference on Natural Language Processing, IIT(BHU), India. We participated only in constrained mode contest for all three language pairs, Bengali-English, Hindi-English and Telegu-English. Our system achieves the overall average F1 score of 79.99, which is the highest overall average F1 score among all 16 systems participated in constrained mode contest.

Filter by Author
P. D. Gujrati (8)
Manish Shrivastava (7)
Umapada Pal (5)
Partha Pratim Roy (5)
Iti Mathur (4)
C.V. Jawahar (4)