Please use this identifier to cite or link to this item: http://hdl.handle.net/10263/7272
Title: Classi cation of Micro-Blog Texts
Authors: Sen, Bihan
Keywords: Text Classi cation
Classi cation Algorithms
Issue Date: Jul-2019
Publisher: Indian Statistical Institute,Kolkata
Citation: 45p.
Series/Report no.: Dissertation;;2019-23
Abstract: Classi cation of micro-blog texts is a very common task for sentiment analysis, user opinion mining, product review analysis, crisis managements, identifying ofensive and hate speech propagation across social media, restricting unnecessary expansion of fake news and rumors etc. In this dissertation, we consider two problems from this domain: (i) classi cation of tweets during crisis scenarios like natural disasters, terrorist attacks etc and (ii) identifying o ensive tweets. We tried both statistical and deep learning approaches. Datasets from the TREC-IS 2018 and 2019 tasks, and OLID from O enseEval workshop were used for our experiments. The rst task is formulated as a multi-label classi cation task, while the second is a binary classi cation problem. Our results suggest that preprocessing of social media text is very crucial for classi cation. We also conclude that Deep Learning approaches do not always outperform traditional learning. We also took part as an active participant in the TREC-IS 2019A task. Out of all 34 submissions from across the world, one of our submissions achieved the highest macro-averaged F-1 score on this task (0.1969) and outperformed the second highest score (0.1556) by a substantial margin.
Description: Dissertation under the supervision of Dr. Mandar Mitra
URI: http://hdl.handle.net/10263/7272
Appears in Collections:Dissertations - M Tech (CS)

Files in This Item:
File Description SizeFormat 
ClassificationMBlog_Bihan.pdf434.6 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.