Abstract:
Classi cation of micro-blog texts is a very common task for sentiment
analysis, user opinion mining, product review analysis, crisis managements,
identifying ofensive and hate speech propagation across
social media, restricting unnecessary expansion of fake news and rumors
etc. In this dissertation, we consider two problems from this
domain: (i) classi cation of tweets during crisis scenarios like natural
disasters, terrorist attacks etc and (ii) identifying o ensive tweets.
We tried both statistical and deep learning approaches. Datasets from
the TREC-IS 2018 and 2019 tasks, and OLID from O enseEval workshop
were used for our experiments. The rst task is formulated as a
multi-label classi cation task, while the second is a binary classi cation
problem. Our results suggest that preprocessing of social media
text is very crucial for classi cation. We also conclude that Deep
Learning approaches do not always outperform traditional learning.
We also took part as an active participant in the TREC-IS 2019A
task. Out of all 34 submissions from across the world, one of our
submissions achieved the highest macro-averaged F-1 score on this
task (0.1969) and outperformed the second highest score (0.1556) by
a substantial margin.