Abstract:
The work of this dissertation has been done along the lines of TREC News Track
Background Linking task. The task is, given a news article suggest other news articles
that provide context and background to the current article. As we know, context
and background are highly subjective terms. Here they are measured by comparing
the system retrieved documents with a set of documents already marked relevant
according to a panel of experts. The entire task is done on the Washington Post data
set, A collection of 591537 news articles that appeared in Washington Post from 2012
to 2017.
In this dissertation we explore Six methods used to solve this task. These tech-
niques are based on standard Information Retrieval methods and Natural Language
Processing techniques. We compare them with each other and pit them against the
best performing methods. We use JAVA as the main programming language for data
parsing, indexing and searching. Python is also used for data exploration in some
limited cases.
Description:
Dissertation under the supervision of Dr. Mandar Mitra, Indian Statistical Institute, Kolkata,