Please use this identifier to cite or link to this item: http://hdl.handle.net/10263/7173
Title: Part 1. An Explainer for Information Retrieval Research
Other Titles: Part 2. Open Domain Complex Question Answering
Authors: Saha, Sourav
Keywords: I-REX
Open Domain Complex Question Answering
Issue Date: Jul-2020
Publisher: Indian Statistical Institute, Kolkata
Citation: 61p.
Series/Report no.: Dissertation;;2020-19
Abstract: This thesis is organised in two parts. First, an explainability in Information retrieval (IR) research where we focus on the performance of the IR models. We present a toolkit I-REX to illustrate the performance and explainability of IR systems. It is an interactive interface built on top of Lucene and gives a white box view of any proposed method. It is implemented as a web based and as well as shell based interface to provide an intuitive explanations and performance of IR systems. The baseline retrieval models such as LM, BM25 and DFR, and a set of well-de ned features enable debugging the performance of retrieval experiments such as ad-hoc IR or query expansion. Next we worked on an open domain complex factoid Question Answering (QA). Creating annotated data in QA problem requires lot of resources and it is very time consuming. The available datasets are often domain speci c and most of the times created for some speci c languages. Therefore we mainly focus on answering the questions in an unsupervised way. As a benchmark data we used the data provided by Lu et al. (Quest)[26]. It mainly focuses on complex questions which cannot be answered by knowledge graphs (KGs) directly. Our architecture uses corpus signals over the various documents along with the traditional QA pipeline to answer the complex questions. We proposed a set of modi ed evaluation protocols to overcome some serious pitfalls in the evaluation measure used in Quest. We also compared the performances of our architecture with another neural benchmark model DrQA [11]. Experiments on this benchmark datasets have shown that our model signi cantly outperforms Quest and DrQA. We nd this very encouraging since DrQA is trained on SQuAD [32], TREC Questions [4], WebQuestion [5], WikiMovies [30] while our proposed method is unsupervised in nature.
Description: Dissertation under the supervision of Mandar Mitra, Indian Statistical Institute, Kolkata
URI: http://hdl.handle.net/10263/7173
Appears in Collections:Dissertations - M Tech (CS)

Files in This Item:
File Description SizeFormat 
mtech_thesis_sourav.pdf1.06 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.