Please use this identifier to cite or link to this item: http://hdl.handle.net/10263/7382
Title: Visual Question Answering
Authors: Borana, Tarun
Keywords: Long short-termmemory
Neural Network Architectures
Issue Date: Jul-2022
Publisher: Indian Statistical Institute, Kolkata
Citation: 28p.
Series/Report no.: Dissertation;2022-7
Abstract: In recent years, tremendous progress has been made in the fields of object detection, computer vision, and natural language processing. Artificial intelligence Systems (AI), such as question-answering models provide the machine with "comprehensive" capabilities using natural language processing. Such a machine can respond to queries in natural language about an unstructured text. For performing the task of VQA, we can combine Natural language processing with computer vision.The purpose of a visual question answering system is to create a system capable of answering natural language queries about images. A number of systems have been introduced for visual question answering that use learning algorithms and deep-learning architectures. This project introduces a VQA system that uses deep understanding of images using a deep convolutional neural network (CNN) that helps to extract features from image and LSTM are used for word embeddings for question texts.in this project we are taking only those questions that have answer type yes or no. Hence, Our system achieves complex reasoning and natural language understanding so that it can correctly predict the request and give the appropriate answer yes or no. Different architectures are introduced to combine the image and language models.
Description: Dissertation under the supervision of Dr. Ujjwal Bhattacharya
URI: http://hdl.handle.net/10263/7382
Appears in Collections:Dissertations - M Tech (CS)

Files in This Item:
File Description SizeFormat 
Dissertation_tarun_borana -7.pdfDissertation2.33 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.