Visual Question Answering

Borana, Tarun

Visual Question Answering

Borana, Tarun

Date: 2022-07

Abstract:

In recent years, tremendous progress has been made in the fields of object detection, computer vision, and natural language processing. Artificial intelligence Systems (AI), such as question-answering models provide the machine with "comprehensive" capabilities using natural language processing. Such a machine can respond to queries in natural language about an unstructured text. For performing the task of VQA, we can combine Natural language processing with computer vision.The purpose of a visual question answering system is to create a system capable of answering natural language queries about images. A number of systems have been introduced for visual question answering that use learning algorithms and deep-learning architectures. This project introduces a VQA system that uses deep understanding of images using a deep convolutional neural network (CNN) that helps to extract features from image and LSTM are used for word embeddings for question texts.in this project we are taking only those questions that have answer type yes or no. Hence, Our system achieves complex reasoning and natural language understanding so that it can correctly predict the request and give the appropriate answer yes or no. Different architectures are introduced to combine the image and language models.