Date of Award

Winter 1-1-2019

Degree Type

Dissertation

Degree Name

Master of Science In Software Engineering Degree

Department

Electrical Engineering And Computer Science

First Advisor

Chung, Sunnie

Second Advisor

Dr. Yongjian Fu

Third Advisor

Dr. Wenbing Zhao

Abstract

With advances of internet computing and a great success of social media websites, internet is exploded with ahuge number of digital images. Nowadays searching appropriate images directly through search engines and the web is trending. However, automatically finding images relevant to a textual query content remains a very challenging task. Visual Question Answering (VQA) system has emerged as a significant multidisciplinary research problem. The research combines methodologies from the different areas like natural language processing, image recognition and knowledge representation. The main challenges for developing such a VQA system is to deal with the scalability of the solution and handling features of the objects in vision and questions in a natural language simultaneously. Prior works have been done to develop models for VQA by extracting and combining image features using Convolution Neural Network (CNN) and textual features using Recurrent Neural Network (RNN). This thesis explores methodologies to build a Visual Question Answering (VQA) system that can automatically identify and answer a question about the image presented to it. The VQA system uses methods of deep Residual Network (ResNet), an advanced Convolution Neural Network (CNN) model for image identification, and Long Short-Term Memory (LSTM) networks, which is advanced form of Recurring Neural Network (RNN) for Natural Language Processing (NLP) to analyze a user-provided question. Finally, the features from both an iv image and a user question are combined to indicate an attention area to focus on to identify objects in the area of the image in deep residual network, to produce an answer in text. When evaluated on the well-known challenging COCO data set and VQA 1.0 dataset, this system has produced an accuracy of 59%, with a 12% increase when compared with a baseline model without the attention-based technique and the results also show comparable performance to other existing state-of-the-art attention-based approaches in the literature. The quality and the accuracy of the method used in this research are compared and analyzed.

COinS