dc.contributor.author |
Basu, Spandan |
|
dc.date.accessioned |
2021-08-04T05:52:22Z |
|
dc.date.available |
2021-08-04T05:52:22Z |
|
dc.date.issued |
2020-07 |
|
dc.identifier.citation |
77p. |
en_US |
dc.identifier.uri |
http://hdl.handle.net/10263/7183 |
|
dc.description |
Dissertation under the supervision Dipti Prasad Mukherjee, Professor, ECSU |
en_US |
dc.description.abstract |
This research work is aimed to design a method for predicting the view count of a
video using deep neural network based analysis of subjective video attributes. With
more and more companies turning to online video content in
uencers to capture
the millennial audience, getting people to watch your videos on online platforms is
becoming increasingly lucrative. So we provide a solution to the problem by building
a model of our own. Our model takes four subjective video attributes as input and
predicts the probable view of the video as output. The attributes are the thumbnail
image, the title caption, the audio associated with the video and the video itself. We
preprocess each of the attributes seperately to obtain the feature vectors. Our model
contains four branches to deal with these attributes. We pass the feature vectors of
each of the component to the respective branches of the model to capture the salient
features with regards to the thumbnail image using a pre-trained CNN architecture,
AlexNet; the sentimental feature with regards to the title caption using Sentiment
Intensity Analyzer; the temporal feature with regards to the audio waveform using
LSTM and both the temporal and salient features with regards to the video using
Convolutional LSTM. Since a user, clicks a video based on the title and the thumbnail
associated with the video on most online platforms, the model tries to generate a
click a nity feature depicting the a nity of the user to click the video. After the
user clicked the video, the user decides to view the video based on the audio and
the video itself, so the view count of the video is predicted by taking into account
the click a nity feature alongwith the temporal feature of the audio waveform and
the spatial - temporal feature of the video using a regressor network called the viralvideo-
prediction network. A loss function designed from this regression values is used
to train the last two stages of the pipeline. We obtain a test accuracy as high as
95.89%. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Indian Statistical Institute, Kolkata |
en_US |
dc.relation.ispartofseries |
Dissertation;;2020-29 |
|
dc.subject |
AlexNet |
en_US |
dc.subject |
deep neural network |
en_US |
dc.subject |
Sentiment Intensity Analyzer |
en_US |
dc.subject |
LSTM |
en_US |
dc.subject |
Convolutional LSTM |
en_US |
dc.title |
View Count Prediction of a Video through Deep Neural Network based Analysis of Subjective Video Attributes |
en_US |
dc.type |
Other |
en_US |