dc.description.abstract |
Phishing Attack is one of the cyber bullying activity over the internet. Most of the
phishing websites try to look similar to legitimate websites, their web content and
URL features memic the legitimate URL. Due to emerging new techniques, detecting
and analyzing these malicious URL is very costly due to their complexities. Traditionally,
black and white listing is used for detection, but these technique was not good
for real time.To address this, recent years have witnessed several e orts to perform
Malicious URL Detection using Machine Learning. The most popular and scalable
approaches use lexical properties of the URL string by extracting Bag-of-words like
features, followed by applying machine learning models such as SVMs, Randon Forest
etc. Various machine learning and deep learning techniques are used to improve
generalization of malicious URLs.These approaches su er from several limitations:
(i) Inability to e ectively capture semantic meaning and sequential patterns in URL
strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to
handle unseen features and generalize to test data.
To address these Limitation, In this dissertation work, we are focused to built the real
time and language independent phishing detection model by analyzing the anatomy
of the URLs using deep learning techniques. To achieve this, we rstly try to nd
static and dynamic features manually using some previous work. After getting the
featured valued data set, we tried to nd the lexical features of Url using CNN which
has both characters and words of the URL String to learn the URL embedding. After
that we merge features which we manually selected and features learned from CNN
and applied on Bi-LSTM Model to keeps the sequence information of URL. A hybrid
model of CNN (convolution neural network model) and Bi-directional LSTM(Long
Short Term Memory) are to achieve the goal. Our model analyze the URL without
accessing the web content of websites. It eliminates the time latency. |
en_US |