Feature Extraction And Detection of Malicious URLs Using Deep Learning Approach

Kushwaha, Rajni

dc.contributor.author	Kushwaha, Rajni
dc.date.accessioned	2022-02-03T08:00:09Z
dc.date.available	2022-02-03T08:00:09Z
dc.date.issued	2019-07
dc.identifier.citation	28p.	en_US
dc.identifier.uri	http://hdl.handle.net/10263/7268
dc.description	Dissertation under the supervision of Dr. K.S Ray	en_US
dc.description.abstract	Phishing Attack is one of the cyber bullying activity over the internet. Most of the phishing websites try to look similar to legitimate websites, their web content and URL features memic the legitimate URL. Due to emerging new techniques, detecting and analyzing these malicious URL is very costly due to their complexities. Traditionally, black and white listing is used for detection, but these technique was not good for real time.To address this, recent years have witnessed several e orts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs, Randon Forest etc. Various machine learning and deep learning techniques are used to improve generalization of malicious URLs.These approaches su er from several limitations: (i) Inability to e ectively capture semantic meaning and sequential patterns in URL strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to handle unseen features and generalize to test data. To address these Limitation, In this dissertation work, we are focused to built the real time and language independent phishing detection model by analyzing the anatomy of the URLs using deep learning techniques. To achieve this, we rstly try to nd static and dynamic features manually using some previous work. After getting the featured valued data set, we tried to nd the lexical features of Url using CNN which has both characters and words of the URL String to learn the URL embedding. After that we merge features which we manually selected and features learned from CNN and applied on Bi-LSTM Model to keeps the sequence information of URL. A hybrid model of CNN (convolution neural network model) and Bi-directional LSTM(Long Short Term Memory) are to achieve the goal. Our model analyze the URL without accessing the web content of websites. It eliminates the time latency.	en_US
dc.language.iso	en	en_US
dc.publisher	Indian Statistical Institute,Kolkata	en_US
dc.relation.ispartofseries	Dissertation;;2019-20
dc.subject	Malicious URL	en_US
dc.subject	Feature Extraction	en_US
dc.title	Feature Extraction And Detection of Malicious URLs Using Deep Learning Approach	en_US
dc.type	Other	en_US