Please use this identifier to cite or link to this item:
http://hdl.handle.net/10263/7268
Title: | Feature Extraction And Detection of Malicious URLs Using Deep Learning Approach |
Authors: | Kushwaha, Rajni |
Keywords: | Malicious URL Feature Extraction |
Issue Date: | Jul-2019 |
Publisher: | Indian Statistical Institute,Kolkata |
Citation: | 28p. |
Series/Report no.: | Dissertation;;2019-20 |
Abstract: | Phishing Attack is one of the cyber bullying activity over the internet. Most of the phishing websites try to look similar to legitimate websites, their web content and URL features memic the legitimate URL. Due to emerging new techniques, detecting and analyzing these malicious URL is very costly due to their complexities. Traditionally, black and white listing is used for detection, but these technique was not good for real time.To address this, recent years have witnessed several e orts to perform Malicious URL Detection using Machine Learning. The most popular and scalable approaches use lexical properties of the URL string by extracting Bag-of-words like features, followed by applying machine learning models such as SVMs, Randon Forest etc. Various machine learning and deep learning techniques are used to improve generalization of malicious URLs.These approaches su er from several limitations: (i) Inability to e ectively capture semantic meaning and sequential patterns in URL strings; (ii) Requiring substantial manual feature engineering; and (iii) Inability to handle unseen features and generalize to test data. To address these Limitation, In this dissertation work, we are focused to built the real time and language independent phishing detection model by analyzing the anatomy of the URLs using deep learning techniques. To achieve this, we rstly try to nd static and dynamic features manually using some previous work. After getting the featured valued data set, we tried to nd the lexical features of Url using CNN which has both characters and words of the URL String to learn the URL embedding. After that we merge features which we manually selected and features learned from CNN and applied on Bi-LSTM Model to keeps the sequence information of URL. A hybrid model of CNN (convolution neural network model) and Bi-directional LSTM(Long Short Term Memory) are to achieve the goal. Our model analyze the URL without accessing the web content of websites. It eliminates the time latency. |
Description: | Dissertation under the supervision of Dr. K.S Ray |
URI: | http://hdl.handle.net/10263/7268 |
Appears in Collections: | Dissertations - M Tech (CS) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
report.pdf | 2.52 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.