Transfer Learning for Customized URL Filtering - URL Classification with CNN/LSTM Hybrid Model

Authors

Overview

This repository contains a URL classification system using Neural Networks. It aims to classify various URLs into different categories, such as Benign, Malware, Phishing, Spam, and Defacement. The project is based on an original implementation provided in the Medium Post by the authors, Aaditya Jain, Anirudh Bhaskar, Srikanth, and Rohith Ramakrishnan. We have made modifications to the feature extraction and the model used for classification.

Set-Up

Pre-requisites

Before running this project, make sure you have the following installed:

conda
git

Installation

Clone the repository and create a Python environment with the required packages:


git clone https://github.com/gitzaan/Transfer-Learning-for-Customizable-Web-Filtering
cd Transfer-Learning-for-Customizable-Web-Filtering
conda create -n pyenv python=3.8.5
conda activate pyenv
pip install -r requirements.txt

Feature Extraction

cd scripts/
python FeatureExtraction.py

The features extracted are explained and visualized in the DataProcessing.ipynb notebook.

Data Description via Extracted Features

The project extracts various features from URLs, which are categorized into different groups. These features include characteristics of the URL string, domain features, and page features. For a detailed list of features, please refer to the original Medium Post

Model

In our modified version of the project, we use a Convolutional Neural Network (CNN) with LSTM layers for classification. The architecture includes convolutional layers, batch normalization, max-pooling, dropout layers, and fully connected layers. The model is trained with an Adam optimizer.

You can train the model using :

cd scripts/
python modelTrain.py

Model Evaluation

After training the model, we evaluate its performance using metrics like accuracy and generate a classification report. The report provides detailed information on the model's classification performance for different categories.

Making Predictions

To make predictions on a new URL, you need to follow these steps:

Load the LabelEncoder and StandardScaler from the saved files.
Load the pre-trained model.
Featurize the URL and prepare the features for prediction.
Standardize the features and reshape them to match the model's input shape.
Make predictions and convert numerical predictions to class labels.

You can make predictions by running the following

cd scripts/
python predict_args.py -i <url>

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Dataset		Dataset
Notebooks		Notebooks
Scripts		Scripts
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transfer Learning for Customized URL Filtering - URL Classification with CNN/LSTM Hybrid Model

Authors

Overview

Set-Up

Pre-requisites

Installation

Feature Extraction

Data Description via Extracted Features

Model

Model Evaluation

Making Predictions

License

About

Releases

Packages

Contributors 2

Languages

License

gitzaan/Transfer-Learning-for-Customizable-Web-Filtering

Folders and files

Latest commit

History

Repository files navigation

Transfer Learning for Customized URL Filtering - URL Classification with CNN/LSTM Hybrid Model

Authors

Overview

Set-Up

Pre-requisites

Installation

Feature Extraction

Data Description via Extracted Features

Model

Model Evaluation

Making Predictions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages