Skip to content

aly-abbas11/SecureScan-AI

Repository files navigation


NEURAL CORE — ACTIVE
[ OK ] loading codebert encoder ................ 125M params
[ OK ] initializing bilstm layers .............. 2x256 hidden
[ OK ] compiling mlp classifier ................ 512→256→128→2
[ OK ] mounting vulnerability database ......... 600K+ CVE samples
[ OK ] v2.4.0 engine active




0.9252 WEIGHTED F1 600K+ CVE SAMPLES 42ms GPU INFERENCE 125M CODEBERT PARAMS

The Problem SecureScan AI Solves

68% of C++ vulnerabilities survive traditional linting tools.
Standard tools catch syntax. SecureScan AI catches intent.

Security tools like cppcheck and clang-tidy catch what the compiler sees. They miss what the attacker sees — unchecked buffer boundaries, unsafe memory flows, logic bombs buried in runtime behavior. SecureScan AI was built to close that gap.

It reads your C/C++ function the same way a security researcher would — understanding context, data flow, and behavioral patterns — then delivers a verdict in 42ms with a confidence score and CWE classification.


Live Application

Production: securescan-ai.vercel.app

Paste any C/C++ function. Click Start Analyzer. Get results across 8 forensic layers — Input Sanitization, Buffer Integrity, Heap Validation, Logical Flows, Auth Bypass, Encryption Key, External Leak, and AI Inference — in under a second.


Architecture

THREE-STAGE NEURAL PIPELINE — SECURESCAN AI INPUT Raw C / C++ function string STAGE 01 CodeBERT Encoder microsoft/codebert-base 12 transformer layers 768-dim contextual embeddings First 6 layers frozen Output → sequence of 768-dimensional token vectors encoding code semantics STAGE 02 Bidirectional LSTM sequential flow modeling 2 stacked BiLSTM layers 256 hidden per direction → 512-dim Dropout p=0.3 STAGE 03 MLP Classifier binary decision head 512 → 256 → 128 → 2 ReLU + BatchNorm + Dropout Softmax output VULNERABLE + CWE + CONFIDENCE NON-VULNERABLE + CONFIDENCE

The 8 Forensic Layers

LAYER 01 Input Sanitization Detects unchecked user input flows LAYER 02 Buffer Integrity Buffer overflow, stack smashing LAYER 03 Heap Validation Use-after-free, double-free bugs LAYER 04 Logical Flows Race conditions, logic bombs LAYER 05 Auth Bypass Privilege escalation, access violations LAYER 06 Encryption Key Hardcoded secrets, weak crypto LAYER 07 External Leak Data exfiltration, memory leaks LAYER 08 AI Inference CodeBERT+BiLSTM deep vulnerability scan

Results

MODEL COMPARISON — F1 SCORE 1.00 0.90 0.80 0.70 0.60 0.7346 Baseline MLP 0.8612 CodeBERT only 0.7950 BiLSTM+MLP 0.9252 Full Model (OURS) BEST Baseline MLP CodeBERT only BiLSTM+MLP Full Model — CodeBERT + BiLSTM + MLP

Ablation Study

Configuration F1 Score Delta
Full model — CodeBERT + BiLSTM + MLP 0.9252 reference
Without BiLSTM 0.8832 -4.2%
Without CodeBERT (GloVe embeddings) 0.7950 -8.3%
Without Dropout 0.8910 -3.4%
Unidirectional LSTM 0.9056 -2.1%

Datasets

Dataset Language Samples Source
BigVul C / C++ ~188,000 Real CVE + NVD entries
DiverseVul C / C++ ~319,000 Diverse CVE coverage
FormAI Multi-language ~246,000 AI-generated + labeled

Total: 600,000+ labeled vulnerability samples


Development Timeline

01 PHASE 1 SRS + EDA Definition 02 PHASE 2 MLP Baseline 0.7346 03 PHASE 3 CodeBERT+BiLSTM 92.68% 04 PHASE 4 VAE + HPO 94.82% 05 PHASE 5 Evaluation F1: 0.9252 06 PHASE 6 Deployment LIVE

Installation

git clone https://github.com/aly-abbas11/SecureScan-AI.git
cd SecureScan-AI

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

pip install -r requirements.txt

Run Training

python src/training/train.py

Run Inference

from src.models.securescan_model import SecureScanModel
from transformers import AutoTokenizer
import torch

model     = SecureScanModel()
tokenizer = AutoTokenizer.from_pretrained('microsoft/codebert-base')
code      = "char buf[10]; strcpy(buf, user_input);"
inputs    = tokenizer(code, return_tensors='pt', truncation=True, max_length=512)

with torch.no_grad():
    logits     = model(inputs['input_ids'], inputs['attention_mask'])
    prediction = 'Vulnerable' if logits.argmax().item() == 1 else 'Safe'

print(f"Result: {prediction}")
# Result: Vulnerable

Component Reference

Component Config
Base Encoder microsoft/codebert-base — 125M params
Frozen Layers First 6 of 12 transformer layers
BiLSTM 2 layers, hidden 256, bidirectional → 512-dim
MLP 512 → 256 → 128 → 2, ReLU + BatchNorm
Dropout p = 0.3
Optimizer AdamW lr=8.57e-5, linear warmup
Loss BCE with 16:1 class balancing weights
Inference 42ms GPU · 380ms CPU

Citation

@misc{securescan-ai,
  title        = {SecureScan AI — C/C++ Vulnerability Detection},
  author       = {Shah, Ali Abbas and Tanveer, Salman and Ali, Hammad},
  year         = {2026},
  howpublished = {\url{https://github.com/aly-abbas11/SecureScan-AI}},
  note         = {AI335L Deep Learning Lab, Air University Lahore}
}

License

MIT — see LICENSE


[ OK ] scan complete
[ OK ] vulnerabilities reported
[ OK ] neural core standing by
       SECURESCAN AI — ALWAYS WATCHING.

AI335L Deep Learning Lab — Air University Lahore, Spring 2026

About

AI-powered C/C++ vulnerability detection using CodeBERT + BiLSTM + MLP. F1: 0.9252. FastAPI backend + React frontend. Live at securescan-ai.vercel.app

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors