GitHub - aly-abbas11/SecureScan-AI: AI-powered C/C++ vulnerability detection using CodeBERT + BiLSTM + MLP. F1: 0.9252. FastAPI backend + React frontend. Live at securescan-ai.vercel.app

NEURAL CORE — ACTIVE
[ OK ] loading codebert encoder ................ 125M params
[ OK ] initializing bilstm layers .............. 2x256 hidden
[ OK ] compiling mlp classifier ................ 512→256→128→2
[ OK ] mounting vulnerability database ......... 600K+ CVE samples
[ OK ] v2.4.0 engine active

0.9252 WEIGHTED F1 600K+ CVE SAMPLES 42ms GPU INFERENCE 125M CODEBERT PARAMS

The Problem SecureScan AI Solves

68% of C++ vulnerabilities survive traditional linting tools.
Standard tools catch syntax. SecureScan AI catches intent.

Security tools like cppcheck and clang-tidy catch what the compiler sees. They miss what the attacker sees — unchecked buffer boundaries, unsafe memory flows, logic bombs buried in runtime behavior. SecureScan AI was built to close that gap.

It reads your C/C++ function the same way a security researcher would — understanding context, data flow, and behavioral patterns — then delivers a verdict in 42ms with a confidence score and CWE classification.

Live Application

Production: securescan-ai.vercel.app

Paste any C/C++ function. Click Start Analyzer. Get results across 8 forensic layers — Input Sanitization, Buffer Integrity, Heap Validation, Logical Flows, Auth Bypass, Encryption Key, External Leak, and AI Inference — in under a second.

Architecture

THREE-STAGE NEURAL PIPELINE — SECURESCAN AI INPUT Raw C / C++ function string STAGE 01 CodeBERT Encoder microsoft/codebert-base 12 transformer layers 768-dim contextual embeddings First 6 layers frozen Output → sequence of 768-dimensional token vectors encoding code semantics STAGE 02 Bidirectional LSTM sequential flow modeling 2 stacked BiLSTM layers 256 hidden per direction → 512-dim Dropout p=0.3 STAGE 03 MLP Classifier binary decision head 512 → 256 → 128 → 2 ReLU + BatchNorm + Dropout Softmax output VULNERABLE + CWE + CONFIDENCE NON-VULNERABLE + CONFIDENCE

The 8 Forensic Layers

LAYER 01 Input Sanitization Detects unchecked user input flows LAYER 02 Buffer Integrity Buffer overflow, stack smashing LAYER 03 Heap Validation Use-after-free, double-free bugs LAYER 04 Logical Flows Race conditions, logic bombs LAYER 05 Auth Bypass Privilege escalation, access violations LAYER 06 Encryption Key Hardcoded secrets, weak crypto LAYER 07 External Leak Data exfiltration, memory leaks LAYER 08 AI Inference CodeBERT+BiLSTM deep vulnerability scan

Results

MODEL COMPARISON — F1 SCORE 1.00 0.90 0.80 0.70 0.60 0.7346 Baseline MLP 0.8612 CodeBERT only 0.7950 BiLSTM+MLP 0.9252 Full Model (OURS) BEST Baseline MLP CodeBERT only BiLSTM+MLP Full Model — CodeBERT + BiLSTM + MLP

Ablation Study

Configuration	F1 Score	Delta
Full model — CodeBERT + BiLSTM + MLP	0.9252	reference
Without BiLSTM	0.8832	-4.2%
Without CodeBERT (GloVe embeddings)	0.7950	-8.3%
Without Dropout	0.8910	-3.4%
Unidirectional LSTM	0.9056	-2.1%

Datasets

Dataset	Language	Samples	Source
BigVul	C / C++	~188,000	Real CVE + NVD entries
DiverseVul	C / C++	~319,000	Diverse CVE coverage
FormAI	Multi-language	~246,000	AI-generated + labeled

Total: 600,000+ labeled vulnerability samples

Development Timeline

01 PHASE 1 SRS + EDA Definition 02 PHASE 2 MLP Baseline 0.7346 03 PHASE 3 CodeBERT+BiLSTM 92.68% 04 PHASE 4 VAE + HPO 94.82% 05 PHASE 5 Evaluation F1: 0.9252 06 PHASE 6 Deployment LIVE

Installation

git clone https://github.com/aly-abbas11/SecureScan-AI.git
cd SecureScan-AI

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate

pip install -r requirements.txt

Run Training

python src/training/train.py

Run Inference

from src.models.securescan_model import SecureScanModel
from transformers import AutoTokenizer
import torch

model     = SecureScanModel()
tokenizer = AutoTokenizer.from_pretrained('microsoft/codebert-base')
code      = "char buf[10]; strcpy(buf, user_input);"
inputs    = tokenizer(code, return_tensors='pt', truncation=True, max_length=512)

with torch.no_grad():
    logits     = model(inputs['input_ids'], inputs['attention_mask'])
    prediction = 'Vulnerable' if logits.argmax().item() == 1 else 'Safe'

print(f"Result: {prediction}")
# Result: Vulnerable

Component Reference

Component	Config
Base Encoder	`microsoft/codebert-base` — 125M params
Frozen Layers	First 6 of 12 transformer layers
BiLSTM	2 layers, hidden 256, bidirectional → 512-dim
MLP	512 → 256 → 128 → 2, ReLU + BatchNorm
Dropout	p = 0.3
Optimizer	AdamW lr=8.57e-5, linear warmup
Loss	BCE with 16:1 class balancing weights
Inference	42ms GPU · 380ms CPU

Citation

@misc{securescan-ai,
  title        = {SecureScan AI — C/C++ Vulnerability Detection},
  author       = {Shah, Ali Abbas and Tanveer, Salman and Ali, Hammad},
  year         = {2026},
  howpublished = {\url{https://github.com/aly-abbas11/SecureScan-AI}},
  note         = {AI335L Deep Learning Lab, Air University Lahore}
}

License

MIT — see LICENSE

[ OK ] scan complete
[ OK ] vulnerabilities reported
[ OK ] neural core standing by
       SECURESCAN AI — ALWAYS WATCHING.

_{AI335L Deep Learning Lab — Air University Lahore, Spring 2026}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
phases		phases
reports		reports
securesense-api		securesense-api
src		src
tests		tests
.gitignore		.gitignore
CITATION.bib		CITATION.bib
CITATION.cff		CITATION.cff
DATA_CARD.md		DATA_CARD.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem SecureScan AI Solves

Live Application

Architecture

The 8 Forensic Layers

Results

Ablation Study

Datasets

Development Timeline

Installation

Run Training

Run Inference

Component Reference

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Problem SecureScan AI Solves

Live Application

Architecture

The 8 Forensic Layers

Results

Ablation Study

Datasets

Development Timeline

Installation

Run Training

Run Inference

Component Reference

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages