Real-Time Phishing Website Detection Using Lexical URL Features with Weighted Soft Voting Ensemble

Thiyagarajan A; Kaniska Devi B; Manisha T; Harisha S

doi:10.63363/aijfr.2026.v07i03.5511

Real-Time Phishing Website Detection Using Lexical URL Features with Weighted Soft Voting Ensemble

Author(s)	Prof. Dr. Thiyagarajan A, Ms. Kaniska Devi B, Ms. Manisha T, Ms. Harisha S
Country	India
Abstract	Phishing attacks remain one of the most financially damaging threats in modern cybersecurity. Conventional blacklist-based defences prove insufficient against zero-day phishing URLs not yet logged by threat intelligence services. This work investigates whether a machine learning framework operating exclusively on lexical features extracted from raw URL strings can deliver high-accuracy phishing detection without accessing, downloading, or rendering the target webpage. Seventeen lexical features are extracted and organized across five conceptual groups: length and structure, special characters, Shannon entropy, typosquatting indicators, and suspicious keyword patterns. Two ensemble classifiers—Random Forest (RF) and XGBoost - are individually trained on two benchmark datasets and their outputs fused through a Weighted Soft Voting algorithm that assigns calibrated, confidence-based weights to each model. Experiments on the UCI Phishing Websites Dataset (11,055 instances) and the Kaggle Phishing URL Detection Dataset yield training-phase accuracies of 99.34%, 99.51%, and 99.40% for RF, XGBoost, and the ensemble respectively. The brand edits distance feature—the novel typosquatting detection measure proves the single most discriminative lexical feature. A graded three-tier risk scoring mechanism (Low / Medium / High) provides actionable outputs beyond binary classification, and sub-millisecond inference confirms practical suitability for real-time browser or network gateway deployment
Keywords	phishing detection; lexical URL features; Random Forest; XGBoost; Weighted Soft Voting; typosquatting; ensemble learning; real-time classification; cybersecurity; machine learning.
Field	Computer Applications
Published In	Volume 7, Issue 3, May-June 2026
Published On	2026-05-15
DOI	https://doi.org/10.63363/aijfr.2026.v07i03.5511

View / Download PDF File

E-ISSN 3048-7641

doi

CrossRef DOI is assigned to each research paper published in our journal.

AIJFR DOI prefix is
10.63363/aijfr

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 7 Isu 4 Cover Page Vol 7 Isu 3 Cover Page Vol 7 Isu 2

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About AIJFR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9898-79-39-59	editor@aijfr.com

Advanced International Journal for Research

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Real-Time Phishing Website Detection Using Lexical URL Features with Weighted Soft Voting Ensemble

Share this