Najmul Hasan

B.S. Computer Science · University of North Carolina at Pembroke

Hi 👋! I work at the intersection of AI safety & alignment and natural language processing. Read my latest work, DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention. I was advised by Dr. Prashanth BusiReddyGari, previously worked with Dr. Shaohu Zhang, and was an AI Safety Research Fellow at Algoverse.

I also founded and run UNC Pembroke's AI student organization, AI@UNCP, and have organized HackUNCP 2025 and HackUNCP 2026.

News

2026

Jun
Reviewer for the MusIML Workshop at ICML 2026
Reviewed submissions for the Muslims in ML (MusIML) Workshop at the International Conference on Machine Learning (ICML 2026).
May
Reviewer for the GenBio Workshop at ICML 2026
Reviewed submissions for the Generative and Agentic AI for Biology (GenBio) Workshop at the International Conference on Machine Learning (ICML 2026).
May
Graduated from UNC Pembroke
Graduated from the University of North Carolina at Pembroke with a B.S. in Computer Science and minors in Mathematics and Physics. Completed the honors curriculum as a member of the Esther G. Maynor Honors College.
Apr
Released preprint: CRC-Screen for DNA-synthesis hazard screening
Released "CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift" on arXiv. A conformal-risk-control screener that fuses sequence similarity, an LLM judge panel, and embedding similarity with certified false-negative-rate bounds.
Apr
Presented at PURC Symposium 2026
Presented research on stress-testing LLMs across adversarial attacks, prompt injection, and non-English languages at the PURC Symposium 2026 at UNC Pembroke.
Feb
Lead Organizer for HackUNCP 2026
Led HackUNCP 2026 (February 21-22) for the second year. Grateful for everyone involved and every participant who showed up and built something.

Research

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

Najmul Hasan, Prashanth BusiReddyGari

We present DPBench, a benchmark for evaluating coordination in multi-agent systems built from large language models. Existing benchmarks measure task-level success under a fixed protocol; the structural conditions under which coordination succeeds or fails at all have not been characterised. DPBench adapts the Dining Philosophers problem into a controlled testbed where the action protocol, the communication structure, the prompting strategy, and the group size each vary independently. We evaluate five frontier LLMs (GPT-5.2, Claude Opus 4.5, Grok 4.1, Gemini 2.5 Flash, Llama 4 Maverick) against a uniform-random baseline. Under simultaneous action at N=5 with the default prompt, deadlock ranges from 25.0% (95% Wilson CI [11.2, 46.9]) for GPT-5.2 to 90.0% [74.4, 96.5] for Gemini 2.5 Flash; sequential action is solved by three of the five LLMs plus the random baseline. Holding the model fixed at Gemini 2.5 Flash, three protocol variables drive deadlock from 90% to a 0% point estimate (Wilson upper bound 16.1% at n=20): three rounds of pre-commitment communication (vs. single-round 86.7%), a prompt encoding a classical concurrency primitive (0.0% for resource-ordering and symmetry-breaking, against 100% for the minimal prompt), or doubling the group from N=5 to N=10 (90.0% to 10.0%). Single-round messaging and memory of past timesteps do not change the rate at the sample size we ran. On the model that fails most, whether it coordinates or deadlocks is determined by the protocol, not by raw capability.

arXiv 2026Read more

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

Najmul Hasan

DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: k-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies E[FNR] ≤ α. Across ten leave-one-taxonomic-family-out folds at α = 0.05 on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack 1/(n_cal + 1) caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade α = 10⁻³ requires an 18× larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms.

arXiv 2026Read more

Honeypot Protocol

Najmul Hasan

Trusted monitoring, the standard defense in AI control, is vulnerable to adaptive attacks, collusion, and strategic attack selection. All of these exploit the fact that monitoring is passive: it observes model behavior but never probes whether the model would behave differently under different perceived conditions. We introduce the honeypot protocol, which tests for context-dependent behavior by varying only the system prompt across three conditions (evaluation, synthetic deployment, explicit no-monitoring) while holding the task, environment, and scoring identical. We evaluate Claude Opus 4.6 in BashArena across all three conditions in both honest and attack modes. The model achieved 100% main task success and triggered zero side tasks uniformly across conditions, providing a baseline for future comparisons with stronger attack policies and additional models.

AI Control Hackathon 2026Read more

Time-Complexity Characterization of the NIST Lightweight Cryptography Finalists

Najmul Hasan, Prashanth BusiReddyGari

Lightweight cryptography is becoming essential as emerging technologies in digital identity systems and Internet of Things verification continue to demand strong cryptographic assurance on devices with limited processing power, memory, and energy resources. As these technologies move into routine use, they demand cryptographic primitives that maintain strong security and deliver predictable performance through clear theoretical models of time complexity. Although NIST's lightweight cryptography project provides empirical evaluations of the ten finalist algorithms, a unified theoretical understanding of their time-complexity behavior remains absent. This work introduces a symbolic model that decomposes each scheme into initialization, data-processing, and finalization phases, enabling formal time-complexity derivation for all ten finalists. The results clarify how design parameters shape computational scaling on constrained mobile and embedded environments. The framework provides a foundation needed to distinguish algorithmic efficiency and guides the choice of primitives capable of supporting security systems in constrained environments.

IEEE CCWC 2026Read more

Phishing Email Detection Using Large Language Models

Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang

Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language Models (LLMs) applications, these systems face evolving phishing email threats that exploit their fundamental architectures. Current LLMs require substantial hardening before deployment in email security systems, particularly against coordinated multi-vector attacks that exploit architectural vulnerabilities. This paper proposes LLM-PEA, an LLM-based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet 4, and Grok-3) and comprehensive prompting design to assess their feasibility, robustness, and limitations against phishing email attacks. Our empirical analysis reveals that LLMs can detect the phishing email over 90% accuracy while we also highlight that LLM-based phishing email detection systems could be exploited by adversarial attack, prompt injection, and multilingual attacks. Our findings provide critical insights for LLM-based phishing detection in real-world settings where attackers exploit multiple vulnerabilities in combination.

arXiv 2025Read more

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan, Prashanth BusiReddyGari

The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and phishing tactics are emerging faster than labeled data can be produced, zero-shot and few-shot learning with large language models (LLMs) offers a timely and adaptable solution, enabling generalization with minimal supervision. Given the critical importance of phishing URL detection in large-scale cybersecurity defense systems, we present a comprehensive benchmark of LLMs under a unified zero-shot and few-shot prompting framework and reveal operational trade-offs. Our evaluation uses a balanced dataset with consistent prompts, offering detailed analysis of performance, generalization, and model efficacy, quantified by accuracy, precision, recall, F1 score, AUROC, and AUPRC, to reflect both classification quality and practical utility in threat detection settings. We conclude few-shot prompting improves performance across multiple LLMs.

LAW @ NeurIPS 2025Read more

Experience

UNC Pembroke

Pembroke, NC

Undergraduate Research Assistant

May 2024 – May 2026 · Advised by Dr. Prashanth BusiReddyGari

SOC Analyst

Jul 2023 – May 2026

Undergraduate Research Assistant

Sept 2023 – Dec 2025 · Advised by Dr. Shaohu Zhang

Algoverse

Remote

AI Safety Research Fellow

Feb 2026 – Apr 2026

Pembroke Undergraduate Research and Creativity (PURC) Center

Pembroke, NC

Research Assistant

May 2025 – Jun 2025 · Advised by Dr. Prashanth BusiReddyGari and Dr. Shaohu Zhang

Research Assistant

Jan 2024 – Apr 2024 · Advised by Dr. Shaohu Zhang

Emerging Technology Institute

Pembroke, NC

Programming Intern

Jan 2024 – Apr 2024

Education

University of North Carolina at Pembroke

Pembroke, NC

Bachelor of Science in Computer Science

Minors in Mathematics and Physics

Esther G. Maynor Honors College

Jan 2023 – May 2026

Awards

Undergraduate Research Fellowship – Summer (URFS)

Pembroke Undergraduate Research and Creativity Center

Supported research on multilingual phishing email detection using large language models; findings presented at PURC Symposium 2026.

Summer 2025

Semester-Long Undergraduate Research Fellowship (SURF)

Pembroke Undergraduate Research and Creativity Center

Supported cross-linguistic speech emotion recognition research; findings presented at PURC Symposium 2024.

Spring 2024

Honors Scholar Fellowship (HSF)

Esther G. Maynor Honors College, UNC Pembroke

Awarded upon admission as part of the academic offer.

Spring 2023

Service

Leadership & Community

Founder & President, AI@UNCP

AI student organization, UNC Pembroke

Founded the AI student organization at UNC Pembroke and led it across three elected terms, running programming contests and hackathons, and hosting a guest speaker talk, to grow AI engagement on campus.

Sept 2023 – May 2026

Lead Organizer, HackUNCP 2025 & 2026

UNC Pembroke

Organized and led HackUNCP 2025, the first official hackathon at UNC Pembroke, and HackUNCP 2026.

2025 – 2026

Peer Review

Reviewer, Generative and Agentic AI for Biology (GenBio) Workshop

ICML 2026

2026

Reviewer, Muslims in ML (MusIML) Workshop

ICML 2026

2026

Blog

SAGE: When One AI Isn't Enough

December 19, 2025

Ask a language model a question and you get one answer. One perspective. One viewpoint. That works fine for simple queries. But complex problems like architecture decisions, security reviews, or research questions benefit from multiple angles. A single model, no matter how capable, has blind spots. SAGE takes a different approach. Instead of relying on one model's response, it puts together a team of specialized agents.

Najmul Hasan

News

2026

Reviewer for the MusIML Workshop at ICML 2026

Reviewer for the GenBio Workshop at ICML 2026

Graduated from UNC Pembroke

Released preprint: CRC-Screen for DNA-synthesis hazard screening

Presented at PURC Symposium 2026

Lead Organizer for HackUNCP 2026

Research

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

Honeypot Protocol

Time-Complexity Characterization of the NIST Lightweight Cryptography Finalists

Phishing Email Detection Using Large Language Models

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Experience

UNC Pembroke

Algoverse

Pembroke Undergraduate Research and Creativity (PURC) Center

Emerging Technology Institute

Education

University of North Carolina at Pembroke

Awards

Undergraduate Research Fellowship – Summer (URFS)

Semester-Long Undergraduate Research Fellowship (SURF)

Honors Scholar Fellowship (HSF)

Service

Leadership & Community

Founder & President, AI@UNCP

Lead Organizer, HackUNCP 2025 & 2026

Peer Review

Reviewer, Generative and Agentic AI for Biology (GenBio) Workshop

Reviewer, Muslims in ML (MusIML) Workshop

Blog

SAGE: When One AI Isn't Enough