Najmul Hasan

Najmul Hasan

B.S. Computer Science · University of North Carolina at Pembroke

Hi 👋! I work at the intersection of AI safety & alignment and natural language processing. Read my latest work, DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention. I was advised by Dr. Prashanth BusiReddyGari, previously worked with Dr. Shaohu Zhang, and was an AI Safety Research Fellow at Algoverse.

I also founded and run UNC Pembroke's AI student organization, AI@UNCP, and have organized HackUNCP 2025 and HackUNCP 2026.

News

2026

  • Jun

    Reviewer for the MusIML Workshop at ICML 2026

    Reviewed submissions for the Muslims in ML (MusIML) Workshop at the International Conference on Machine Learning (ICML 2026).

  • May

    Reviewer for the GenBio Workshop at ICML 2026

    Reviewed submissions for the Generative and Agentic AI for Biology (GenBio) Workshop at the International Conference on Machine Learning (ICML 2026).

  • May

    Graduated from UNC Pembroke

    Graduated from the University of North Carolina at Pembroke with a B.S. in Computer Science and minors in Mathematics and Physics. Completed the honors curriculum as a member of the Esther G. Maynor Honors College.

  • Apr

    Released preprint: CRC-Screen for DNA-synthesis hazard screening

    Released "CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift" on arXiv. A conformal-risk-control screener that fuses sequence similarity, an LLM judge panel, and embedding similarity with certified false-negative-rate bounds.

  • Apr

    Presented at PURC Symposium 2026

    Presented research on stress-testing LLMs across adversarial attacks, prompt injection, and non-English languages at the PURC Symposium 2026 at UNC Pembroke.

  • Feb

    Lead Organizer for HackUNCP 2026

    Led HackUNCP 2026 (February 21-22) for the second year. Grateful for everyone involved and every participant who showed up and built something.

Research

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

Najmul Hasan, Prashanth BusiReddyGari

We present DPBench, a benchmark for evaluating coordination in multi-agent systems built from large language models. Existing benchmarks measure task-level success under a fixed protocol; the structural conditions under which coordination succeeds or fails at all have not been characterised. DPBench adapts the Dining Philosophers problem into a controlled testbed where the action protocol, the communication structure, the prompting strategy, and the group size each vary independently. We evaluate five frontier LLMs (GPT-5.2, Claude Opus 4.5, Grok 4.1, Gemini 2.5 Flash, Llama 4 Maverick) against a uniform-random baseline. Under simultaneous action at N=5 with the default prompt, deadlock ranges from 25.0% (95% Wilson CI [11.2, 46.9]) for GPT-5.2 to 90.0% [74.4, 96.5] for Gemini 2.5 Flash; sequential action is solved by three of the five LLMs plus the random baseline. Holding the model fixed at Gemini 2.5 Flash, three protocol variables drive deadlock from 90% to a 0% point estimate (Wilson upper bound 16.1% at n=20): three rounds of pre-commitment communication (vs. single-round 86.7%), a prompt encoding a classical concurrency primitive (0.0% for resource-ordering and symmetry-breaking, against 100% for the minimal prompt), or doubling the group from N=5 to N=10 (90.0% to 10.0%). Single-round messaging and memory of past timesteps do not change the rate at the sample size we ran. On the model that fails most, whether it coordinates or deadlocks is determined by the protocol, not by raw capability.

arXiv 2026Read more
CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

Najmul Hasan

DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: k-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies E[FNR] ≤ α. Across ten leave-one-taxonomic-family-out folds at α = 0.05 on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack 1/(n_cal + 1) caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade α = 10⁻³ requires an 18× larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms.

arXiv 2026Read more
Honeypot Protocol

Honeypot Protocol

Najmul Hasan

Trusted monitoring, the standard defense in AI control, is vulnerable to adaptive attacks, collusion, and strategic attack selection. All of these exploit the fact that monitoring is passive: it observes model behavior but never probes whether the model would behave differently under different perceived conditions. We introduce the honeypot protocol, which tests for context-dependent behavior by varying only the system prompt across three conditions (evaluation, synthetic deployment, explicit no-monitoring) while holding the task, environment, and scoring identical. We evaluate Claude Opus 4.6 in BashArena across all three conditions in both honest and attack modes. The model achieved 100% main task success and triggered zero side tasks uniformly across conditions, providing a baseline for future comparisons with stronger attack policies and additional models.

AI Control Hackathon 2026Read more
Time-Complexity Characterization of the NIST Lightweight Cryptography Finalists

Time-Complexity Characterization of the NIST Lightweight Cryptography Finalists

Najmul Hasan, Prashanth BusiReddyGari

Lightweight cryptography is becoming essential as emerging technologies in digital identity systems and Internet of Things verification continue to demand strong cryptographic assurance on devices with limited processing power, memory, and energy resources. As these technologies move into routine use, they demand cryptographic primitives that maintain strong security and deliver predictable performance through clear theoretical models of time complexity. Although NIST's lightweight cryptography project provides empirical evaluations of the ten finalist algorithms, a unified theoretical understanding of their time-complexity behavior remains absent. This work introduces a symbolic model that decomposes each scheme into initialization, data-processing, and finalization phases, enabling formal time-complexity derivation for all ten finalists. The results clarify how design parameters shape computational scaling on constrained mobile and embedded environments. The framework provides a foundation needed to distinguish algorithmic efficiency and guides the choice of primitives capable of supporting security systems in constrained environments.

IEEE CCWC 2026Read more
Phishing Email Detection Using Large Language Models

Phishing Email Detection Using Large Language Models

Najmul Hasan, Prashanth BusiReddyGari, Haitao Zhao, Yihao Ren, Jinsheng Xu, Shaohu Zhang

Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language Models (LLMs) applications, these systems face evolving phishing email threats that exploit their fundamental architectures. Current LLMs require substantial hardening before deployment in email security systems, particularly against coordinated multi-vector attacks that exploit architectural vulnerabilities. This paper proposes LLM-PEA, an LLM-based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet 4, and Grok-3) and comprehensive prompting design to assess their feasibility, robustness, and limitations against phishing email attacks. Our empirical analysis reveals that LLMs can detect the phishing email over 90% accuracy while we also highlight that LLM-based phishing email detection systems could be exploited by adversarial attack, prompt injection, and multilingual attacks. Our findings provide critical insights for LLM-based phishing detection in real-world settings where attackers exploit multiple vulnerabilities in combination.

arXiv 2025Read more
Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan, Prashanth BusiReddyGari

The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and phishing tactics are emerging faster than labeled data can be produced, zero-shot and few-shot learning with large language models (LLMs) offers a timely and adaptable solution, enabling generalization with minimal supervision. Given the critical importance of phishing URL detection in large-scale cybersecurity defense systems, we present a comprehensive benchmark of LLMs under a unified zero-shot and few-shot prompting framework and reveal operational trade-offs. Our evaluation uses a balanced dataset with consistent prompts, offering detailed analysis of performance, generalization, and model efficacy, quantified by accuracy, precision, recall, F1 score, AUROC, and AUPRC, to reflect both classification quality and practical utility in threat detection settings. We conclude few-shot prompting improves performance across multiple LLMs.

LAW @ NeurIPS 2025Read more

Experience

UNC Pembroke

Pembroke, NC
Undergraduate Research Assistant
May 2024 – May 2026 · Advised by Dr. Prashanth BusiReddyGari
SOC Analyst
Jul 2023 – May 2026
Undergraduate Research Assistant
Sept 2023 – Dec 2025 · Advised by Dr. Shaohu Zhang

Algoverse

Remote
AI Safety Research Fellow
Feb 2026 – Apr 2026

Pembroke Undergraduate Research and Creativity (PURC) Center

Pembroke, NC
Research Assistant
May 2025 – Jun 2025 · Advised by Dr. Prashanth BusiReddyGari and Dr. Shaohu Zhang
Research Assistant
Jan 2024 – Apr 2024 · Advised by Dr. Shaohu Zhang

Emerging Technology Institute

Pembroke, NC
Programming Intern
Jan 2024 – Apr 2024

Education

University of North Carolina at Pembroke

Pembroke, NC
Bachelor of Science in Computer Science
Minors in Mathematics and Physics
Esther G. Maynor Honors College
Jan 2023May 2026

Awards

Undergraduate Research Fellowship – Summer (URFS)

Pembroke Undergraduate Research and Creativity Center

Supported research on multilingual phishing email detection using large language models; findings presented at PURC Symposium 2026.

Summer 2025

Semester-Long Undergraduate Research Fellowship (SURF)

Pembroke Undergraduate Research and Creativity Center

Supported cross-linguistic speech emotion recognition research; findings presented at PURC Symposium 2024.

Spring 2024

Honors Scholar Fellowship (HSF)

Esther G. Maynor Honors College, UNC Pembroke

Awarded upon admission as part of the academic offer.

Spring 2023

Service

Leadership & Community

Founder & President, AI@UNCP

AI student organization, UNC Pembroke

Founded the AI student organization at UNC Pembroke and led it across three elected terms, running programming contests and hackathons, and hosting a guest speaker talk, to grow AI engagement on campus.

Sept 2023 – May 2026

Lead Organizer, HackUNCP 2025 & 2026

UNC Pembroke

Organized and led HackUNCP 2025, the first official hackathon at UNC Pembroke, and HackUNCP 2026.

2025 – 2026

Peer Review

Reviewer, Generative and Agentic AI for Biology (GenBio) Workshop

ICML 2026
2026

Reviewer, Muslims in ML (MusIML) Workshop

ICML 2026
2026

Blog

SAGE: When One AI Isn't Enough

SAGE: When One AI Isn't Enough

December 19, 2025

Ask a language model a question and you get one answer. One perspective. One viewpoint. That works fine for simple queries. But complex problems like architecture decisions, security reviews, or research questions benefit from multiple angles. A single model, no matter how capable, has blind spots. SAGE takes a different approach. Instead of relying on one model's response, it puts together a team of specialized agents.

Read more