CSCI 699 Adversarial and Trustworthy Foundation Models (Spring 2026)

The schedule is subject to change: this website is under construction; please check back frequently.

Date	Topic / In-class plan	Readings (placeholders)	Logistics
Part I. Course setup and evaluation basics
Week 1 Wed, Jan 14	Course overview; seminar format; framing trustworthiness and adversarial risks in foundation models.	1. TrustLLM: Trustworthiness in Large Language Models(ICML 2024) 2. On the Trustworthiness of Generative Foundation Models 3. Mitigating Hallucinations via Causal Reasoning(AAAI 2026) 4. Treble Counterfactual VLMs: A Causal Approach to Hallucination(EMNLP 2025 Findings) 5. StealthRank: LLM Ranking Manipulation 6. Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers	Sign-up
Week 2 Wed, Jan 21	Backdoor and jailbreak attacks against foundation models (LLMs and VLMs).	1. Composite Backdoor Attacks Against Large Language Models (NAACL 2024 Findings) 2. Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning (EMNLP 2024) 3. A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models (ACL 2024 Findings) 4. MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models (AAAI 2025) 5. Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Week 3 Wed, Jan 28	Attacker capabilities and system surfaces: prompts, tools, retrieval, agents, training pipeline.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Part II. Attacks
Week 4 Wed, Feb 4	Prompt injection and tool-use attacks; agent security basics.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD	One-page project proposal due with all the team members name on (by 23:59pm, PST)
Week 5 Wed, Feb 11	Jailbreaks and refusal bypass; attack evaluation and transfer.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 6 Wed, Feb 18	Data extraction and privacy leakage; memorization and training data exposure.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 7 Wed, Feb 25	Training-time attacks: poisoning, backdoors, and model supply-chain issues.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 8 Wed, Mar 4	RAG attacks: retrieval manipulation, content injection, and ranking effects.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Part III. Defenses and verification
Week 9 Wed, Mar 11	Defense patterns: input filtering, sandboxing, tool policies, and guardrails.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 10 Wed, Mar 18	NO CLASS — Spring Break	None
Week 11 Wed, Mar 25	Mid Project Presentations Short midterm updates (e.g., 8–10 minutes per team) + Q&A.	None	Mid project slides due (TBD)
Week 12 Wed, Apr 1	Detection and monitoring in production: telemetry, anomaly signals, and incident response.	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Part IV. Paper discussions and student projects
Week 13 Wed, Apr 8	Paper discussion (placeholder).	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 14 Wed, Apr 15	Paper discussion (placeholder).	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 15 Wed, Apr 22	Paper discussion (placeholder).	Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD
Week 16 Wed, Apr 29	Final Project Presentations In-class project presentations.	None	Final report due (TBD)
Final TBD	Final project report due (no in-class final exam).	None	Deadline (TBD)

Schedule