Schedule

The schedule is subject to change: this website is under construction; please check back frequently.

Discussion: Piazza

Date Topic / In-class plan Readings (placeholders) Logistics
Part I. Course setup and evaluation basics
Week 1
Wed, Jan 14
Course overview; seminar format; framing trustworthiness and adversarial risks in foundation models. 1. TrustLLM: Trustworthiness in Large Language Models(ICML 2024)
2. On the Trustworthiness of Generative Foundation Models
3. Mitigating Hallucinations via Causal Reasoning(AAAI 2026)
4. Treble Counterfactual VLMs: A Causal Approach to Hallucination(EMNLP 2025 Findings)
5. StealthRank: LLM Ranking Manipulation
6. Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers
Sign-up
Week 2
Wed, Jan 21
Backdoor and jailbreak attacks against foundation models (LLMs and VLMs). 1. Composite Backdoor Attacks Against Large Language Models (NAACL 2024 Findings)
2. Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning (EMNLP 2024)
3. A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models (ACL 2024 Findings)
4. MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models (AAAI 2025) 5. Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
Week 3
Wed, Jan 28
Attacker capabilities and system surfaces: prompts, tools, retrieval, agents, training pipeline. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Part II. Attacks
Week 4
Wed, Feb 4
Prompt injection and tool-use attacks; agent security basics. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
One-page project proposal due with all the team members name on (by 23:59pm, PST)
Week 5
Wed, Feb 11
Jailbreaks and refusal bypass; attack evaluation and transfer. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 6
Wed, Feb 18
Data extraction and privacy leakage; memorization and training data exposure. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 7
Wed, Feb 25
Training-time attacks: poisoning, backdoors, and model supply-chain issues. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 8
Wed, Mar 4
RAG attacks: retrieval manipulation, content injection, and ranking effects. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Part III. Defenses and verification
Week 9
Wed, Mar 11
Defense patterns: input filtering, sandboxing, tool policies, and guardrails. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 10
Wed, Mar 18
NO CLASS — Spring Break None
Week 11
Wed, Mar 25
Mid Project Presentations
Short midterm updates (e.g., 8–10 minutes per team) + Q&A.
None Mid project slides due (TBD)
Week 12
Wed, Apr 1
Detection and monitoring in production: telemetry, anomaly signals, and incident response. Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Part IV. Paper discussions and student projects
Week 13
Wed, Apr 8
Paper discussion (placeholder). Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 14
Wed, Apr 15
Paper discussion (placeholder). Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 15
Wed, Apr 22
Paper discussion (placeholder). Paper 1 (TBD): Title TBD
Paper 2 (TBD): Title TBD
Week 16
Wed, Apr 29
Final Project Presentations
In-class project presentations.
None Final report due (TBD)
Final
TBD
Final project report due (no in-class final exam). None Deadline (TBD)