The schedule is subject to change: this website is under construction; please check back frequently.
Discussion: Piazza
| Date | Topic / In-class plan | Readings (placeholders) | Logistics |
|---|---|---|---|
| Part I. Course setup and evaluation basics | |||
| Week 1 Wed, Jan 14 |
Course overview; seminar format; framing trustworthiness and adversarial risks in foundation models. |
1. TrustLLM: Trustworthiness in Large Language Models(ICML 2024) 2. On the Trustworthiness of Generative Foundation Models 3. Mitigating Hallucinations via Causal Reasoning(AAAI 2026) 4. Treble Counterfactual VLMs: A Causal Approach to Hallucination(EMNLP 2025 Findings) 5. StealthRank: LLM Ranking Manipulation 6. Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers |
Sign-up |
| Week 2 Wed, Jan 21 |
Backdoor and jailbreak attacks against foundation models (LLMs and VLMs). |
1.
Composite Backdoor Attacks Against Large Language Models
(NAACL 2024 Findings) 2. Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning (EMNLP 2024) 3. A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models (ACL 2024 Findings) 4. MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models (AAAI 2025) 5. Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs |
|
| Week 3 Wed, Jan 28 |
Attacker capabilities and system surfaces: prompts, tools, retrieval, agents, training pipeline. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Part II. Attacks | |||
| Week 4 Wed, Feb 4 |
Prompt injection and tool-use attacks; agent security basics. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
One-page project proposal due with all the team members name on (by 23:59pm, PST) |
| Week 5 Wed, Feb 11 |
Jailbreaks and refusal bypass; attack evaluation and transfer. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 6 Wed, Feb 18 |
Data extraction and privacy leakage; memorization and training data exposure. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 7 Wed, Feb 25 |
Training-time attacks: poisoning, backdoors, and model supply-chain issues. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 8 Wed, Mar 4 |
RAG attacks: retrieval manipulation, content injection, and ranking effects. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Part III. Defenses and verification | |||
| Week 9 Wed, Mar 11 |
Defense patterns: input filtering, sandboxing, tool policies, and guardrails. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 10 Wed, Mar 18 |
NO CLASS — Spring Break | None | |
| Week 11 Wed, Mar 25 |
Mid Project Presentations Short midterm updates (e.g., 8–10 minutes per team) + Q&A. |
None | Mid project slides due (TBD) |
| Week 12 Wed, Apr 1 |
Detection and monitoring in production: telemetry, anomaly signals, and incident response. |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Part IV. Paper discussions and student projects | |||
| Week 13 Wed, Apr 8 |
Paper discussion (placeholder). |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 14 Wed, Apr 15 |
Paper discussion (placeholder). |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 15 Wed, Apr 22 |
Paper discussion (placeholder). |
Paper 1 (TBD): Title TBD Paper 2 (TBD): Title TBD |
|
| Week 16 Wed, Apr 29 |
Final Project Presentations In-class project presentations. |
None | Final report due (TBD) |
| Final TBD |
Final project report due (no in-class final exam). | None | Deadline (TBD) |