LLM-Driven Research Engineering

End-to-end anomaly detection from one command.

AD-AGENT turns natural-language requests into runnable anomaly detection pipelines across PyOD, PyGOD, and TSB-AD — with automated review, sandbox execution, and evaluation.

Supported by NSF POSE Phase II OpenAD (Award #2346158)

Get Started Read Paper

3 modalities

Tabular, graph, and time-series anomaly detection

Multi-agent

Processor, Selector, Generator, Reviewer, Evaluator

Automation

Parse, select, codegen, test, and run

What It Does

Running anomaly detection across different data modalities usually means switching libraries, re-learning APIs, and wiring up evaluation code by hand. AD-AGENT collapses that loop: a prompt describing the task you want to solve — "Run IForest on cardio.mat", "Detect anomalies in my graph data", or "Try all PyOD models on this dataset" — is turned into a working script, executed inside a secure sandbox, and evaluated end-to-end.

Core Features

Natural-language Interface

Write commands like: Run IForest on ./data/glass_train.mat and ./data/glass_test.mat.

Cross-library Support

Works across pyod (tabular), pygod (graph), and tsb_ad (time-series) — one interface for multiple data modalities.

Self-checking Pipeline

Generated code is reviewed on synthetic data before real execution.

Pipeline API

Call api.pipeline stages directly from Python to embed AD-AGENT in notebooks, scripts, or larger workflows.

Automatic Model Suggestion

When no algorithm is specified the selector agent recommends competitive candidates based on data modality and shape.

Secure Sandbox Execution

Generated code runs inside an isolated sandbox — Modal (remote, default) or Docker (local) — never on the host process.

Workflow

1Processor extracts algorithms, datasets, and params from user command.
2Selector infers data modality and selects AD library/tools.
3InfoMiner queries authoritative docs and model usage details.
4CodeGenerator creates runnable scripts and revises on errors.
5Reviewer tests on synthetic data; re-runs CodeGenerator on failure (up to 4 cycles).
6Evaluator executes the reviewed code on real data inside the sandbox and records AUROC/AUPRC.
7Optimizer (optional, -o) tunes hyper-parameters with LLM guidance and re-evaluates.

Quickstart

git clone git@github.com:USC-FORTIS/AD-AGENT.git
cd AD-AGENT
python -m venv .venv

# macOS / Linux
source .venv/bin/activate

# Windows
.venv\Scripts\activate

pip install -r requirements.txt
export OPENAI_API_KEY=your-api-key-here   # or set in src/config/config.py
python main.py

Then type a natural-language request, for example:

Run IForest on ./data/pyod_data/cardio.mat
Run DOMINANT on ./data/pygod_data/books.pt
Run IForest on ./data/SMAP/SMAP_train.npy
Run all on ./data/pyod_data/cardio.mat

Parallel: python main.py -p | Optimizer: python main.py -o | Sandbox: python main.py --sandbox docker

Sandbox Execution

AD-AGENT runs its agent workflow on the host machine and executes generated model scripts inside an isolated sandbox backend. This keeps the orchestration layer lightweight while moving package-heavy model execution into containers. Workflow logs ([main], [selector], [reviewer] …) come from the host; generated-script output is streamed back from the sandbox into the same terminal.

Modal (default)

Remote execution in a managed cloud sandbox. No local Docker required.

Prerequisite — install and authenticate once:

pip install modal
modal setup

Paths inside Modal: /workspace (script), /data (datasets).

Docker

Local container execution. Requires Docker Desktop to be running. Useful for offline or air-gapped environments.

Resource limits applied to every container:

--memory=4g  --cpus=2  --rm

Dataset files are bind-mounted read-only at the requested in-container paths.

Quick Start

python main.py --sandbox docker
python main.py --sandbox modal

# with debug retention (Modal only — sandboxes kept for post-run inspection)
ADAGENT_SANDBOX_DEBUG=1 python main.py --sandbox modal

Configuration

Sandbox mode is resolved in this priority order:

1--sandbox flag passed to main.py
2Environment variable ADAGENT_SANDBOX
3Legacy OPENAD_SANDBOX (backward compatibility)
4src/config/settings.yaml if present
5Default: modal

Additional environment variables:

ADAGENT_SANDBOX_DEBUG=1        # retain Modal sandboxes after run for inspection
ADAGENT_MODAL_APP_NAME=...     # override Modal app name  (default: adagent-sandbox)
ADAGENT_MODAL_VOLUME_NAME=...  # override Modal volume name (default: adagent-data)

Supported Package Images

Docker image tags

adagent-pyod:latest
adagent-pygod:latest
adagent-tsb-ad:latest

Modal image definitions

PYOD_IMAGE
PYGOD_IMAGE
TSB_AD_IMAGE

pygod requires PyG wheel deps (pyg_lib, torch_sparse, torch_scatter).

API Reference

api.pipeline module

Step-by-step pipeline functions for anomaly detection. Each function can be called individually (pass explicit arguments) or inside a graph (pass a FullToolState).

class api.pipeline.FullToolState

Bases: TypedDict

Shared state dictionary passed between all pipeline nodes.

Attributes

messages Sequence[Any]: Accumulated message history across pipeline stages.
current_tool str: Name of the algorithm currently being processed.
input_parameters dict: User-supplied hyper-parameters for the algorithm.
data_path_train str: Path to the training dataset file.
data_path_test str: Path to the testing dataset file.
package_name str: AD library in use: "pyod", "pygod", or "tsb_ad".
code_quality CodeQuality | None: Evaluation result attached to the current candidate code.
should_rerun bool: Flag indicating whether the code generation step should retry.
experiment_config dict | None: Structured experiment configuration produced by the Processor.
algorithm_doc str | None: Documentation string fetched by the InfoMiner for the current tool.
feature_dim int | None: Feature dimensionality of the dataset, inferred by the Selector.
metadata dict | None: Dataset metadata (e.g. num_samples, has_labels) inferred by the Selector.
results List[Tuple[str, Any]] | None: Collected (tool, final_state) pairs after all tools are processed.

api.pipeline.build_state()

Create a default FullToolState with all agent instances initialised.

Returns

state dict: Fully initialised pipeline state with default values for every key.

api.pipeline.run_processor(state=None)

Launch the interactive chatbot to collect algorithm, dataset, and parameter information from the user and populate experiment_config.

Parameters

state FullToolState, optional: Existing pipeline state to reuse. A new default state is created when None.

Returns

state dict: Updated pipeline state with experiment_config populated from user input.

api.pipeline.run_selector(algorithm=None, dataset_train=None, dataset_test=None, parameters=None, state=None)

Resolve the AD library and tool list from experiment configuration or explicit arguments. Infers package_name, feature_dim, and dataset metadata.

Parameters

algorithm list[str], optional: Algorithm names to run. Pass "all" to use every available algorithm for the detected library, or None to let the agent decide.
dataset_train str, optional: Path to the training dataset. Required when state is None.
dataset_test str, optional: Path to the testing dataset.
parameters dict, optional: Algorithm hyper-parameters to forward to the generator.
state FullToolState, optional: Existing pipeline state whose experiment_config is used when provided.

Returns

state dict: Updated state with agent_selector, package_name, feature_dim, and metadata set.

api.pipeline.run_info_miner(algorithm=None, package_name=None, state=None)

Query authoritative documentation for an algorithm and store the result in algorithm_doc. Results are cached to disk to avoid redundant API calls.

Parameters

algorithm str, optional: Algorithm name to look up. Required when state is None.
package_name str, optional: Package to query ("pyod", "pygod", "tsb_ad"). Required when state is None.
state FullToolState, optional: Existing pipeline state containing current_tool and package_name.

Returns

state dict: Updated state with algorithm_doc set to the retrieved documentation string.

api.pipeline.run_code_generator(tool=None, data_path_train=None, algorithm_doc=None, package_name=None, data_path_test=None, input_parameters=None, code_quality=None, metadata=None, state=None)

Generate an initial runnable script for the algorithm, or revise an existing one when a previous CodeQuality with errors is supplied.

Parameters

tool str, optional: Algorithm name. Required when state is None.
data_path_train str, optional: Path to the training dataset. Required when state is None.
algorithm_doc str, optional: Documentation string for the algorithm (from run_info_miner). Required when state is None.
package_name str, optional: AD library name. Required when state is None.
data_path_test str, optional: Path to the testing dataset.
input_parameters dict, optional: Hyper-parameters to embed in the generated code.
code_quality CodeQuality, optional: Previous CodeQuality with error_message set, triggering a revision pass instead of fresh generation.
metadata dict, optional: Dataset metadata to pass to the generator for data-shape-aware code.
state FullToolState, optional: Existing pipeline state. All of the above are read from state when provided.

Returns

state dict: Updated state with code_quality.code containing the generated or revised script.

api.pipeline.run_reviewer(code_quality=None, tool=None, state=None)

Execute the generated code against synthetic data to catch runtime errors before real-data evaluation. Updates code_quality.error_message and increments review_count on failure.

Parameters

code_quality CodeQuality, optional: Code to review. Required when state is None.
tool str, optional: Algorithm name. Required when state is None.
state FullToolState, optional: Existing pipeline state.

Returns

state dict: Updated state with code_quality.error_message set (empty string on success).

api.pipeline.run_codegenerator_reviewer_loop(tool, data_path_train, algorithm_doc=None, package_name=None, data_path_test=None, input_parameters=None, max_reviews=2)

Convenience loop that alternates code generation and synthetic review until the code passes or max_reviews is reached.

Parameters

tool str: Algorithm name.
data_path_train str: Path to the training dataset.
algorithm_doc str, optional: Documentation string for the algorithm.
package_name str, optional: AD library name.
data_path_test str, optional: Path to the testing dataset.
input_parameters dict, optional: Hyper-parameters to embed in the generated code.
max_reviews int, optional (default=2): Maximum number of review–revision cycles before exiting the loop.

Returns

state dict: Final state after the last review cycle, containing code_quality.

api.pipeline.run_evaluator(code_quality=None, tool=None, state=None)

Execute the reviewed code on real training and testing data inside the configured sandbox and compute AUROC / AUPRC metrics.

Parameters

code_quality CodeQuality, optional: Code to evaluate. Required when state is None.
tool str, optional: Algorithm name. Required when state is None.
state FullToolState, optional: Existing pipeline state.

Returns

state dict: Updated state with code_quality.auroc and code_quality.auprc populated.

api.pipeline.run_optimizer(code_quality=None, algorithm_doc=None, state=None)

Use an LLM to propose improved hyper-parameters and re-evaluate. Only active when the -o flag is passed on the command line. Returns the original code_quality unchanged otherwise.

Parameters

code_quality CodeQuality, optional: Baseline result to optimise. Required when state is None.
algorithm_doc str, optional: Documentation string for the algorithm. Required when state is None.
state FullToolState, optional: Existing pipeline state.

Returns

state dict: Updated state with tuned code_quality after up to 8 optimisation steps.

api.pipeline.run_evaluator_optimizer_loop(cq, tool, algorithm_doc=None, optimizer_cycles=1)

Run an initial evaluation pass and then alternate optimizer and evaluator for optimizer_cycles iterations. Exits early if any step returns an error.

Parameters

cq CodeQuality: Initial code quality object with reviewed code ready for evaluation.
tool str: Algorithm name.
algorithm_doc str, optional: Documentation string for the algorithm.
optimizer_cycles int, optional (default=1): Number of optimise-then-evaluate cycles after the initial evaluation.

Returns

state dict: Final state with the best code_quality achieved across all cycles.

api.pipeline.check_dataset_exists(dataset_train, dataset_test=None)

Validate that dataset files exist on disk before the pipeline starts.

Parameters

dataset_train str: Path to the training dataset.
dataset_test str, optional: Path to the testing dataset.

Raises

FileNotFoundError: If either dataset path does not exist on the filesystem.

api.pipeline.log_local(stage, message, tool=None)

Print a formatted stage-tagged log line. Inserts a blank separator line when the stage or tool context changes.

Parameters

stage str: Pipeline stage label, e.g. "code_generator", "reviewer".
message str: Log message to print.
tool str, optional: Algorithm/tool name appended to the prefix as [stage][tool].

Citation

If this project helps your work, cite the paper:

@inproceedings{yang2025ad,
  title={AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection},
  author={Yang, Tiankai and Liu, Junjun and Siu, Michael and Wang, Jiahang
          and Qian, Zhuangzhuang and Song, Chanjuan and Cheng, Cheng
          and Hu, Xiyang and Zhao, Yue},
  booktitle={Proceedings of the 14th International Joint Conference on
             Natural Language Processing and the 4th Conference of the
             Asia-Pacific Chapter of the Association for Computational Linguistics},
  pages={191--205},
  year={2025}
}

Support

This project is supported by the U.S. National Science Foundation (NSF), TIP POSE program: NSF POSE: Phase II: OpenAD: An Integrated Open-Source Ecosystem for Anomaly Detection.

Award ID: 2346158 | Status: Active | Period: Jun 15, 2024 - May 31, 2027

Lead institution: University of Illinois at Chicago. Partners: Illinois Institute of Technology, Lehigh University, University of Southern California.

NSF Program Director: Florence Rabanal. Award page