Benchmark Job#

class anomalib.pipelines.benchmark.job.BenchmarkJob(accelerator, model, datamodule, seed, flat_cfg)#

Bases: Job

Benchmarking job for evaluating anomaly detection models.

This class implements a benchmarking job that evaluates model performance by training and testing on a given dataset. It collects metrics like accuracy, F1-score, and timing information.

Parameters:
  • accelerator (str) – Type of accelerator to use for computation (e.g. "cpu", "gpu").

  • model (AnomalibModule) – Anomaly detection model instance to benchmark.

  • datamodule (AnomalibDataModule) – Data module providing the dataset.

  • seed (int) – Random seed for reproducibility.

  • flat_cfg (dict) – Flattened configuration dictionary with dotted keys.

Example

>>> from anomalib.data import MVTecAD
>>> from anomalib.models import Padim
>>> from anomalib.pipelines.benchmark.job import BenchmarkJob
>>> # Initialize model, datamodule and job
>>> model = Padim()
>>> datamodule = MVTecAD(category="bottle")
>>> job = BenchmarkJob(
...     accelerator="gpu",
...     model=model,
...     datamodule=datamodule,
...     seed=42,
...     flat_cfg={"model.name": "padim"}
... )
>>> # Run the benchmark job
>>> results = job.run()

The job executes model training and evaluation, collecting metrics like accuracy, F1-score, and inference time. Results are returned in a standardized format for comparison across different model-dataset combinations.

static collect(results)#

Collect and aggregate results from multiple benchmark runs.

Parameters:

results (list[dict[str, Any]]) – List of result dictionaries from individual benchmark runs.

Returns:

DataFrame containing aggregated results with each row

representing a benchmark run.

Return type:

pd.DataFrame

run(task_id=None)#

Run the benchmark job.

This method executes the full benchmarking pipeline including model training and testing. It measures execution time for different stages and collects performance metrics.

Parameters:

task_id (int | None, optional) – ID of the task when running in distributed mode. When provided, the job will use the specified device. Defaults to None.

Returns:

Dictionary containing benchmark results including:
  • Timing information (job, fit and test duration)

  • Model configuration

  • Performance metrics from testing

Return type:

dict[str, Any]

static save(result)#

Save benchmark results to CSV file.

The results are saved in the runs/benchmark/YYYY-MM-DD-HH_MM_SS directory. The method also prints a tabular view of the results.

Parameters:

result (pd.DataFrame) – DataFrame containing benchmark results to save.

Return type:

None