BMAD Datamodule#
BMAD Data Module.
This module provides a PyTorch Lightning DataModule for the BMAD (Benchmarks for Medical Anomaly Detection) dataset. If the dataset is not available locally, it will be downloaded and prepared automatically.
BMAD is a standardized benchmark comprising six reorganized public medical-imaging datasets from five domains: brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology. Within these datasets, three support pixel-level anomaly localization, while the remaining three are for sample-level anomaly detection only :contentReference[oaicite:0]{index=0}.
Example
Create a BMAD datamodule:
>>> from anomalib.data import BMAD
>>> datamodule = BMAD(
... root="./datasets/BMAD",
... dataset="Brain", # options: "Brain", "Chest", "Histopathology", "Liver", "Retina_OCT2017",
"Retina_RESC"
... )
Notes
The dataset will be automatically downloaded and reorganized upon first usage. Directory structure after preparation may look like:
datasets/
└── BMAD/
├── Brain/
│ ├── train/
│ │ └── good/
│ ├── valid/
│ │ ├── good/
│ │ └── Ungood/ (if applicable with masks for localization)
│ └── test/
│ ├── good/
│ └── Ungood/
├── Liver/
├── Retina_OCT2017/
├── Retina_RESC/
├── Chest/
└── Histopathology/
- License:
BMAD dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). https://creativecommons.org/licenses/by-nc-sa/4.0/
- Reference:
Jinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang, Xingyu Li: “BMAD: Benchmarks for Medical Anomaly Detection,” arXiv preprint arXiv:2306.11876, 2023. DOI: 10.48550/arXiv.2306.11876 https://arxiv.org/abs/2306.11876
- class anomalib.data.datamodules.image.bmad.BMAD(root='./datasets/BMAD', category='Brain', train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, test_split_mode=TestSplitMode.FROM_DIR, test_split_ratio=0.2, val_split_mode=ValSplitMode.FROM_DIR, val_split_ratio=None, seed=None)#
Bases:
AnomalibDataModuleBMAD Datamodule.
- Parameters:
root (Path | str) – Path to the root of the dataset. Defaults to
"./datasets/BMAD".category (str) – Category of the BMAD dataset (e.g.
"Brain","Liver","Retina_OCT2017","Retina_RESC","Chest", or"Histopathology"). Defaults to"Brain".train_batch_size (int, optional) – Training batch size. Defaults to
32.eval_batch_size (int, optional) – Test batch size. Defaults to
32.num_workers (int, optional) – Number of workers. Defaults to
8.train_augmentations (Transform | None) – Augmentations to apply to the training images. Defaults to
None.val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to
None.test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to
None.augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
test_split_mode (TestSplitMode | str) – Method to create test set. Defaults to
TestSplitMode.FROM_DIR.test_split_ratio (float) – Fraction of data to use for testing. Defaults to
0.2.val_split_mode (ValSplitMode | str) – Method to create validation set. Defaults to
ValSplitMode.SAME_AS_TEST.val_split_ratio (float) – Fraction of data to use for validation. Defaults to
0.5.seed (int | None, optional) – Seed for reproducibility. Defaults to
None.
Example
Create BMAD datamodule with default settings:
>>> datamodule = BMAD() >>> datamodule.setup() >>> i, data = next(enumerate(datamodule.train_dataloader())) >>> data.image.shape torch.Size([32, 3, 240, 240])
Change the category:
>>> datamodule = BMAD(category="Liver")
Use Retina_RESC:
>>> datamodule = BMAD(category="Retina_RESC")
Create validation set from test data:
>>> datamodule = BMAD( ... val_split_mode=ValSplitMode.FROM_TEST, ... val_split_ratio=0.1 ... )
- prepare_data()#
Download the dataset if not available.
This method checks if the specified dataset is available in the file system. If not, it downloads and extracts the dataset into the appropriate directory.
- Return type:
Example
Assume the dataset is not available on the file system:
>>> datamodule = BMAD( ... root="./datasets/BMAD", ... category="Brain" ... ) >>> datamodule.prepare_data()
Directory structure after download:
datasets/ └── BMAD/ ├── Brain/ ├── Liver/ └── ...