ShanghaiTech Datamodule

ShanghaiTech Datamodule#

ShanghaiTech Campus Data Module.

This module provides a PyTorch Lightning DataModule for the ShanghaiTech Campus dataset. The dataset must be downloaded manually from the official project page (the original automated mirror is no longer available). The video files are converted to a format readable by pyav during preparation.

Example

Create a ShanghaiTech datamodule:

>>> from anomalib.data import ShanghaiTech
>>> datamodule = ShanghaiTech(
...     root="./datasets/shanghaitech",
...     scene=1,
...     clip_length_in_frames=2,
...     frames_between_clips=1,
... )
>>> datamodule.setup()
>>> i, data = next(enumerate(datamodule.train_dataloader()))
>>> data.keys()
dict_keys(['image', 'video_path', 'frames', 'label'])

Notes

The directory structure after preparation will be:

root/
├── testing/
│   ├── frames/
│   ├── test_frame_mask/
│   └── test_pixel_mask/
└── training/
    ├── frames/
    ├── converted_videos/
    └── videos/

License:: ShanghaiTech Campus Dataset is released under the BSD 2-Clause License.
Reference:: Liu, W., Luo, W., Lian, D., & Gao, S. (2018). Future frame prediction for anomaly detection–a new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6536-6545).

class anomalib.data.datamodules.video.shanghaitech.ShanghaiTech(root='./datasets/shanghaitech', scene=1, clip_length_in_frames=2, frames_between_clips=1, target_frame=VideoTargetFrame.LAST, train_batch_size=32, eval_batch_size=32, num_workers=8, train_augmentations=None, val_augmentations=None, test_augmentations=None, augmentations=None, val_split_mode=ValSplitMode.SAME_AS_TEST, val_split_ratio=0.5, seed=None)#

Bases: AnomalibVideoDataModule

ShanghaiTech DataModule class.

Parameters:

root (Path | str | None) – Path to the root directory of the dataset. Defaults to "./datasets/shanghaitech".
scene (int) – Scene index in range [1, 13]. Defaults to 1.
clip_length_in_frames (int) – Number of frames in each video clip. Defaults to 2.
frames_between_clips (int) – Number of frames between consecutive clips. Defaults to 1.
target_frame (VideoTargetFrame) – Specifies which frame in the clip should be used for ground truth. Defaults to VideoTargetFrame.LAST.
train_batch_size (int) – Training batch size. Defaults to 32.
eval_batch_size (int) – Test batch size. Defaults to 32.
num_workers (int) – Number of workers for data loading. Defaults to 8.
train_augmentations (Transform | None) – Augmentations to apply to the training images Defaults to None.
val_augmentations (Transform | None) – Augmentations to apply to the validation images. Defaults to None.
test_augmentations (Transform | None) – Augmentations to apply to the test images. Defaults to None.
augmentations (Transform | None) – General augmentations to apply if stage-specific augmentations are not provided.
val_split_mode (ValSplitMode) – Setting that determines how validation subset is obtained. Defaults to ValSplitMode.SAME_AS_TEST.
val_split_ratio (float) – Fraction of train or test images that will be reserved for validation. Defaults to 0.5.
seed (int | None) – Random seed for reproducibility. Defaults to None.

prepare_data()#

Verify that the dataset is available and convert video files.

The dataset must be downloaded manually. If it is not found, a RuntimeError is raised with instructions pointing to the official project page.

Raises:: RuntimeError – If the dataset is missing, with instructions for downloading it manually.
Return type:: None

anomalib.data.datamodules.video.shanghaitech.get_download_instructions(root_path)#

Get manual download instructions for the ShanghaiTech Campus dataset.

The automated mirror originally hosted by the dataset authors is no longer reachable, so the dataset must be downloaded manually from the official project page.

Parameters:: root_path (Path) – Path where the dataset should be placed.
Returns:: Formatted download instructions.
Return type:: str

ShanghaiTech Datamodule

Contents

ShanghaiTech Datamodule#