datasets package¶

Submodules¶

datasets.bidimensional_dataset module¶

Load a dataset of bidimensional points by specifying the folder where its located.

class datasets.bidimensional_dataset.Bidimensional(path, transform=None, target_transform=None)[source]¶

Bases: torch.utils.data.dataset.Dataset

This class loads the data.csv file and prepares it as a dataset.

datasets.bidimensional_dataset.load_dataset(dataset_folder)[source]¶

Loads the dataset from file system and provides the dataset splits for train validation and test

The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.

Example:

dataset_folder = “~/../../data/bd_xor”

which contains the splits sub-folders as follow:

‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test

Parameters

dataset_folder (string) – Path to the dataset on the file System

Returns

train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits

datasets.image_folder_dataset module¶

Load a dataset of images by specifying the folder where its located.

class datasets.image_folder_dataset.ImageFolderApply(path, transform=None, target_transform=None, classify=False)[source]¶

Bases: torch.utils.data.dataset.Dataset

TODO fill me

class datasets.image_folder_dataset.ImageFolderInMemory(path, transform=None, target_transform=None, workers=1)[source]¶

Bases: torch.utils.data.dataset.Dataset

This class loads the data provided and stores it entirely in memory as a dataset.

It makes use of torchvision.datasets.ImageFolder() to create a dataset. Afterward all images are sequentially stored in memory for faster use when paired with dataloders. It is responsibility of the user ensuring that the dataset actually fits in memory.

datasets.image_folder_dataset.load_dataset(dataset_folder, in_memory=False, workers=1)[source]¶

Loads the dataset from file system and provides the dataset splits for train validation and test

The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.

Example:

dataset_folder = “~/../../data/cifar”

which contains the splits sub-folders as follow:

‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test

In each of the three splits (train, val, test) should have different classes in a separate folder with the class name. The file name can be arbitrary i.e. it does not have to be 0-* for classes 0 of MNIST.

Example:

train/dog/whatever.png train/dog/you.png train/dog/like.png

train/cat/123.png train/cat/nsdf3.png train/cat/asd932_.png

train/”class_name”/*.png

Parameters

dataset_folder (string) – Path to the dataset on the file System
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders

Returns

train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits

datasets.image_folder_triplet module¶

Load a dataset of images by specifying the folder where its located and prepares it for triplet similarity matching training.

class datasets.image_folder_triplet.ImageFolderTriplet(path, train=None, num_triplets=None, in_memory=None, transform=None, target_transform=None, workers=None)[source]¶

Bases: torch.utils.data.dataset.Dataset

This class loads the data provided and stores it entirely in memory as a dataset. Additionally, triplets will be generated in the format of [a, p, n] and their file names stored in memory.

generate_triplets()[source]¶: Generate triplets for training. Triplets have format [anchor, positive, negative]

datasets.image_folder_triplet.load_dataset(dataset_folder, num_triplets=None, in_memory=False, workers=1, only_evaluate=False)[source]¶

Loads the dataset from file system and provides the dataset splits for train validation and test.

The dataset is expected to be in the same structure as described in image_folder_dataset.load_dataset()

Parameters

dataset_folder (string) – Path to the dataset on the file System
num_triplets (int) – Number of triplets [a, p, n] to generate on dataset creation
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
only_evaluate (boolean) – Flag : if True, only the test set is loaded.

Returns

train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits

datasets.multi_label_image_folder_dataset module¶

Load a dataset of images by specifying the folder where its located.

class datasets.multi_label_image_folder_dataset.MultiLabelImageFolder(path, transform=None, target_transform=None, workers=1)[source]¶

Bases: torch.utils.data.dataset.Dataset

This class loads the multi-label image data provided.

datasets.multi_label_image_folder_dataset.load_dataset(dataset_folder, in_memory=False, workers=1)[source]¶

Loads the dataset from file system and provides the dataset splits for train validation and test

The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.

Example:

dataset_folder = “~/../../data/dataset_folder”

which contains the splits sub-folders as follow:

‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test

Each of the three splits (train, val, test) should contain a folder called ‘images’ containing all of the images (the file names of the images can be arbitrary). The split folder should also contain a csv file called ‘labels.csv’ formatted so:

filename,class_0,class_1,…,class_n images/img_1.png,1,-1,-1,…,1

where the filename is the relative path to the image file from the split folder and 1/-1 to indicate presence/absence of a particular label.

Example:

train/image/whatever.png train/image/you.png train/image/like.png train/labels.csv

and the labels.csv would contain:

filename,cat,dog,elephant image/whatever.png,1,1,-1 image/you.png,1,-1,-1 image/like.png,-1,1,1

Parameters

dataset_folder (string) – Path to the dataset on the file System
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders

Returns

train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits

datasets package¶

Submodules¶

datasets.bidimensional_dataset module¶

datasets.image_folder_dataset module¶

datasets.image_folder_triplet module¶

datasets.multi_label_image_folder_dataset module¶

Module contents¶