datasets package¶
Submodules¶
datasets.bidimensional_dataset module¶
Load a dataset of bidimensional points by specifying the folder where its located.
-
class
datasets.bidimensional_dataset.
Bidimensional
(path, transform=None, target_transform=None)[source]¶ Bases:
torch.utils.data.dataset.Dataset
This class loads the data.csv file and prepares it as a dataset.
-
datasets.bidimensional_dataset.
load_dataset
(dataset_folder)[source]¶ Loads the dataset from file system and provides the dataset splits for train validation and test
The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.
Example:
dataset_folder = “~/../../data/bd_xor”
which contains the splits sub-folders as follow:
‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test
- Parameters
dataset_folder (string) – Path to the dataset on the file System
- Returns
train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits
datasets.image_folder_dataset module¶
Load a dataset of images by specifying the folder where its located.
-
class
datasets.image_folder_dataset.
ImageFolderApply
(path, transform=None, target_transform=None, classify=False)[source]¶ Bases:
torch.utils.data.dataset.Dataset
TODO fill me
-
class
datasets.image_folder_dataset.
ImageFolderInMemory
(path, transform=None, target_transform=None, workers=1)[source]¶ Bases:
torch.utils.data.dataset.Dataset
This class loads the data provided and stores it entirely in memory as a dataset.
It makes use of torchvision.datasets.ImageFolder() to create a dataset. Afterward all images are sequentially stored in memory for faster use when paired with dataloders. It is responsibility of the user ensuring that the dataset actually fits in memory.
-
datasets.image_folder_dataset.
load_dataset
(dataset_folder, in_memory=False, workers=1)[source]¶ Loads the dataset from file system and provides the dataset splits for train validation and test
The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.
Example:
dataset_folder = “~/../../data/cifar”
which contains the splits sub-folders as follow:
‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test
In each of the three splits (train, val, test) should have different classes in a separate folder with the class name. The file name can be arbitrary i.e. it does not have to be 0-* for classes 0 of MNIST.
Example:
- Parameters
dataset_folder (string) – Path to the dataset on the file System
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
- Returns
train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits
datasets.image_folder_triplet module¶
Load a dataset of images by specifying the folder where its located and prepares it for triplet similarity matching training.
-
class
datasets.image_folder_triplet.
ImageFolderTriplet
(path, train=None, num_triplets=None, in_memory=None, transform=None, target_transform=None, workers=None)[source]¶ Bases:
torch.utils.data.dataset.Dataset
This class loads the data provided and stores it entirely in memory as a dataset. Additionally, triplets will be generated in the format of [a, p, n] and their file names stored in memory.
-
datasets.image_folder_triplet.
load_dataset
(dataset_folder, num_triplets=None, in_memory=False, workers=1, only_evaluate=False)[source]¶ Loads the dataset from file system and provides the dataset splits for train validation and test.
The dataset is expected to be in the same structure as described in image_folder_dataset.load_dataset()
- Parameters
dataset_folder (string) – Path to the dataset on the file System
num_triplets (int) – Number of triplets [a, p, n] to generate on dataset creation
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
only_evaluate (boolean) – Flag : if True, only the test set is loaded.
- Returns
train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits
datasets.multi_label_image_folder_dataset module¶
Load a dataset of images by specifying the folder where its located.
-
class
datasets.multi_label_image_folder_dataset.
MultiLabelImageFolder
(path, transform=None, target_transform=None, workers=1)[source]¶ Bases:
torch.utils.data.dataset.Dataset
This class loads the multi-label image data provided.
-
datasets.multi_label_image_folder_dataset.
load_dataset
(dataset_folder, in_memory=False, workers=1)[source]¶ Loads the dataset from file system and provides the dataset splits for train validation and test
The dataset is expected to be in the following structure, where ‘dataset_folder’ has to point to the root of the three folder train/val/test.
Example:
dataset_folder = “~/../../data/dataset_folder”
which contains the splits sub-folders as follow:
‘dataset_folder’/train ‘dataset_folder’/val ‘dataset_folder’/test
Each of the three splits (train, val, test) should contain a folder called ‘images’ containing all of the images (the file names of the images can be arbitrary). The split folder should also contain a csv file called ‘labels.csv’ formatted so:
filename,class_0,class_1,…,class_n images/img_1.png,1,-1,-1,…,1
where the filename is the relative path to the image file from the split folder and 1/-1 to indicate presence/absence of a particular label.
Example:
train/image/whatever.png train/image/you.png train/image/like.png train/labels.csv
and the labels.csv would contain:
filename,cat,dog,elephant image/whatever.png,1,1,-1 image/you.png,1,-1,-1 image/like.png,-1,1,1
- Parameters
dataset_folder (string) – Path to the dataset on the file System
in_memory (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
- Returns
train_ds (data.Dataset)
val_ds (data.Dataset)
test_ds (data.Dataset) – Train, validation and test splits