template package¶

Subpackages¶

template.runner package
- Subpackages
- Module contents

Submodules¶

template.CL_arguments module¶

template.CL_arguments.parse_arguments(args=None)[source]¶: Argument Parser

template.RunMe module¶

This file is the main entry point of DeepDIVA.

We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results.

It is completely open source and accessible as Web Service through DIVAService

>> Official website: https://diva-dia.github.io/DeepDIVAweb/ >> GitHub repo: https://github.com/DIVA-DIA/DeepDIVA >> Tutorials: https://diva-dia.github.io/DeepDIVAweb/articles.html

authors: Michele Alberti and Vinaychandran Pondenkandath (equal contribution)

class template.RunMe.RunMe[source]¶

Bases: object

This class is used as entry point of DeepDIVA. The there are four main scenarios for using the framework:

Single run: (classic) run an experiment once with the given parameters specified by
command line. This is typical usage scenario.

Multi run: this will run multiple times an experiment. It basically runs the single run
scenario multiple times and aggregates the results. This is particularly useful to counter effects of randomness.

Optimize with SigOpt: this will start an hyper-parameter optimization search with the aid
of SigOpt (State-of-the-art Bayesian optimization tool). For more info on how to use it see the tutorial page on: https://diva-dia.github.io/DeepDIVAweb/articles.html

Optimize manually: this will start a grid-like hyper-parameter optimization with the
boundaries for the values specifies by the user in a provided file. This is much less efficient than using SigOpt but on the other hand is not using any commercial solutions.

main(args=None)[source]¶

Select the use case based on the command line arguments and delegate the execution to the most appropriate sub-routine

Returns

train_scores (ndarray[floats] of size (1, epochs) or None) – Score values for train split
val_scores (ndarray[floats] of size (1, `epochs`+1) or None) – Score values for validation split
test_scores (float or None) – Score value for test split

parser = None¶

template.setup module¶

template.setup.copy_code(output_folder)[source]¶

Makes a tar file with DeepDIVA that exists during runtime.

Parameters: output_folder (str) – Path to output directory
Returns
Return type: None

template.setup.set_up_dataloaders(model_expected_input_size, dataset_folder, batch_size, workers, disable_dataset_integrity, enable_deep_dataset_integrity, inmem=False, **kwargs)[source]¶

Set up the dataloaders for the specified datasets.

Parameters

model_expected_input_size (tuple) – Specify the height and width that the model expects.
dataset_folder (string) – Path string that points to the three folder train/val/test. Example: ~/../../data/svhn
batch_size (int) – Number of datapoints to process at once
workers (int) – Number of workers to use for the dataloaders
inmem (boolean) – Flag: if False, the dataset is loaded in an online fashion i.e. only file names are stored and images are loaded on demand. This is slower than storing everything in memory.

Returns

train_loader (torch.utils.data.DataLoader)
val_loader (torch.utils.data.DataLoader)
test_loader (torch.utils.data.DataLoader) – Dataloaders for train, val and test.
int – Number of classes for the model.

template.setup.set_up_env(gpu_id, seed, multi_run, no_cuda, **kwargs)[source]¶

Set up the execution environment.

Parameters

gpu_id (string) – Specify the GPUs to be used
seed (int) – Seed all possible seeds for deterministic run
multi_run (int) – Number of runs over the same code to produce mean-variance graph.
no_cuda (bool) – Specify whether to use the GPU or not

Returns

Return type

None

template.setup.set_up_logging(parser, experiment_name, output_folder, quiet, args_dict, debug, **kwargs)[source]¶

Set up a logger for the experiment

Parameters

parser (parser) – The argument parser
experiment_name (string) – Name of the experiment. If not specify, accepted from command line.
output_folder (string) – Path to where all experiment logs are stored.
quiet (bool) – Specify whether to print log to console or only to text file
debug (bool) – Specify the logging level
args_dict (dict) – Contains the entire argument dictionary specified via command line.

Returns

log_folder (String) – The final logging folder tree
writer (tensorboardX.writer.SummaryWriter) – The tensorboard writer object. Used to log values on file for the tensorboard visualization.

template.setup.set_up_model(output_channels, model_name, pretrained, no_cuda, resume, load_model, disable_databalancing, dataset_folder, inmem, workers, optimizer_name=None, criterion_name=None, num_classes=None, ablate=False, **kwargs)[source]¶

Instantiate model, optimizer, criterion. Load a pretrained model or resume from a checkpoint.

Parameters

output_channels (int) – Specify shape of final layer of network. Only used if num_classes is not specified.
model_name (string) – Name of the model
pretrained (bool) – Specify whether to load a pretrained model or not
optimizer_name (string) – Name of the optimizer
criterion_name (string) – Name of the criterion
no_cuda (bool) – Specify whether to use the GPU or not
resume (string) – Path to a saved checkpoint
load_model (string) – Path to a saved model
start_epoch (int) – Epoch from which to resume training. If if not resuming a previous experiment the value is 0
disable_databalancing (boolean) – If True the criterion will not be fed with the class frequencies. Use with care.
dataset_folder (String) – Location of the dataset on the file system
inmem (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
num_classes (int) – Number of classes for the model
ablate (boolean) – If True, remove the final layer of the given model.

Returns

model (nn.Module) – The actual model
criterion (nn.loss) – The criterion for the network
optimizer (torch.optim) – The optimizer for the model
best_value (float) – Specifies the former best value obtained by the model. Relevant only if you are resuming training.

template.test_RunMe module¶

Warning: LONG RUNTIME TESTS!

This test suite is designed to verify that the main components of the framework are not broken. It is expected that smaller components or sub-parts are tested individually.

As we all know this will probably never happen, we will at least verify that the overall features are correct and fully functional. These tests will take long time to run and are not supposed to be run frequently. Nevertheless, it is important that before a PR or a push on the master branch the main functions can be tested.

Please keep the list of these tests up to date as soon as you add new features.

template.test_RunMe.test_one()[source]¶

Verify the sizes of the return of execute
Image classification with default parameters

template package¶

Subpackages¶

Submodules¶

template.CL_arguments module¶

template.RunMe module¶

template.setup module¶

template.test_RunMe module¶

Module contents¶