template package

Subpackages

Submodules

template.CL_arguments module

template.CL_arguments.parse_arguments(args=None)[source]

Argument Parser

template.RunMe module

This file is the main entry point of DeepDIVA.

We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results.

It is completely open source and accessible as Web Service through DIVAService

>> Official website: https://diva-dia.github.io/DeepDIVAweb/ >> GitHub repo: https://github.com/DIVA-DIA/DeepDIVA >> Tutorials: https://diva-dia.github.io/DeepDIVAweb/articles.html

authors: Michele Alberti and Vinaychandran Pondenkandath (equal contribution)

class template.RunMe.RunMe[source]

Bases: object

This class is used as entry point of DeepDIVA. The there are four main scenarios for using the framework:

  • Single run: (classic) run an experiment once with the given parameters specified by

    command line. This is typical usage scenario.

  • Multi run: this will run multiple times an experiment. It basically runs the single run

    scenario multiple times and aggregates the results. This is particularly useful to counter effects of randomness.

  • Optimize with SigOpt: this will start an hyper-parameter optimization search with the aid

    of SigOpt (State-of-the-art Bayesian optimization tool). For more info on how to use it see the tutorial page on: https://diva-dia.github.io/DeepDIVAweb/articles.html

  • Optimize manually: this will start a grid-like hyper-parameter optimization with the

    boundaries for the values specifies by the user in a provided file. This is much less efficient than using SigOpt but on the other hand is not using any commercial solutions.

main(args=None)[source]

Select the use case based on the command line arguments and delegate the execution to the most appropriate sub-routine

Returns

  • train_scores (ndarray[floats] of size (1, epochs) or None) – Score values for train split

  • val_scores (ndarray[floats] of size (1, `epochs`+1) or None) – Score values for validation split

  • test_scores (float or None) – Score value for test split

parser = None

template.setup module

template.setup.copy_code(output_folder)[source]

Makes a tar file with DeepDIVA that exists during runtime.

Parameters

output_folder (str) – Path to output directory

Returns

Return type

None

template.setup.set_up_dataloaders(model_expected_input_size, dataset_folder, batch_size, workers, disable_dataset_integrity, enable_deep_dataset_integrity, inmem=False, **kwargs)[source]

Set up the dataloaders for the specified datasets.

Parameters
  • model_expected_input_size (tuple) – Specify the height and width that the model expects.

  • dataset_folder (string) – Path string that points to the three folder train/val/test. Example: ~/../../data/svhn

  • batch_size (int) – Number of datapoints to process at once

  • workers (int) – Number of workers to use for the dataloaders

  • inmem (boolean) – Flag: if False, the dataset is loaded in an online fashion i.e. only file names are stored and images are loaded on demand. This is slower than storing everything in memory.

Returns

  • train_loader (torch.utils.data.DataLoader)

  • val_loader (torch.utils.data.DataLoader)

  • test_loader (torch.utils.data.DataLoader) – Dataloaders for train, val and test.

  • int – Number of classes for the model.

template.setup.set_up_env(gpu_id, seed, multi_run, no_cuda, **kwargs)[source]

Set up the execution environment.

Parameters
  • gpu_id (string) – Specify the GPUs to be used

  • seed (int) – Seed all possible seeds for deterministic run

  • multi_run (int) – Number of runs over the same code to produce mean-variance graph.

  • no_cuda (bool) – Specify whether to use the GPU or not

Returns

Return type

None

template.setup.set_up_logging(parser, experiment_name, output_folder, quiet, args_dict, debug, **kwargs)[source]

Set up a logger for the experiment

Parameters
  • parser (parser) – The argument parser

  • experiment_name (string) – Name of the experiment. If not specify, accepted from command line.

  • output_folder (string) – Path to where all experiment logs are stored.

  • quiet (bool) – Specify whether to print log to console or only to text file

  • debug (bool) – Specify the logging level

  • args_dict (dict) – Contains the entire argument dictionary specified via command line.

Returns

  • log_folder (String) – The final logging folder tree

  • writer (tensorboardX.writer.SummaryWriter) – The tensorboard writer object. Used to log values on file for the tensorboard visualization.

template.setup.set_up_model(output_channels, model_name, pretrained, no_cuda, resume, load_model, disable_databalancing, dataset_folder, inmem, workers, optimizer_name=None, criterion_name=None, num_classes=None, ablate=False, **kwargs)[source]

Instantiate model, optimizer, criterion. Load a pretrained model or resume from a checkpoint.

Parameters
  • output_channels (int) – Specify shape of final layer of network. Only used if num_classes is not specified.

  • model_name (string) – Name of the model

  • pretrained (bool) – Specify whether to load a pretrained model or not

  • optimizer_name (string) – Name of the optimizer

  • criterion_name (string) – Name of the criterion

  • no_cuda (bool) – Specify whether to use the GPU or not

  • resume (string) – Path to a saved checkpoint

  • load_model (string) – Path to a saved model

  • start_epoch (int) – Epoch from which to resume training. If if not resuming a previous experiment the value is 0

  • disable_databalancing (boolean) – If True the criterion will not be fed with the class frequencies. Use with care.

  • dataset_folder (String) – Location of the dataset on the file system

  • inmem (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.

  • workers (int) – Number of workers to use for the dataloaders

  • num_classes (int) – Number of classes for the model

  • ablate (boolean) – If True, remove the final layer of the given model.

Returns

  • model (nn.Module) – The actual model

  • criterion (nn.loss) – The criterion for the network

  • optimizer (torch.optim) – The optimizer for the model

  • best_value (float) – Specifies the former best value obtained by the model. Relevant only if you are resuming training.

template.test_RunMe module

Warning: LONG RUNTIME TESTS!

This test suite is designed to verify that the main components of the framework are not broken. It is expected that smaller components or sub-parts are tested individually.

As we all know this will probably never happen, we will at least verify that the overall features are correct and fully functional. These tests will take long time to run and are not supposed to be run frequently. Nevertheless, it is important that before a PR or a push on the master branch the main functions can be tested.

Please keep the list of these tests up to date as soon as you add new features.

template.test_RunMe.test_one()[source]
  • Verify the sizes of the return of execute

  • Image classification with default parameters

Module contents