template package¶
Subpackages¶
- template.runner package
- Subpackages
- template.runner.apply_model package
- template.runner.bidimensional package
- template.runner.image_classification package
- template.runner.multi_label_image_classification package
- Submodules
- template.runner.multi_label_image_classification.evaluate module
- template.runner.multi_label_image_classification.multi_label_image_classification module
- template.runner.multi_label_image_classification.setup module
- template.runner.multi_label_image_classification.train module
- Module contents
- template.runner.process_activation package
- template.runner.triplet package
- Module contents
- Subpackages
Submodules¶
template.CL_arguments module¶
template.RunMe module¶
This file is the main entry point of DeepDIVA.
We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results.
It is completely open source and accessible as Web Service through DIVAService
>> Official website: https://diva-dia.github.io/DeepDIVAweb/ >> GitHub repo: https://github.com/DIVA-DIA/DeepDIVA >> Tutorials: https://diva-dia.github.io/DeepDIVAweb/articles.html
authors: Michele Alberti and Vinaychandran Pondenkandath (equal contribution)
-
class
template.RunMe.
RunMe
[source]¶ Bases:
object
This class is used as entry point of DeepDIVA. The there are four main scenarios for using the framework:
- Single run: (classic) run an experiment once with the given parameters specified by
command line. This is typical usage scenario.
- Multi run: this will run multiple times an experiment. It basically runs the single run
scenario multiple times and aggregates the results. This is particularly useful to counter effects of randomness.
- Optimize with SigOpt: this will start an hyper-parameter optimization search with the aid
of SigOpt (State-of-the-art Bayesian optimization tool). For more info on how to use it see the tutorial page on: https://diva-dia.github.io/DeepDIVAweb/articles.html
- Optimize manually: this will start a grid-like hyper-parameter optimization with the
boundaries for the values specifies by the user in a provided file. This is much less efficient than using SigOpt but on the other hand is not using any commercial solutions.
-
main
(args=None)[source]¶ Select the use case based on the command line arguments and delegate the execution to the most appropriate sub-routine
- Returns
train_scores (ndarray[floats] of size (1, epochs) or None) – Score values for train split
val_scores (ndarray[floats] of size (1, `epochs`+1) or None) – Score values for validation split
test_scores (float or None) – Score value for test split
-
parser
= None¶
template.setup module¶
-
template.setup.
copy_code
(output_folder)[source]¶ Makes a tar file with DeepDIVA that exists during runtime.
- Parameters
output_folder (str) – Path to output directory
- Returns
- Return type
-
template.setup.
set_up_dataloaders
(model_expected_input_size, dataset_folder, batch_size, workers, disable_dataset_integrity, enable_deep_dataset_integrity, inmem=False, **kwargs)[source]¶ Set up the dataloaders for the specified datasets.
- Parameters
model_expected_input_size (tuple) – Specify the height and width that the model expects.
dataset_folder (string) – Path string that points to the three folder train/val/test. Example: ~/../../data/svhn
batch_size (int) – Number of datapoints to process at once
workers (int) – Number of workers to use for the dataloaders
inmem (boolean) – Flag: if False, the dataset is loaded in an online fashion i.e. only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
- Returns
train_loader (torch.utils.data.DataLoader)
val_loader (torch.utils.data.DataLoader)
test_loader (torch.utils.data.DataLoader) – Dataloaders for train, val and test.
int – Number of classes for the model.
-
template.setup.
set_up_env
(gpu_id, seed, multi_run, no_cuda, **kwargs)[source]¶ Set up the execution environment.
- Parameters
gpu_id (string) – Specify the GPUs to be used
seed (int) – Seed all possible seeds for deterministic run
multi_run (int) – Number of runs over the same code to produce mean-variance graph.
no_cuda (bool) – Specify whether to use the GPU or not
- Returns
- Return type
-
template.setup.
set_up_logging
(parser, experiment_name, output_folder, quiet, args_dict, debug, **kwargs)[source]¶ Set up a logger for the experiment
- Parameters
parser (parser) – The argument parser
experiment_name (string) – Name of the experiment. If not specify, accepted from command line.
output_folder (string) – Path to where all experiment logs are stored.
quiet (bool) – Specify whether to print log to console or only to text file
debug (bool) – Specify the logging level
args_dict (dict) – Contains the entire argument dictionary specified via command line.
- Returns
log_folder (String) – The final logging folder tree
writer (tensorboardX.writer.SummaryWriter) – The tensorboard writer object. Used to log values on file for the tensorboard visualization.
-
template.setup.
set_up_model
(output_channels, model_name, pretrained, no_cuda, resume, load_model, disable_databalancing, dataset_folder, inmem, workers, optimizer_name=None, criterion_name=None, num_classes=None, ablate=False, **kwargs)[source]¶ Instantiate model, optimizer, criterion. Load a pretrained model or resume from a checkpoint.
- Parameters
output_channels (int) – Specify shape of final layer of network. Only used if num_classes is not specified.
model_name (string) – Name of the model
pretrained (bool) – Specify whether to load a pretrained model or not
optimizer_name (string) – Name of the optimizer
criterion_name (string) – Name of the criterion
no_cuda (bool) – Specify whether to use the GPU or not
resume (string) – Path to a saved checkpoint
load_model (string) – Path to a saved model
start_epoch (int) – Epoch from which to resume training. If if not resuming a previous experiment the value is 0
disable_databalancing (boolean) – If True the criterion will not be fed with the class frequencies. Use with care.
dataset_folder (String) – Location of the dataset on the file system
inmem (boolean) – Load the whole dataset in memory. If False, only file names are stored and images are loaded on demand. This is slower than storing everything in memory.
workers (int) – Number of workers to use for the dataloaders
num_classes (int) – Number of classes for the model
ablate (boolean) – If True, remove the final layer of the given model.
- Returns
model (nn.Module) – The actual model
criterion (nn.loss) – The criterion for the network
optimizer (torch.optim) – The optimizer for the model
best_value (float) – Specifies the former best value obtained by the model. Relevant only if you are resuming training.
template.test_RunMe module¶
Warning: LONG RUNTIME TESTS!
This test suite is designed to verify that the main components of the framework are not broken. It is expected that smaller components or sub-parts are tested individually.
As we all know this will probably never happen, we will at least verify that the overall features are correct and fully functional. These tests will take long time to run and are not supposed to be run frequently. Nevertheless, it is important that before a PR or a push on the master branch the main functions can be tested.
Please keep the list of these tests up to date as soon as you add new features.