Neural network
The neural network model is an abstract model that can be used to create any neural network model.
It is then up to the user to specify the architecture of the neural network model.
While the args
in the model
section vary according to the model, the
args
in the training
section are the same for all neural network models.
training:
enabled: true
args:
epochs: 50
optimizer:
name: SGD
args:
lr: 0.0005
loss:
name: BCEWithLogitsLoss
callbacks:
- name: callback1
args:
arg1: value1
arg2: value2
...
- ...
The epochs
argument specifies the number of epochs to train the model for.
The optimizer
argument specifies the optimizer to use. Its most basic configuration
has two keys: name
, which contains the name of the optimizer, and args
,
which contains the arguments that will be passed when creating the optimizer.
There is a third argument, which is useful when fine tuning a pretrained model:
load_from_checkpoint
(default true
), which specifies whether to load the
optimizer state from the checkpoint or not.
By default, the optimizer state is loaded from the checkpoint, so that the
optimizer can continue from where it left off.
If you fine tune with a different optimizer than the original one, this argument
is automatically set to false
and it is not necessary to specify it.
If you fine tune with the same optimizer, but you want to change its hyperparameters
(e.g. the learning rate), you must set this argument to false
and specify the
new hyperparameters in the configuration file.
Warning
If you fine tune with the same optimizer, but you want to change its
hyperparameters (e.g. the learning rate) and you forget to set
load_from_checkpoint
to false
, the optimizer will continue from where
it left off, and the new hyperparameters will be ignored, leading to unexpected
results during training and validation.
The loss
argument specifies the loss function to use. It has two keys:
name
, which contains the name of the loss function, and args
, which
contains the arguments that will be passed when creating the loss function.
The callbacks
argument specifies the callbacks to use. It is a list of
callbacks, each of which has a name
and an optional args
field.
Available optimizers
There are some optimizers already available in Revelio:
- SGD
- Adam
In both optimizers the args
field allows the same fields as the
corresponding PyTorch optimizer; therefore, at least the lr
field must be
specified.
Available loss functions
There are some loss functions already available in Revelio:
- BCEWithLogitsLoss
- BCELoss
In both loss functions the args
field allows the same fields as the
corresponding PyTorch loss function.
Available callbacks
There are some callbacks already available in Revelio:
EarlyStopping
ModelCheckpoint
TensorBoard
The details of each callback are described in the following sections.
EarlyStopping
The EarlyStopping
callback stops the training when a monitored quantity has
stopped improving.
callbacks:
- name: EarlyStopping
args:
monitor: val_loss
min_delta: 0.001
patience: 5
direction: min
restore_best_weights: true
The EarlyStopping
callback has the following arguments:
monitor
(optional string, default:val_loss
): the quantity to be monitored. It can be eitherloss
or any of the metrics specified in themetrics
section of the configuration file. If you want to monitor a validation-time metric, prepend the metric name withval_
.min_delta
(optional float, default:0.0
): the minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less thanmin_delta
, will count as no improvement.patience
(optional int, default:0
): the number of epochs with no improvement after which training will be stopped.direction
(optional string, default:min
): whether to monitor the quantity in an increasing or decreasing way. It can be eithermin
ormax
.restore_best_weights
(optional bool, default:false
): whether to restore model weights from the epoch with the best value of the monitored quantity. Iffalse
, the model weights obtained at the last step of training are used.
ModelCheckpoint
The ModelCheckpoint
callback saves the model after every epoch.
callbacks:
- name: ModelCheckpoint
args:
file_path: /path/to/checkpoint.pt
monitor: val_loss
min_delta: 0.001
direction: min
save_best_only: true
The ModelCheckpoint
callback has the following arguments:
file_path
(string): the path to the file where the model will be saved.monitor
(optional string, default:val_loss
): the quantity to be monitored. It can be eitherloss
or any of the metrics specified in themetrics
section of the configuration file. If you want to monitor a validation-time metric, prepend the metric name withval_
.min_delta
(optional float, default:0.0
): the minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less thanmin_delta
, will count as no improvement.direction
(optional string, default:min
): whether to monitor the quantity in an increasing or decreasing way. It can be eithermin
ormax
.save_best_only
(optional bool, default:false
): whether to save the model only if the monitored quantity has improved.
The file_path
argument can contain the following placeholders, wrapped around curly
braces:
{epoch}
: the epoch number.{now}
: the current date and time.{metric}
: the value of the monitored metric (e.g.{val_loss:.4f}
). Formatting can be specified after the metric name, as in the example. For more information on formatting, see the Format String Syntax on the Python documentation.
TensorBoard
The TensorBoard
callback logs the training metrics to TensorBoard.
callbacks:
- name: TensorBoard
args:
log_dir: /path/to/logs
profile: false
The TensorBoard
callback has the following arguments:
log_dir
(string): the path to the directory where the logs will be saved.profile
(optional bool, default:False
): whether to profile the training process.
The TensorBoard callback reports all the metrics specified in the metrics
section of the configuration file, as well as the loss.
For each metric, the callback reports the value of the metric at the end of
each epoch (with the epoch_
prefix),
as well as the value of the metric at the end of each batch,
both for training and validation.
For example, if the metrics
section of the configuration file contains
accuracy
, the callback will report the following metrics:
epoch_accuracy
epoch_val_accuracy
accuracy
val_accuracy
Also, at the very first batch of the first epoch, the callback reports the images in the training and validation set, as well as the corresponding labels.