Hyperparameter tuning

We provide easy distributed hyperparameter tuning using Ray Tune. Let's walk through the examples/mnist_tune.py script for a concrete example.

This script has been modified from examples/mnist.py that was presented in Getting started, to perform a distributed hyperparameter parameter tuning run using Ray Tune and the SLP utilities.

First we refactor the model creation and training into a function, so that each worker is able to instantiate and train a model.

from slp.plbind.trainer import make_trainer_for_ray_tune
...
def train_mnist(config, train=None, val=None):
    # Convert dictionary to omegaconf dictconfig object
    config = OmegaConf.create(config)

    # Create data module
    ldm = PLDataModuleFromDatasets(
        train, val=val, seed=config.seed, no_test_set=True, **config.data
    )

    # Create model, optimizer, criterion, scheduler
    model = Net(**config.model)

    optimizer = getattr(optim, config.optimizer)(model.parameters(), **config.optim)
    criterion = nn.CrossEntropyLoss()

    lr_scheduler = None

    if config.lr_scheduler:
        lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(
            optimizer, **config.lr_schedule
        )

    # Wrap in PLModule, & configure metrics
    lm = PLModule(
        model,
        optimizer,
        criterion,
        lr_scheduler=lr_scheduler,
        metrics={
            "acc": FromLogits(pl.metrics.classification.Accuracy())
        },  # Will log train_acc and val_acc
        hparams=config,
    )

    # Map Lightning metrics to ray tune metris.
    metrics_map = {"accuracy": "val_acc", "validation_loss": "val_loss"}
    assert (
        config["tune"]["metric"] in metrics_map.keys()
    ), "Metrics mapping should contain the metric you are trying to optimize"
    # Train model
    trainer = make_trainer_for_ray_tune(metrics_map=metrics_map, **config.trainer)

    trainer.fit(lm, datamodule=ldm)

This function is pretty similar to the examples/mnist.py code, but there are some important notes to keep in mind here.

The function accepts a config dictionary as a positional argument and the train and validation datasets as keyword arguments. This is important, because run_tuning expects the input training function to have this stub.
We convert the input dict to an omegaconf.DictConfig object for convenience. Unfortunately ray tune is not able to pass around OmegaConf configuration objects so we need to convert back and forth.
Take note of the metrics_map. This mapping renames the metrics aggregated by our pytorch lightning module so that they are logged for ray tune. One of the keys of this dict is going to be chosen as the metric to optimize
We use the make_trainer_for_ray_tune To create a trainer that is configured specifically for a tuning run

Next we define the configure_search_space function, that overrides entries in the configuration file with ranges, from which ray tune will sample values for the hyperparameters.

def configure_search_space(config):
    config["model"] = {
        "intermediate_hidden": tune.choice([16, 32, 64, 100, 128, 256, 300, 512])
    }
    config["optimizer"] = tune.choice(["SGD", "Adam", "AdamW"])
    config["optim"]["lr"] = tune.loguniform(1e-4, 1e-1)
    config["optim"]["weight_decay"] = tune.loguniform(1e-4, 1e-1)
    config["data"]["batch_size"] = tune.choice([16, 32, 64, 128])

    return config

As you can see, we are going to tune the learning rate, weight decay, optimizer, batch size and the hidden size of our model.

Note: I considered abstracting this into a configuration file, but I don't have any use case for this kind of abstraction and the simplicity and flexibility we get from keeping this in code is more important.

Finally we parse the config file and the CLI arguments and spawn the hyperparameter tuning in the main function:

from slp.util.tuning import run_tuning
...
if __name__ == "__main__":
    config = ...
    train, val, _ = get_data()
    best_config = run_tuning(
        config,
        "configs/best.mnist.tune.yml",
        train_mnist,
        configure_search_space,
        train,
        val,
    )

The run_tuning function accepts * The parsed configuration as an omegaconf.DictConfig object * A path to save the best trial configuration as a yaml file * The train_mnist function * The configure_search_space function * The train dataset * The validation dataset

Note that we create the train and validation splits by hand, so that each trial runs on the same validation set.

The script can be called with the following arguments:

python examples/minst_tune.py --num-trials 1000 --cpus_per_trial 1 --gpus_per_trial 0.12 --tune-metric accuracy --tune-mode max --epochs 20

This will spawn 1000 trials over as many gpus as we have available (in our server or in our cluster). The --gpus_per_trial argument can be a floating point number. In this case we pack 7-8 experiments per GPU (RTX 2080Ti).

Note the --tune-metric argument corresponds to one of the keys in the metrics_map dictionary. Here we run the tuning to optimize for validation accuracy.

After the run finishes we can use the configs/best.mnist.tune.yml configuration file to train and evaluate the best model on the test set

python examples/mnist.py --config configs/best.mnist.tune.yml

Note: mnist_tune.py can accept any configuration file and command line argument that mnist.py can accept. Run python minst_tune.py --help for more information.

`run_tuning(config, output_config_file, train_fn, config_fn, train=None, val=None)`

Run distributed hyperparameter tuning using ray tune

Uses Optuna TPE search algorithm and ASHA pruning strategy

Parameters:

Name	Type	Description	Default
`config`	`DictConfig`	The parsed configuration	required
`output_config_file`	`str`	Path to save the optimal configuration that yields the best result	required
`train_fn`	`Callable[[Dict[str, Any], Any, Any], NoneType]`	Train function that takes the configuration as a python dict, train dataset and validation dataset and fits the model. This function is used to create the trainable that will run when calling ray.tune.run	required
`config_fn`	`Callable[[Dict[str, Any]], Dict[str, Any]]`	Configuration function that constructs the search space by overriding entries in the input configuration	required
`train`	`Any`	Torch dataset or corpus that will be used for training	`None`
`val`	`Any`	Torch dataset or corpus that will be used for validation	`None`

Returns:

Type	Description
`Dict[str, Any]`	The configuration for the best trial

Examples:

>>> # Make search space
>>> def configure_search_space(config):
>>>     config["optimizer"] = tune.choice(["SGD", "Adam", "AdamW"])
>>>     config["optim"]["lr"] = tune.loguniform(1e-4, 1e-1)
>>>     config["optim"]["weight_decay"] = tune.loguniform(1e-4, 1e-1)
>>>     config["data"]["batch_size"] = tune.choice([16, 32, 64, 128])
>>>     return config
>>> # Training function.
>>> def train_fn(config, train=None, val=None):
>>>     config = OmegaConf.create(config) # convert dict from ray tune to DictConfig
>>>     ldm = PLDataModuleFromDatasets(train, val=val, seed=config.seed, no_test_set=True, **config.data)
>>>     model = Net(**config.model)
>>>     optimizer = getattr(optim, config.optimizer)(model.parameters(), **config.optim)
>>>     criterion = nn.CrossEntropyLoss()
>>>     lm = PLModule(
>>>         model, optimizer, criterion,
>>>         hparams=config,
>>>         metrics={"acc": FromLogits(pl.metrics.classification.Accuracy())}, # Logs train_acc and val_acc
>>>     )
>>>     metrics_map = {"accuracy": "val_acc", "validation_loss": "val_loss"}  # map metrics from pl to ray tune
>>>     trainer = make_trainer_for_ray_tune(metrics_map=metrics_map, **config.trainer)
>>>     trainer.fit(lm, datamodule=ldm)
>>> # Run optimization
>>> if __name__ == "__main__":
>>>     config, train_dataset, val_dataset = ...
>>>     best_config = run_tuning(
>>>         config,
>>>         "configs/best.tuning.config.yml",
>>>         train_fn,
>>>         configure_search_space,
>>>         train_dataset,
>>>         val_dataset,
>>>     )

Source code in slp/util/tuning.py

def run_tuning(
    config: DictConfig,
    output_config_file: str,
    train_fn: Callable[[Dict[str, Any], Any, Any], None],
    config_fn: Callable[[Dict[str, Any]], Dict[str, Any]],
    train: Any = None,
    val: Any = None,
):
    """Run distributed hyperparameter tuning using ray tune

    Uses Optuna TPE search algorithm and ASHA pruning strategy

    Args:
        config (omegaconf.DictConfig): The parsed configuration
        output_config_file (str): Path to save the optimal configuration that yields the best
            result
        train_fn (Callable[[Dict[str, Any], Any, Any], None]): Train function that takes the
            configuration as a python dict, train dataset and validation dataset and fits the
            model. This function is used to create the trainable that will run when calling
            ray.tune.run
        config_fn (Callable[[Dict[str, Any]], Dict[str, Any]]): Configuration function that
            constructs the search space by overriding entries in the input configuration
        train (Dataset): Torch dataset or corpus that will be used for training
        val (Dataset): Torch dataset or corpus that will be used for validation

    Returns:
        Dict[str, Any]: The configuration for the best trial

    Examples:
        >>> # Make search space
        >>> def configure_search_space(config):
        >>>     config["optimizer"] = tune.choice(["SGD", "Adam", "AdamW"])
        >>>     config["optim"]["lr"] = tune.loguniform(1e-4, 1e-1)
        >>>     config["optim"]["weight_decay"] = tune.loguniform(1e-4, 1e-1)
        >>>     config["data"]["batch_size"] = tune.choice([16, 32, 64, 128])
        >>>     return config
        >>> # Training function.
        >>> def train_fn(config, train=None, val=None):
        >>>     config = OmegaConf.create(config) # convert dict from ray tune to DictConfig
        >>>     ldm = PLDataModuleFromDatasets(train, val=val, seed=config.seed, no_test_set=True, **config.data)
        >>>     model = Net(**config.model)
        >>>     optimizer = getattr(optim, config.optimizer)(model.parameters(), **config.optim)
        >>>     criterion = nn.CrossEntropyLoss()
        >>>     lm = PLModule(
        >>>         model, optimizer, criterion,
        >>>         hparams=config,
        >>>         metrics={"acc": FromLogits(pl.metrics.classification.Accuracy())}, # Logs train_acc and val_acc
        >>>     )
        >>>     metrics_map = {"accuracy": "val_acc", "validation_loss": "val_loss"}  # map metrics from pl to ray tune
        >>>     trainer = make_trainer_for_ray_tune(metrics_map=metrics_map, **config.trainer)
        >>>     trainer.fit(lm, datamodule=ldm)
        >>> # Run optimization
        >>> if __name__ == "__main__":
        >>>     config, train_dataset, val_dataset = ...
        >>>     best_config = run_tuning(
        >>>         config,
        >>>         "configs/best.tuning.config.yml",
        >>>         train_fn,
        >>>         configure_search_space,
        >>>         train_dataset,
        >>>         val_dataset,
        >>>     )
    """
    config = _extract_wandb_config(config)
    cfg = config_fn(cast(Dict[str, Any], OmegaConf.to_container(config)))
    cfg["trainer"]["gpus"] = math.ceil(cfg["tune"]["gpus_per_trial"])
    trainable = tune.with_parameters(train_fn, train=train, val=val)
    metric, mode = cfg["tune"]["metric"], cfg["tune"]["mode"]

    analysis = tune.run(
        trainable,
        loggers=[
            WandbLogger
        ],  # WandbLogger logs experiment configurations and metrics reported via tune.report() to W&B Dashboard
        resources_per_trial={
            "cpu": cfg["tune"]["cpus_per_trial"],
            "gpu": cfg["tune"]["gpus_per_trial"],
        },
        config=cfg,
        max_failures=10,
        num_samples=cfg["tune"]["num_trials"],
        search_alg=OptunaSearch(metric=metric, mode=mode),
        metric=metric,
        mode=mode,
        # scheduler=tune.schedulers.ASHAScheduler(metric=metric, mode=mode, reduction_factor=2),
        name=f"{cfg['trainer']['experiment_name']}-tuning",
    )
    best_config = analysis.get_best_config(metric, mode)
    best_result = analysis.get_best_trial(metric=metric, mode=mode).last_result
    logger.info(f"Best hyperparameters found were: {best_config}")
    logger.info(f"Best score: {best_result[metric]}")

    best_config["tune"]["result"] = best_result

    yaml_dump(best_config, output_config_file)

    return best_config