Skip to content

API reference

generate_example_config(parser, output_file, args=None)

parse_config Parse a provided YAML config file and command line args and merge them

During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

The precedence for merging is as follows * default cli args values < config file values < user provided cli args

E.g.:

  • if you don't include a value in your configuration it will take the default value from the argparse arguments
  • if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

Parameters:

Name Type Description Default
parser ArgumentParser

The argument parser you want to use

required
output_file str

Configuration file name or file descriptor to save example configuration

required
args Optional[List[str]]

Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]

None
Source code in slp/config/config_parser.py
def generate_example_config(
    parser: argparse.ArgumentParser,
    output_file: str,
    args: Optional[List[str]] = None,
) -> None:
    """parse_config Parse a provided YAML config file and command line args and merge them

    During experimentation we want ideally to have a configuration file with the model and training configuration,
    but also be able to run quick experiments using command line args.
    This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

    The precedence for merging is as follows
       * default cli args values < config file values < user provided cli args

    E.g.:

       * if you don't include a value in your configuration it will take the default value from the argparse arguments
       * if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

    Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

    Args:
        parser (argparse.ArgumentParser): The argument parser you want to use
        output_file (Union[str, IO]): Configuration file name or file descriptor to save example configuration
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]
    """
    config = parse_config(parser, None, include_none=True)
    OmegaConf.save(config, output_file)

make_cli_parser(parser, datamodule_cls)

make_cli_parser Augment an argument parser for slp with the default arguments

Default arguments for training, logging, optimization etc. are added to the input {parser}. If you use make_cli_parser, the following command line arguments will be included

!!! usage "my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]"
                                [--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
                                [--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
                                [--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
                                [--lr-patience LR_SCHEDULE.PATIENCE]
                                [--lr-cooldown LR_SCHEDULE.COOLDOWN]
                                [--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
                                [--experiment-name TRAINER.EXPERIMENT_NAME]
                                [--run-id TRAINER.RUN_ID]
                                [--experiment-group TRAINER.EXPERIMENT_GROUP]
                                [--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
                                [--save-top-k TRAINER.SAVE_TOP_K]
                                [--patience TRAINER.PATIENCE]
                                [--wandb-project TRAINER.WANDB_PROJECT]
                                [--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
                                [--stochastic_weight_avg] [--gpus TRAINER.GPUS]
                                [--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
                                [--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
                                [--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
                                [--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
                                [--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
                                [--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
                                [--gpus-per-trial TUNE.GPUS_PER_TRIAL]
                                [--cpus-per-trial TUNE.CPUS_PER_TRIAL]
                                [--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
                                [--val-percent DATA.VAL_PERCENT]
                                [--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
                                [--bsz-eval DATA.BATCH_SIZE_EVAL]
                                [--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
                                [--drop-last] [--no-shuffle-eval]

optional arguments:
  -h, --help            show this help message and exit
  --hidden MODEL.INTERMEDIATE_HIDDEN
                                                Intermediate hidden layers for linear module
  --optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
                                                Which optimizer to use
  --lr OPTIM.LR         Learning rate
  --weight-decay OPTIM.WEIGHT_DECAY
                                                Learning rate
  --lr-scheduler        Use learning rate scheduling. Currently only
                                                ReduceLROnPlateau is supported out of the box
  --lr-factor LR_SCHEDULE.FACTOR
                                                Multiplicative factor by which LR is reduced. Used if
                                                --lr-scheduler is provided.
  --lr-patience LR_SCHEDULE.PATIENCE
                                                Number of epochs with no improvement after which
                                                learning rate will be reduced. Used if --lr-scheduler
                                                is provided.
  --lr-cooldown LR_SCHEDULE.COOLDOWN
                                                Number of epochs to wait before resuming normal
                                                operation after lr has been reduced. Used if --lr-
                                                scheduler is provided.
  --min-lr LR_SCHEDULE.MIN_LR
                                                Minimum lr for LR scheduling. Used if --lr-scheduler
                                                is provided.
  --seed SEED           Seed for reproducibility
  --config CONFIG       Path to YAML configuration file
  --experiment-name TRAINER.EXPERIMENT_NAME
                                                Name of the running experiment
  --run-id TRAINER.RUN_ID
                                                Unique identifier for the current run. If not provided
                                                it is inferred from datetime.now()
  --experiment-group TRAINER.EXPERIMENT_GROUP
                                                Group of current experiment. Useful when evaluating
                                                for different seeds / cross-validation etc.
  --experiments-folder TRAINER.EXPERIMENTS_FOLDER
                                                Top-level folder where experiment results &
                                                checkpoints are saved
  --save-top-k TRAINER.SAVE_TOP_K
                                                Save checkpoints for top k models
  --patience TRAINER.PATIENCE
                                                Number of epochs to wait before early stopping
  --wandb-project TRAINER.WANDB_PROJECT
                                                Wandb project under which results are saved
  --tags [TRAINER.TAGS [TRAINER.TAGS ...]]
                                                Tags for current run to make results searchable.
  --stochastic_weight_avg
                                                Use Stochastic weight averaging.
  --gpus TRAINER.GPUS   Number of GPUs to use
  --val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
                                                Run validation every n epochs
  --clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
                                                Clip gradients with ||grad(w)|| >= args.clip_grad_norm
  --epochs TRAINER.MAX_EPOCHS
                                                Maximum number of training epochs
  --steps TRAINER.MAX_STEPS
                                                Maximum number of training steps
  --tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
                                                Truncated Back-propagation-through-time steps.
  --debug               If true, we run a full run on a small subset of the
                                                input data and overfit 10 training batches
  --offline             If true, forces offline execution of wandb logger
  --early-stop-on TRAINER.EARLY_STOP_ON
                                                Metric for early stopping
  --early-stop-mode {min,max}
                                                Minimize or maximize early stopping metric
  --num-trials TUNE.NUM_TRIALS
                                                Number of trials to run for hyperparameter tuning
  --gpus-per-trial TUNE.GPUS_PER_TRIAL
                                                How many gpus to use for each trial. If gpus_per_trial
                                                < 1 multiple trials are packed in the same gpu
  --cpus-per-trial TUNE.CPUS_PER_TRIAL
                                                How many cpus to use for each trial.
  --tune-metric TUNE.METRIC
                                                Tune this metric. Need to be one of the keys of
                                                metrics_map passed into make_trainer_for_ray_tune.
  --tune-mode {max,min}
                                                Maximize or minimize metric
  --val-percent DATA.VAL_PERCENT
                                                Percent of validation data to be randomly split from
                                                the training set, if no validation set is provided
  --test-percent DATA.TEST_PERCENT
                                                Percent of test data to be randomly split from the
                                                training set, if no test set is provided
  --bsz DATA.BATCH_SIZE
                                                Training batch size
  --bsz-eval DATA.BATCH_SIZE_EVAL
                                                Evaluation batch size
  --num-workers DATA.NUM_WORKERS
                                                Number of workers to be used in the DataLoader
  --no-pin-memory       Don't pin data to GPU memory when transferring
  --drop-last           Drop last incomplete batch
  --no-shuffle-eval     Don't shuffle val & test sets

Parameters:

Name Type Description Default
parser ArgumentParser

A parent argument to be augmented

required
datamodule_cls LightningDataModule

A data module class that injects arguments through the add_argparse_args method

required

Returns:

Type Description
ArgumentParser

argparse.ArgumentParser: The augmented command line parser

Examples:

>>> import argparse
>>> from slp.plbind.dm import PLDataModuleFromDatasets
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int)  # Create parser with model arguments and anything else you need
>>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
>>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
>>> args.data.batch_size
64
>>> args.optim.lr
0.01
Source code in slp/config/config_parser.py
def make_cli_parser(
    parser: argparse.ArgumentParser, datamodule_cls: pl.LightningDataModule
) -> argparse.ArgumentParser:
    """make_cli_parser Augment an argument parser for slp with the default arguments

    Default arguments for training, logging, optimization etc. are added to the input {parser}.
    If you use make_cli_parser, the following command line arguments will be included

        usage: my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]
                                        [--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
                                        [--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
                                        [--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
                                        [--lr-patience LR_SCHEDULE.PATIENCE]
                                        [--lr-cooldown LR_SCHEDULE.COOLDOWN]
                                        [--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
                                        [--experiment-name TRAINER.EXPERIMENT_NAME]
                                        [--run-id TRAINER.RUN_ID]
                                        [--experiment-group TRAINER.EXPERIMENT_GROUP]
                                        [--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
                                        [--save-top-k TRAINER.SAVE_TOP_K]
                                        [--patience TRAINER.PATIENCE]
                                        [--wandb-project TRAINER.WANDB_PROJECT]
                                        [--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
                                        [--stochastic_weight_avg] [--gpus TRAINER.GPUS]
                                        [--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
                                        [--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
                                        [--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
                                        [--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
                                        [--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
                                        [--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
                                        [--gpus-per-trial TUNE.GPUS_PER_TRIAL]
                                        [--cpus-per-trial TUNE.CPUS_PER_TRIAL]
                                        [--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
                                        [--val-percent DATA.VAL_PERCENT]
                                        [--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
                                        [--bsz-eval DATA.BATCH_SIZE_EVAL]
                                        [--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
                                        [--drop-last] [--no-shuffle-eval]

        optional arguments:
          -h, --help            show this help message and exit
          --hidden MODEL.INTERMEDIATE_HIDDEN
                                                        Intermediate hidden layers for linear module
          --optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
                                                        Which optimizer to use
          --lr OPTIM.LR         Learning rate
          --weight-decay OPTIM.WEIGHT_DECAY
                                                        Learning rate
          --lr-scheduler        Use learning rate scheduling. Currently only
                                                        ReduceLROnPlateau is supported out of the box
          --lr-factor LR_SCHEDULE.FACTOR
                                                        Multiplicative factor by which LR is reduced. Used if
                                                        --lr-scheduler is provided.
          --lr-patience LR_SCHEDULE.PATIENCE
                                                        Number of epochs with no improvement after which
                                                        learning rate will be reduced. Used if --lr-scheduler
                                                        is provided.
          --lr-cooldown LR_SCHEDULE.COOLDOWN
                                                        Number of epochs to wait before resuming normal
                                                        operation after lr has been reduced. Used if --lr-
                                                        scheduler is provided.
          --min-lr LR_SCHEDULE.MIN_LR
                                                        Minimum lr for LR scheduling. Used if --lr-scheduler
                                                        is provided.
          --seed SEED           Seed for reproducibility
          --config CONFIG       Path to YAML configuration file
          --experiment-name TRAINER.EXPERIMENT_NAME
                                                        Name of the running experiment
          --run-id TRAINER.RUN_ID
                                                        Unique identifier for the current run. If not provided
                                                        it is inferred from datetime.now()
          --experiment-group TRAINER.EXPERIMENT_GROUP
                                                        Group of current experiment. Useful when evaluating
                                                        for different seeds / cross-validation etc.
          --experiments-folder TRAINER.EXPERIMENTS_FOLDER
                                                        Top-level folder where experiment results &
                                                        checkpoints are saved
          --save-top-k TRAINER.SAVE_TOP_K
                                                        Save checkpoints for top k models
          --patience TRAINER.PATIENCE
                                                        Number of epochs to wait before early stopping
          --wandb-project TRAINER.WANDB_PROJECT
                                                        Wandb project under which results are saved
          --tags [TRAINER.TAGS [TRAINER.TAGS ...]]
                                                        Tags for current run to make results searchable.
          --stochastic_weight_avg
                                                        Use Stochastic weight averaging.
          --gpus TRAINER.GPUS   Number of GPUs to use
          --val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
                                                        Run validation every n epochs
          --clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
                                                        Clip gradients with ||grad(w)|| >= args.clip_grad_norm
          --epochs TRAINER.MAX_EPOCHS
                                                        Maximum number of training epochs
          --steps TRAINER.MAX_STEPS
                                                        Maximum number of training steps
          --tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
                                                        Truncated Back-propagation-through-time steps.
          --debug               If true, we run a full run on a small subset of the
                                                        input data and overfit 10 training batches
          --offline             If true, forces offline execution of wandb logger
          --early-stop-on TRAINER.EARLY_STOP_ON
                                                        Metric for early stopping
          --early-stop-mode {min,max}
                                                        Minimize or maximize early stopping metric
          --num-trials TUNE.NUM_TRIALS
                                                        Number of trials to run for hyperparameter tuning
          --gpus-per-trial TUNE.GPUS_PER_TRIAL
                                                        How many gpus to use for each trial. If gpus_per_trial
                                                        < 1 multiple trials are packed in the same gpu
          --cpus-per-trial TUNE.CPUS_PER_TRIAL
                                                        How many cpus to use for each trial.
          --tune-metric TUNE.METRIC
                                                        Tune this metric. Need to be one of the keys of
                                                        metrics_map passed into make_trainer_for_ray_tune.
          --tune-mode {max,min}
                                                        Maximize or minimize metric
          --val-percent DATA.VAL_PERCENT
                                                        Percent of validation data to be randomly split from
                                                        the training set, if no validation set is provided
          --test-percent DATA.TEST_PERCENT
                                                        Percent of test data to be randomly split from the
                                                        training set, if no test set is provided
          --bsz DATA.BATCH_SIZE
                                                        Training batch size
          --bsz-eval DATA.BATCH_SIZE_EVAL
                                                        Evaluation batch size
          --num-workers DATA.NUM_WORKERS
                                                        Number of workers to be used in the DataLoader
          --no-pin-memory       Don't pin data to GPU memory when transferring
          --drop-last           Drop last incomplete batch
          --no-shuffle-eval     Don't shuffle val & test sets

    Args:
        parser (argparse.ArgumentParser): A parent argument to be augmented
        datamodule_cls (pytorch_lightning.LightningDataModule): A data module class that injects arguments through the add_argparse_args method

    Returns:
        argparse.ArgumentParser: The augmented command line parser

    Examples:
        >>> import argparse
        >>> from slp.plbind.dm import PLDataModuleFromDatasets
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int)  # Create parser with model arguments and anything else you need
        >>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
        >>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
        >>> args.data.batch_size
        64
        >>> args.optim.lr
        0.01
    """
    parser = add_optimizer_args(parser)
    parser = add_trainer_args(parser)
    parser = add_tune_args(parser)
    parser = datamodule_cls.add_argparse_args(parser)

    return parser

parse_config(parser, config_file, args=None, include_none=False)

parse_config Parse a provided YAML config file and command line args and merge them

During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

The precedence for merging is as follows * default cli args values < config file values < user provided cli args

E.g.:

  • if you don't include a value in your configuration it will take the default value from the argparse arguments
  • if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

Parameters:

Name Type Description Default
parser ArgumentParser

The argument parser you want to use

required
config_file Union[str, IO]

Configuration file name or file descriptor

required
args Optional[List[str]]

Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]

None

Returns:

Type Description
Union[omegaconf.listconfig.ListConfig, omegaconf.dictconfig.DictConfig]

OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object

Examples:

>>> import io
>>> from slp.config.config_parser import parse_config
>>> mock_config_file = io.StringIO('''
model:
  hidden: 100
''')
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 100}}
>>> type(cfg)
<class 'omegaconf.dictconfig.DictConfig'>
>>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
{'model': {'hidden': 200}}
>>> mock_config_file = io.StringIO('''
random_value: hello
''')
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 20}, 'random_value': 'hello'}
Source code in slp/config/config_parser.py
def parse_config(
    parser: argparse.ArgumentParser,
    config_file: Optional[Union[str, IO]],
    args: Optional[List[str]] = None,
    include_none: bool = False,
) -> Union[ListConfig, DictConfig]:
    """parse_config Parse a provided YAML config file and command line args and merge them

    During experimentation we want ideally to have a configuration file with the model and training configuration,
    but also be able to run quick experiments using command line args.
    This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

    The precedence for merging is as follows
       * default cli args values < config file values < user provided cli args

    E.g.:

       * if you don't include a value in your configuration it will take the default value from the argparse arguments
       * if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

    Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

    Args:
        parser (argparse.ArgumentParser): The argument parser you want to use
        config_file (Union[str, IO]): Configuration file name or file descriptor
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]

    Returns:
        OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object

    Examples:
        >>> import io
        >>> from slp.config.config_parser import parse_config
        >>> mock_config_file = io.StringIO('''
        model:
          hidden: 100
        ''')
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
        >>> cfg = parse_config(parser, mock_config_file)
        {'model': {'hidden': 100}}
        >>> type(cfg)
        <class 'omegaconf.dictconfig.DictConfig'>
        >>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
        {'model': {'hidden': 200}}
        >>> mock_config_file = io.StringIO('''
        random_value: hello
        ''')
        >>> cfg = parse_config(parser, mock_config_file)
        {'model': {'hidden': 20}, 'random_value': 'hello'}
    """
    # Merge Configurations Precedence: default kwarg values < default argparse values < config file values < user provided CLI args values

    if config_file is not None:
        dict_config = OmegaConf.from_yaml(config_file)  # type: ignore
    else:
        dict_config = OmegaConf.create({})

    user_cli, default_cli = OmegaConf.from_argparse(parser, include_none=include_none)
    config = OmegaConf.merge(default_cli, dict_config, user_cli)

    logger.info("Running with the following configuration")
    logger.info(f"\n{OmegaConf.to_yaml(config)}")

    return config

SPECIAL_TOKENS

SPECIAL_TOKENS Special Tokens for NLP applications

Default special tokens values and indices (compatible with BERT):

* [PAD]: 0
* [MASK]: 1
* [UNK]: 2
* [BOS]: 3
* [EOS]: 4
* [CLS]: 5
* [SEP]: 6
* [PAUSE]: 7

OmegaConfExtended

OmegaConfExtended Extended OmegaConf class, to include argparse style CLI arguments

Unfortunately the original authors are not interested into providing integration with argparse (https://github.com/omry/omegaconf/issues/569), so we have to get by with this extension

from_argparse(parser, args=None, include_none=False) staticmethod

from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects

We parse the command line arguments and separate the user provided values and the default values. This is useful for merging with a config file.

Parameters:

Name Type Description Default
parser ArgumentParser

Parser for argparse arguments

required
args Optional[List[str]]

Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]

None

Returns:

Type Description
Tuple[omegaconf.dictconfig.DictConfig, omegaconf.dictconfig.DictConfig]

Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs

Examples:

>>> import argparse
>>> from slp.config.omegaconf import OmegaConfExtended
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
>>> user_provided_args
{'model': {'hidden': 100}}
>>> default_args
{}
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
>>> user_provided_args
{}
>>> default_args
{'model': {'hidden': 20}}
Source code in slp/config/omegaconf.py
@staticmethod
def from_argparse(
    parser: argparse.ArgumentParser,
    args: Optional[List[str]] = None,
    include_none: bool = False,
) -> Tuple[DictConfig, DictConfig]:
    """from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects

    We parse the command line arguments and separate the user provided values and the default values.
    This is useful for merging with a config file.

    Args:
        parser (argparse.ArgumentParser): Parser for argparse arguments
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]
    Returns:
        Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs

    Examples:
        >>> import argparse
        >>> from slp.config.omegaconf import OmegaConfExtended
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
        >>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
        >>> user_provided_args
        {'model': {'hidden': 100}}
        >>> default_args
        {}
        >>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
        >>> user_provided_args
        {}
        >>> default_args
        {'model': {'hidden': 20}}
    """
    dest_to_arg = {v.dest: k for k, v in parser._option_string_actions.items()}

    all_args = vars(parser.parse_args(args=args))
    provided_args = {}
    default_args = {}

    for k, v in all_args.items():
        if dest_to_arg[k] in sys.argv:
            provided_args[k] = v
        else:
            default_args[k] = v

    provided = OmegaConf.create(_nest(provided_args, include_none=include_none))
    defaults = OmegaConf.create(_nest(default_args, include_none=include_none))

    return provided, defaults

from_yaml(file_) staticmethod

Alias for OmegaConf.load OmegaConf.from_yaml got removed at some point. Bring it back

Parameters:

Name Type Description Default
file_ Union[str, pathlib.Path, IO[Any]]

file to load or file descriptor

required

Returns:

Type Description
Union[omegaconf.dictconfig.DictConfig, omegaconf.listconfig.ListConfig]

Union[DictConfig, ListConfig]: The loaded configuration

Source code in slp/config/omegaconf.py
@staticmethod
def from_yaml(
    file_: Union[str, pathlib.Path, IO[Any]]
) -> Union[DictConfig, ListConfig]:
    """Alias for OmegaConf.load
    OmegaConf.from_yaml got removed at some point. Bring it back

    Args:
        file_ (Union[str, pathlib.Path, IO[Any]]): file to load or file descriptor

    Returns:
        Union[DictConfig, ListConfig]: The loaded configuration

    """
    return OmegaConfExtended.load(file_)

MultimodalSequenceClassificationCollator

__call__(self, batch) special

Call collate function

Parameters:

Name Type Description Default
batch List[Dict[str, torch.Tensor]]

Batch of samples. It expects a list of dictionaries from modalities to torch tensors

required

Returns:

Type Description
Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]

Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of (dict batched modality tensors, labels, dict of modality sequence lengths)

Source code in slp/data/collators.py
def __call__(
    self, batch: List[Dict[str, torch.Tensor]]
) -> Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]:
    """Call collate function

    Args:
        batch (List[Dict[str, torch.Tensor]]): Batch of samples.
            It expects a list of dictionaries from modalities to torch tensors

    Returns:
        Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of
            (dict batched modality tensors, labels, dict of modality sequence lengths)
    """
    inputs = {}
    lengths = {}

    for m in self.modalities:
        seq = self.extract_sequence(batch, m)
        lengths[m] = torch.tensor([s.size(0) for s in seq], device=self.device)

        if self.max_length > 0:
            lengths[m] = torch.clamp(lengths[m], min=0, max=self.max_length)

        inputs[m] = pad_sequence(
            seq,
            batch_first=True,
            padding_value=self.pad_indx,
            max_length=self.max_length,
        ).to(self.device)

    targets: List[Label] = [b[self.label_key] for b in batch]

    # Pad and convert to tensor
    ttargets: torch.Tensor = mktensor(
        targets, device=self.device, dtype=self.label_dtype
    )

    return inputs, ttargets.to(self.device), lengths

__init__(self, pad_indx=0, modalities={'audio', 'visual', 'text'}, label_key='label', max_length=-1, label_dtype=torch.float32, device='cpu') special

Collate function for sequence classification tasks

  • Perform padding
  • Calculate sequence lengths

Parameters:

Name Type Description Default
pad_indx int

Pad token index. Defaults to 0.

0
modalities Set

Which modalities are included in the batch dict

{'audio', 'visual', 'text'}
max_length int

Pad sequences to a fixed maximum length

-1
label_key str

String to access the label in the batch dict

'label'
device str

device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.

'cpu'

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())
Source code in slp/data/collators.py
def __init__(
    self,
    pad_indx=0,
    modalities={"visual", "text", "audio"},
    label_key="label",
    max_length=-1,
    label_dtype=torch.float,
    device="cpu",
):
    """Collate function for sequence classification tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        modalities (Set): Which modalities are included in the batch dict
        max_length (int): Pad sequences to a fixed maximum length
        label_key (str): String to access the label in the batch dict
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.device = device
    self.max_length = max_length
    self.label_key = label_key
    self.modalities = modalities
    self.label_dtype = label_dtype

Seq2SeqCollator

__call__(self, batch) special

Call collate function

Parameters:

Name Type Description Default
batch List[Tuple[torch.Tensor, torch.Tensor]]

Batch of samples. It expects a list of tuples (source, target) Each source and target are a sequences of features or ids.

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]

Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths_inputs, lengths_targets)

Source code in slp/data/collators.py
def __call__(
    self, batch: List[Tuple[torch.Tensor, torch.Tensor]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Call collate function

    Args:
        batch (List[Tuple[torch.Tensor, torch.Tensor]]): Batch of samples.
            It expects a list of tuples (source, target)
            Each source and target are a sequences of features or ids.

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors
            (inputs, labels, lengths_inputs, lengths_targets)
    """
    inputs: List[torch.Tensor] = [b[0] for b in batch]
    targets: List[torch.Tensor] = [b[1] for b in batch]
    lengths_inputs = torch.tensor([s.size(0) for s in inputs], device=self.device)
    lengths_targets = torch.tensor([s.size(0) for s in targets], device=self.device)

    if self.max_length > 0:
        lengths_inputs = torch.clamp(lengths_inputs, min=0, max=self.max_length)
        lengths_targets = torch.clamp(lengths_targets, min=0, max=self.max_length)

    inputs_padded: torch.Tensor = pad_sequence(
        inputs,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    targets_padded: torch.Tensor = pad_sequence(
        targets,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    return inputs_padded, targets_padded, lengths_inputs, lengths_targets

__init__(self, pad_indx=0, max_length=-1, device='cpu') special

Collate function for seq2seq tasks

  • Perform padding
  • Calculate sequence lengths

Parameters:

Name Type Description Default
pad_indx int

Pad token index. Defaults to 0.

0
max_length int

Pad sequences to a fixed maximum length

-1
device str

device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.

'cpu'

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())
Source code in slp/data/collators.py
def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
    """Collate function for seq2seq tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        max_length (int): Pad sequences to a fixed maximum length
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.max_length = max_length
    self.device = device

SequenceClassificationCollator

__call__(self, batch) special

Call collate function

Parameters:

Name Type Description Default
batch List[Tuple[torch.Tensor, Union[numpy.ndarray, torch.Tensor, List[~T], int]]]

Batch of samples. It expects a list of tuples (inputs, label).

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths)

Source code in slp/data/collators.py
def __call__(
    self, batch: List[Tuple[torch.Tensor, Label]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Call collate function

    Args:
        batch (List[Tuple[torch.Tensor, slp.util.types.Label]]): Batch of samples.
            It expects a list of tuples (inputs, label).

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths)
    """
    inputs: List[torch.Tensor] = [b[0] for b in batch]
    targets: List[Label] = [b[1] for b in batch]
    #  targets: List[torch.tensor] = map(list, zip(*batch))
    lengths = torch.tensor([s.size(0) for s in inputs], device=self.device)

    if self.max_length > 0:
        lengths = torch.clamp(lengths, min=0, max=self.max_length)
    # Pad and convert to tensor
    inputs_padded: torch.Tensor = pad_sequence(
        inputs,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    ttargets: torch.Tensor = mktensor(targets, device=self.device, dtype=torch.long)

    return inputs_padded, ttargets.to(self.device), lengths

__init__(self, pad_indx=0, max_length=-1, device='cpu') special

Collate function for sequence classification tasks

  • Perform padding
  • Calculate sequence lengths

Parameters:

Name Type Description Default
pad_indx int

Pad token index. Defaults to 0.

0
max_length int

Pad sequences to a fixed maximum length

-1
device str

device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.

'cpu'

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())
Source code in slp/data/collators.py
def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
    """Collate function for sequence classification tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        max_length (int): Pad sequences to a fixed maximum length
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.device = device
    self.max_length = max_length

EmbeddingsLoader

__init__(self, embeddings_file, dim, vocab=None, extra_tokens=None) special

Load word embeddings in text format

Parameters:

Name Type Description Default
embeddings_file str

File where embeddings are stored (e.g. glove.6B.50d.txt)

required
dim int

Dimensionality of embeddings

required
vocab Optional[Dict[str, int]]

Load only embeddings in vocab. Defaults to None.

None
extra_tokens Optional[slp.config.nlp.SPECIAL_TOKENS]

Create random embeddings for these special tokens. Defaults to None.

None
Source code in slp/data/corpus.py
def __init__(
    self,
    embeddings_file: str,
    dim: int,
    vocab: Optional[Dict[str, int]] = None,
    extra_tokens: Optional[SPECIAL_TOKENS] = None,
) -> None:
    """Load word embeddings in text format

    Args:
        embeddings_file (str): File where embeddings are stored (e.g. glove.6B.50d.txt)
        dim (int): Dimensionality of embeddings
        vocab (Optional[Dict[str, int]]): Load only embeddings in vocab. Defaults to None.
        extra_tokens (Optional[slp.config.nlp.SPECIAL_TOKENS]): Create random embeddings for these special tokens.
            Defaults to None.
    """
    self.embeddings_file = embeddings_file
    self.vocab = vocab
    self.cache_ = self._get_cache_name()
    self.dim_ = dim
    self.extra_tokens = extra_tokens

__repr__(self) special

String representation of class

Source code in slp/data/corpus.py
def __repr__(self):
    """String representation of class"""

    return f"{self.__class__.__name__}({self.embeddings_file}, {self.dim_})"

augment_embeddings(self, word2idx, idx2word, embeddings, token, emb=None)

Create a random embedding for a special token and append it to the embeddings array

Parameters:

Name Type Description Default
word2idx Dict[str, int]

Current word2idx map

required
idx2word Dict[int, str]

Current idx2word map

required
embeddings List[numpy.ndarray]

Embeddings array as list of embeddings

required
token str

The special token (e.g. [PAD])

required
emb Optional[numpy.ndarray]

Optional value for the embedding to be appended. Defaults to None, where a random embedding is created.

None

Returns:

Type Description
Tuple[Dict[str, int], Dict[int, str], List[numpy.ndarray]]

Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple

Source code in slp/data/corpus.py
def augment_embeddings(
    self,
    word2idx: Dict[str, int],
    idx2word: Dict[int, str],
    embeddings: List[np.ndarray],
    token: str,
    emb: Optional[np.ndarray] = None,
) -> Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]:
    """Create a random embedding for a special token and append it to the embeddings array

    Args:
        word2idx (Dict[str, int]): Current word2idx map
        idx2word (Dict[int, str]): Current idx2word map
        embeddings (List[np.ndarray]): Embeddings array as list of embeddings
        token (str): The special token (e.g. [PAD])
        emb (Optional[np.ndarray]): Optional value for the embedding to be appended.
            Defaults to None, where a random embedding is created.

    Returns:
        Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple
    """
    word2idx[token] = len(embeddings)
    idx2word[len(embeddings)] = token

    if emb is None:
        emb = np.random.uniform(low=-0.05, high=0.05, size=self.dim_)
    embeddings.append(emb)

    return word2idx, idx2word, embeddings

in_accepted_vocab(self, word)

Check if word exists in given vocabulary

Parameters:

Name Type Description Default
word str

word from embeddings file

required

Returns:

Type Description
bool

bool: Word exists

Source code in slp/data/corpus.py
def in_accepted_vocab(self, word: str) -> bool:
    """Check if word exists in given vocabulary

    Args:
        word (str): word from embeddings file

    Returns:
        bool: Word exists
    """

    return True if self.vocab is None else word in self.vocab

load(self)

Read the word vectors from a text file

  • Read embeddings
  • Filter with given vocabulary
  • Augment with special tokens

Returns:

Type Description
Tuple[Dict[str, int], Dict[int, str], numpy.ndarray]

types.Embeddings: (word2idx, idx2word, embeddings) tuple

Source code in slp/data/corpus.py
@system.timethis(method=True)
def load(self) -> types.Embeddings:
    """Read the word vectors from a text file

    * Read embeddings
    * Filter with given vocabulary
    * Augment with special tokens

    Returns:
        types.Embeddings: (word2idx, idx2word, embeddings) tuple
    """
    # in order to avoid this time consuming operation, cache the results
    try:
        cache = self._load_cache()
        logger.info("Loaded word embeddings from cache.")

        return cache
    except OSError:
        logger.warning(f"Didn't find embeddings cache file {self.embeddings_file}")
        logger.warning("Loading embeddings from file.")

    # create the necessary dictionaries and the word embeddings matrix

    if not os.path.exists(self.embeddings_file):
        logger.critical(f"{self.embeddings_file} not found!")
        raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), self.embeddings_file)

    logger.info(f"Indexing file {self.embeddings_file} ...")

    # create the 2D array, which will be used for initializing
    # the Embedding layer of a NN.
    # We reserve the first row (idx=0), as the word embedding,
    # which will be used for zero padding (word with id = 0).

    if self.extra_tokens is not None:
        word2idx, idx2word, embeddings = self.augment_embeddings(
            {},
            {},
            [],
            self.extra_tokens.PAD.value,  # type: ignore
            emb=np.zeros(self.dim_),
        )

        for token in self.extra_tokens:  # type: ignore
            logger.debug(f"Adding token {token.value} to embeddings matrix")

            if token == self.extra_tokens.PAD:
                continue
            word2idx, idx2word, embeddings = self.augment_embeddings(
                word2idx, idx2word, embeddings, token.value
            )
    else:
        word2idx, idx2word, embeddings = self.augment_embeddings(
            {}, {}, [], "[PAD]", emb=np.zeros(self.dim_)
        )
    # read file, line by line
    with open(self.embeddings_file, "r") as f:
        num_lines = sum(1 for line in f)

    with open(self.embeddings_file, "r") as f:
        index = len(embeddings)

        for line in tqdm(
            f, total=num_lines, desc="Loading word embeddings...", leave=False
        ):
            # skip the first row if it is a header

            if len(line.split()) < self.dim_:
                continue

            values = line.rstrip().split(" ")
            word = values[0]

            if word in word2idx:
                continue

            if not self.in_accepted_vocab(word):
                continue

            vector = np.asarray(values[1:], dtype=np.float32)
            idx2word[index] = word
            word2idx[word] = index
            embeddings.append(vector)
            index += 1

    logger.info(f"Loaded {len(embeddings)} word vectors.")
    embeddings_out = np.array(embeddings, dtype="float32")

    # write the data to a cache file
    self._dump_cache((word2idx, idx2word, embeddings_out))

    return word2idx, idx2word, embeddings_out

HfCorpus

embeddings: None property readonly

Unused. Defined for compatibility

frequencies: Dict[str, int] property readonly

Retrieve wordpieces occurence counts

Returns:

Type Description
Dict[str, int]

Dict[str, int]: wordpieces occurence counts

idx2word: None property readonly

Unused. Defined for compatibility

indices: List[List[int]] property readonly

Retrieve corpus as token indices

Returns:

Type Description
List[List[int]]

List[List[int]]: Token indices for corpus

raw: List[str] property readonly

Retrieve raw corpus

Returns:

Type Description
List[str]

List[str]: Raw Corpus

tokenized: List[List[str]] property readonly

Retrieve tokenized corpus

Returns:

Type Description
List[List[str]]

List[List[str]]: tokenized corpus

vocab: Set[str] property readonly

Retrieve set of words in vocabulary

Returns:

Type Description
Set[str]

Set[str]: set of words in vocabulary

vocab_size: int property readonly

Retrieve vocabulary size

Returns:

Type Description
int

int: Vocabulary size

word2idx: None property readonly

Unused. Defined for compatibility

__getitem__(self, idx) special

Get ith element in corpus as token indices

Parameters:

Name Type Description Default
idx List[int]

index in corpus

required

Returns:

Type Description
List[int]

List[int]: List of token indices for sentence

Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

__init__(self, corpus, lower=True, tokenizer_model='bert-base-uncased', add_special_tokens=True, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs) special

Process a corpus using hugging face tokenizers

Select one of hugging face tokenizers and process corpus

Parameters:

Name Type Description Default
corpus List[str]

List of sentences

required
lower bool

Convert strings to lower case. Defaults to True.

True
tokenizer_model str

Hugging face model to use. Defaults to "bert-base-uncased".

'bert-base-uncased'
add_special_tokens bool

Add special tokens in sentence during tokenization. Defaults to True.

True
special_tokens Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
max_length int

Crop sequences above this length. Defaults to -1 where sequences are left unaltered.

-1
Source code in slp/data/corpus.py
def __init__(
    self,
    corpus: List[str],
    lower: bool = True,
    tokenizer_model: str = "bert-base-uncased",
    add_special_tokens: bool = True,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    max_length: int = -1,
    **kwargs,
):
    """Process a corpus using hugging face tokenizers

    Select one of hugging face tokenizers and process corpus

    Args:
        corpus (List[str]): List of sentences
        lower (bool): Convert strings to lower case. Defaults to True.
        tokenizer_model (str): Hugging face model to use. Defaults to "bert-base-uncased".
        add_special_tokens (bool): Add special tokens in sentence during tokenization. Defaults to True.
        special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
             Defaults to slp.config.nlp.SPECIAL_TOKENS.
        max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
    """
    self.corpus_ = corpus
    self.max_length = max_length

    logger.info(
        f"Tokenizing corpus using hugging face tokenizer from {tokenizer_model}"
    )

    self.tokenizer = HuggingFaceTokenizer(
        lower=lower, model=tokenizer_model, add_special_tokens=add_special_tokens
    )

    self.corpus_indices_ = [
        self.tokenizer(s)
        for s in tqdm(
            self.corpus_, desc="Converting tokens to indices...", leave=False
        )
    ]

    self.tokenized_corpus_ = [
        self.tokenizer.detokenize(s)
        for s in tqdm(
            self.corpus_indices_,
            desc="Mapping indices to tokens...",
            leave=False,
        )
    ]

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=-1,
        special_tokens=special_tokens,
    )

__len__(self) special

Number of samples in corpus

Returns:

Type Description
int

int: Corpus length

Source code in slp/data/corpus.py
def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

TokenizedCorpus

embeddings: None property readonly

Unused. Kept for compatibility

frequencies: Dict[str, int] property readonly

Retrieve wordpieces occurence counts

Returns:

Type Description
Dict[str, int]

Dict[str, int]: wordpieces occurence counts

idx2word: Dict[int, str] property readonly

Retrieve idx2word mapping

Returns:

Type Description
Dict[int, str]

Dict[str, int]: idx2word mapping

indices: Union[List[int], List[List[int]]] property readonly

Retrieve corpus as token indices

Returns:

Type Description
Union[List[int], List[List[int]]]

List[List[int]]: Token indices for corpus

raw: Union[List[str], List[List[str]]] property readonly

Retrieve raw corpus

Returns:

Type Description
Union[List[str], List[List[str]]]

List[str]: Raw Corpus

tokenized: Union[List[str], List[List[str]]] property readonly

Retrieve tokenized corpus

Returns:

Type Description
Union[List[str], List[List[str]]]

List[List[str]]: Tokenized corpus

vocab: Set[str] property readonly

Retrieve set of words in vocabulary

Returns:

Type Description
Set[str]

Set[str]: set of words in vocabulary

vocab_size: int property readonly

Retrieve vocabulary size

Returns:

Type Description
int

int: Vocabulary size

word2idx: Dict[str, int] property readonly

Retrieve word2idx mapping

Returns:

Type Description
Dict[str, int]

Dict[str, int]: word2idx mapping

__getitem__(self, idx) special

Get ith element in corpus as token indices

Parameters:

Name Type Description Default
idx List[int]

index in corpus

required

Returns:

Type Description
List[int]

List[int]: List of token indices for sentence

Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

__init__(self, corpus, word2idx=None, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs) special

Wrap a corpus that's already tokenized

Parameters:

Name Type Description Default
corpus Union[List[str], List[List[str]]]

List of tokens or List of lists of tokens

required
word2idx Dict[str, int]

Token to index mapping. Defaults to None.

None
special_tokens Optional[slp.config.nlp.SPECIAL_TOKENS]

Special Tokens. Defaults to SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
Source code in slp/data/corpus.py
def __init__(
    self,
    corpus: Union[List[str], List[List[str]]],
    word2idx: Dict[str, int] = None,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    max_length: int = -1,
    **kwargs,
):
    """Wrap a corpus that's already tokenized

    Args:
        corpus (Union[List[str], List[List[str]]]): List of tokens or List of lists of tokens
        word2idx (Dict[str, int], optional): Token to index mapping. Defaults to None.
        special_tokens (Optional[SPECIAL_TOKENS], optional): Special Tokens. Defaults to SPECIAL_TOKENS.
    """
    self.corpus_ = corpus
    self.tokenized_corpus_ = corpus
    self.max_length = max_length

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=-1,
        special_tokens=special_tokens,
    )

    if word2idx is not None:
        logger.info("Converting tokens to ids using word2idx.")
        self.word2idx_ = word2idx
    else:
        logger.info(
            "No word2idx provided. Will convert tokens to ids using an iterative counter."
        )
        self.word2idx_ = dict(zip(self.vocab_.keys(), itertools.count()))

    self.idx2word_ = {v: k for k, v in self.word2idx_.items()}

    self.to_token_ids = ToTokenIds(
        self.word2idx_,
        specials=SPECIAL_TOKENS,  # type: ignore
    )

    if isinstance(self.tokenized_corpus_[0], list):
        self.corpus_indices_ = [
            self.to_token_ids(s)
            for s in tqdm(
                self.tokenized_corpus_,
                desc="Converting tokens to token ids...",
                leave=False,
            )
        ]
    else:
        self.corpus_indices_ = self.to_token_ids(self.tokenized_corpus_)  # type: ignore

__len__(self) special

Number of samples in corpus

Returns:

Type Description
int

int: Corpus length

Source code in slp/data/corpus.py
def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

WordCorpus

embeddings: ndarray property readonly

Retrieve embeddings array

Returns:

Type Description
ndarray

np.ndarray: Array of pretrained word embeddings

frequencies: Dict[str, int] property readonly

Retrieve word occurence counts

Returns:

Type Description
Dict[str, int]

Dict[str, int]: word occurence counts

idx2word: Dict[int, str] property readonly

Retrieve idx2word mapping

Returns:

Type Description
Dict[int, str]

Dict[str, int]: idx2word mapping

indices: List[List[int]] property readonly

Retrieve corpus as token indices

Returns:

Type Description
List[List[int]]

List[List[int]]: Token indices for corpus

raw: List[str] property readonly

Retrieve raw corpus

Returns:

Type Description
List[str]

List[str]: Raw Corpus

tokenized: List[List[str]] property readonly

Retrieve tokenized corpus

Returns:

Type Description
List[List[str]]

List[List[str]]: Tokenized corpus

vocab: Set[str] property readonly

Retrieve set of words in vocabulary

Returns:

Type Description
Set[str]

Set[str]: set of words in vocabulary

vocab_size: int property readonly

Retrieve vocabulary size for corpus

Returns:

Type Description
int

int: vocabulary size

word2idx: Dict[str, int] property readonly

Retrieve word2idx mapping

Returns:

Type Description
Dict[str, int]

Dict[str, int]: word2idx mapping

__getitem__(self, idx) special

Get ith element in corpus as token indices

Parameters:

Name Type Description Default
idx List[int]

index in corpus

required

Returns:

Type Description
List[int]

List[int]: List of token indices for sentence

Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

__init__(self, corpus, limit_vocab_size=30000, word2idx=None, idx2word=None, embeddings=None, embeddings_file=None, embeddings_dim=300, lower=True, special_tokens=<enum 'SPECIAL_TOKENS'>, prepend_bos=False, append_eos=False, lang='en_core_web_md', max_length=-1, **kwargs) special

Load corpus embeddings, tokenize in words using spacy and convert to ids

This class handles the handling of a raw corpus. It handles:

  • Tokenization into words (spacy)
  • Loading of pretrained word embedding
  • Calculation of word frequencies / corpus statistics
  • Conversion to token ids

You can pass either:

  • Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
  • Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits where we want to pass the train split embeddings / word2idx.

Parameters:

Name Type Description Default
corpus List[str]

Corpus as a list of sentences

required
limit_vocab_size int

Upper bound for number of most frequent tokens to keep. Defaults to 30000.

30000
word2idx Optional[Dict[str, int]]

Mapping of word to indices. Defaults to None.

None
idx2word Optional[Dict[int, str]]

Mapping of indices to words. Defaults to None.

None
embeddings Optional[numpy.ndarray]

Embeddings array. Defaults to None.

None
embeddings_file Optional[str]

Embeddings file to read. Defaults to None.

None
embeddings_dim int

Dimension of embeddings. Defaults to 300.

300
lower bool

Convert strings to lower case. Defaults to True.

True
special_tokens Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
prepend_bos bool

Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False.

False
append_eos bool

Append End of Sequence token for seq2seq tasks. Defaults to False.

False
lang str

Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".

'en_core_web_md'
max_length int

Crop sequences above this length. Defaults to -1 where sequences are left unaltered.

-1
Source code in slp/data/corpus.py
def __init__(
    self,
    corpus: List[str],
    limit_vocab_size: int = 30000,
    word2idx: Optional[Dict[str, int]] = None,
    idx2word: Optional[Dict[int, str]] = None,
    embeddings: Optional[np.ndarray] = None,
    embeddings_file: Optional[str] = None,
    embeddings_dim: int = 300,
    lower: bool = True,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    prepend_bos: bool = False,
    append_eos: bool = False,
    lang: str = "en_core_web_md",
    max_length: int = -1,
    **kwargs,
):
    """Load corpus embeddings, tokenize in words using spacy and convert to ids

    This class handles the handling of a raw corpus. It handles:

    * Tokenization into words (spacy)
    * Loading of pretrained word embedding
    * Calculation of word frequencies / corpus statistics
    * Conversion to token ids

    You can pass either:

    * Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
    * Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits
      where we want to pass the train split embeddings / word2idx.

    Args:
        corpus (List[List[str]]): Corpus as a list of sentences
        limit_vocab_size (int): Upper bound for number of most frequent tokens to keep. Defaults to 30000.
        word2idx (Optional[Dict[str, int]]): Mapping of word to indices. Defaults to None.
        idx2word (Optional[Dict[int, str]]): Mapping of indices to words. Defaults to None.
        embeddings (Optional[np.ndarray]): Embeddings array. Defaults to None.
        embeddings_file (Optional[str]): Embeddings file to read. Defaults to None.
        embeddings_dim (int): Dimension of embeddings. Defaults to 300.
        lower (bool): Convert strings to lower case. Defaults to True.
        special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
             Defaults to slp.config.nlp.SPECIAL_TOKENS.
        prepend_bos (bool): Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False.
        append_eos (bool): Append End of Sequence token for seq2seq tasks. Defaults to False.
        lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
        max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
    """
    # FIXME: Extract super class to avoid repetition
    self.corpus_ = corpus
    self.max_length = max_length
    self.tokenizer = SpacyTokenizer(
        lower=lower,
        prepend_bos=prepend_bos,
        append_eos=append_eos,
        specials=special_tokens,
        lang=lang,
    )

    logger.info(f"Tokenizing corpus using spacy {lang}")

    self.tokenized_corpus_ = [
        self.tokenizer(s)
        for s in tqdm(self.corpus_, desc="Tokenizing corpus...", leave=False)
    ]

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=limit_vocab_size if word2idx is None else -1,
        special_tokens=special_tokens,
    )

    self.word2idx_, self.idx2word_, self.embeddings_ = None, None, None
    # self.corpus_indices_ = self.tokenized_corpus_

    if word2idx is not None:
        logger.info("Word2idx was already provided. Going to used it.")

    if embeddings_file is not None and word2idx is None:
        logger.info(
            f"Going to load {len(self.vocab_)} embeddings from {embeddings_file}"
        )
        loader = EmbeddingsLoader(
            embeddings_file,
            embeddings_dim,
            vocab=self.vocab_,
            extra_tokens=special_tokens,
        )
        word2idx, idx2word, embeddings = loader.load()

    if embeddings is not None:
        self.embeddings_ = embeddings

    if idx2word is not None:
        self.idx2word_ = idx2word

    if word2idx is not None:
        self.word2idx_ = word2idx

        logger.info("Converting tokens to ids using word2idx.")
        self.to_token_ids = ToTokenIds(
            self.word2idx_,
            specials=SPECIAL_TOKENS,  # type: ignore
        )

        self.corpus_indices_ = [
            self.to_token_ids(s)
            for s in tqdm(
                self.tokenized_corpus_,
                desc="Converting tokens to token ids...",
                leave=False,
            )
        ]

        logger.info("Filtering corpus vocabulary.")

        updated_vocab = {}

        for k, v in self.vocab_.items():
            if k in self.word2idx_:
                updated_vocab[k] = v

        logger.info(
            f"Out of {len(self.vocab_)} tokens {len(self.vocab_) - len(updated_vocab)} were not found in the pretrained embeddings."
        )

        self.vocab_ = updated_vocab

__len__(self) special

Number of samples in corpus

Returns:

Type Description
int

int: Corpus length

Source code in slp/data/corpus.py
def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

create_vocab(corpus, vocab_size=-1, special_tokens=None)

Create the vocabulary based on tokenized input corpus

  • Injects special tokens in the vocabulary
  • Calculates the occurence count for each token
  • Limits vocabulary to vocab_size most common tokens

Parameters:

Name Type Description Default
corpus Union[List[str], List[List[str]]]

The tokenized corpus as a list of sentences or a list of tokenized sentences

required
vocab_size int

[description]. Limit vocabulary to vocab_size most common tokens. Defaults to -1 which keeps all tokens.

-1
special_tokens Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens to include in the vocabulary. Defaults to None.

None

Returns:

Type Description
Dict[str, int]

Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts

Examples:

>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
{'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
{'far': 2, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
{'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}
Source code in slp/data/corpus.py
def create_vocab(
    corpus: Union[List[str], List[List[str]]],
    vocab_size: int = -1,
    special_tokens: Optional[SPECIAL_TOKENS] = None,
) -> Dict[str, int]:
    """Create the vocabulary based on tokenized input corpus

    * Injects special tokens in the vocabulary
    * Calculates the occurence count for each token
    * Limits vocabulary to vocab_size most common tokens

    Args:
        corpus (Union[List[str], List[List[str]]]): The tokenized corpus as a list of sentences or a list of tokenized sentences
        vocab_size (int): [description]. Limit vocabulary to vocab_size most common tokens.
            Defaults to -1 which keeps all tokens.
        special_tokens Optional[SPECIAL_TOKENS]: Special tokens to include in the vocabulary. Defaults to None.

    Returns:
        Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts

    Examples:
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
        {'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
        {'far': 2, 'a': 1, 'in': 1}
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
        {'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}
    """

    if isinstance(corpus[0], list):
        corpus = list(itertools.chain.from_iterable(corpus))
    freq = Counter(corpus)

    if special_tokens is None:
        extra_tokens = []
    else:
        extra_tokens = special_tokens.to_list()

    if vocab_size < 0:
        vocab_size = len(freq)
    take = min(vocab_size, len(freq))
    logger.info(f"Keeping {vocab_size} most common tokens out of {len(freq)}")

    def take0(x: Tuple[Any, Any]) -> Any:
        """Take first tuple element"""

        return x[0]

    common_words = list(map(take0, freq.most_common(take)))
    common_words = list(set(common_words) - set(extra_tokens))
    words = extra_tokens + common_words

    if len(words) > vocab_size:
        words = words[: vocab_size + len(extra_tokens)]

    def token_freq(t):
        """Token frequeny"""

        return 0 if t in extra_tokens else freq[t]

    vocab = dict(zip(words, map(token_freq, words)))
    logger.info(f"Vocabulary created with {len(vocab)} tokens.")
    logger.info(f"The 10 most common tokens are:\n{freq.most_common(10)}")

    return vocab

CorpusDataset

__getitem__(self, idx) special

Get a source and target token from the corpus

Parameters:

Name Type Description Default
idx int

Token position

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

(processed sentence, label)

Source code in slp/data/datasets.py
def __getitem__(self, idx):
    """Get a source and target token from the corpus

    Args:
        idx (int): Token position

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (processed sentence, label)
    """
    text, target = self.corpus[idx], self.labels[idx]
    if self.label_encoder is not None:
        target = self.label_encoder.transform([target])[0]
    for t in self.transforms:
        text = t(text)
    return text, target

__init__(self, corpus, labels) special

Labeled corpus dataset

Parameters:

Name Type Description Default
corpus WordCorpus, HfCorpus etc..

Input corpus

required
labels List[Any]

Labels for examples

required
Source code in slp/data/datasets.py
def __init__(self, corpus, labels):
    """Labeled corpus dataset

    Args:
        corpus (WordCorpus, HfCorpus etc..): Input corpus
        labels (List[Any]): Labels for examples
    """
    self.corpus = corpus
    self.labels = labels
    assert len(self.labels) == len(self.corpus), "Incompatible labels and corpus"
    self.transforms = []
    self.label_encoder = None
    if isinstance(self.labels[0], str):
        self.label_encoder = LabelEncoder().fit(self.labels)

__len__(self) special

Length of corpus

Returns:

Type Description
int

Corpus Length

Source code in slp/data/datasets.py
def __len__(self):
    """Length of corpus

    Returns:
        int: Corpus Length
    """
    return len(self.corpus)

map(self, t)

Append a transform to self.transforms, in order to be applied to the data

Parameters:

Name Type Description Default
t Callable[[str], Any]

Transform of input token

required

Returns:

Type Description
CorpusDataset

self

Source code in slp/data/datasets.py
def map(self, t):
    """Append a transform to self.transforms, in order to be applied to the data

    Args:
        t (Callable[[str], Any]): Transform of input token

    Returns:
        CorpusDataset: self
    """
    self.transforms.append(t)
    return self

CorpusLMDataset

__getitem__(self, idx) special

Get a source and target token from the corpus

Parameters:

Name Type Description Default
idx int

Token position

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

source=coprus[idx], target=corpus[idx+1]

Source code in slp/data/datasets.py
def __getitem__(self, idx):
    """Get a source and target token from the corpus

    Args:
        idx (int): Token position

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: source=coprus[idx], target=corpus[idx+1]
    """
    src, tgt = self.source[idx], self.target[idx]
    for t in self.transforms:
        src = t(src)
        tgt = t(tgt)
    return src, tgt

__init__(self, corpus) special

Wraps a tokenized dataset which is provided as a list of tokens

Targets = source shifted one token to the left (next token prediction)

Parameters:

Name Type Description Default
corpus List[str] or WordCorpus

List of tokens

required
Source code in slp/data/datasets.py
def __init__(self, corpus):
    """Wraps a tokenized dataset which is provided as a list of tokens

    Targets = source shifted one token to the left (next token prediction)

    Args:
        corpus (List[str] or WordCorpus): List of tokens
    """
    self.source = corpus[:-1]
    self.target = corpus[1:]
    self.transforms = []

__len__(self) special

Length of corpus

Returns:

Type Description
int

Corpus Length

Source code in slp/data/datasets.py
def __len__(self):
    """Length of corpus

    Returns:
        int: Corpus Length
    """
    return int(len(self.source))

map(self, t)

Append a transform to self.transforms, in order to be applied to the data

Parameters:

Name Type Description Default
t Callable[[str], Any]

Transform of input token

required

Returns:

Type Description
CorpusLMDataset

self

Source code in slp/data/datasets.py
def map(self, t):
    """Append a transform to self.transforms, in order to be applied to the data

    Args:
        t (Callable[[str], Any]): Transform of input token

    Returns:
        CorpusLMDataset: self
    """
    self.transforms.append(t)
    return self

HuggingFaceTokenizer

__call__(self, x) special

Call to tokenize function

Parameters:

Name Type Description Default
x str

Input string

required

Returns:

Type Description
List[int]

List[int]: List of token ids

Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[int]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[int]: List of token ids
    """
    out: List[int] = self.tokenizer.encode(
        x, add_special_tokens=self.add_special_tokens, max_length=65536
    )
    return out

__init__(self, lower=True, model='bert-base-uncased', add_special_tokens=True) special

Apply one of huggingface tokenizers to a string

Parameters:

Name Type Description Default
lower bool

Lowercase string. Defaults to True.

True
model str

Select transformer model. Defaults to "bert-base-uncased".

'bert-base-uncased'
add_special_tokens bool

Insert special tokens to tokenized string. Defaults to True.

True
Source code in slp/data/transforms.py
def __init__(
    self,
    lower: bool = True,
    model: str = "bert-base-uncased",
    add_special_tokens: bool = True,
):
    """Apply one of huggingface tokenizers to a string

    Args:
        lower (bool): Lowercase string. Defaults to True.
        model (str): Select transformer model. Defaults to "bert-base-uncased".
        add_special_tokens (bool): Insert special tokens to tokenized string. Defaults to True.
    """
    self.tokenizer = AutoTokenizer.from_pretrained(model, do_lower_case=lower)
    self.vocab_size = len(self.tokenizer.vocab)
    self.add_special_tokens = add_special_tokens

detokenize(self, x)

Convert list of token ids to list of tokens

Parameters:

Name Type Description Default
x List[int]

List of token ids

required

Returns:

Type Description
List[str]

List[str]: List of tokens

Source code in slp/data/transforms.py
def detokenize(self, x: List[int]) -> List[str]:
    """Convert list of token ids to list of tokens

    Args:
        x (List[int]): List of token ids

    Returns:
        List[str]: List of tokens
    """
    out: List[str] = self.tokenizer.convert_ids_to_tokens(x)
    return out

ReplaceUnknownToken

__call__(self, x) special

Convert in list of tokens to [UNK]

Parameters:

Name Type Description Default
x List[str]

List of tokens

required

Returns:

Type Description
List[str]

List[str]: List of tokens

Source code in slp/data/transforms.py
def __call__(self, x: List[str]) -> List[str]:
    """Convert <unk> in list of tokens to [UNK]

    Args:
        x (List[str]): List of tokens

    Returns:
        List[str]: List of tokens
    """
    return [w if w != self.old_unk else self.new_unk for w in x]

__init__(self, old_unk='<unk>', new_unk='[UNK]') special

Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext

Parameters:

Name Type Description Default
old_unk str

Unk token in corpus. Defaults to "".

'<unk>'
new_unk str

Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value.

'[UNK]'
Source code in slp/data/transforms.py
def __init__(
    self,
    old_unk: str = "<unk>",
    new_unk: str = SPECIAL_TOKENS.UNK.value,  # type: ignore
):
    """Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext

    Args:
        old_unk (str): Unk token in corpus. Defaults to "<unk>".
        new_unk (str): Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value.
    """
    self.old_unk = old_unk
    self.new_unk = new_unk

SentencepieceTokenizer

__call__(self, x) special

Call to tokenize function

Parameters:

Name Type Description Default
x str

Input string

required

Returns:

Type Description
List[int]

List[int]: List of tokens ids

Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[int]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[int]: List of tokens ids
    """
    if self.lower:
        x = x.lower()
    ids: List[int] = self.pre_id + self.tokenizer.encode_as_ids(x) + self.post_id
    return ids

__init__(self, lower=True, model=None, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>) special

Tokenize sentence using pretrained sentencepiece model

Parameters:

Name Type Description Default
lower bool

Lowercase string. Defaults to True.

True
model Optional[Any]

Sentencepiece model. Defaults to None.

None
prepend_bos bool

Prepend BOS for seq2seq. Defaults to False.

False
append_eos bool

Append EOS for seq2seq. Defaults to False.

False
specials Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens. Defaults to SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
Source code in slp/data/transforms.py
def __init__(
    self,
    lower: bool = True,
    model: Optional[Any] = None,
    prepend_bos: bool = False,
    append_eos: bool = False,
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
):
    """Tokenize sentence using pretrained sentencepiece model

    Args:
        lower (bool): Lowercase string. Defaults to True.
        model (Optional[Any]): Sentencepiece model. Defaults to None.
        prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
        append_eos (bool): Append EOS for seq2seq. Defaults to False.
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
    """
    self.tokenizer = spm.SentencePieceProcessor()
    self.tokenizer.Load(model)
    self.specials = specials
    self.lower = lower
    self.vocab_size = self.tokenizer.get_piece_size()
    self.pre_id = []
    self.post_id = []
    if prepend_bos:
        self.pre_id.append(self.tokenizer.piece_to_id(self.specials.BOS.value))  # type: ignore
    if append_eos:
        self.post_id.append(self.tokenizer.piece_to_id(self.specials.EOS.value))  # type: ignore

SpacyTokenizer

__call__(self, x) special

Call to tokenize function

Parameters:

Name Type Description Default
x str

Input string

required

Returns:

Type Description
List[str]

List[str]: List of tokens

Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[str]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[str]: List of tokens
    """
    if self.lower:
        x = x.lower()
    out: List[str] = (
        self.pre_id + [y.text for y in self.nlp.tokenizer(x)] + self.post_id
    )
    return out

__init__(self, lower=True, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>, lang='en_core_web_sm') special

Apply spacy tokenizer to str

Parameters:

Name Type Description Default
lower bool

Lowercase string. Defaults to True.

True
prepend_bos bool

Prepend BOS for seq2seq. Defaults to False.

False
append_eos bool

Append EOS for seq2seq. Defaults to False.

False
specials Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens. Defaults to SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
lang str

Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".

'en_core_web_sm'
Source code in slp/data/transforms.py
def __init__(
    self,
    lower: bool = True,
    prepend_bos: bool = False,
    append_eos: bool = False,
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    lang: str = "en_core_web_sm",
):
    """Apply spacy tokenizer to str

    Args:
        lower (bool): Lowercase string. Defaults to True.
        prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
        append_eos (bool): Append EOS for seq2seq. Defaults to False.
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
        lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
    """
    self.lower = lower
    self.specials = SPECIAL_TOKENS
    self.lang = lang
    self.pre_id = []
    self.post_id = []
    if prepend_bos:
        self.pre_id.append(self.specials.BOS.value)
    if append_eos:
        self.post_id.append(self.specials.EOS.value)
    self.nlp = self.get_nlp(name=lang, specials=specials)

get_nlp(self, name='en_core_web_sm', specials=<enum 'SPECIAL_TOKENS'>)

Get spacy nlp object for given lang and add SPECIAL_TOKENS

Parameters:

Name Type Description Default
name str

Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".

'en_core_web_sm'
specials Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens. Defaults to SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>

Returns:

Type Description
Language

spacy.Language: spacy text-processing pipeline

Source code in slp/data/transforms.py
def get_nlp(
    self,
    name: str = "en_core_web_sm",
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
) -> spacy.Language:
    """Get spacy nlp object for given lang and add SPECIAL_TOKENS

    Args:
        name (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.

    Returns:
        spacy.Language: spacy text-processing pipeline
    """
    nlp = spacy.load(name)
    if specials is not None:
        for token in specials.to_list():
            nlp.tokenizer.add_special_case(token, [{ORTH: token}])
    return nlp

ToTensor

__call__(self, x) special

Convert list of tokens or list of features to tensor

Parameters:

Name Type Description Default
x List[Any]

List of tokens or features

required

Returns:

Type Description
Tensor

torch.Tensor: Resulting tensor

Source code in slp/data/transforms.py
def __call__(self, x: List[Any]) -> torch.Tensor:
    """Convert list of tokens or list of features to tensor

    Args:
        x (List[Any]): List of tokens or features

    Returns:
        torch.Tensor: Resulting tensor
    """
    return mktensor(x, device=self.device, dtype=self.dtype)

__init__(self, device='cpu', dtype=torch.int64) special

To tensor convertor

Parameters:

Name Type Description Default
device str

Device to map the tensor. Defaults to "cpu".

'cpu'
dtype dtype

Type of resulting tensor. Defaults to torch.long.

torch.int64
Source code in slp/data/transforms.py
def __init__(self, device: str = "cpu", dtype: torch.dtype = torch.long):
    """To tensor convertor

    Args:
        device (str): Device to map the tensor. Defaults to "cpu".
        dtype (torch.dtype): Type of resulting tensor. Defaults to torch.long.
    """
    self.device = device
    self.dtype = dtype

ToTokenIds

__call__(self, x) special

Convert list of tokens to list of token ids

Parameters:

Name Type Description Default
x List[str]

List of tokens

required

Returns:

Type Description
List[int]

List[int]: List of token ids

Source code in slp/data/transforms.py
def __call__(self, x: List[str]) -> List[int]:
    """Convert list of tokens to list of token ids

    Args:
        x (List[str]): List of tokens

    Returns:
        List[int]: List of token ids
    """
    return [
        self.word2idx[w] if w in self.word2idx else self.word2idx[self.unk_value]
        for w in x
    ]

__init__(self, word2idx, specials=<enum 'SPECIAL_TOKENS'>) special

Convert List of tokens to list of token ids

Parameters:

Name Type Description Default
word2idx Dict[str, int]

Word to index mapping

required
specials Optional[slp.config.nlp.SPECIAL_TOKENS]

Special tokens. Defaults to SPECIAL_TOKENS.

<enum 'SPECIAL_TOKENS'>
Source code in slp/data/transforms.py
def __init__(
    self,
    word2idx: Dict[str, int],
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
):
    """Convert List of tokens to list of token ids

    Args:
        word2idx (Dict[str, int]): Word to index mapping
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
    """
    self.word2idx = word2idx
    self.unk_value = specials.UNK.value if specials is not None else "[UNK]"  # type: ignore

Attention

__init__(self, attention_size=512, input_size=None, dropout=0.1) special

Single-Headed Dot-product attention module

Parameters:

Name Type Description Default
attention_size int

Number of hidden features. Defaults to 512.

512
input_size Optional[int]

Input features. Defaults to None. If None input_size is set to attention_size.

None
dropout float

Drop probability. Defaults to 0.1.

0.1
Source code in slp/modules/attention.py
def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
):
    """Single-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(Attention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.dk = input_size
    self.k = nn.Linear(input_size, attention_size, bias=False)
    self.q = nn.Linear(input_size, attention_size, bias=False)
    self.v = nn.Linear(input_size, attention_size, bias=False)
    self.dropout = dropout
    reset_parameters(self.named_parameters())

forward(self, keys, queries=None, attention_mask=None)

Single-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

\[a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
keys Tensor

[B, L, D] Keys tensor

required
queries Optional[torch.Tensor]

Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.

None
attention_mask Optional[torch.Tensor]

Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

None

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])

Source code in slp/modules/attention.py
def forward(
    self,
    keys: torch.Tensor,
    queries: Optional[torch.Tensor] = None,
    attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
    r"""Single-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    $$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        keys (torch.Tensor): [B, L, D] Keys tensor
        queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
        attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
    """
    if attention_mask is not None:
        if len(list(attention_mask.size())) == 2:
            attention_mask = attention_mask.unsqueeze(1)

    if queries is None:
        queries = keys

    values = keys

    k = self.k(keys)  # (B, L, A)
    q = self.q(queries)
    v = self.v(values)

    # weights => (B, L, L)
    out, scores = attention(
        k,
        q,
        v,
        self.dk,
        attention_mask=attention_mask,
        dropout=self.dropout,
        training=self.training,
    )

    return out, scores

MultiheadAttention

__init__(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None) special

Multi-Headed Dot-product attention module

Parameters:

Name Type Description Default
attention_size int

Number of hidden features. Defaults to 512.

512
num_heads int

Number of attention heads

8
input_size Optional[int]

Input features. Defaults to None. If None input_size is set to attention_size.

None
dropout float

Drop probability. Defaults to 0.1.

0.1
nystrom bool

Use nystrom method for attention calculation. Defaults to False.

False
num_landmarks int

Number of landmark points for nystrom attention. Defaults to 64.

64
inverse_iterations int

Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.

6
kernel_size Optional[int]

Use residual convolution in the output. Defaults to None.

None
Source code in slp/modules/attention.py
def __init__(
    self,
    attention_size: int = 512,
    num_heads: int = 8,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    """Multi-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
        nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
        num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
        inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
        kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
    """
    super(MultiheadAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.inverse_iterations = inverse_iterations
    self.num_landmarks = num_landmarks
    self.nystrom = nystrom
    self.num_heads = num_heads
    self.head_size = int(attention_size / num_heads)
    self.dk = self.head_size
    self.attention_size = attention_size
    self.k = nn.Linear(input_size, attention_size, bias=False)
    self.q = nn.Linear(input_size, attention_size, bias=False)
    self.v = nn.Linear(input_size, attention_size, bias=False)
    self.output = nn.Linear(attention_size, attention_size)
    self.dropout = dropout

    self.conv = None

    if kernel_size is not None:
        self.conv = nn.Conv2d(
            in_channels=self.num_heads,
            out_channels=self.num_heads,
            kernel_size=(kernel_size, 1),
            padding=(kernel_size // 2, 0),
            bias=False,
            groups=self.num_heads,
        )

    reset_parameters(self.named_parameters())

forward(self, keys, queries=None, attention_mask=None)

Multi-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

Each head performs dot-product attention

\[a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H\]

The outputs of multiple heads are concatenated and passed through a feedforward layer.

\[a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
keys torch.Tensor

[B, L, D] Keys tensor

required
queries Optional[torch.Tensor]

Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.

None
attention_mask Optional[torch.Tensor]

Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

None

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

(Reweighted values [B, L, D], attention scores [B, H, M, L])

Source code in slp/modules/attention.py
def forward(self, keys, queries=None, attention_mask=None):
    r"""Multi-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    Each head performs dot-product attention

    $$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$

    The outputs of multiple heads are concatenated and passed through a feedforward layer.

    $$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$


    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension


    Args:
        keys (torch.Tensor): [B, L, D] Keys tensor
        queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
        attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
    """
    _, seq_length, _ = keys.size()

    if attention_mask is not None:
        if attention_mask.ndim == 2:
            attention_mask = attention_mask.unsqueeze(1)
        attention_mask = attention_mask.unsqueeze(1)

    if self.nystrom:
        keys, attention_mask = pad_for_nystrom(
            keys, self.num_landmarks, attention_mask=attention_mask
        )

    if queries is None:
        queries = keys

    values = keys

    k = self.k(keys)
    q = self.q(queries)
    v = self.v(values)
    k = split_heads(k, self.num_heads)
    q = split_heads(q, self.num_heads)
    v = split_heads(v, self.num_heads)

    if self.nystrom:
        # out = (B, H, L, A/H)
        # scores = Tuple
        out, scores = nystrom_attention(
            k,
            q,
            v,
            self.dk,
            self.num_landmarks,
            attention_mask=attention_mask,
            inverse_iterations=self.inverse_iterations,
            dropout=self.dropout,
            training=self.training,
        )
    else:
        # out => (B, H, L, A/H)
        # scores => (B, H, L, L)
        out, scores = attention(
            k,
            q,
            v,
            self.dk,
            attention_mask=attention_mask,
            dropout=self.dropout,
            training=self.training,
        )

    if self.conv is not None:
        if attention_mask is None or attention_mask.ndim > 2:
            out += self.conv(v)
        else:
            attention_mask = attention_mask.squeeze()
            out += self.conv(v * attention_mask[:, None, :, None])

    # out => (B, H, L, A/H)
    out = merge_heads(out)
    if out.size(1) != seq_length:
        out = out[:, :seq_length, :]
    out = self.output(out)

    return out, scores

MultiheadSelfAttention

__init__(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None) special

Multi-Headed Dot-product attention module

Parameters:

Name Type Description Default
attention_size int

Number of hidden features. Defaults to 512.

512
num_heads int

Number of attention heads

8
input_size Optional[int]

Input features. Defaults to None. If None input_size is set to attention_size.

None
dropout float

Drop probability. Defaults to 0.1.

0.1
Source code in slp/modules/attention.py
def __init__(
    self,
    attention_size: int = 512,
    num_heads: int = 8,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    """Multi-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(MultiheadSelfAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.inverse_iterations = inverse_iterations
    self.num_landmarks = num_landmarks
    self.nystrom = nystrom
    self.num_heads = num_heads
    self.head_size = int(attention_size / num_heads)
    self.dk = self.head_size
    self.attention_size = attention_size
    self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
    self.output = nn.Linear(attention_size, attention_size)
    self.dropout = dropout

    self.conv = None

    if kernel_size is not None:
        self.conv = nn.Conv2d(
            in_channels=self.num_heads,
            out_channels=self.num_heads,
            kernel_size=(kernel_size, 1),
            padding=(kernel_size // 2, 0),
            bias=False,
            groups=self.num_heads,
        )

    reset_parameters(self.named_parameters())

forward(self, x, attention_mask=None)

Multi-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

Each head performs dot-product attention

\[a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H\]

The outputs of multiple heads are concatenated and passed through a feedforward layer.

\[a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
x torch.Tensor

[B, L, D] Keys tensor

required
attention_mask Optional[torch.Tensor]

Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

None

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

(Reweighted values [B, L, D], attention scores [B, H, M, L])

Source code in slp/modules/attention.py
def forward(self, x, attention_mask=None):
    r"""Multi-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    Each head performs dot-product attention

    $$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$

    The outputs of multiple heads are concatenated and passed through a feedforward layer.

    $$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$


    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension


    Args:
        x (torch.Tensor): [B, L, D] Keys tensor
        attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
    """
    _, seq_length, _ = x.size()

    if attention_mask is not None:
        if attention_mask.ndim == 2:
            attention_mask = attention_mask.unsqueeze(1)
        attention_mask = attention_mask.unsqueeze(1)

    if self.nystrom:
        x, attention_mask = pad_for_nystrom(
            x, self.num_landmarks, attention_mask=attention_mask
        )

    k, q, v = self.kqv(x).chunk(3, dim=-1)
    k = split_heads(k, self.num_heads)
    q = split_heads(q, self.num_heads)
    v = split_heads(v, self.num_heads)

    if self.nystrom:
        # out = (B, H, L, A/H)
        # scores = Tuple
        out, scores = nystrom_attention(
            k,
            q,
            v,
            self.dk,
            self.num_landmarks,
            attention_mask=attention_mask,
            inverse_iterations=self.inverse_iterations,
            dropout=self.dropout,
            training=self.training,
        )
    else:
        # out => (B, H, L, A/H)
        # scores => (B, H, L, L)
        out, scores = attention(
            k,
            q,
            v,
            self.dk,
            attention_mask=attention_mask,
            dropout=self.dropout,
            training=self.training,
        )

    if self.conv is not None:
        if attention_mask is None or attention_mask.ndim > 2:
            out = out + self.conv(v)
        else:
            attention_mask = attention_mask.squeeze()
            out = out + self.conv(v * attention_mask[:, None, :, None])

    # out => (B, H, L, A/H)
    out = merge_heads(out)
    if out.size(1) != seq_length:
        out = out[:, -seq_length:, :]
    out = self.output(out)

    return out, scores

MultiheadTwowayAttention

__init__(self, attention_size=512, input_size=None, dropout=0.1, num_heads=8, residual=True, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None) special

Multihead twoway attention for multimodal fusion

This module performs two way attention for two input modality feature sequences. If att is the MultiheadAttention operation and x, y the input modality sequences, the operation is summarized as

\[out = (att(x \rightarrow y), att(y \rightarrow x))\]

If residual is True then a Vilbert-like residual connection is applied

\[out = (att(x \rightarrow y) + x, att(y \rightarrow x) + y)\]

Parameters:

Name Type Description Default
attention_size int

Number of hidden features. Defaults to 512.

512
num_heads int

Number of attention heads

8
input_size Optional[int]

Input features. Defaults to None. If None input_size is set to attention_size.

None
dropout float

Drop probability. Defaults to 0.1.

0.1
nystrom bool

Use nystrom method for attention calculation. Defaults to False.

False
num_landmarks int

Number of landmark points for nystrom attention. Defaults to 64.

64
inverse_iterations int

Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.

6
kernel_size Optional[int]

Use residual convolution in the output. Defaults to None.

None
residual bool

Use vilbert-like residual connections for fusion. Defaults to True.

True
Source code in slp/modules/attention.py
def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    num_heads: int = 8,
    residual: bool = True,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    r"""Multihead twoway attention for multimodal fusion

    This module performs two way attention for two input modality feature sequences.
    If att is the MultiheadAttention operation and x, y the input modality sequences,
    the operation is summarized as

    $$out = (att(x \rightarrow y), att(y \rightarrow x))$$

    If residual is True then a Vilbert-like residual connection is applied

    $$out = (att(x \rightarrow y) + x, att(y \rightarrow x) + y)$$


    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
        nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
        num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
        inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
        kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
        residual (bool, optional): Use vilbert-like residual connections for fusion. Defaults to True.
    """
    super(MultiheadTwowayAttention, self).__init__()

    self.xy = MultiheadAttention(
        attention_size=attention_size,
        input_size=input_size,
        dropout=dropout,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        inverse_iterations=inverse_iterations,
        kernel_size=kernel_size,
    )
    self.yx = MultiheadAttention(
        attention_size=attention_size,
        input_size=input_size,
        dropout=dropout,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        inverse_iterations=inverse_iterations,
        kernel_size=kernel_size,
    )
    self.residual = residual

forward(self, mod1, mod2, attention_mask=None)

x : (B, L, D) queries : (B, L, D) values : (B, L, D)

Source code in slp/modules/attention.py
def forward(self, mod1, mod2, attention_mask=None):
    """
    x : (B, L, D)
    queries : (B, L, D)
    values : (B, L, D)
    """
    out_mod1, _ = self.xy(mod1, queries=mod2, attention_mask=attention_mask)
    out_mod2, _ = self.yx(mod2, queries=mod1, attention_mask=attention_mask)

    if not self.residual:
        return out_mod1, out_mod2
    else:
        # vilbert cross residual

        # v + attention(v->a)
        # a + attention(a->v)
        out_mod1 += mod2
        out_mod2 += mod1

        return out_mod1, out_mod2

SelfAttention

__init__(self, attention_size=512, input_size=None, dropout=0.1) special

Single-Headed Dot-product self attention module

Parameters:

Name Type Description Default
attention_size int

Number of hidden features. Defaults to 512.

512
input_size Optional[int]

Input features. Defaults to None. If None input_size is set to attention_size.

None
dropout float

Drop probability. Defaults to 0.1.

0.1
Source code in slp/modules/attention.py
def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
):
    """Single-Headed Dot-product self attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(SelfAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.dk = input_size
    self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
    self.dropout = dropout
    reset_parameters(self.named_parameters())

forward(self, x, attention_mask=None)

Single-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

\[a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
x Tensor

[B, L, D] Input tensor

required
attention_mask Optional[torch.Tensor]

Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

None

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor]

Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])

Source code in slp/modules/attention.py
def forward(
    self,
    x: torch.Tensor,
    attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
    r"""Single-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    $$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        x (torch.Tensor): [B, L, D] Input tensor
        attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
    """
    if attention_mask is not None:
        if len(list(attention_mask.size())) == 2:
            attention_mask = attention_mask.unsqueeze(1)

    k, q, v = self.kqv(x).chunk(3, dim=-1)  # (B, L, A)

    # weights => (B, L, L)
    out, scores = attention(
        k,
        q,
        v,
        self.dk,
        attention_mask=attention_mask,
        dropout=self.dropout,
        training=self.training,
    )

    return out, scores

attention(k, q, v, dk, attention_mask=None, dropout=0.2, training=True)

Reweight values using scaled dot product attention

\[s = softmax(\frac{Q \cdot K^T}{\sqrt{d}}) V\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
k Tensor

Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor

required
q Tensor

Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor

required
v Tensor

Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor

required
dk int

Model dimension

required
attention_mask Optional[torch.Tensor]

Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.

None
dropout float

Drop probability. Defaults to 0.2.

0.2
training bool

Is module in training phase? Defaults to True.

True

Returns:

Type Description
torch.Tensor

[B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py
def attention(
    k: torch.Tensor,
    q: torch.Tensor,
    v: torch.Tensor,
    dk: int,
    attention_mask: Optional[torch.Tensor] = None,
    dropout: float = 0.2,
    training: bool = True,
):
    r"""Reweight values using scaled dot product attention

    $$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}}) V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
        dk (int): Model dimension
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """

    scores = attention_scores(
        k, q, dk, attention_mask=attention_mask, dropout=dropout, training=training
    )
    out = torch.matmul(scores, v)

    return out, scores

attention_scores(k, q, dk, attention_mask=None, dropout=0.2, training=True)

Calculate attention scores for scaled dot product attention

\[s = softmax(\frac{Q \cdot K^T}{\sqrt{d}})\]
  • B: Batch size
  • L: Keys Sequence length
  • M: Queries Sequence length
  • H: Number of heads
  • A: Feature dimension

Parameters:

Name Type Description Default
k Tensor

Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor

required
q Tensor

Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor

required
dk int

Model dimension

required
attention_mask Optional[torch.Tensor]

Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.

None
dropout float

Drop probability. Defaults to 0.2.

0.2
training bool

Is module in training phase? Defaults to True.

True

Returns:

Type Description
Tensor

torch.Tensor: [B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py
def attention_scores(
    k: torch.Tensor,
    q: torch.Tensor,
    dk: int,
    attention_mask: Optional[torch.Tensor] = None,
    dropout: float = 0.2,
    training: bool = True,
) -> torch.Tensor:
    r"""Calculate attention scores for scaled dot product attention

    $$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}})$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        dk (int): Model dimension
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """
    scores = torch.matmul(q, k.transpose(-1, -2)) / math.sqrt(dk)

    if attention_mask is not None:
        scores = scores + ((1 - attention_mask) * -1e5)
    scores = F.softmax(scores, dim=-1)
    scores = F.dropout(scores, p=dropout, training=training)

    return scores

merge_heads(x)

Merge multiple attention heads into output tensor

(Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)

Parameters:

Name Type Description Default
x Tensor

[B, H, L, A/H] multi-head tensor

required

Returns:

Type Description
Tensor

torch.Tensor: [B, L, A] merged / reshaped tensor

Source code in slp/modules/attention.py
def merge_heads(x: torch.Tensor) -> torch.Tensor:
    """Merge multiple attention heads into output tensor

    (Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)

    Args:
        x (torch.Tensor): [B, H, L, A/H] multi-head tensor

    Returns:
        torch.Tensor:  [B, L, A] merged / reshaped tensor
    """
    batch_size, _, max_length, _ = x.size()
    # x => (B, L, H, A/H)
    x = x.permute(0, 2, 1, 3).contiguous()

    return x.view(batch_size, max_length, -1)

nystrom_attention(k, q, v, dk, num_landmarks, attention_mask=None, inverse_iterations=6, dropout=0.2, training=True)

Calculate attention using nystrom approximation

Implementation heavily based on: https://github.com/lucidrains/nystrom-attention

Reference: https://arxiv.org/abs/2102.03902 * B: Batch size * L: Keys Sequence length * M: Queries Sequence length * H: Number of heads * A: Feature dimension

Parameters:

Name Type Description Default
k Tensor

Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor

required
q Tensor

Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor

required
v Tensor

Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor

required
dk int

Model dimension

required
num_landmarks int

Number of landmark points

required
attention_mask Optional[torch.Tensor]

Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.

None
inverse_iterations int

Number of iterations for Moore Penrose iterative inverse approximation

6
dropout float

Drop probability. Defaults to 0.2.

0.2
training bool

Is module in training phase? Defaults to True.

True

Returns:

Type Description
torch.Tensor

[B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py
def nystrom_attention(
    k: torch.Tensor,
    q: torch.Tensor,
    v: torch.Tensor,
    dk: int,
    num_landmarks: int,
    attention_mask: Optional[torch.Tensor] = None,
    inverse_iterations: int = 6,
    dropout: float = 0.2,
    training: bool = True,
):
    """Calculate attention using nystrom approximation

    Implementation heavily based on: https://github.com/lucidrains/nystrom-attention

    Reference: https://arxiv.org/abs/2102.03902
    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
        dk (int): Model dimension
        num_landmarks (int): Number of landmark points
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        inverse_iterations (int): Number of iterations for Moore Penrose iterative inverse
            approximation
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """
    _, num_heads, seq_length, head_size = k.size()

    masked_mean_denom = seq_length // num_landmarks
    if attention_mask is not None:
        attention_mask = attention_mask.unsqueeze(1)
        masked_mean_denom = (
            attention_mask.reshape(-1, 1, num_landmarks, seq_length // num_landmarks).sum(-1) + 1e-8  # type: ignore
        )  # (B, 1, Landmarks)
        mask_landmarks = (masked_mean_denom > 0).type(torch.float)  # type: ignore
        masked_mean_denom = masked_mean_denom[..., None]  # type: ignore
        attention_mask = attention_mask.unsqueeze(-1)
        q = q * attention_mask  # (B, H, L, A/H)
        k = k * attention_mask  # (B, H, L, A/H)
        v = v * attention_mask  # (B, H, L, A/H)

        scores_1_mask = attention_mask * mask_landmarks[..., None, :]
        scores_2_mask = mask_landmarks[..., None] * mask_landmarks[..., None, :]
        scores_3_mask = scores_1_mask.transpose(-1, -2)

    q = q / math.sqrt(dk)

    q_landmarks = q.reshape(
        q.size(0),  # batch_size
        q.size(1),  # num_heads
        num_landmarks,  # landmarks
        seq_length // num_landmarks,  # reduced length
        q.size(-1),  # head_size
    ).sum(
        dim=-2
    )  # (B, H, Landmarks, A/H)

    k_landmarks = k.reshape(
        k.size(0),  # batch_size
        k.size(1),  # num_heads
        num_landmarks,  # landmarks
        seq_length // num_landmarks,  # reduced length
        k.size(-1),  # head size
    ).sum(
        dim=-2
    )  # (B, H, Landmarks, A/H)

    k_landmarks = k_landmarks / masked_mean_denom
    q_landmarks = q_landmarks / masked_mean_denom

    scores_1 = attention_scores(
        k_landmarks,
        q,
        1,  # We have already accounted for dk
        attention_mask=scores_1_mask,
        dropout=dropout,
        training=training,
    )

    scores_2 = attention_scores(
        k_landmarks,
        q_landmarks,
        1,  # We have already accounted for dk
        attention_mask=scores_2_mask,
        dropout=dropout,
        training=training,
    )

    scores_3 = attention_scores(
        k,
        q_landmarks,
        1,  # We have already accounted for dk
        attention_mask=scores_3_mask,
        dropout=dropout,
        training=training,
    )

    z_star = moore_penrose_pinv(scores_2, num_iter=inverse_iterations)
    out = (scores_1 @ z_star) @ (scores_3 @ v)

    return out, (scores_1, scores_2, scores_3)

pad_for_nystrom(x, num_landmarks, attention_mask=None)

Pad inputs and attention_mask to perform Nystrom Attention

Pad to nearest multiple of num_landmarks

Parameters:

Name Type Description Default
x Tensor

[B, L, A] Input tensor

required
num_landmarks int

Number of landmark points

required
attention_mask Optional[torch.Tensor]

[B, L] Padding mask

None

Returns:

Type Description
Tuple[torch.Tensor, Optional[torch.Tensor]]

Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask

Source code in slp/modules/attention.py
def pad_for_nystrom(
    x: torch.Tensor, num_landmarks: int, attention_mask: Optional[torch.Tensor] = None
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
    """Pad inputs and attention_mask to perform Nystrom Attention

    Pad to nearest multiple of num_landmarks

    Args:
        x (torch.Tensor): [B, L, A] Input tensor
        num_landmarks (int): Number of landmark points
        attention_mask (Optional[torch.Tensor]): [B, L] Padding mask

    Returns:
        Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask
    """
    if attention_mask is not None:
        attention_mask = attention_mask.squeeze()

    _, seq_length, _ = x.size()

    _, remainder = (
        math.ceil(seq_length / num_landmarks),
        seq_length % num_landmarks,
    )

    if remainder > 0:
        padding = num_landmarks - remainder
        x = F.pad(x, (0, 0, padding, 0), value=0)

        if attention_mask is not None:
            attention_mask = F.pad(attention_mask, (padding, 0))

    return x, attention_mask

reset_parameters(named_parameters)

Initialize parameters in the transformer model.

Source code in slp/modules/attention.py
def reset_parameters(named_parameters):
    """Initialize parameters in the transformer model."""

    for name, p in named_parameters:
        if "weight" in name:
            nn.init.xavier_normal_(p)

        if "bias" in name:
            nn.init.constant_(p, 0.0)

split_heads(x, num_heads)

Split input tensor into multiple attention heads

(Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)

Parameters:

Name Type Description Default
x Tensor

[B, L, A] input tensor

required
num_heads int

number of heads

required

Returns:

Type Description
Tensor

torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor

Source code in slp/modules/attention.py
def split_heads(x: torch.Tensor, num_heads: int) -> torch.Tensor:
    """Split input tensor into multiple attention heads

    (Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)

    Args:
        x (torch.Tensor): [B, L, A] input tensor
        num_heads (int): number of heads

    Returns:
        torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor
    """
    batch_size, max_length, attention_size = x.size()
    head_size = int(attention_size / num_heads)

    return x.view(batch_size, max_length, num_heads, head_size).permute(0, 2, 1, 3)

Classifier

__init__(self, encoder, encoded_features, num_classes, dropout=0.2) special

Classifier wrapper module

Stores a Neural Network encoder and adds a classification layer on top.

Parameters:

Name Type Description Default
encoder Module

[description]

required
encoded_features int

[description]

required
num_classes int

[description]

required
dropout float

Drop probability

0.2
Source code in slp/modules/classifier.py
def __init__(
    self,
    encoder: nn.Module,
    encoded_features: int,
    num_classes: int,
    dropout: float = 0.2,
):
    """Classifier wrapper module

    Stores a Neural Network encoder and adds a classification layer on top.

    Args:
        encoder (nn.Module): [description]
        encoded_features (int): [description]
        num_classes (int): [description]
        dropout (float): Drop probability
    """
    super(Classifier, self).__init__()
    self.encoder = encoder
    self.drop = nn.Dropout(dropout)
    self.clf = nn.Linear(encoded_features, num_classes)

forward(self, *args, **kwargs)

Encode inputs using the encoder network and perform classification

Returns:

Type Description
Tensor

torch.Tensor: [B, *, num_classes] Logits tensor

Source code in slp/modules/classifier.py
def forward(self, *args, **kwargs) -> torch.Tensor:
    """Encode inputs using the encoder network and perform classification

    Returns:
        torch.Tensor: [B, *, num_classes] Logits tensor
    """
    encoded: torch.Tensor = self.encoder(*args, **kwargs)  # type: ignore
    out: torch.Tensor = self.drop(encoded)
    out = self.clf(out)

    return out

MOSEITextClassifier

forward(self, x, lengths)

Encode inputs using the encoder network and perform classification

Returns:

Type Description
torch.Tensor

[B, *, num_classes] Logits tensor

Source code in slp/modules/classifier.py
def forward(self, x, lengths):
    x = x["text"]
    lengths = lengths["text"]

    return super().forward(x, lengths)

RNNLateFusionClassifier

forward(self, inputs, lengths)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/classifier.py
def forward(self, inputs, lengths):
    encoded = [
        self.modality_encoders[m](inputs[m], lengths[m]) for m in self.modalities
    ]
    if self.mmdrop is not None:
        encoded = self.mmdrop(*encoded)
    fused = torch.cat(encoded, dim=-1)
    fused = self.drop(fused)
    out = self.clf(fused)

    return out

TransformerLateFusionClassifier

forward(self, inputs, attention_masks=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/classifier.py
def forward(self, inputs, attention_masks=None):
    if attention_masks is None:
        attention_masks = dict(
            zip(self.modalities, [None for _ in self.modalities])
        )

    encoded = [
        self.modality_encoders[m](inputs[m], attention_mask=attention_masks[m])
        for m in self.modalities
    ]

    if self.mmdrop is not None:
        encoded = self.mmdrop(*encoded)
    fused = torch.cat(encoded, dim=-1)
    if self.modality_drop is not None:
        fused = self.modality_drop(fused)

    out = self.clf(fused)

    return out

Embed

__init__(self, num_embeddings, embedding_dim, embeddings=None, noise=0.0, dropout=0.0, scale=1.0, trainable=False) special

Define the layer of the model and perform the initializations of the layers (wherever it is necessary)

Parameters:

Name Type Description Default
num_embeddings int

Total number of embeddings.

required
embedding_dim int

Embedding dimension.

required
embeddings Optional[numpy.ndarray]

the 2D ndarray with the word vectors.

None
noise float

Optional additive noise. Defaults to 0.0.

0.0
dropout float

Embedding dropout probability. Defaults to 0.0.

0.0
scale float

Scale word embeddings by a constant. Defaults to 1.0.

1.0
trainable bool

Finetune embeddings. Defaults to False

False
Source code in slp/modules/embed.py
def __init__(
    self,
    num_embeddings: int,
    embedding_dim: int,
    embeddings: Optional[np.ndarray] = None,
    noise: float = 0.0,
    dropout: float = 0.0,
    scale: float = 1.0,
    trainable: bool = False,
):
    """
    Define the layer of the model and perform the initializations
    of the layers (wherever it is necessary)

    Args:
        num_embeddings (int): Total number of embeddings.
        embedding_dim (int): Embedding dimension.
        embeddings (numpy.ndarray): the 2D ndarray with the word vectors.
        noise (float): Optional additive noise. Defaults to 0.0.
        dropout (float): Embedding dropout probability. Defaults to 0.0.
        scale (float): Scale word embeddings by a constant. Defaults to 1.0.
        trainable (bool): Finetune embeddings. Defaults to False
    """
    super(Embed, self).__init__()
    self.scale = scale  # scale embeddings by value. Needed for transformer
    # define the embedding layer, with the corresponding dimensions
    self.embedding = nn.Embedding(
        num_embeddings=num_embeddings, embedding_dim=embedding_dim
    )

    if embeddings is not None:
        logger.info("Initializing Embedding layer with pre-trained weights.")
        if trainable:
            logger.info("Embeddings are going to be finetuned")
        else:
            logger.info("Embeddings are frozen")
        self.init_embeddings(embeddings, trainable)

    # the dropout "layer" for the word embeddings
    self.dropout = nn.Dropout(dropout)

    # the gaussian noise "layer" for the word embeddings
    self.noise = GaussianNoise(noise)

forward(self, x)

Embed input tokens

Assign embedding that corresponds to each token. Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.

Parameters:

Name Type Description Default
x Tensor

[B, L] Input token ids.

required

Returns:

Type Description
Tensor

(torch.Tensor) -> [B, L, E] Embedded tokens.

Source code in slp/modules/embed.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Embed input tokens

    Assign embedding that corresponds to each token.
    Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.

    Args:
        x (torch.Tensor): [B, L] Input token ids.

    Returns:
        (torch.Tensor) -> [B, L, E] Embedded tokens.
    """
    embeddings = self.embedding(x)

    if self.noise.stddev > 0:
        embeddings = self.noise(embeddings)

    if self.dropout.p > 0:
        embeddings = self.dropout(embeddings)

    return embeddings * self.scale  # type: ignore

init_embeddings(self, weights, trainable)

Initialize embeddings matrix with pretrained embeddings

Parameters:

Name Type Description Default
weights ndarray

pretrained embeddings

required
trainable bool

Finetune embeddings?

required
Source code in slp/modules/embed.py
def init_embeddings(self, weights: np.ndarray, trainable: bool):
    """Initialize embeddings matrix with pretrained embeddings

    Args:
        weights (np.ndarray): pretrained embeddings
        trainable (bool): Finetune embeddings?
    """
    self.embedding.weight = nn.Parameter(
        torch.from_numpy(weights), requires_grad=trainable
    )

PositionalEncoding

__init__(self, embedding_dim=512, max_len=5000) special

Inject some information about the relative or absolute position of the tokens in the sequence.

The positional encodings have the same dimension as the embeddings, so that the two can be summed. Here, we use sine and cosine functions of different frequencies.

PE for even positions:

\[\text{PosEncoder}(pos, 2i) = sin(\frac{pos}{10000^{\frac{2i}{d}}})\]

PE for odd positions:

\[\text{PosEncoder}(pos, 2i+1) = cos(\frac{pos}{10000^{\frac{2i}{d}}})\]

where \(pos\) is the word position and \(i\) is the embedding idx

Implementation modified from pytorch/examples/word_language_model.py

Parameters:

Name Type Description Default
embedding_dim int

Embedding / model dimension. Defaults to 512.

512
max_len int

Maximum sequence length that can be encoded. Defaults to 5000.

5000
Source code in slp/modules/embed.py
def __init__(self, embedding_dim: int = 512, max_len: int = 5000):
    r"""Inject some information about the relative or absolute position of the tokens in the sequence.

    The positional encodings have the same dimension as
    the embeddings, so that the two can be summed. Here, we use sine and cosine
    functions of different frequencies.

    PE for even positions:

    $$\text{PosEncoder}(pos, 2i) = sin(\frac{pos}{10000^{\frac{2i}{d}}})$$

    PE for odd positions:

    $$\text{PosEncoder}(pos, 2i+1) = cos(\frac{pos}{10000^{\frac{2i}{d}}})$$

    where $pos$ is the word position and $i$ is the embedding idx

    Implementation modified from pytorch/examples/word_language_model.py

    Args:
        embedding_dim (int): Embedding / model dimension. Defaults to 512.
        max_len (int): Maximum sequence length that can be encoded. Defaults to 5000.
    """
    super(PositionalEncoding, self).__init__()
    pe = torch.zeros(max_len, embedding_dim)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(
        torch.arange(0, embedding_dim, 2).float()
        * (-math.log(10000.0) / embedding_dim)
    )
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    pe = pe.unsqueeze(0)
    self.register_buffer("pe", pe)

forward(self, x)

Calculate positional embeddings for input and add them to input tensor

\[out = x + PosEmbed(x)\]

x is assumed to be batch first

Parameters:

Name Type Description Default
x Tensor

[B, L, D] input embeddings

required

Returns:

Type Description
Tensor

torch.Tensor: Embeddings + positional embeddings

Source code in slp/modules/embed.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Calculate positional embeddings for input and add them to input tensor

    $$out = x + PosEmbed(x)$$

    x is assumed to be batch first

    Args:
        x (torch.Tensor): [B, L, D] input embeddings

    Returns:
        torch.Tensor: Embeddings + positional embeddings
    """
    x = x + self.pe[:, : x.size(1), :]  # type: ignore
    return x

PositionwiseFF

__init__(self, d_model, d_ff, dropout=0.1, gelu=False) special

Transformer Position-wise feed-forward layer

Linear -> LayerNorm -> ReLU -> Linear

Parameters:

Name Type Description Default
d_model int

Model dimension

required
d_ff int

Hidden dimension

required
dropout float

Dropout probability. Defaults to 0.1.

0.1
Source code in slp/modules/feedforward.py
def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1, gelu=False):
    """Transformer Position-wise feed-forward layer

    Linear -> LayerNorm -> ReLU -> Linear

    Args:
        d_model (int): Model dimension
        d_ff (int): Hidden dimension
        dropout (float): Dropout probability. Defaults to 0.1.
    """
    super(PositionwiseFF, self).__init__()
    self.ff1 = nn.Linear(d_model, d_ff)
    self.ff2 = nn.Linear(d_ff, d_model)
    self.drop = nn.Dropout(dropout)
    self.activation = nn.ReLU() if not gelu else nn.GELU()

forward(self, x)

Position-wise FF forward pass

\[out = W_2 \dot max(0, W_1 \dot x + b_1) + b_2\]

[B, , D] -> [B, , H] -> [B, *, D]

  • B: Batch size
  • D: Model dim
  • H: Hidden size > Model dim (Usually \(H = 2D\))

Parameters:

Name Type Description Default
x Tensor

[B, *, D] Input features

required

Returns:

Type Description
Tensor

torch.Tensor: [B, *, D] Output features

Source code in slp/modules/feedforward.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    r"""Position-wise FF forward pass

    $$out = W_2 \dot max(0, W_1 \dot x + b_1) + b_2$$

    [B, *, D] -> [B, *, H] -> [B, *, D]

    * B: Batch size
    * D: Model dim
    * H: Hidden size > Model dim (Usually $H = 2D$)

    Args:
        x (torch.Tensor): [B, *, D] Input features

    Returns:
        torch.Tensor: [B, *, D] Output features
    """
    out: torch.Tensor = self.ff2(self.drop(self.activation(self.ff1(x))))
    return out

TwoLayer

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/feedforward.py
def forward(self, x):
    out = self.l1(x)
    out = self.drop(out)
    out = self.act(out)
    out = self.l2(out)
    out = self.drop(out)

    if self.residual:
        out = x + out

    return out

LayerNormTf

__init__(self, hidden_size, eps=1e-12) special

Construct a layernorm module in the TF style (epsilon inside the square root). Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234

Source code in slp/modules/norm.py
def __init__(self, hidden_size: int, eps: float = 1e-12):
    """Construct a layernorm module in the TF style (epsilon inside the square root).
    Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234
    """
    super(LayerNormTf, self).__init__()
    self.weight = nn.Parameter(torch.ones(hidden_size))
    self.bias = nn.Parameter(torch.zeros(hidden_size))
    self.variance_epsilon = eps

forward(self, x)

Calculate Layernorm the tf way

Source code in slp/modules/norm.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Calculate Layernorm the tf way"""
    u = x.mean(-1, keepdim=True)
    s = (x - u).pow(2).mean(-1, keepdim=True)
    x = (x - u) / torch.sqrt(s + self.variance_epsilon)

    return self.weight * x + self.bias

ScaleNorm

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/norm.py
def forward(self, x: torch.Tensor):
    scaled_norm = self.g / safe_norm(x, dim=-1, keepdim=True).clamp(min=self.eps)

    return scaled_norm * x

GaussianNoise

__init__(self, stddev, mean=0.0) special

Additive Gaussian Noise layer

Parameters:

Name Type Description Default
stddev float

the standard deviation of the distribution

required
mean float

the mean of the distribution

0.0
Source code in slp/modules/regularization.py
def __init__(self, stddev: float, mean: float = 0.0):
    """Additive Gaussian Noise layer

    Args:
        stddev (float): the standard deviation of the distribution
        mean (float): the mean of the distribution
    """
    super().__init__()
    self.stddev = stddev
    self.mean = mean

__repr__(self) special

String representation of class

Source code in slp/modules/regularization.py
def __repr__(self):
    """String representation of class"""
    return "{} (mean={}, stddev={})".format(
        self.__class__.__name__, str(self.mean), str(self.stddev)
    )

forward(self, x)

Gaussian noise forward pass

Parameters:

Name Type Description Default
x Tensor

Input features.

required

Returns:

Type Description
Tensor
Source code in slp/modules/regularization.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Gaussian noise forward pass

    Args:
        x (torch.Tensor): Input features.

    Returns:
        [type]: [description]
    """
    if self.training:
        noise = Variable(x.data.new(x.size()).normal_(self.mean, self.stddev))
        return x + noise
    return x

AttentiveRNN

__init__(self, input_size, hidden_size=256, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False) special

RNN with embedding layer and optional attention mechanism

Single-headed scaled dot-product attention is used as an attention mechanism

Parameters:

Name Type Description Default
input_size int

Input features dimension

required
hidden_size int

Hidden features

256
batch_first bool

Use batch first representation type. Defaults to True.

True
layers int

Number of RNN layers. Defaults to 1.

1
bidirectional bool

Use bidirectional RNNs. Defaults to False.

False
merge_bi str

How bidirectional states are merged. Defaults to "cat".

'cat'
dropout float

Dropout probability. Defaults to 0.0.

0.1
rnn_type str

lstm or gru. Defaults to "lstm".

'lstm'
packed_sequence bool

Use packed sequences. Defaults to True.

True
max_length int

Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch

-1
attention bool

Use attention mechanism. Defaults to False

False
num_heads int

Number of attention heads. If 1 uses single headed attention

1
nystrom bool

Use nystrom approximation for multihead attention

True
num_landmarks int

Number of landmark sequence elements for nystrom attention

32
kernel_size Optional[int]

Kernel size for multihead attention output residual convolution

33
inverse_iterations int

Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value

6
return_hidden bool

Return all hidden states. Defaults to False.

False
Source code in slp/modules/rnn.py
def __init__(
    self,
    input_size: int,
    hidden_size: int = 256,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.1,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    attention: bool = False,
    max_length: int = -1,
    num_heads: int = 1,
    nystrom: bool = True,
    num_landmarks: int = 32,
    kernel_size: Optional[int] = 33,
    inverse_iterations: int = 6,
    return_hidden: bool = False,
):
    """RNN with embedding layer and optional attention mechanism

    Single-headed scaled dot-product attention is used as an attention mechanism

    Args:
        input_size (int): Input features dimension
        hidden_size (int): Hidden features
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
        max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
            largest sequence length in this batch
        attention (bool): Use attention mechanism. Defaults to False
        num_heads (int): Number of attention heads. If 1 uses single headed attention
        nystrom (bool): Use nystrom approximation for multihead attention
        num_landmarks (int): Number of landmark sequence elements for nystrom attention
        kernel_size (int): Kernel size for multihead attention output residual convolution
        inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
            in nystrom attention. 6 is a good value
        return_hidden (bool): Return all hidden states. Defaults to False.
    """
    super(AttentiveRNN, self).__init__()
    self.rnn = RNN(
        input_size,  # type: ignore
        hidden_size,
        batch_first=batch_first,
        layers=layers,
        merge_bi=merge_bi,
        bidirectional=bidirectional,
        dropout=dropout,
        rnn_type=rnn_type,
        packed_sequence=packed_sequence,
        max_length=max_length,
    )
    self.out_size = (
        hidden_size
        if not (bidirectional and merge_bi == "cat")
        else 2 * hidden_size
    )
    self.batch_first = batch_first
    self.return_hidden = return_hidden

    self.attention = None

    if attention:
        if num_heads == 1:
            self.attention = Attention(
                attention_size=self.out_size, dropout=dropout
            )
        else:
            self.attention = MultiheadAttention(  # type: ignore
                attention_size=self.out_size,
                num_heads=num_heads,
                kernel_size=kernel_size,
                nystrom=nystrom,
                num_landmarks=num_landmarks,
                inverse_iterations=inverse_iterations,
                dropout=dropout,
            )

forward(self, x, lengths)

Attentive RNN forward pass

If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.

Parameters:

Name Type Description Default
x Tensor

[B, L] Input token ids

required
lengths Tensor

[B] Original sequence lengths

required

Returns:

Type Description
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]

Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]: if return_hidden == False: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification if return_hidden == True: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification, and a tensor of all the hidden states

Source code in slp/modules/rnn.py
def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    """Attentive RNN forward pass

    If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
    Else the output is the last hidden state of the RNN.

    Args:
        x (torch.Tensor): [B, L] Input token ids
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
            if return_hidden == False: Returns a tensor [B, H] or [B, 2*H] of output features to be used for classification
            if return_hidden == True: Returns a tensor [B, H] or [B, 2*H] of output features to
                be used for classification, and a tensor of all the hidden states
    """
    states, last_hidden, _ = self.rnn(x, lengths)

    out: torch.Tensor = last_hidden

    if self.attention is not None:
        states, _ = self.attention(
            states,
            attention_mask=pad_mask(
                lengths,
                max_length=states.size(1) if self.batch_first else states.size(0),
            ),
        )
        out = states.mean(dim=1)

    if self.return_hidden:
        return out, states
    else:
        return out

RNN

out_size: int property readonly

RNN output features size

Returns:

Type Description
int

int: RNN output features size

__init__(self, input_size, hidden_size, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.0, rnn_type='lstm', packed_sequence=True, max_length=-1) special

LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states

It is recommended to run with batch_first=True because the rest of the code is built with this assumption

Parameters:

Name Type Description Default
input_size int

Input features.

required
hidden_size int

Hidden features.

required
batch_first bool

Use batch first representation type. Defaults to True.

True
layers int

Number of RNN layers. Defaults to 1.

1
bidirectional bool

Use bidirectional RNNs. Defaults to False.

False
merge_bi str

How bidirectional states are merged. Defaults to "cat".

'cat'
dropout float

Dropout probability. Defaults to 0.0.

0.0
rnn_type str

lstm or gru. Defaults to "lstm".

'lstm'
packed_sequence bool

Use packed sequences. Defaults to True.

True
Source code in slp/modules/rnn.py
def __init__(
    self,
    input_size: int,
    hidden_size: int,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.0,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    max_length: int = -1,
):
    """LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states

    It is recommended to run with batch_first=True because the rest of the code is built with this assumption

    Args:
        input_size (int): Input features.
        hidden_size (int): Hidden features.
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
    """
    super(RNN, self).__init__()
    self.bidirectional = bidirectional
    self.hidden_size = hidden_size
    self.batch_first = batch_first
    self.merge_bi = merge_bi
    self.rnn_type = rnn_type.lower()

    if not batch_first:
        logger.warning(
            "You are running RNN with batch_first=False. Make sure this is really what you want"
        )

    if not packed_sequence:
        logger.warning(
            "You have set packed_sequence=False. Running with packed_sequence=True will be much faster"
        )

    rnn_cls = nn.LSTM if self.rnn_type == "lstm" else nn.GRU
    self.rnn = rnn_cls(
        input_size,
        hidden_size,
        batch_first=batch_first,
        num_layers=layers,
        bidirectional=bidirectional,
    )
    self.drop = nn.Dropout(dropout)
    self.packed_sequence = packed_sequence

    if packed_sequence:
        self.pack = PackSequence(batch_first=batch_first)
        self.unpack = PadPackedSequence(
            batch_first=batch_first, max_length=max_length
        )

forward(self, x, lengths)

RNN forward pass

Parameters:

Name Type Description Default
x Tensor

[B, L, D] Input features

required
lengths Tensor

[B] Original sequence lengths

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor, Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]]

Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: ( merged forward and backward states [B, L, H] or [B, L, 2H], merged last forward and backward state [B, H] or [B, 2H], hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU )

Source code in slp/modules/rnn.py
def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[
    torch.Tensor,
    torch.Tensor,
    Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
]:
    """RNN forward pass

    Args:
        x (torch.Tensor): [B, L, D] Input features
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: (
            merged forward and backward states [B, L, H] or [B, L, 2*H],
            merged last forward and backward state [B, H] or [B, 2*H],
            hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU
        )
    """
    self.rnn.flatten_parameters()

    if self.packed_sequence:
        # Latest pytorch allows only cpu tensors for packed sequence
        lengths = lengths.to("cpu")
        x, lengths = self.pack(x, lengths)
    out, hidden = self.rnn(x)

    if self.packed_sequence:
        out = self.unpack(out, lengths)
    out = self.drop(out)
    lengths = lengths.to(out.device)

    out, last_timestep = self._final_output(out, lengths)

    return out, last_timestep, hidden

TokenRNN

__init__(self, hidden_size=256, vocab_size=None, embeddings_dim=None, embeddings=None, embeddings_dropout=0.0, finetune_embeddings=False, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False) special

RNN with embedding layer and optional attention mechanism

Single-headed scaled dot-product attention is used as an attention mechanism

Parameters:

Name Type Description Default
hidden_size int

Hidden features

256
vocab_size Optional[int]

Vocabulary size. Defaults to None.

None
embeddings_dim Optional[int]

Embedding dimension. Defaults to None.

None
embeddings Optional[numpy.ndarray]

Embedding matrix. Defaults to None.

None
embeddings_dropout float

Embedding dropout probability. Defaults to 0.0.

0.0
finetune_embeddings bool

Finetune embeddings? Defaults to False.

False
batch_first bool

Use batch first representation type. Defaults to True.

True
layers int

Number of RNN layers. Defaults to 1.

1
bidirectional bool

Use bidirectional RNNs. Defaults to False.

False
merge_bi str

How bidirectional states are merged. Defaults to "cat".

'cat'
dropout float

Dropout probability. Defaults to 0.0.

0.1
rnn_type str

lstm or gru. Defaults to "lstm".

'lstm'
packed_sequence bool

Use packed sequences. Defaults to True.

True
max_length int

Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch

-1
attention bool

Use attention mechanism. Defaults to False

False
num_heads int

Number of attention heads. If 1 uses single headed attention

1
nystrom bool

Use nystrom approximation for multihead attention

True
num_landmarks int

Number of landmark sequence elements for nystrom attention

32
kernel_size Optional[int]

Kernel size for multihead attention output residual convolution

33
inverse_iterations int

Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value

6
Source code in slp/modules/rnn.py
def __init__(
    self,
    hidden_size: int = 256,
    vocab_size: Optional[int] = None,
    embeddings_dim: Optional[int] = None,
    embeddings: Optional[np.ndarray] = None,
    embeddings_dropout: float = 0.0,
    finetune_embeddings: bool = False,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.1,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    attention: bool = False,
    max_length: int = -1,
    num_heads: int = 1,
    nystrom: bool = True,
    num_landmarks: int = 32,
    kernel_size: Optional[int] = 33,
    inverse_iterations: int = 6,
    return_hidden=False,
):
    """RNN with embedding layer and optional attention mechanism

    Single-headed scaled dot-product attention is used as an attention mechanism

    Args:
        hidden_size (int): Hidden features
        vocab_size (Optional[int]): Vocabulary size. Defaults to None.
        embeddings_dim (Optional[int]): Embedding dimension. Defaults to None.
        embeddings (Optional[np.ndarray]): Embedding matrix. Defaults to None.
        embeddings_dropout (float): Embedding dropout probability. Defaults to 0.0.
        finetune_embeddings (bool): Finetune embeddings? Defaults to False.
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
        max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
            largest sequence length in this batch
        attention (bool): Use attention mechanism. Defaults to False
        num_heads (int): Number of attention heads. If 1 uses single headed attention
        nystrom (bool): Use nystrom approximation for multihead attention
        num_landmarks (int): Number of landmark sequence elements for nystrom attention
        kernel_size (int): Kernel size for multihead attention output residual convolution
        inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
            in nystrom attention. 6 is a good value
    """
    super(TokenRNN, self).__init__()

    if embeddings is None:
        finetune_embeddings = True
        assert (
            vocab_size is not None
        ), "You should either pass an embeddings matrix or vocab size"
        assert (
            embeddings_dim is not None
        ), "You should either pass an embeddings matrix or embeddings_dim"
    else:
        vocab_size = embeddings.shape[0]
        embeddings_dim = embeddings.shape[1]

    self.embed = Embed(
        vocab_size,  # type: ignore
        embeddings_dim,  # type: ignore
        embeddings=embeddings,
        dropout=embeddings_dropout,
        scale=hidden_size ** 0.5,
        trainable=finetune_embeddings,
    )
    self.encoder = AttentiveRNN(
        embeddings_dim,  # type: ignore
        hidden_size,
        batch_first=batch_first,
        layers=layers,
        bidirectional=bidirectional,
        merge_bi=merge_bi,
        dropout=dropout,
        rnn_type=rnn_type,
        packed_sequence=packed_sequence,
        attention=attention,
        max_length=max_length,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        kernel_size=kernel_size,
        inverse_iterations=inverse_iterations,
        return_hidden=return_hidden,
    )

    self.out_size = self.encoder.out_size

forward(self, x, lengths)

Token RNN forward pass

If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.

Parameters:

Name Type Description Default
x Tensor

[B, L] Input token ids

required
lengths Tensor

[B] Original sequence lengths

required

Returns:

Type Description
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]

torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification

Source code in slp/modules/rnn.py
def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    """Token RNN forward pass

    If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
    Else the output is the last hidden state of the RNN.

    Args:
        x (torch.Tensor): [B, L] Input token ids
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification
    """
    x = self.embed(x)
    out = self.encoder(x, lengths)

    return out  # type: ignore

Decoder

forward(self, target, encoded, source_mask=None, target_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, target, encoded, source_mask=None, target_mask=None):

    for l in self.decoder:
        target = l(
            target, encoded, source_mask=source_mask, target_mask=target_mask
        )

    return target

DecoderLayer

forward(self, targets, encoded, source_mask=None, target_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, targets, encoded, source_mask=None, target_mask=None):
    targets = self.in_layer(targets, attention_mask=target_mask)
    out = self.fuse_layer(encoded, targets, attention_mask=source_mask)
    out = self.out_layer(out)

    return out

Encoder

forward(self, x, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
    for layer in self.encoder:
        x = layer(x, attention_mask=attention_mask)

    return x

EncoderDecoder

forward(self, source, target, source_mask=None, target_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, source, target, source_mask=None, target_mask=None):
    encoded = self.encoder(source, attention_mask=source_mask)
    decoded = self.decoder(
        target, encoded, source_mask=source_mask, target_mask=target_mask
    )

    return decoded

EncoderLayer

forward(self, x, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
    out = self.l1(x, attention_mask=attention_mask)
    out = self.l2(out)

    return out

Sublayer1

forward(self, x, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
    return (
        self._prenorm(x, attention_mask=attention_mask)
        if self.prenorm
        else self._postnorm(x, attention_mask=attention_mask)
    )

Sublayer2

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x):
    return self._prenorm(x) if self.prenorm else self._postnorm(x)

Sublayer3

forward(self, x, y, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, y, attention_mask=None):
    return (
        self._prenorm(x, y, attention_mask=attention_mask)
        if self.prenorm
        else self._postnorm(x, y, attention_mask=attention_mask)
    )

Transformer

forward(self, source, target, source_mask=None, target_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, source, target, source_mask=None, target_mask=None):
    source = self.embed(source)
    target = self.embed(target)
    # Adding embeddings + pos embeddings
    # is done in PositionalEncoding class
    source = self.pe(source)
    target = self.pe(target)
    out = self.transformer_block(
        source, target, source_mask=source_mask, target_mask=target_mask
    )
    out = self.drop(out)
    out = self.predict(out)

    return out

TransformerSequenceEncoder

forward(self, x, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
    if self.feature_norm:
        x = self.feature_norm(x)

    x = self.embed(x)
    x = self.pe(x)
    out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)

    return out

TransformerTokenSequenceEncoder

forward(self, x, attention_mask=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
    x = self.embed(x)
    x = self.pe(x)
    out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)

    return out

reset_parameters(named_parameters, gain=1.0)

Initialize parameters in the transformer model.

Source code in slp/modules/transformer.py
def reset_parameters(named_parameters, gain=1.0):
    """Initialize parameters in the transformer model."""

    for name, p in named_parameters:
        if p.dim() > 1:
            if "weight" in name:
                nn.init.xavier_normal_(p, gain=gain)

            if "bias" in name:
                nn.init.constant_(p, 0.0)

PLDataModuleFromCorpus

embeddings: Optional[numpy.ndarray] property readonly

Embeddings matrix

Returns:

Type Description
Optional[numpy.ndarray]

Optional[np.ndarray]: Embeddings matrix

vocab_size: int property readonly

Number of tokens in the vocabulary

Returns:

Type Description
int

int: Number of tokens in the vocabulary

__init__(self, train, train_labels=None, val=None, val_labels=None, test=None, test_labels=None, val_percent=0.2, test_percent=0.2, batch_size=64, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, shuffle_eval=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, collate_fn=None, language_model=False, tokenizer='spacy', no_test_set=False, **corpus_args) special

Wrap raw corpus in a LightningDataModule

  • This handles the selection of the appropriate corpus class based on the tokenizer argument.
  • If language_model=True it uses the appropriate dataset from slp.data.datasets.
  • Uses the PLDataModuleFromDatasets to split the val and test sets if not provided

Parameters:

Name Type Description Default
train List

Raw train corpus

required
train_labels Optional[List]

Train labels. Defaults to None.

None
val Optional[List]

Raw validation corpus. Defaults to None.

None
val_labels Optional[List]

Validation labels. Defaults to None.

None
test Optional[List]

Raw test corpus. Defaults to None.

None
test_labels Optional[List]

Test labels. Defaults to None.

None
val_percent float

Percent of train to be used for validation if no validation set is given. Defaults to 0.2.

0.2
test_percent float

Percent of train to be used for test set if no test set is given. Defaults to 0.2.

0.2
batch_size int

Training batch size. Defaults to 1.

64
batch_size_eval int

Validation and test batch size. Defaults to None.

None
seed int

Seed for deterministic run. Defaults to None.

None
num_workers int

Number of workers in the DataLoader. Defaults to 1.

1
pin_memory bool

Pin tensors to GPU memory. Defaults to True.

True
drop_last bool

Drop last incomplete batch. Defaults to False.

False
sampler_train Sampler

Sampler for train loader. Defaults to None.

None
sampler_val Sampler

Sampler for validation loader. Defaults to None.

None
sampler_test Sampler

Sampler for test loader. Defaults to None.

None
batch_sampler_train BatchSampler

Batch sampler for train loader. Defaults to None.

None
batch_sampler_val BatchSampler

Batch sampler for validation loader. Defaults to None.

None
batch_sampler_test BatchSampler

Batch sampler for test loader. Defaults to None.

None
shuffle_eval bool

Shuffle validation and test dataloaders. Defaults to False.

False
collate_fn Optional[Callable[..., Any]]

Collator function. Defaults to None.

None
language_model bool

Use corpus for Language Modeling. Defaults to False.

False
tokenizer str

Select one of the cls.accepted_tokenizers. Defaults to "spacy".

'spacy'
no_test_set bool

Do not create test set. Useful for tuning

False
**corpus_args kwargs

Extra arguments to be passed to the corpus. See slp/data/corpus.py

{}

Exceptions:

Type Description
ValueError

[description]

ValueError

[description]

Source code in slp/plbind/dm.py
def __init__(
    self,
    train: List,
    train_labels: Optional[List] = None,
    val: Optional[List] = None,
    val_labels: Optional[List] = None,
    test: Optional[List] = None,
    test_labels: Optional[List] = None,
    val_percent: float = 0.2,
    test_percent: float = 0.2,
    batch_size: int = 64,
    batch_size_eval: int = None,
    seed: int = None,
    num_workers: int = 1,
    pin_memory: bool = True,
    drop_last: bool = False,
    shuffle_eval: bool = False,
    sampler_train: Sampler = None,
    sampler_val: Sampler = None,
    sampler_test: Sampler = None,
    batch_sampler_train: BatchSampler = None,
    batch_sampler_val: BatchSampler = None,
    batch_sampler_test: BatchSampler = None,
    collate_fn: Optional[Callable[..., Any]] = None,
    language_model: bool = False,
    tokenizer: str = "spacy",
    no_test_set: bool = False,
    **corpus_args,
):
    """Wrap raw corpus in a LightningDataModule

    * This handles the selection of the appropriate corpus class based on the tokenizer argument.
    * If language_model=True it uses the appropriate dataset from slp.data.datasets.
    * Uses the PLDataModuleFromDatasets to split the val and test sets if not provided

    Args:
        train (List): Raw train corpus
        train_labels (Optional[List]): Train labels. Defaults to None.
        val (Optional[List]): Raw validation corpus. Defaults to None.
        val_labels (Optional[List]): Validation labels. Defaults to None.
        test (Optional[List]): Raw test corpus. Defaults to None.
        test_labels (Optional[List]): Test labels. Defaults to None.
        val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
        test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
        batch_size (int): Training batch size. Defaults to 1.
        batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
        seed (Optional[int]): Seed for deterministic run. Defaults to None.
        num_workers (int): Number of workers in the DataLoader. Defaults to 1.
        pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
        drop_last (bool): Drop last incomplete batch. Defaults to False.
        sampler_train (Sampler): Sampler for train loader. Defaults to None.
        sampler_val (Sampler): Sampler for validation loader. Defaults to None.
        sampler_test (Sampler): Sampler for test loader. Defaults to None.
        batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
        batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
        batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
        shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
        collate_fn (Callable[..., Any]): Collator function. Defaults to None.
        language_model (bool): Use corpus for Language Modeling. Defaults to False.
        tokenizer (str): Select one of the cls.accepted_tokenizers. Defaults to "spacy".
        no_test_set (bool): Do not create test set. Useful for tuning
        **corpus_args (kwargs): Extra arguments to be passed to the corpus. See
            slp/data/corpus.py
    Raises:
        ValueError: [description]
        ValueError: [description]
    """
    self.language_model = language_model
    self.tokenizer = tokenizer
    self.corpus_args = corpus_args

    train_data, val_data, test_data = self._zip_corpus_and_labels(
        train, val, test, train_labels, val_labels, test_labels
    )

    self.no_test_set = no_test_set
    super(PLDataModuleFromCorpus, self).__init__(
        train_data,  # type: ignore
        val=val_data,  # type: ignore
        test=test_data,  # type: ignore
        val_percent=val_percent,
        test_percent=test_percent,
        batch_size=batch_size,
        batch_size_eval=batch_size_eval,
        seed=seed,
        num_workers=num_workers,
        pin_memory=pin_memory,
        drop_last=drop_last,
        shuffle_eval=shuffle_eval,
        sampler_train=sampler_train,
        sampler_val=sampler_val,
        sampler_test=sampler_test,
        batch_sampler_train=batch_sampler_train,
        batch_sampler_val=batch_sampler_val,
        batch_sampler_test=batch_sampler_test,
        collate_fn=collate_fn,
        no_test_set=no_test_set,
    )

add_argparse_args(parent_parser) classmethod

Augment input parser with arguments for data loading and corpus processing

Parameters:

Name Type Description Default
parent_parser argparse.ArgumentParser

Parser created by the user

required

Returns:

Type Description
argparse.ArgumentParser

Augmented parser

Source code in slp/plbind/dm.py
@classmethod
def add_argparse_args(cls, parent_parser):
    """Augment input parser with arguments for data loading and corpus processing

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = super(PLDataModuleFromCorpus, cls).add_argparse_args(parent_parser)
    parser.add_argument(
        "--tokenizer",
        dest="data.tokenizer",
        type=str.lower,
        # Corpus can already be tokenized, you can use spacy for word tokenization or any tokenizer from hugging face
        choices=cls.accepted_tokenizers,
        default="spacy",
        help="Token type. The tokenization will happen at this level.",
    )

    # Only when tokenizer == spacy
    parser.add_argument(
        "--limit-vocab",
        dest="data.limit_vocab_size",
        type=int,
        default=-1,
        help="Limit vocab size. -1 means use the whole vocab. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--embeddings-file",
        dest="data.embeddings_file",
        type=dir_path,
        default=None,
        help="Path to file with pretrained embeddings. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--embeddings-dim",
        dest="data.embeddings_dim",
        type=int,
        default=50,
        help="Embedding dim of pretrained embeddings. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--lang",
        dest="data.lang",
        type=str,
        default="en_core_web_md",
        help="Language for spacy tokenizer, e.g. en_core_web_md. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--no-add-specials",
        dest="data.add_special_tokens",
        action="store_false",
        help="Do not add special tokens for hugging face tokenizers",
    )

    # Generic args
    parser.add_argument(
        "--lower",
        dest="data.lower",
        action="store_true",
        help="Convert to lowercase.",
    )

    parser.add_argument(
        "--prepend-bos",
        dest="data.prepend_bos",
        action="store_true",
        help="Prepend [BOS] token",
    )

    parser.add_argument(
        "--append-eos",
        dest="data.append_eos",
        action="store_true",
        help="Append [EOS] token",
    )

    parser.add_argument(
        "--max-sentence-length",
        dest="data.max_len",
        type=int,
        default=-1,
        help="Maximum allowed sentence length. -1 means use the whole sentence",
    )

    return parser

PLDataModuleFromDatasets

__init__(self, train, val=None, test=None, val_percent=0.2, test_percent=0.2, batch_size=1, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, shuffle_eval=False, collate_fn=None, no_test_set=False) special

LightningDataModule wrapper for generic torch.utils.data.Dataset

If val or test Datasets are not provided, this class will split val_pecent and test_percent of the train set respectively to create them

Parameters:

Name Type Description Default
train Dataset

Train set

required
val Dataset

Validation set. Defaults to None.

None
test Dataset

Test set. Defaults to None.

None
val_percent float

Percent of train to be used for validation if no validation set is given. Defaults to 0.2.

0.2
test_percent float

Percent of train to be used for test set if no test set is given. Defaults to 0.2.

0.2
batch_size int

Training batch size. Defaults to 1.

1
batch_size_eval Optional[int]

Validation and test batch size. Defaults to None.

None
seed Optional[int]

Seed for deterministic run. Defaults to None.

None
num_workers int

Number of workers in the DataLoader. Defaults to 1.

1
pin_memory bool

Pin tensors to GPU memory. Defaults to True.

True
drop_last bool

Drop last incomplete batch. Defaults to False.

False
sampler_train Sampler

Sampler for train loader. Defaults to None.

None
sampler_val Sampler

Sampler for validation loader. Defaults to None.

None
sampler_test Sampler

Sampler for test loader. Defaults to None.

None
batch_sampler_train BatchSampler

Batch sampler for train loader. Defaults to None.

None
batch_sampler_val BatchSampler

Batch sampler for validation loader. Defaults to None.

None
batch_sampler_test BatchSampler

Batch sampler for test loader. Defaults to None.

None
shuffle_eval bool

Shuffle validation and test dataloaders. Defaults to False.

False
collate_fn Optional[Callable[..., Any]]

Collator function. Defaults to None.

None
no_test_set bool

Do not create test set. Useful for tuning

False

Exceptions:

Type Description
ValueError

If both mutually exclusive sampler_train and batch_sampler_train are provided

ValueError

If both mutually exclusive sampler_val and batch_sampler_val are provided

ValueError

If both mutually exclusive sampler_test and batch_sampler_test are provided

Source code in slp/plbind/dm.py
def __init__(
    self,
    train: Dataset,
    val: Dataset = None,
    test: Dataset = None,
    val_percent: float = 0.2,
    test_percent: float = 0.2,
    batch_size: int = 1,
    batch_size_eval: Optional[int] = None,
    seed: Optional[int] = None,
    num_workers: int = 1,
    pin_memory: bool = True,
    drop_last: bool = False,
    sampler_train: Sampler = None,
    sampler_val: Sampler = None,
    sampler_test: Sampler = None,
    batch_sampler_train: BatchSampler = None,
    batch_sampler_val: BatchSampler = None,
    batch_sampler_test: BatchSampler = None,
    shuffle_eval: bool = False,
    collate_fn: Optional[Callable[..., Any]] = None,
    no_test_set: bool = False,
):
    """LightningDataModule wrapper for generic torch.utils.data.Dataset

    If val or test Datasets are not provided, this class will split
    val_pecent and test_percent of the train set respectively to create them

    Args:
        train (Dataset): Train set
        val (Dataset): Validation set. Defaults to None.
        test (Dataset): Test set. Defaults to None.
        val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
        test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
        batch_size (int): Training batch size. Defaults to 1.
        batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
        seed (Optional[int]): Seed for deterministic run. Defaults to None.
        num_workers (int): Number of workers in the DataLoader. Defaults to 1.
        pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
        drop_last (bool): Drop last incomplete batch. Defaults to False.
        sampler_train (Sampler): Sampler for train loader. Defaults to None.
        sampler_val (Sampler): Sampler for validation loader. Defaults to None.
        sampler_test (Sampler): Sampler for test loader. Defaults to None.
        batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
        batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
        batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
        shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
        collate_fn (Callable[..., Any]): Collator function. Defaults to None.
        no_test_set (bool): Do not create test set. Useful for tuning

    Raises:
        ValueError: If both mutually exclusive sampler_train and batch_sampler_train are provided
        ValueError: If both mutually exclusive sampler_val and batch_sampler_val are provided
        ValueError: If both mutually exclusive sampler_test and batch_sampler_test are provided
    """
    super(PLDataModuleFromDatasets, self).__init__()
    self.setup_has_run = False
    if batch_sampler_train is not None and sampler_train is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the train set. These are mutually exclusive"
        )

    if batch_sampler_val is not None and sampler_val is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the validation set. These are mutually exclusive"
        )

    if batch_sampler_test is not None and sampler_test is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the test set. These are mutually exclusive"
        )
    self.val_percent = val_percent
    self.test_percent = test_percent
    self.sampler_train = sampler_train
    self.sampler_val = sampler_val
    self.sampler_test = sampler_test
    self.batch_sampler_train = batch_sampler_train
    self.batch_sampler_val = batch_sampler_val
    self.batch_sampler_test = batch_sampler_test
    self.num_workers = num_workers
    self.pin_memory = pin_memory
    self.drop_last = drop_last

    self.shuffle_eval = shuffle_eval
    self.collate_fn = collate_fn

    self.batch_size = batch_size
    self.seed = seed

    if batch_size_eval is None:
        batch_size_eval = self.batch_size

    self.no_test_set = no_test_set
    self.batch_size_eval = batch_size_eval
    self.train = train
    self.val = val
    self.test = test

add_argparse_args(parent_parser) classmethod

Augment input parser with arguments for data loading

Parameters:

Name Type Description Default
parent_parser ArgumentParser

Parser created by the user

required

Returns:

Type Description
ArgumentParser

argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/dm.py
@classmethod
def add_argparse_args(
    cls, parent_parser: argparse.ArgumentParser
) -> argparse.ArgumentParser:
    """Augment input parser with arguments for data loading

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--val-percent",
        dest="data.val_percent",
        type=float,
        default=0.2,
        help="Percent of validation data to be randomly split from the training set, if no validation set is provided",
    )

    parser.add_argument(
        "--test-percent",
        dest="data.test_percent",
        type=float,
        default=0.2,
        help="Percent of test data to be randomly split from the training set, if no test set is provided",
    )

    parser.add_argument(
        "--bsz",
        dest="data.batch_size",
        type=int,
        default=32,
        help="Training batch size",
    )

    parser.add_argument(
        "--bsz-eval",
        dest="data.batch_size_eval",
        type=int,
        default=32,
        help="Evaluation batch size",
    )

    parser.add_argument(
        "--num-workers",
        dest="data.num_workers",
        type=int,
        default=1,
        help="Number of workers to be used in the DataLoader",
    )

    parser.add_argument(
        "--no-pin-memory",
        dest="data.pin_memory",
        action="store_false",
        help="Don't pin data to GPU memory when transferring",
    )

    parser.add_argument(
        "--drop-last",
        dest="data.drop_last",
        action="store_true",
        help="Drop last incomplete batch",
    )

    parser.add_argument(
        "--no-shuffle-eval",
        dest="data.shuffle_eval",
        action="store_false",
        help="Don't shuffle val & test sets",
    )

    return parser

prepare_data(self)

Use this to download and prepare data.

.. warning:: DO NOT set state to the model (use setup instead) since this is NOT called on every GPU in DDP/TPU

Example::

def prepare_data(self):
    # good
    download_data()
    tokenize()
    etc()

    # bad
    self.split = data_split
    self.some_state = some_other_state()

In DDP prepare_data can be called in two ways (using Trainer(prepare_data_per_node)):

  1. Once per node. This is the default and is only called on LOCAL_RANK=0.
  2. Once in total. Only called on GLOBAL_RANK=0.

Example::

# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
Trainer(prepare_data_per_node=True)

# call on GLOBAL_RANK=0 (great for shared file systems)
Trainer(prepare_data_per_node=False)

This is called before requesting the dataloaders:

.. code-block:: python

model.prepare_data()
    if ddp/tpu: init()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()
Source code in slp/plbind/dm.py
def prepare_data(self):
    return None

test_dataloader(self)

Configure test DataLoader

Returns:

Type Description
DataLoader

Pytorch DataLoader for test set

Source code in slp/plbind/dm.py
def test_dataloader(self):
    """Configure test DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for test set
    """

    return DataLoader(
        self.test,
        batch_size=self.batch_size_eval if self.batch_sampler_test is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_test is None),
        sampler=self.sampler_test,
        batch_sampler=self.batch_sampler_test,
        shuffle=(
            self.shuffle_eval
            and (self.batch_sampler_test is None)
            and (self.sampler_test is None)
        ),
        collate_fn=self.collate_fn,
    )

train_dataloader(self)

Configure train DataLoader

Returns:

Type Description
DataLoader

DataLoader: Pytorch DataLoader for train set

Source code in slp/plbind/dm.py
def train_dataloader(self) -> DataLoader:
    """Configure train DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for train set
    """

    return DataLoader(
        self.train,
        batch_size=self.batch_size if self.batch_sampler_train is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_train is None),
        sampler=self.sampler_train,
        batch_sampler=self.batch_sampler_train,
        shuffle=(self.batch_sampler_train is None) and (self.sampler_train is None),
        collate_fn=self.collate_fn,
    )

val_dataloader(self)

Configure validation DataLoader

Returns:

Type Description
DataLoader

Pytorch DataLoader for validation set

Source code in slp/plbind/dm.py
def val_dataloader(self):
    """Configure validation DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for validation set
    """
    val = DataLoader(
        self.val,
        batch_size=self.batch_size_eval if self.batch_sampler_val is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_val is None),
        sampler=self.sampler_val,
        batch_sampler=self.batch_sampler_val,
        shuffle=(
            self.shuffle_eval
            and (self.batch_sampler_val is None)
            and (self.sampler_val is None)
        ),
        collate_fn=self.collate_fn,
    )

    return val

split_data(dataset, test_size, seed)

Train-test split of dataset.

Dataset can be either a torch.utils.data.Dataset or a list

Parameters:

Name Type Description Default
dataset Union[Dataset, List]

Input dataset

required
test_size float

Size of the test set. Defaults to 0.2.

required
seed int

Optional seed for deterministic run. Defaults to None.

required

Returns:

Type Description
Tuple[Union[Dataset, List], Union[Dataset, List]

(train set, test set)

Source code in slp/plbind/dm.py
def split_data(dataset, test_size, seed):
    """Train-test split of dataset.

    Dataset can be either a torch.utils.data.Dataset or a list

    Args:
        dataset (Union[Dataset, List]): Input dataset
        test_size (float): Size of the test set. Defaults to 0.2.
        seed (int): Optional seed for deterministic run. Defaults to None.

    Returns:
        Tuple[Union[Dataset, List], Union[Dataset, List]: (train set, test set)
    """
    train, test = None, None

    if isinstance(dataset, torch.utils.data.Dataset):
        test_len = int(test_size * len(dataset))
        train_len = len(dataset) - test_len

        seed_generator = None

        if seed is not None:
            seed_generator = torch.Generator().manual_seed(seed)

        train, test = random_split(
            dataset, [train_len, test_len], generator=seed_generator
        )

    else:

        train, test = train_test_split(dataset, test_size=test_size, random_state=seed)

    return train, test

FixedWandbLogger

__init__(self, name=None, save_dir=None, offline=False, id=None, anonymous=False, version=None, project=None, log_model=False, experiment=None, prefix='', sync_step=True, checkpoint_dir=None, **kwargs) special

Wandb logger fix to save checkpoints in wandb

Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory

Parameters:

Name Type Description Default
name Optional[str]

Display name for the run. Defaults to None.

None
save_dir Optional[str]

Path where data is saved. Defaults to None.

None
offline Optional[bool]

Run offline (data can be streamed later to wandb servers). Defaults to False.

False
id Optional[str]

Sets the version, mainly used to resume a previous run. Defaults to None.

None
anonymous Optional[bool]

Enables or explicitly disables anonymous logging. Defaults to False.

False
version Optional[str]

Sets the version, mainly used to resume a previous run. Defaults to None.

None
project Optional[str]

The name of the project to which this run will belong. Defaults to None.

None
log_model Optional[bool]

Save checkpoints in wandb dir to upload on W&B servers. Defaults to False.

False
experiment Run

WandB experiment object. Defaults to None.

None
prefix Optional[str]

A string to put at the beginning of metric keys. Defaults to "".

''
sync_step Optional[bool]

Sync Trainer step with wandb step. Defaults to True.

True
checkpoint_dir Optional[str]

Real checkpoint dir. Defaults to None.

None
Source code in slp/plbind/helpers.py
def __init__(
    self,
    name: Optional[str] = None,
    save_dir: Optional[str] = None,
    offline: Optional[bool] = False,
    id: Optional[str] = None,
    anonymous: Optional[bool] = False,
    version: Optional[str] = None,
    project: Optional[str] = None,
    log_model: Optional[bool] = False,
    experiment: wandb.sdk.wandb_run.Run = None,
    prefix: Optional[str] = "",
    sync_step: Optional[bool] = True,
    checkpoint_dir: Optional[str] = None,
    **kwargs,
):
    """Wandb logger fix to save checkpoints in wandb

    Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory

    Args:
        name (Optional[str]): Display name for the run. Defaults to None.
        save_dir (Optional[str]): Path where data is saved. Defaults to None.
        offline (Optional[bool]): Run offline (data can be streamed later to wandb servers). Defaults to False.
        id (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
        anonymous (Optional[bool]): Enables or explicitly disables anonymous logging. Defaults to False.
        version (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
        project (Optional[str]): The name of the project to which this run will belong. Defaults to None.
        log_model (Optional[bool]): Save checkpoints in wandb dir to upload on W&B servers. Defaults to False.
        experiment ([type]): WandB experiment object. Defaults to None.
        prefix (Optional[str]): A string to put at the beginning of metric keys. Defaults to "".
        sync_step (Optional[bool]): Sync Trainer step with wandb step. Defaults to True.
        checkpoint_dir (Optional[str]): Real checkpoint dir. Defaults to None.
    """
    self._checkpoint_dir = checkpoint_dir
    super(FixedWandbLogger, self).__init__(
        name=name,
        save_dir=save_dir,
        offline=offline,
        id=id,
        anonymous=anonymous,
        version=version,
        project=project,
        log_model=log_model,
        experiment=experiment,
        prefix=prefix,
        sync_step=sync_step,
        **kwargs,
    )

finalize(self, status)

Determine where checkpoints are saved and upload to wandb servers

Parameters:

Name Type Description Default
status str

Experiment status

required
Source code in slp/plbind/helpers.py
@rank_zero_only
def finalize(self, status: str) -> None:
    """Determine where checkpoints are saved and upload to wandb servers

    Args:
        status (str): Experiment status
    """
    # offset future training logged on same W&B run

    if self._experiment is not None:
        self._step_offset = self._experiment.step

    checkpoint_dir = (
        self._checkpoint_dir if self._checkpoint_dir is not None else self.save_dir
    )

    if checkpoint_dir is None:
        logger.warning(
            "Invalid checkpoint dir. Checkpoints will not be uploaded to Wandb."
        )
        logger.info(
            "You can manually upload your checkpoints through the CLI interface."
        )

    else:
        # upload all checkpoints from saving dir

        if self._log_model:
            wandb.save(os.path.join(checkpoint_dir, "*.ckpt"))

FromLogits

__init__(self, metric) special

Wrap pytorch lighting metric to accept logits input

Parameters:

Name Type Description Default
metric Metric

The metric to wrap, e.g. pl.metrics.Accuracy

required
Source code in slp/plbind/helpers.py
def __init__(self, metric: pl.metrics.Metric):
    """Wrap pytorch lighting metric to accept logits input

    Args:
        metric (pl.metrics.Metric): The metric to wrap, e.g. pl.metrics.Accuracy
    """
    super(FromLogits, self).__init__(
        compute_on_step=metric.compute_on_step,
        dist_sync_on_step=metric.dist_sync_on_step,
        process_group=metric.process_group,
        dist_sync_fn=metric.dist_sync_fn,
    )
    self.metric = metric

compute(self)

Compute metric

Returns:

Type Description
Tensor

torch.Tensor: metric value

Source code in slp/plbind/helpers.py
def compute(self) -> torch.Tensor:
    """Compute metric

    Returns:
        torch.Tensor: metric value
    """
    return self.metric.compute()  # type: ignore

update(self, preds, target)

Update underlying metric

Calculate softmax under the hood and pass probs to the underlying metric

Parameters:

Name Type Description Default
preds Tensor

[B, *, num_classes] Logits

required
target Tensor

[B, *] Ground truths

required
Source code in slp/plbind/helpers.py
def update(self, preds: torch.Tensor, target: torch.Tensor) -> None:  # type: ignore
    """Update underlying metric

    Calculate softmax under the hood and pass probs to the underlying metric

    Args:
        preds (torch.Tensor): [B, *, num_classes] Logits
        target (torch.Tensor): [B, *] Ground truths
    """
    preds = F.softmax(preds, dim=-1)
    self.metric.update(preds, target)  # type: ignore

AutoEncoderPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(AutoEncoderPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_AutoEncoder,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

BertPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(BertPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_BertSequenceClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

MultimodalTransformerClassificationPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(MultimodalTransformerClassificationPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_MultimodalTransformerClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

PLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(PLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_Classification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

RnnPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(RnnPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_RnnClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

SimplePLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, predictor_cls=<class 'slp.plbind.module._Classification'>, calculate_perplexity=False) special

Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule

Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step with use of the predictor helper classes (e.g. _Classification, _RnnClassification)

Parameters:

Name Type Description Default
model Module

Module to use for prediction

required
optimizer Union[torch.optim.optimizer.Optimizer, List[torch.optim.optimizer.Optimizer]]

Optimizers to use for training

required
criterion Union[torch.nn.modules.module.Module, Callable]

Task loss

required
lr_scheduler Union[torch.optim.lr_scheduler._LRScheduler, List[torch.optim.lr_scheduler._LRScheduler]]

Learning rate scheduler. Defaults to None.

None
hparams Union[omegaconf.dictconfig.DictConfig, Dict[str, Any], argparse.Namespace]

Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None.

None
metrics Optional[Dict[str, pytorch_lightning.metrics.metric.Metric]]

Metrics to track. Defaults to None.

None
predictor_cls [type]

Class that defines a parse_batch and a get_predictions_and_targets method. Defaults to _Classification.

<class 'slp.plbind.module._Classification'>
calculate_perplexity bool

Whether to calculate perplexity. Would be cleaner as a metric, but this is more efficient. Defaults to False.

False
Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    predictor_cls=_Classification,
    calculate_perplexity: bool = False,  # for LM. Dirty but much more efficient
):
    """Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule

    Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step
    with use of the predictor helper classes (e.g. _Classification, _RnnClassification)

    Args:
        model (nn.Module): Module to use for prediction
        optimizer (Union[Optimizer, List[Optimizer]]): Optimizers to use for training
        criterion (LossType): Task loss
        lr_scheduler (Union[_LRScheduler, List[_LRScheduler]], optional): Learning rate scheduler. Defaults to None.
        hparams (Configuration, optional): Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None.
        metrics (Optional[Dict[str, pl.metrics.Metric]], optional): Metrics to track. Defaults to None.
        predictor_cls ([type], optional): Class that defines a parse_batch and a
                get_predictions_and_targets method. Defaults to _Classification.
        calculate_perplexity (bool, optional): Whether to calculate perplexity.
                Would be cleaner as a metric, but this is more efficient. Defaults to False.
    """
    super(SimplePLModule, self).__init__()
    self.calculate_perplexity = calculate_perplexity
    self.model = model
    self.optimizer = optimizer
    self.lr_scheduler = lr_scheduler
    self.criterion = criterion

    if metrics is not None:
        self.train_metrics = nn.ModuleDict(metrics)
        self.val_metrics = nn.ModuleDict({k: v.clone() for k, v in metrics.items()})
        self.test_metrics = nn.ModuleDict(
            {k: v.clone() for k, v in metrics.items()}
        )
    else:
        self.train_metrics = nn.ModuleDict(modules=None)
        self.val_metrics = nn.ModuleDict(modules=None)
        self.test_metrics = nn.ModuleDict(modules=None)
    self.predictor = predictor_cls()

    if hparams is not None:
        if isinstance(hparams, Namespace):
            dict_params = vars(hparams)
        elif isinstance(hparams, DictConfig):
            dict_params = cast(Dict[str, Any], OmegaConf.to_container(hparams))
        else:
            dict_params = hparams
        # self.hparams = dict_params
        self.save_hyperparameters(dict_params)

aggregate_epoch_metrics(self, outputs, mode='Training')

Aggregate metrics over a whole epoch

Parameters:

Name Type Description Default
outputs List[Dict[str, torch.Tensor]]

Aggregated outputs from train_step, validation_step or test_step

required
mode str

"Training", "Validation" or "Testing". Defaults to "Training".

'Training'
Source code in slp/plbind/module.py
def aggregate_epoch_metrics(self, outputs, mode="Training"):
    """Aggregate metrics over a whole epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step, validation_step or test_step
        mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
    """

    def fmt(name):
        """Format metric name"""

        return f"{name}" if name != "loss" else "train_loss"

    keys = list(outputs[0].keys())
    aggregated = {fmt(k): torch.stack([x[k] for x in outputs]).mean() for k in keys}
    aggregated["epoch"] = self.current_epoch + 1

    self.log_dict(aggregated, logger=True, prog_bar=False, on_epoch=True)

    return aggregated

configure_optimizers(self)

Return optimizers and learning rate schedulers

Returns:

Type Description
Tuple[List[Optimizer], List[_LRScheduler]]

(optimizers, lr_schedulers)

Source code in slp/plbind/module.py
def configure_optimizers(self):
    """Return optimizers and learning rate schedulers

    Returns:
        Tuple[List[Optimizer], List[_LRScheduler]]: (optimizers, lr_schedulers)
    """

    if self.lr_scheduler is not None:
        scheduler = {
            "scheduler": self.lr_scheduler,
            "interval": "epoch",
            "monitor": "val_loss",
        }

        return [self.optimizer], [scheduler]

    return self.optimizer

forward(self, *args, **kwargs)

Call wrapped module forward

Source code in slp/plbind/module.py
def forward(self, *args, **kwargs):
    """Call wrapped module forward"""

    return self.model(*args, **kwargs)

log_to_console(self, metrics, mode='Training')

Log metrics to console

Parameters:

Name Type Description Default
metrics Dict[str, torch.Tensor]

Computed metrics

required
mode str

"Training", "Validation" or "Testing". Defaults to "Training".

'Training'
Source code in slp/plbind/module.py
def log_to_console(self, metrics, mode="Training"):
    """Log metrics to console

    Args:
        metrics (Dict[str, torch.Tensor]): Computed metrics
        mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
    """
    logger.info("Epoch {} {} results".format(self.current_epoch + 1, mode))
    print_separator(symbol="-", n=50, print_fn=logger.info)

    for name, value in metrics.items():
        if name == "epoch":
            continue
        logger.info("{:<15} {:<15}".format(name, value))

    print_separator(symbol="%", n=50, print_fn=logger.info)

test_epoch_end(self, outputs)

Aggregate metrics of a test epoch

Parameters:

Name Type Description Default
outputs List[Dict[str, torch.Tensor]]

Aggregated outputs from test_step

required
Source code in slp/plbind/module.py
def test_epoch_end(self, outputs):
    """Aggregate metrics of a test epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from test_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Test")
    self.log_to_console(outputs, mode="Test")

test_step(self, batch, batch_idx)

Compute loss for a single test step and log metrics to loggers

Parameters:

Name Type Description Default
batch Tuple[torch.Tensor, ...]

Input batch

required
batch_idx int

Index of batch

required

Returns:

Type Description
Dict[str, torch.Tensor]

computed metrics

Source code in slp/plbind/module.py
def test_step(self, batch, batch_idx):
    """Compute loss for a single test step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.test_metrics, loss, y_hat, targets, mode="test"
    )

    return metrics

training_epoch_end(self, outputs)

Aggregate metrics of a training epoch

Parameters:

Name Type Description Default
outputs List[Dict[str, torch.Tensor]]

Aggregated outputs from train_step

required
Source code in slp/plbind/module.py
def training_epoch_end(self, outputs):
    """Aggregate metrics of a training epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Training")
    self.log_to_console(outputs, mode="Training")

training_step(self, batch, batch_idx)

Compute loss for a single training step and log metrics to loggers

Parameters:

Name Type Description Default
batch Tuple[torch.Tensor, ...]

Input batch

required
batch_idx int

Index of batch

required

Returns:

Type Description
Dict[str, torch.Tensor]

computed metrics

Source code in slp/plbind/module.py
def training_step(self, batch, batch_idx):
    """Compute loss for a single training step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self.model, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.train_metrics, loss, y_hat, targets, mode="train"
    )

    self.log_dict(
        metrics,
        on_step=True,
        on_epoch=False,
        logger=True,
        prog_bar=False,
    )

    metrics["loss"] = loss

    return metrics

validation_epoch_end(self, outputs)

Aggregate metrics of a validation epoch

Parameters:

Name Type Description Default
outputs List[Dict[str, torch.Tensor]]

Aggregated outputs from validation_step

required
Source code in slp/plbind/module.py
def validation_epoch_end(self, outputs):
    """Aggregate metrics of a validation epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from validation_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Validation")

    if torch.isnan(outputs["val_loss"]) or torch.isinf(outputs["val_loss"]):
        outputs["val_loss"] = 1000000

    outputs["best_score"] = min(
        outputs[self.trainer.early_stopping_callback.monitor].detach().cpu(),
        self.trainer.early_stopping_callback.best_score.detach().cpu(),
    )
    self.log_to_console(outputs, mode="Validation")

validation_step(self, batch, batch_idx)

Compute loss for a single validation step and log metrics to loggers

Parameters:

Name Type Description Default
batch Tuple[torch.Tensor, ...]

Input batch

required
batch_idx int

Index of batch

required

Returns:

Type Description
Dict[str, torch.Tensor]

computed metrics

Source code in slp/plbind/module.py
def validation_step(self, batch, batch_idx):
    """Compute loss for a single validation step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.val_metrics, loss, y_hat, targets, mode="val"
    )

    metrics[
        "best_score"
    ] = self.trainer.early_stopping_callback.best_score.detach().cpu()

    return metrics

TransformerClassificationPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(TransformerClassificationPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_TransformerClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

TransformerPLModule

__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False) special

Pass arguments through to base class

Source code in slp/plbind/module.py
def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(TransformerPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_Transformer,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

add_optimizer_args(parent_parser)

Augment parser with optimizer arguments

Parameters:

Name Type Description Default
parent_parser ArgumentParser

Parser created by the user

required

Returns:

Type Description
ArgumentParser

argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/trainer.py
def add_optimizer_args(
    parent_parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
    """Augment parser with optimizer arguments

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--optimizer",
        dest="optimizer",
        type=str,
        choices=[
            "Adam",
            "AdamW",
            "SGD",
            "Adadelta",
            "Adagrad",
            "Adamax",
            "ASGD",
            "RMSprop",
        ],
        default="Adam",
        help="Which optimizer to use",
    )

    parser.add_argument(
        "--lr",
        dest="optim.lr",
        type=float,
        default=1e-3,
        help="Learning rate",
    )

    parser.add_argument(
        "--weight-decay",
        dest="optim.weight_decay",
        type=float,
        default=0,
        help="Learning rate",
    )

    parser.add_argument(
        "--lr-scheduler",
        dest="lr_scheduler",
        action="store_true",
        # type=str,
        # choices=["ReduceLROnPlateau"],
        help="Use learning rate scheduling. Currently only ReduceLROnPlateau is supported out of the box",
    )

    parser.add_argument(
        "--lr-factor",
        dest="lr_schedule.factor",
        type=float,
        default=0.1,
        help="Multiplicative factor by which LR is reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--lr-patience",
        dest="lr_schedule.patience",
        type=int,
        default=10,
        help="Number of epochs with no improvement after which learning rate will be reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--lr-cooldown",
        dest="lr_schedule.cooldown",
        type=int,
        default=0,
        help="Number of epochs to wait before resuming normal operation after lr has been reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--min-lr",
        dest="lr_schedule.min_lr",
        type=float,
        default=0,
        help="Minimum lr for LR scheduling. Used if --lr-scheduler is provided.",
    )

    return parser

add_trainer_args(parent_parser)

Augment parser with trainer arguments

Parameters:

Name Type Description Default
parent_parser ArgumentParser

Parser created by the user

required

Returns:

Type Description
ArgumentParser

argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/trainer.py
def add_trainer_args(parent_parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
    """Augment parser with trainer arguments

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--seed",
        dest="seed",
        type=int,
        default=None,
        help="Seed for reproducibility",
    )

    parser.add_argument(
        "--config",
        dest="config",
        type=str,  # dir_path,
        default=None,
        help="Path to YAML configuration file",
    )

    parser.add_argument(
        "--experiment-name",
        dest="trainer.experiment_name",
        type=str,
        default="experiment",
        help="Name of the running experiment",
    )

    parser.add_argument(
        "--run-id",
        dest="trainer.run_id",
        type=str,
        default=None,
        help="Unique identifier for the current run. If not provided it is inferred from datetime.now()",
    )

    parser.add_argument(
        "--experiment-group",
        dest="trainer.experiment_group",
        type=str,
        default=None,
        help="Group of current experiment. Useful when evaluating for different seeds / cross-validation etc.",
    )

    parser.add_argument(
        "--experiments-folder",
        dest="trainer.experiments_folder",
        type=str,
        default="experiments",
        help="Top-level folder where experiment results & checkpoints are saved",
    )

    parser.add_argument(
        "--save-top-k",
        dest="trainer.save_top_k",
        type=int,
        default=3,
        help="Save checkpoints for top k models",
    )

    parser.add_argument(
        "--patience",
        dest="trainer.patience",
        type=int,
        default=3,
        help="Number of epochs to wait before early stopping",
    )

    parser.add_argument(
        "--wandb-project",
        dest="trainer.wandb_project",
        type=str,
        default=None,
        help="Wandb project under which results are saved",
    )

    parser.add_argument(
        "--tags",
        dest="trainer.tags",
        type=str,
        nargs="*",
        default=[],
        help="Tags for current run to make results searchable.",
    )

    parser.add_argument(
        "--stochastic_weight_avg",
        dest="trainer.stochastic_weight_avg",
        action="store_true",
        help="Use Stochastic weight averaging.",
    )

    parser.add_argument(
        "--gpus", dest="trainer.gpus", type=int, default=0, help="Number of GPUs to use"
    )

    parser.add_argument(
        "--val-interval",
        dest="trainer.check_val_every_n_epoch",
        type=int,
        default=1,
        help="Run validation every n epochs",
    )

    parser.add_argument(
        "--clip-grad-norm",
        dest="trainer.gradient_clip_val",
        type=float,
        default=0,
        help="Clip gradients with ||grad(w)|| >= args.clip_grad_norm",
    )

    parser.add_argument(
        "--epochs",
        dest="trainer.max_epochs",
        type=int,
        default=100,
        help="Maximum number of training epochs",
    )

    parser.add_argument(
        "--num-nodes",
        dest="trainer.num_nodes",
        type=int,
        default=1,
        help="Number of nodes to run",
    )

    parser.add_argument(
        "--steps",
        dest="trainer.max_steps",
        type=int,
        default=None,
        help="Maximum number of training steps",
    )

    parser.add_argument(
        "--tbtt_steps",
        dest="trainer.truncated_bptt_steps",
        type=int,
        default=None,
        help="Truncated Back-propagation-through-time steps.",
    )

    parser.add_argument(
        "--debug",
        dest="debug",
        action="store_true",
        help="If true, we run a full run on a small subset of the input data and overfit 10 training batches",
    )

    parser.add_argument(
        "--offline",
        dest="trainer.force_wandb_offline",
        action="store_true",
        help="If true, forces offline execution of wandb logger",
    )

    parser.add_argument(
        "--early-stop-on",
        dest="trainer.early_stop_on",
        type=str,
        default="val_loss",
        help="Metric for early stopping",
    )

    parser.add_argument(
        "--early-stop-mode",
        dest="trainer.early_stop_mode",
        type=str,
        choices=["min", "max"],
        default="min",
        help="Minimize or maximize early stopping metric",
    )

    return parser

make_trainer(experiment_name='experiment', experiment_description=None, run_id=None, experiment_group=None, experiments_folder='experiments', save_top_k=3, patience=3, wandb_project=None, wandb_user=None, force_wandb_offline=False, tags=None, stochastic_weight_avg=False, auto_scale_batch_size=False, gpus=0, check_val_every_n_epoch=1, gradient_clip_val=0, precision=32, num_nodes=1, max_epochs=100, max_steps=None, truncated_bptt_steps=None, fast_dev_run=None, overfit_batches=None, terminate_on_nan=False, profiler='simple', early_stop_on='val_loss', early_stop_mode='min')

Configure trainer with preferred defaults

  • Experiment folder and run_id configured (based on datetime.now())
  • Wandb and CSV loggers run by default
  • Wandb configured to save code and checkpoints
  • Wandb configured in online mode except if no internet connection is available
  • Early stopping on best validation loss is configured by default
  • Checkpointing on best validation loss is configured by default *

Parameters:

Name Type Description Default
experiment_name str

Experiment name. Defaults to "experiment".

'experiment'
experiment_description Optional[str]

Detailed description of the experiment. Defaults to None.

None
run_id Optional[str]

Unique run_id. Defaults to datetime.now(). Defaults to None.

None
experiment_group Optional[str]

Group experiments over multiple runs. Defaults to None.

None
experiments_folder str

Folder to save outputs. Defaults to "experiments".

'experiments'
save_top_k int

Save top k checkpoints. Defaults to 3.

3
patience int

Patience for early stopping. Defaults to 3.

3
wandb_project Optional[str]

Wandb project to save the experiment. Defaults to None.

None
wandb_user Optional[str]

Wandb username. Defaults to None.

None
force_wandb_offline bool

Force offline execution of wandb

False
tags Optional[Sequence]

Additional tags to attach to the experiment. Defaults to None.

None
stochastic_weight_avg bool

Use stochastic weight averaging. Defaults to False.

False
auto_scale_batch_size bool

Find optimal batch size for the available resources when running trainer.tune(). Defaults to False.

False
gpus int

number of GPUs to use. Defaults to 0.

0
check_val_every_n_epoch int

Run validation every n epochs. Defaults to 1.

1
gradient_clip_val float

Clip gradient norm value. Defaults to 0 (no clipping).

0
precision int

Floating point precision. Defaults to 32.

32
num_nodes int

Number of nodes to run on

1
max_epochs Optional[int]

Maximum number of epochs for training. Defaults to 100.

100
max_steps Optional[int]

Maximum number of steps for training. Defaults to None.

None
truncated_bptt_steps Optional[int]

Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None.

None
fast_dev_run Optional[int]

Run training on a small number of batches for debugging. Defaults to None.

None
overfit_batches Optional[int]

Try to overfit a small number of batches for debugging. Defaults to None.

None
terminate_on_nan bool

Terminate on NaN gradients. Warning this makes training slow. Defaults to False.

False
profiler Union[pytorch_lightning.profiler.profilers.BaseProfiler, bool, str]

Use profiler to track execution times of each function

'simple'
early_stop_on str

metric for early stopping

'val_loss'
early_stop_mode str

"min" or "max"

'min'

Returns:

Type Description
Trainer

pl.Trainer: Configured trainer

Source code in slp/plbind/trainer.py
def make_trainer(
    experiment_name: str = "experiment",
    experiment_description: Optional[str] = None,
    run_id: Optional[str] = None,
    experiment_group: Optional[str] = None,
    experiments_folder: str = "experiments",
    save_top_k: int = 3,
    patience: int = 3,
    wandb_project: Optional[str] = None,
    wandb_user: Optional[str] = None,
    force_wandb_offline: bool = False,
    tags: Optional[Sequence] = None,
    stochastic_weight_avg: bool = False,
    auto_scale_batch_size: bool = False,
    gpus: int = 0,
    check_val_every_n_epoch: int = 1,
    gradient_clip_val: float = 0,
    precision: int = 32,
    num_nodes: int = 1,
    max_epochs: Optional[int] = 100,
    max_steps: Optional[int] = None,
    truncated_bptt_steps: Optional[int] = None,
    fast_dev_run: Optional[int] = None,
    overfit_batches: Optional[int] = None,
    terminate_on_nan: bool = False,  # Be careful this makes training very slow for large models
    profiler: Optional[Union[pl.profiler.BaseProfiler, bool, str]] = "simple",
    early_stop_on: str = "val_loss",
    early_stop_mode: str = "min",
) -> pl.Trainer:
    """Configure trainer with preferred defaults

    * Experiment folder and run_id configured (based on datetime.now())
    * Wandb and CSV loggers run by default
    * Wandb configured to save code and checkpoints
    * Wandb configured in online mode except if no internet connection is available
    * Early stopping on best validation loss is configured by default
    * Checkpointing on best validation loss is configured by default
    *

    Args:
        experiment_name (str, optional): Experiment name. Defaults to "experiment".
        experiment_description (Optional[str], optional): Detailed description of the experiment. Defaults to None.
        run_id (Optional[str], optional): Unique run_id. Defaults to datetime.now(). Defaults to None.
        experiment_group (Optional[str], optional): Group experiments over multiple runs. Defaults to None.
        experiments_folder (str, optional): Folder to save outputs. Defaults to "experiments".
        save_top_k (int, optional): Save top k checkpoints. Defaults to 3.
        patience (int, optional): Patience for early stopping. Defaults to 3.
        wandb_project (Optional[str], optional): Wandb project to save the experiment. Defaults to None.
        wandb_user (Optional[str], optional): Wandb username. Defaults to None.
        force_wandb_offline (bool): Force offline execution of wandb
        tags (Optional[Sequence], optional): Additional tags to attach to the experiment. Defaults to None.
        stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
        auto_scale_batch_size (bool, optional): Find optimal batch size for the available resources when running
                trainer.tune(). Defaults to False.
        gpus (int, optional): number of GPUs to use. Defaults to 0.
        check_val_every_n_epoch (int, optional): Run validation every n epochs. Defaults to 1.
        gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
        precision (int, optional): Floating point precision. Defaults to 32.
        num_nodes (int): Number of nodes to run on
        max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
        max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
        truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
                sequence. Defaults to None.
        fast_dev_run (Optional[int], optional): Run training on a small number of  batches for debugging. Defaults to None.
        overfit_batches (Optional[int], optional): Try to overfit a small number of batches for debugging. Defaults to None.
        terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
        profiler (Optional[Union[pl.profiler.BaseProfiler, bool, str]]): Use profiler to track execution times of each function
        early_stop_on (str): metric for early stopping
        early_stop_mode (str): "min" or "max"

    Returns:
        pl.Trainer: Configured trainer
    """

    if overfit_batches is not None:
        trainer = pl.Trainer(overfit_batches=overfit_batches, gpus=gpus)

        return trainer

    if fast_dev_run is not None:
        trainer = pl.Trainer(fast_dev_run=fast_dev_run, gpus=gpus)

        return trainer

    logging_dir = os.path.join(experiments_folder, experiment_name)
    safe_mkdirs(logging_dir)

    run_id = run_id if run_id is not None else date_fname()

    if run_id in os.listdir(logging_dir):
        logger.warning(
            "The run id you provided {run_id} already exists in {logging_dir}"
        )
        run_id = date_fname()
        logger.info("Setting run_id={run_id}")

    checkpoint_dir = os.path.join(logging_dir, run_id, "checkpoints")

    logger.info(f"Logs will be saved in {logging_dir}")
    logger.info(f"Logs will be saved in {checkpoint_dir}")

    if wandb_project is None:
        wandb_project = experiment_name

    connected = has_internet_connection()
    offline_run = force_wandb_offline or not connected

    loggers = [
        pl.loggers.CSVLogger(logging_dir, name="csv_logs", version=run_id),
        FixedWandbLogger(  # type: ignore
            name=experiment_name,
            project=wandb_project,
            anonymous=False,
            save_dir=logging_dir,
            version=run_id,
            save_code=True,
            checkpoint_dir=checkpoint_dir,
            offline=offline_run,
            log_model=not offline_run,
            entity=wandb_user,
            group=experiment_group,
            notes=experiment_description,
            tags=tags,
        ),
    ]

    if gpus > 1:
        del loggers[
            1
        ]  # https://github.com/PyTorchLightning/pytorch-lightning/issues/6106

    logger.info("Configured wandb and CSV loggers.")
    logger.info(
        f"Wandb configured to run {experiment_name}/{run_id} in project {wandb_project}"
    )

    if connected:
        logger.info("Results will be stored online.")
    else:
        logger.info("Results will be stored offline due to bad internet connection.")
        logger.info(
            f"If you want to upload your results later run\n\t wandb sync {logging_dir}/wandb/run-{run_id}"
        )

    if experiment_description is not None:
        logger.info(
            f"Experiment verbose description:\n{experiment_description}\n\nTags:{'n/a' if tags is None else tags}"
        )

    callbacks = [
        EarlyStoppingWithLogs(
            monitor=early_stop_on,
            mode=early_stop_mode,
            patience=patience,
            verbose=True,
        ),
        pl.callbacks.ModelCheckpoint(
            dirpath=checkpoint_dir,
            filename="{epoch}-{val_loss:.2f}",
            monitor=early_stop_on,
            save_top_k=save_top_k,
            mode=early_stop_mode,
        ),
        pl.callbacks.LearningRateMonitor(logging_interval="step"),
    ]

    logger.info("Configured Early stopping and Model checkpointing to track val_loss")

    trainer = pl.Trainer(
        default_root_dir=logging_dir,
        gpus=gpus,
        max_epochs=max_epochs,
        max_steps=max_steps,
        callbacks=callbacks,
        logger=loggers,
        check_val_every_n_epoch=check_val_every_n_epoch,
        gradient_clip_val=gradient_clip_val,
        auto_scale_batch_size=auto_scale_batch_size,
        stochastic_weight_avg=stochastic_weight_avg,
        precision=precision,
        truncated_bptt_steps=truncated_bptt_steps,
        terminate_on_nan=terminate_on_nan,
        progress_bar_refresh_rate=10,
        profiler=profiler,
        num_nodes=num_nodes,
    )

    return trainer

make_trainer_for_ray_tune(patience=3, stochastic_weight_avg=False, gpus=0, gradient_clip_val=0, precision=32, max_epochs=100, max_steps=None, truncated_bptt_steps=None, terminate_on_nan=False, early_stop_on='val_loss', early_stop_mode='min', metrics_map=None, **extra_kwargs)

Configure trainer with preferred defaults

  • Early stopping on best validation loss is configured by default
  • Ray tune callback configured

Parameters:

Name Type Description Default
patience int

Patience for early stopping. Defaults to 3.

3
stochastic_weight_avg bool

Use stochastic weight averaging. Defaults to False.

False
gpus int

number of GPUs to use. Defaults to 0.

0
gradient_clip_val float

Clip gradient norm value. Defaults to 0 (no clipping).

0
precision int

Floating point precision. Defaults to 32.

32
max_epochs Optional[int]

Maximum number of epochs for training. Defaults to 100.

100
max_steps Optional[int]

Maximum number of steps for training. Defaults to None.

None
truncated_bptt_steps Optional[int]

Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None.

None
terminate_on_nan bool

Terminate on NaN gradients. Warning this makes training slow. Defaults to False.

False
early_stop_on str

metric for early stopping

'val_loss'
early_stop_mode str

"min" or "max"

'min'
metrics_map Optional[Dict[str, str]]

The mapping from pytorch lightning logged metrics to ray tune metrics. The --tune-metric argument should be one of the keys of this mapping

None
extra_kwargs kwargs

Ignored. We use it so that we are able to pass the same config object as in make_trainer

{}

Returns:

Type Description
Trainer

pl.Trainer: Configured trainer

Source code in slp/plbind/trainer.py
def make_trainer_for_ray_tune(
    patience: int = 3,
    stochastic_weight_avg: bool = False,
    gpus: int = 0,
    gradient_clip_val: float = 0,
    precision: int = 32,
    max_epochs: Optional[int] = 100,
    max_steps: Optional[int] = None,
    truncated_bptt_steps: Optional[int] = None,
    terminate_on_nan: bool = False,  # Be careful this makes training very slow for large models
    early_stop_on: str = "val_loss",
    early_stop_mode: str = "min",
    metrics_map: Optional[Dict[str, str]] = None,
    **extra_kwargs,
) -> pl.Trainer:
    """Configure trainer with preferred defaults

    * Early stopping on best validation loss is configured by default
    * Ray tune callback configured

    Args:
        patience (int, optional): Patience for early stopping. Defaults to 3.
        stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
        gpus (int, optional): number of GPUs to use. Defaults to 0.
        gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
        precision (int, optional): Floating point precision. Defaults to 32.
        max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
        max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
        truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
                sequence. Defaults to None.
        terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
        early_stop_on (str): metric for early stopping
        early_stop_mode (str): "min" or "max"
        metrics_map (Optional[Dict[str, str]]): The mapping from pytorch lightning logged metrics
            to ray tune metrics. The --tune-metric argument should be one of the keys of this
            mapping
        extra_kwargs (kwargs): Ignored. We use it so that we are able to pass the same config
            object as in make_trainer
    Returns:
        pl.Trainer: Configured trainer
    """

    if metrics_map is None:
        raise ValueError("Need to pass metrics for TuneReportCallback")

    callbacks = [
        EarlyStoppingWithLogs(
            monitor=early_stop_on,
            mode=early_stop_mode,
            patience=patience,
            verbose=True,
        ),
        TuneReportCallback(metrics_map, on="validation_end"),
        pl.callbacks.LearningRateMonitor(logging_interval="step"),
    ]

    logger.info("Configured Early stopping to track val_loss")

    trainer = pl.Trainer(
        gpus=gpus,
        max_epochs=max_epochs,
        max_steps=max_steps,
        callbacks=callbacks,
        logger=[],
        check_val_every_n_epoch=1,
        gradient_clip_val=gradient_clip_val,
        stochastic_weight_avg=stochastic_weight_avg,
        precision=precision,
        truncated_bptt_steps=truncated_bptt_steps,
        terminate_on_nan=terminate_on_nan,
        progress_bar_refresh_rate=0,
        num_sanity_val_steps=0,
        auto_scale_batch_size=False,
    )

    return trainer

watch_model(trainer, model)

If wandb logger is configured track gradient and weight norms

Parameters:

Name Type Description Default
trainer Trainer

Trainer

required
model Module

Module to watch

required
Source code in slp/plbind/trainer.py
def watch_model(trainer: pl.Trainer, model: nn.Module) -> None:
    """If wandb logger is configured track gradient and weight norms

    Args:
        trainer (pl.Trainer): Trainer
        model (nn.Module): Module to watch
    """

    if trainer.num_gpus > 1:
        return

    if isinstance(trainer.logger.experiment, list):
        for log in trainer.logger.experiment:
            try:
                log.watch(model, log="all")
                logger.info("Tracking model weights & gradients in wandb.")

                break
            except:
                pass
    else:
        try:
            trainer.logger.experiment.watch(model, log="all")
            logger.info("Tracking model weights & gradients in wandb.")
        except:
            pass

configure_logging(logfile_prefix=None)

configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile

We use logure for stdout/stderr logging in this project. This function configures loguru to intercept logs from other modules that use the default python logging module. It also configures loguru so that it plays well with writes in the tqdm progress bars If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using logfile_prefix and datetime.now()

Parameters:

Name Type Description Default
logfile_prefix Optional[str]

Optional prefix to file where logs will be written.

None

Returns:

Type Description
Optional[str]

str: The logfile where logs are written

Examples:

>>> configure_logging("logs/my-cool-experiment)
logs/my-cool-experiment.20210228-211832.log
Source code in slp/util/log.py
def configure_logging(logfile_prefix: Optional[str] = None) -> Optional[str]:
    """configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile

    We use logure for stdout/stderr logging in this project.
    This function configures loguru to intercept logs from other modules that use the default python logging module.
    It also configures loguru so that it plays well with writes in the tqdm progress bars
    If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using
    logfile_prefix and datetime.now()

    Args:
        logfile_prefix (Optional[str]): Optional prefix to file where logs will be written.

    Returns:
        str: The logfile where logs are written

    Examples:
        >>> configure_logging("logs/my-cool-experiment)
        logs/my-cool-experiment.20210228-211832.log
    """

    class InterceptHandler(logging.Handler):
        def emit(self, record):
            """Intercept standard logging logs in loguru. Should test this for distributed pytorch lightning"""
            # Get corresponding Loguru level if it exists
            try:
                level = logger.level(record.levelname).name
            except ValueError:
                level = record.levelno

            # Find caller from where originated the logged message
            frame, depth = logging.currentframe(), 2
            while frame.f_code.co_filename == logging.__file__:
                frame = frame.f_back
                depth += 1

            logger.opt(depth=depth, exception=record.exc_info).log(
                level, record.getMessage()
            )

    logger.info("Intercepting standard logging logs in loguru")

    # Make loguru play well with tqdm
    logger.remove()

    def tqdm_write(msg: str) -> Any:
        """Loguru wrapper for tqdm.write"""
        return tqdm.write(msg, end="")

    logger.add(tqdm_write, colorize=True)

    logging.basicConfig(handlers=[InterceptHandler()], level=logging.INFO)

    logfile = None
    if logfile_prefix is not None:
        logfile = log_to_file(logfile_prefix)
        logger.info(f"Log file will be saved in {logfile}")

    return logfile

log_to_file(fname_prefix)

log_to_file Configure loguru to log to a logfile

Parameters:

Name Type Description Default
fname_prefix Optional[str]

Optional prefix to file where logs will be written.

required

Returns:

Type Description
str

str: The logfile where logs are written

Source code in slp/util/log.py
def log_to_file(fname_prefix: Optional[str]) -> str:
    """log_to_file Configure loguru to log to a logfile

    Args:
        fname_prefix (Optional[str]): Optional prefix to file where logs will be written.

    Returns:
        str: The logfile where logs are written
    """
    logfile = f"{fname_prefix}.{date_fname()}.log"
    logger.add(
        logfile,
        colorize=False,
        level="DEBUG",
        enqueue=True,
    )
    return logfile

NoOp

forward(self, x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/util/pytorch.py
def forward(self, x):
    return x

PackSequence

__init__(self, batch_first=True) special

Wrap sequence packing in nn.Module

Parameters:

Name Type Description Default
batch_first bool

Use batch first representation. Defaults to True.

True
Source code in slp/util/pytorch.py
def __init__(self, batch_first: bool = True):
    """Wrap sequence packing in nn.Module

    Args:
        batch_first (bool, optional): Use batch first representation. Defaults to True.
    """
    super(PackSequence, self).__init__()
    self.batch_first = batch_first

forward(self, x, lengths)

Pack a padded sequence and sort lengths

Parameters:

Name Type Description Default
x Tensor

Padded tensor

required
lengths Tensor

Original lengths befor padding

required

Returns:

Type Description
Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]

Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths)

Source code in slp/util/pytorch.py
def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]:
    """Pack a padded sequence and sort lengths

    Args:
        x (torch.Tensor): Padded tensor
        lengths (torch.Tensor): Original lengths befor padding

    Returns:
        Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths)
    """
    out: torch.nn.utils.rnn.PackedSequence = pack_padded_sequence(
        x, lengths, batch_first=self.batch_first, enforce_sorted=False
    )
    lengths = lengths[out.sorted_indices]

    return out, lengths

PadPackedSequence

__init__(self, batch_first=True, max_length=-1) special

Wrap sequence padding in nn.Module

Parameters:

Name Type Description Default
batch_first bool

Use batch first representation. Defaults to True.

True
Source code in slp/util/pytorch.py
def __init__(self, batch_first: bool = True, max_length: int = -1):
    """Wrap sequence padding in nn.Module

    Args:
        batch_first (bool, optional): Use batch first representation. Defaults to True.
    """
    super(PadPackedSequence, self).__init__()
    self.batch_first = batch_first
    self.max_length = max_length if max_length > 0 else None

forward(self, x, lengths)

Convert packed sequence to padded sequence

Parameters:

Name Type Description Default
x PackedSequence

Packed sequence

required
lengths Tensor

Sorted original sequence lengths

required

Returns:

Type Description
Tensor

torch.Tensor: Padded sequence

Source code in slp/util/pytorch.py
def forward(
    self, x: torch.nn.utils.rnn.PackedSequence, lengths: torch.Tensor
) -> torch.Tensor:
    """Convert packed sequence to padded sequence

    Args:
        x (torch.nn.utils.rnn.PackedSequence): Packed sequence
        lengths (torch.Tensor): Sorted original sequence lengths

    Returns:
        torch.Tensor: Padded sequence
    """
    out, _ = pad_packed_sequence(
        x, batch_first=self.batch_first, total_length=self.max_length  # type: ignore
    )

    return out  # type: ignore

from_checkpoint(checkpoint_file, obj, map_location='cpu', dataparallel=False)

Load model or optimizer from saved state_dict

Parameters:

Name Type Description Default
checkpoint_file Optional[str]

File containing the state dict

required
obj Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer]

Module or optimizer instance to load the checkpoint

required
map_location Union[torch.device, str]

Where to load. Defaults to "cpu".

'cpu'
dataparallel bool

If data parallel remove leading "module." from statedict keys. Defaults to False.

False

Returns:

Type Description
Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer]

types.ModuleOrOptimizer: Loaded module or optimizer

Source code in slp/util/pytorch.py
def from_checkpoint(
    checkpoint_file: Optional[str],
    obj: types.ModuleOrOptimizer,
    map_location: Optional[types.Device] = "cpu",
    dataparallel: bool = False,
) -> types.ModuleOrOptimizer:
    """Load model or optimizer from saved state_dict

    Args:
        checkpoint_file (Optional[str]): File containing the state dict
        obj (types.ModuleOrOptimizer): Module or optimizer instance to load the checkpoint
        map_location (Optional[types.Device], optional): Where to load. Defaults to "cpu".
        dataparallel (bool, optional): If data parallel remove leading "module." from statedict keys. Defaults to False.

    Returns:
        types.ModuleOrOptimizer: Loaded module or optimizer
    """

    if checkpoint_file is None:
        return obj

    if not system.is_file(checkpoint_file):
        logger.warning(
            f"The checkpoint {checkpoint_file} you are trying to load "
            "does not exist. Continuing without loading..."
        )

        return obj

    state_dict = torch.load(checkpoint_file, map_location=map_location)

    if dataparallel:
        state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
    obj.load_state_dict(state_dict)

    return obj

mktensor(data, dtype=torch.float32, device='cpu', requires_grad=False, copy_tensor=True)

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This can copy data or make the operation in place.

Parameters:

Name Type Description Default
data Union[numpy.ndarray, torch.Tensor, List[~T]]

(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.

required
dtype dtype

(torch.dtype): The type of the tensor elements (Default value = torch.float)

torch.float32
device Union[torch.device, str]

(torch.device, str): Device where the tensor should be (Default value = 'cpu')

'cpu'
requires_grad bool

(bool): Trainable tensor or not? (Default value = False)

False
copy_tensor bool

(bool): If false creates the tensor inplace else makes a copy (Default value = True)

True

Returns:

Type Description
Tensor

(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py
def mktensor(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: types.Device = "cpu",
    requires_grad: bool = False,
    copy_tensor: bool = True,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
        is passed it is cast to  dtype, device and the requires_grad flag is
        set. This can copy data or make the operation in place.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: (bool): Trainable tensor or not? (Default value = False)
        copy_tensor: (bool): If false creates the tensor inplace else makes a copy
            (Default value = True)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """
    tensor_factory = t if copy_tensor else t_

    return tensor_factory(data, dtype=dtype, device=device, requires_grad=requires_grad)

moore_penrose_pinv(x, num_iter=6)

Calculate approximate Moore-Penrose pseudoinverse, via iterative method

  • Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
  • Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13

Parameters:

Name Type Description Default
x torch.Tensor

(*, M, M) The square tensors to inverse. Dimension * can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M)

required
num_iter int

Number of iterations to run for approximation (6 is good enough usually)

6

Returns:

Type Description
(torch.Tensor)

(B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat

Source code in slp/util/pytorch.py
def moore_penrose_pinv(x, num_iter=6):
    """Calculate approximate Moore-Penrose pseudoinverse, via iterative method

    * Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
    * Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13

    Args:
        x (torch.Tensor): (*, M, M) The square tensors to inverse.
            Dimension * can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M)
        num_iter (int): Number of iterations to run for approximation (6 is good enough usually)
    Returns:
        (torch.Tensor): (B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat
    """
    abs_x = torch.abs(x)
    col = abs_x.sum(dim=-1)
    row = abs_x.sum(dim=-2)
    z = x.transpose(-1, -2).contiguous()
    z = z / (torch.max(col) * torch.max(row))

    I = torch.eye(x.shape[-1], device=x.device).unsqueeze(0)

    for _ in range(num_iter):
        xz = x @ z
        z = 0.25 * z @ (13 * I - (xz @ (15 * I - (xz @ (7 * I - xz)))))

    return z

pad_mask(lengths, max_length=None)

Generate mask for padded tokens

Parameters:

Name Type Description Default
lengths Tensor

Original sequence lengths before padding

required
max_length Union[torch.Tensor, int]

Maximum sequence length. Defaults to None.

None

Returns:

Type Description
Tensor

torch.Tensor: padding mask

Source code in slp/util/pytorch.py
def pad_mask(
    lengths: torch.Tensor, max_length: Optional[Union[torch.Tensor, int]] = None
) -> torch.Tensor:
    """Generate mask for padded tokens

    Args:
        lengths (torch.Tensor): Original sequence lengths before padding
        max_length (Optional[Union[torch.Tensor, int]], optional): Maximum sequence length. Defaults to None.

    Returns:
        torch.Tensor: padding mask
    """

    if max_length is None or max_length < 0:
        max_length = cast(int, torch.max(lengths).item())
    max_length = cast(int, max_length)
    idx = torch.arange(0, max_length, device=lengths.device).unsqueeze(0)
    mask: torch.Tensor = (idx < lengths.unsqueeze(1)).float()

    return mask

pad_sequence(sequences, batch_first=False, padding_value=0.0, max_length=-1)

Pad a list of variable length Tensors with padding_value

pad_sequence stacks a list of Tensors along a new dimension, and pads them to equal length. For example, if the input is list of sequences with size L x * and if batch_first is False, and T x B x * otherwise.

B is batch size. It is equal to the number of elements in sequences. T is length of the longest sequence. L is length of the sequence. * is any number of trailing dimensions, including none.

Examples:

>>> from torch.nn.utils.rnn import pad_sequence
>>> a = torch.ones(25, 300)
>>> b = torch.ones(22, 300)
>>> c = torch.ones(15, 300)
>>> pad_sequence([a, b, c]).size()
torch.Size([25, 3, 300])

!!! note This function returns a Tensor of size T x B x * or B x T x * where T is the length of the longest sequence. This function assumes trailing dimensions and type of all the Tensors in sequences are same.

Note:
This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
max_length argument for fixed length padding

Parameters:

Name Type Description Default
sequences List[torch.Tensor]

list of variable length sequences.

required
batch_first bool

output will be in B x T x * if True, or in T x B x * otherwise

False
padding_value Union[float, int]

value for padded elements. Default: 0.

0.0
max_length int

If max length is > 0 then this function will pad to a fixed maximum length. If any sequence is longer than max_length, it will be trimmed.

-1

Returns:

Type Description
Tensor of size ``T x B x *`` if

attr:batch_first is False. Tensor of size B x T x * otherwise

Source code in slp/util/pytorch.py
def pad_sequence(
    sequences: List[torch.Tensor],
    batch_first: bool = False,
    padding_value: Union[float, int] = 0.0,
    max_length: int = -1,
):
    r"""Pad a list of variable length Tensors with ``padding_value``

    ``pad_sequence`` stacks a list of Tensors along a new dimension,
    and pads them to equal length. For example, if the input is list of
    sequences with size ``L x *`` and if batch_first is False, and ``T x B x *``
    otherwise.

    `B` is batch size. It is equal to the number of elements in ``sequences``.
    `T` is length of the longest sequence.
    `L` is length of the sequence.
    `*` is any number of trailing dimensions, including none.

    Example:
        >>> from torch.nn.utils.rnn import pad_sequence
        >>> a = torch.ones(25, 300)
        >>> b = torch.ones(22, 300)
        >>> c = torch.ones(15, 300)
        >>> pad_sequence([a, b, c]).size()
        torch.Size([25, 3, 300])

    Note:
        This function returns a Tensor of size ``T x B x *`` or ``B x T x *``
        where `T` is the length of the longest sequence. This function assumes
        trailing dimensions and type of all the Tensors in sequences are same.

        Note:
        This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
        max_length argument for fixed length padding

    Args:
        sequences (list[Tensor]): list of variable length sequences.
        batch_first (bool, optional): output will be in ``B x T x *`` if True, or in
            ``T x B x *`` otherwise
        padding_value (float, optional): value for padded elements. Default: 0.
        max_length (int): If max length is > 0 then this function will pad to a fixed maximum
            length. If any sequence is longer than max_length, it will be trimmed.
    Returns:
        Tensor of size ``T x B x *`` if :attr:`batch_first` is ``False``.
        Tensor of size ``B x T x *`` otherwise
    """

    # assuming trailing dimensions and type of all the Tensors
    # in sequences are same and fetching those from sequences[0]
    max_size = sequences[0].size()
    trailing_dims = max_size[1:]
    if max_length < 0:
        max_len = max([s.size(0) for s in sequences])
    else:
        max_len = max_length
    if batch_first:
        out_dims = (len(sequences), max_len) + trailing_dims
    else:
        out_dims = (max_len, len(sequences)) + trailing_dims

    out_tensor = sequences[0].new_full(out_dims, padding_value)
    for i, tensor in enumerate(sequences):
        length = tensor.size(0)
        # use index notation to prevent duplicate references to the tensor
        if batch_first:
            out_tensor[i, : min(length, max_len), ...] = tensor[
                : min(length, max_len), ...
            ]
        else:
            out_tensor[: min(length, max_len), i, ...] = tensor[
                : min(length, max_len), ...
            ]

    return out_tensor

repeat_layer(l, times)

Clone a layer multiple times

Parameters:

Name Type Description Default
l Module

nn.Module to stack

required
times int

Times to clone

required

Returns:

Type Description
List[torch.nn.modules.module.Module]

List[nn.Module]: List of identical clones of input layer

Source code in slp/util/pytorch.py
def repeat_layer(l: nn.Module, times: int) -> List[nn.Module]:
    """Clone a layer multiple times

    Args:
        l (nn.Module): nn.Module to stack
        times (int): Times to clone

    Returns:
        List[nn.Module]: List of identical clones of input layer
    """

    return [l] + [copy.deepcopy(l) for _ in range(times - 1)]

rotate_tensor(l, n=1)

Roate tensor by n positions to the right

Parameters:

Name Type Description Default
l Tensor

input tensor

required
n int

positions to rotate. Defaults to 1.

1

Returns:

Type Description
Tensor

torch.Tensor: rotated tensor

Source code in slp/util/pytorch.py
def rotate_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
    """Roate tensor by n positions to the right

    Args:
        l (torch.Tensor): input tensor
        n (int, optional): positions to rotate. Defaults to 1.

    Returns:
        torch.Tensor: rotated tensor
    """

    return torch.cat((l[n:], l[:n]))

shift_tensor(l, n=1)

Shift tensor by n positions

Parameters:

Name Type Description Default
l Tensor

input tensor

required
n int

positions to shift. Defaults to 1.

1

Returns:

Type Description
Tensor

torch.Tensor: shifted tensor

Source code in slp/util/pytorch.py
def shift_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
    """Shift tensor by n positions

    Args:
        l (torch.Tensor): input tensor
        n (int, optional): positions to shift. Defaults to 1.

    Returns:
        torch.Tensor: shifted tensor
    """
    out = rotate_tensor(l, n=n)
    out[-n:] = 0

    return out

sort_sequences(inputs, lengths)

Sort sequences according to lengths (descending)

Parameters:

Name Type Description Default
inputs Tensor

input sequences, size [B, T, D]

required
lengths Tensor

length of each sequence, size [B]

required

Returns:

Type Description
Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]]

Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]: (sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state)

Source code in slp/util/pytorch.py
def sort_sequences(
    inputs: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]]:
    """Sort sequences according to lengths (descending)

    Args:
        inputs (torch.Tensor): input sequences, size [B, T, D]
        lengths (torch.Tensor): length of each sequence, size [B]

    Returns:
        Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]:
            (sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state)
    """
    lengths_sorted, sorted_idx = lengths.sort(descending=True)
    _, unsorted_idx = sorted_idx.sort()

    def unsort(tt: torch.Tensor) -> torch.Tensor:
        """Restore original unsorted sequence"""

        return tt[unsorted_idx]

    return inputs[sorted_idx], lengths_sorted, unsort

subsequent_mask(max_length)

Generate subsequent (lower triangular) mask for transformer autoregressive tasks

Parameters:

Name Type Description Default
max_length int

Maximum sequence length

required

Returns:

Type Description
Tensor

torch.Tensor: The subsequent mask

Source code in slp/util/pytorch.py
def subsequent_mask(max_length: int) -> torch.Tensor:
    """Generate subsequent (lower triangular) mask for transformer autoregressive tasks

    Args:
        max_length (int): Maximum sequence length

    Returns:
        torch.Tensor: The subsequent mask
    """
    mask = torch.ones(max_length, max_length)
    # Ignore typecheck because pytorch types are incomplete

    return mask.triu().t().unsqueeze(0).contiguous()  # type: ignore

t(data, dtype=torch.float32, device='cpu', requires_grad=False)

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This always copies data.

Parameters:

Name Type Description Default
data Union[numpy.ndarray, torch.Tensor, List[~T]]

(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.

required
dtype dtype

(torch.dtype): The type of the tensor elements (Default value = torch.float)

torch.float32
device Union[torch.device, str]

(torch.device, str): Device where the tensor should be (Default value = 'cpu')

'cpu'
requires_grad bool

(bool): Trainable tensor or not? (Default value = False)

False

Returns:

Type Description
Tensor

(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py
def t(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: types.Device = "cpu",
    requires_grad: bool = False,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
    is passed it is cast to  dtype, device and the requires_grad flag is
    set. This always copies data.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: (bool): Trainable tensor or not? (Default value = False)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """
    tt = torch.tensor(data, dtype=dtype, device=device, requires_grad=requires_grad)

    return tt

t_(data, dtype=torch.float32, device='cpu', requires_grad=False)

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set IN PLACE.

Parameters:

Name Type Description Default
data Union[numpy.ndarray, torch.Tensor, List[~T]]

(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.

required
dtype dtype

(torch.dtype): The type of the tensor elements (Default value = torch.float)

torch.float32
device Union[torch.device, str]

(torch.device, str): Device where the tensor should be (Default value = 'cpu')

'cpu'
requires_grad bool

bool): Trainable tensor or not? (Default value = False)

False

Returns:

Type Description
Tensor

(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py
def t_(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: Optional[types.Device] = "cpu",
    requires_grad: bool = False,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
    is passed it is cast to  dtype, device and the requires_grad flag is
    set IN PLACE.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: bool): Trainable tensor or not? (Default value = False)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """

    if isinstance(device, str):
        device = torch.device(device)

    tt = torch.as_tensor(data, dtype=dtype, device=device).requires_grad_(requires_grad)

    return tt

to_device(tt, device='cpu', non_blocking=False)

Send a tensor to a device

Parameters:

Name Type Description Default
tt Tensor

input tensor

required
device Union[torch.device, str]

Output device. Defaults to "cpu".

'cpu'
non_blocking bool

Use blocking or non-blocking memory transfer. Defaults to False.

False

Returns:

Type Description
Tensor

torch.Tensor: Tensor in the desired device

Source code in slp/util/pytorch.py
def to_device(
    tt: torch.Tensor, device: Optional[types.Device] = "cpu", non_blocking: bool = False
) -> torch.Tensor:
    """Send a tensor to a device

    Args:
        tt (torch.Tensor): input tensor
        device (Optional[types.Device], optional): Output device. Defaults to "cpu".
        non_blocking (bool, optional): Use blocking or non-blocking memory transfer. Defaults to False.

    Returns:
        torch.Tensor: Tensor in the desired device
    """

    return tt.to(device, non_blocking=non_blocking)

date_fname()

date_fname Generate a filename based on datetime.now().

If multiple calls are made within the same second, the filename will not be unique. We could add miliseconds etc. in the fname but that would hinder readability. For practical purposes e.g. unique logs between different experiments this should be enough. Either way if we need a truly unique descriptor, there is the uuid module.

Returns:

Type Description
str

str: A filename, e.g. 20210228-211832

Source code in slp/util/system.py
def date_fname() -> str:
    """date_fname Generate a filename based on datetime.now().

    If multiple calls are made within the same second, the filename will not be unique.
    We could add miliseconds etc. in the fname but that would hinder readability.
    For practical purposes e.g. unique logs between different experiments this should be enough.
    Either way if we need a truly unique descriptor, there is the uuid module.

    Returns:
        str: A filename, e.g. 20210228-211832
    """
    return datetime.now().strftime("%Y%m%d-%H%M%S")

download_url(url, dest_path)

download_url Download a file to a destination path given a URL

Parameters:

Name Type Description Default
url str

A url pointing to the file we want to download

required
dest_path str

The destination path to write the file

required

Returns:

Type Description
str

(str): The filename where the downloaded file is written

Source code in slp/util/system.py
def download_url(url: str, dest_path: str) -> str:
    """download_url Download a file to a destination path given a URL

    Args:
        url (str): A url pointing to the file we want to download
        dest_path (str): The destination path to write the file

    Returns:
        (str): The filename where the downloaded file is written
    """
    name = url.rsplit("/")[-1]
    dest = os.path.join(dest_path, name)
    safe_mkdirs(dest_path)
    response = urllib.request.urlopen(url)
    with open(dest, "wb") as fd:
        shutil.copyfileobj(response, fd)
    return dest

has_internet_connection(timeout=3)

has_internet_connection Check if you are connected to the internet

Check if internet connection exists by pinging Google DNS server

Host: 8.8.8.8 (google-public-dns-a.google.com) OpenPort: 53/tcp Service: domain (DNS/TCP)

Parameters:

Name Type Description Default
timeout int

Seconds to wait before giving up

3

Returns:

Type Description
bool

bool: True if connection is established, False if we are not connected to the internet

Source code in slp/util/system.py
def has_internet_connection(timeout: int = 3) -> bool:
    """has_internet_connection Check if you are connected to the internet

    Check if internet connection exists by pinging Google DNS server

    Host: 8.8.8.8 (google-public-dns-a.google.com)
    OpenPort: 53/tcp
    Service: domain (DNS/TCP)

    Args:
        timeout (int): Seconds to wait before giving up

    Returns:
        bool: True if connection is established, False if we are not connected to the internet
    """
    host, port = "8.8.8.8", 53
    try:
        socket.setdefaulttimeout(timeout)
        socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
        return True
    except socket.error as ex:
        print(ex)
        return False

is_file(inp)

is_file Check if the provided string is valid file in the system path

Parameters:

Name Type Description Default
inp Optional[str]

A potential file or None

required

Returns:

Type Description
Union[validators.utils.ValidationFailure, bool]

types.ValidationResult: True if a valid file is provided, False if the string is not a url

Examples:

>>> is_file("/bin/bash")
True
>>> is_file("/supercalifragilisticexpialidocious")  # This does not exist. I hope...
False
Source code in slp/util/system.py
def is_file(inp: Optional[str]) -> types.ValidationResult:
    """is_file Check if the provided string is valid file in the system path

    Args:
        inp (Optional[str]): A potential file or None

    Returns:
        types.ValidationResult: True if a valid file is provided, False if the string is not a url

    Examples:
        >>> is_file("/bin/bash")
        True
        >>> is_file("/supercalifragilisticexpialidocious")  # This does not exist. I hope...
        False
    """
    if not inp:
        return False
    return os.path.isfile(inp)

is_subpath(child, parent)

is_subpath Check if child path is a subpath of parent

Parameters:

Name Type Description Default
child str

Child path

required
parent str

parent path

required

Returns:

Type Description
bool

bool: True if child is a subpath of parent, false if not

Examples:

>>> is_subpath("/usr/bin/Xorg", "/usr")
True
Source code in slp/util/system.py
def is_subpath(child: str, parent: str) -> bool:
    """is_subpath Check if child path is a subpath of parent

    Args:
        child (str): Child path
        parent (str): parent path

    Returns:
        bool: True if child is a subpath of parent, false if not

    Examples:
        >>> is_subpath("/usr/bin/Xorg", "/usr")
        True
    """
    parent = os.path.abspath(parent)
    child = os.path.abspath(child)
    return cast(
        bool, os.path.commonpath([parent]) == os.path.commonpath([parent, child])
    )

is_url(inp)

is_url Check if the provided string is a URL

Parameters:

Name Type Description Default
inp Optional[str]

A potential link or None

required

Returns:

Type Description
Union[validators.utils.ValidationFailure, bool]

types.ValidationResult: True if a valid url is provided, False if the string is not a url

Examples:

>>> is_url("Hello World")
ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
>>> is_url("http://google.com")
True
Source code in slp/util/system.py
def is_url(inp: Optional[str]) -> types.ValidationResult:
    """is_url Check if the provided string is a URL

    Args:
        inp (Optional[str]): A potential link or None

    Returns:
        types.ValidationResult: True if a valid url is provided, False if the string is not a url

    Examples:
        >>> is_url("Hello World")
        ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
        >>> is_url("http://google.com")
        True
    """
    if not inp:
        return False
    return validators.url(inp)

json_dump(data, fname)

json_dump Save dict to a json file

Parameters:

Name Type Description Default
data Dict[~K, ~V]

Dict to save

required
fname str

Output json file

required
Source code in slp/util/system.py
def json_dump(data: types.GenericDict, fname: str) -> None:
    """json_dump Save dict to a json file

    Args:
        data (types.GenericDict): Dict to save
        fname (str): Output json file
    """
    with open(fname, "w") as fd:
        json.dump(data, fd)

json_load(fname)

json_load Load dict from a json file

Parameters:

Name Type Description Default
fname str

Json file to load

required

Returns:

Type Description
Dict[~K, ~V]

types.GenericDict: Dict of loaded data

Source code in slp/util/system.py
def json_load(fname: str) -> types.GenericDict:
    """json_load Load dict from a json file

    Args:
        fname (str): Json file to load

    Returns:
        types.GenericDict: Dict of loaded data
    """
    with open(fname, "r") as fd:
        data = json.load(fd)
    return cast(types.GenericDict, data)

pickle_dump(data, fname)

pickle_dump Save data to pickle file

Parameters:

Name Type Description Default
data Any

Data to save

required
fname str

Output pickle file

required
Source code in slp/util/system.py
def pickle_dump(data: Any, fname: str) -> None:
    """pickle_dump Save data to pickle file

    Args:
        data (Any): Data to save
        fname (str): Output pickle file
    """
    with open(fname, "wb") as fd:
        pickle.dump(data, fd)

pickle_load(fname)

pickle_load Load data from pickle file

Parameters:

Name Type Description Default
fname str

file name of pickle file

required

Returns:

Type Description
Any

Any: Loaded data

Source code in slp/util/system.py
def pickle_load(fname: str) -> Any:
    """pickle_load Load data from pickle file

    Args:
        fname (str): file name of pickle file

    Returns:
        Any: Loaded data
    """
    with open(fname, "rb") as fd:
        data = pickle.load(fd)
    return data

print_separator(symbol='*', n=10, print_fn=<built-in function print>)

print_separator Print a repeated symbol as a separator


Parameters:

Name Type Description Default
symbol str

Symbol to print

'*'
n int

Number of times to print the symbol

10
print_fn Callable[[str], NoneType]

Print function to use, e.g. print or logger.info

<built-in function print>

Examples:

>>> print_separator(symbol="-", n=2)
--
Source code in slp/util/system.py
def print_separator(
    symbol: str = "*", n: int = 10, print_fn: Callable[[str], None] = print
):
    """print_separator Print a repeated symbol as a separator

    *********************************************************

    Args:
        symbol (str): Symbol to print
        n (int): Number of times to print the symbol
        print_fn (Callable[[str], None]): Print function to use, e.g. print or logger.info

    Examples:
        >>> print_separator(symbol="-", n=2)
        --
    """
    print_fn(symbol * n)

read_wav(wav_sample)

read_wav Reads a wav clip into a string and returns the hex string.

Parameters:

Name Type Description Default
wav_sample str

Path to wav file

required

Returns:

Type Description
str

A hex string with the audio information.

Source code in slp/util/system.py
def read_wav(wav_sample: str) -> str:
    """read_wav Reads a wav clip into a string and returns the hex string.

    Args:
        wav_sample (str): Path to wav file

    Returns:
        A hex string with the audio information.
    """
    with open(wav_sample, "r") as wav_fd:
        clip = wav_fd.read()
    return clip

run_cmd(command)

run_cmd Run given shell command

!!! args
    command (str): Shell command to run

!!! returns
    (int, str): Status code, stdout of shell command

!!! examples
    >>> run_cmd("ls /")
    (0, 'bin

boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')

Source code in slp/util/system.py
def run_cmd(command: str) -> Tuple[int, str]:
    """run_cmd Run given shell command

    Args:
        command (str): Shell command to run

    Returns:
        (int, str): Status code, stdout of shell command

    Examples:
        >>> run_cmd("ls /")
        (0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
    """
    command = f'{os.getenv("SHELL")} -c "{command}"'
    pipe = subprocess.Popen(
        command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT
    )

    stdout = ""
    if pipe.stdout is not None:
        stdout = "".join(
            [line.decode("utf-8") for line in iter(pipe.stdout.readline, b"")]
        )
        pipe.stdout.close()
    returncode = pipe.wait()
    return returncode, stdout

run_cmd_silent(command)

run_cmd_silent Run command without printing to console

!!! args
    command (str): Shell command to run

!!! returns
    (int, str): Status code, stdout of shell command

!!! examples
    >>> run_cmd("ls /")
    (0, 'bin

boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')

Source code in slp/util/system.py
def run_cmd_silent(command: str) -> Tuple[int, str]:
    """run_cmd_silent Run command without printing to console

    Args:
        command (str): Shell command to run

    Returns:
        (int, str): Status code, stdout of shell command

    Examples:
        >>> run_cmd("ls /")
        (0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
    """
    return cast(Tuple[int, str], suppress_print(run_cmd)(command))

safe_mkdirs(path)

Makes recursively all the directories in input path

Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist

Parameters:

Name Type Description Default
path str

Path to mkdir -p

required

Examples:

>>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")
Source code in slp/util/system.py
def safe_mkdirs(path: str) -> None:
    """Makes recursively all the directories in input path

    Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist

    Args:
        path (str): Path to mkdir -p

    Examples:
        >>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")
    """
    if not os.path.exists(path):
        try:
            os.makedirs(path)
        except Exception as e:
            logger.warning(e)
            raise IOError((f"Failed to create recursive directories: {path}"))

suppress_print(func)

suppress_print Decorator to supress stdout of decorated function

Examples:

>>> @slp.util.system.timethis
>>> def very_verbose_function(...): ...
Source code in slp/util/system.py
def suppress_print(func: Callable) -> Callable:
    """suppress_print Decorator to supress stdout of decorated function

    Examples:
        >>> @slp.util.system.timethis
        >>> def very_verbose_function(...): ...
    """

    def func_wrapper(*args: types.T, **kwargs: types.T):
        """Inner function for decorator closure"""
        with open("/dev/null", "w") as sys.stdout:
            ret = func(*args, **kwargs)
        sys.stdout = sys.__stdout__
        return ret

    return cast(Callable, func_wrapper)

timethis(method=False)

Decorator to measure the time it takes for a function to complete

Examples:

>>> @slp.util.system.timethis
>>> def time_consuming_function(...): ...
Source code in slp/util/system.py
def timethis(method=False) -> Callable:
    """Decorator to measure the time it takes for a function to complete

    Examples:
        >>> @slp.util.system.timethis
        >>> def time_consuming_function(...): ...
    """

    def timethis_inner(func: Callable) -> Callable:
        """Inner function for decorator closure"""

        @functools.wraps(func)
        def timed(*args: types.T, **kwargs: types.T):
            """Inner function for decorator closure"""

            ts = time.time()
            result = func(*args, **kwargs)
            te = time.time()
            elapsed = f"{te - ts}"
            if method:

                logger.info(
                    "BENCHMARK: {cls}.{f}(*{a}, **{kw}) took: {t} sec".format(
                        f=func.__name__, cls=args[0], a=args[1:], kw=kwargs, t=elapsed
                    )
                )
            else:
                logger.info(
                    "BENCHMARK: {f}(*{a}, **{kw}) took: {t} sec".format(
                        f=func.__name__, a=args, kw=kwargs, t=elapsed
                    )
                )
            return result

        return cast(Callable, timed)

    return timethis_inner

write_wav(byte_str, wav_file)

write_wav Write a hex string into a wav file

Parameters:

Name Type Description Default
byte_str str

The hex string containing the audio data

required
wav_file str

The output wav file

required
Source code in slp/util/system.py
def write_wav(byte_str: str, wav_file: str) -> None:
    """write_wav Write a hex string into a wav file

    Args:
        byte_str (str): The hex string containing the audio data
        wav_file (str): The output wav file
    """
    with open(wav_file, "w") as fd:
        fd.write(byte_str)

yaml_dump(data, fname)

yaml_dump Save dict to a yaml file

Parameters:

Name Type Description Default
data Dict[~K, ~V]

Dict to save

required
fname str

Output json file

required
Source code in slp/util/system.py
def yaml_dump(data: types.GenericDict, fname: str) -> None:
    """yaml_dump Save dict to a yaml file

    Args:
        data (types.GenericDict): Dict to save
        fname (str): Output json file
    """
    with open(fname, "w") as fd:
        yaml.dump(data, fd)

yaml_load(fname)

yaml_load Load dict from a yaml file

Parameters:

Name Type Description Default
fname str

Json file to load

required

Returns:

Type Description
Dict[~K, ~V]

types.GenericDict: Dict of loaded data

Source code in slp/util/system.py
def yaml_load(fname: str) -> types.GenericDict:
    """yaml_load Load dict from a yaml file

    Args:
        fname (str): Json file to load

    Returns:
        types.GenericDict: Dict of loaded data
    """
    with open(fname, "r") as fd:
        data = yaml.load(fd)
    return cast(types.GenericDict, data)

dir_path(path)

dir_path Type to use when parsing a path in argparse arguments

Parameters:

Name Type Description Default
path str

User provided path

required

Exceptions:

Type Description
argparse.ArgumentTypeError

Path does not exists, so argparse fails

Returns:

Type Description
str

User provided path

Examples:

>>> from slp.util.types import dir_path
>>> import argparse
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--config", type=dir_path)
>>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
Traceback (most recent call last):
argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist
Source code in slp/util/types.py
def dir_path(path):
    """dir_path Type to use when parsing a path in argparse arguments


    Args:
        path (str): User provided path

    Raises:
        argparse.ArgumentTypeError: Path does not exists, so argparse fails

    Returns:
        str: User provided path

    Examples:
        >>> from slp.util.types import dir_path
        >>> import argparse
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--config", type=dir_path)
        >>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
        Traceback (most recent call last):
        argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist

    """

    if os.path.isdir(path):
        return path

    raise argparse.ArgumentTypeError(f"User provided path '{path}' does not exist")