API reference

`generate_example_config(parser, output_file, args=None)`

parse_config Parse a provided YAML config file and command line args and merge them

During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

The precedence for merging is as follows * default cli args values < config file values < user provided cli args

E.g.:

if you don't include a value in your configuration it will take the default value from the argparse arguments
if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

Parameters:

Name	Type	Description	Default
`parser`	`ArgumentParser`	The argument parser you want to use	required
`output_file`	`str`	Configuration file name or file descriptor to save example configuration	required
`args`	`Optional[List[str]]`	Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]	`None`

Source code in slp/config/config_parser.py

def generate_example_config(
    parser: argparse.ArgumentParser,
    output_file: str,
    args: Optional[List[str]] = None,
) -> None:
    """parse_config Parse a provided YAML config file and command line args and merge them

    During experimentation we want ideally to have a configuration file with the model and training configuration,
    but also be able to run quick experiments using command line args.
    This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

    The precedence for merging is as follows
       * default cli args values < config file values < user provided cli args

    E.g.:

       * if you don't include a value in your configuration it will take the default value from the argparse arguments
       * if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

    Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

    Args:
        parser (argparse.ArgumentParser): The argument parser you want to use
        output_file (Union[str, IO]): Configuration file name or file descriptor to save example configuration
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]
    """
    config = parse_config(parser, None, include_none=True)
    OmegaConf.save(config, output_file)

`make_cli_parser(parser, datamodule_cls)`

make_cli_parser Augment an argument parser for slp with the default arguments

Default arguments for training, logging, optimization etc. are added to the input {parser}. If you use make_cli_parser, the following command line arguments will be included

!!! usage "my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]"
                                [--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
                                [--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
                                [--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
                                [--lr-patience LR_SCHEDULE.PATIENCE]
                                [--lr-cooldown LR_SCHEDULE.COOLDOWN]
                                [--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
                                [--experiment-name TRAINER.EXPERIMENT_NAME]
                                [--run-id TRAINER.RUN_ID]
                                [--experiment-group TRAINER.EXPERIMENT_GROUP]
                                [--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
                                [--save-top-k TRAINER.SAVE_TOP_K]
                                [--patience TRAINER.PATIENCE]
                                [--wandb-project TRAINER.WANDB_PROJECT]
                                [--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
                                [--stochastic_weight_avg] [--gpus TRAINER.GPUS]
                                [--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
                                [--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
                                [--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
                                [--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
                                [--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
                                [--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
                                [--gpus-per-trial TUNE.GPUS_PER_TRIAL]
                                [--cpus-per-trial TUNE.CPUS_PER_TRIAL]
                                [--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
                                [--val-percent DATA.VAL_PERCENT]
                                [--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
                                [--bsz-eval DATA.BATCH_SIZE_EVAL]
                                [--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
                                [--drop-last] [--no-shuffle-eval]

optional arguments:
  -h, --help            show this help message and exit
  --hidden MODEL.INTERMEDIATE_HIDDEN
                                                Intermediate hidden layers for linear module
  --optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
                                                Which optimizer to use
  --lr OPTIM.LR         Learning rate
  --weight-decay OPTIM.WEIGHT_DECAY
                                                Learning rate
  --lr-scheduler        Use learning rate scheduling. Currently only
                                                ReduceLROnPlateau is supported out of the box
  --lr-factor LR_SCHEDULE.FACTOR
                                                Multiplicative factor by which LR is reduced. Used if
                                                --lr-scheduler is provided.
  --lr-patience LR_SCHEDULE.PATIENCE
                                                Number of epochs with no improvement after which
                                                learning rate will be reduced. Used if --lr-scheduler
                                                is provided.
  --lr-cooldown LR_SCHEDULE.COOLDOWN
                                                Number of epochs to wait before resuming normal
                                                operation after lr has been reduced. Used if --lr-
                                                scheduler is provided.
  --min-lr LR_SCHEDULE.MIN_LR
                                                Minimum lr for LR scheduling. Used if --lr-scheduler
                                                is provided.
  --seed SEED           Seed for reproducibility
  --config CONFIG       Path to YAML configuration file
  --experiment-name TRAINER.EXPERIMENT_NAME
                                                Name of the running experiment
  --run-id TRAINER.RUN_ID
                                                Unique identifier for the current run. If not provided
                                                it is inferred from datetime.now()
  --experiment-group TRAINER.EXPERIMENT_GROUP
                                                Group of current experiment. Useful when evaluating
                                                for different seeds / cross-validation etc.
  --experiments-folder TRAINER.EXPERIMENTS_FOLDER
                                                Top-level folder where experiment results &
                                                checkpoints are saved
  --save-top-k TRAINER.SAVE_TOP_K
                                                Save checkpoints for top k models
  --patience TRAINER.PATIENCE
                                                Number of epochs to wait before early stopping
  --wandb-project TRAINER.WANDB_PROJECT
                                                Wandb project under which results are saved
  --tags [TRAINER.TAGS [TRAINER.TAGS ...]]
                                                Tags for current run to make results searchable.
  --stochastic_weight_avg
                                                Use Stochastic weight averaging.
  --gpus TRAINER.GPUS   Number of GPUs to use
  --val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
                                                Run validation every n epochs
  --clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
                                                Clip gradients with ||grad(w)|| >= args.clip_grad_norm
  --epochs TRAINER.MAX_EPOCHS
                                                Maximum number of training epochs
  --steps TRAINER.MAX_STEPS
                                                Maximum number of training steps
  --tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
                                                Truncated Back-propagation-through-time steps.
  --debug               If true, we run a full run on a small subset of the
                                                input data and overfit 10 training batches
  --offline             If true, forces offline execution of wandb logger
  --early-stop-on TRAINER.EARLY_STOP_ON
                                                Metric for early stopping
  --early-stop-mode {min,max}
                                                Minimize or maximize early stopping metric
  --num-trials TUNE.NUM_TRIALS
                                                Number of trials to run for hyperparameter tuning
  --gpus-per-trial TUNE.GPUS_PER_TRIAL
                                                How many gpus to use for each trial. If gpus_per_trial
                                                < 1 multiple trials are packed in the same gpu
  --cpus-per-trial TUNE.CPUS_PER_TRIAL
                                                How many cpus to use for each trial.
  --tune-metric TUNE.METRIC
                                                Tune this metric. Need to be one of the keys of
                                                metrics_map passed into make_trainer_for_ray_tune.
  --tune-mode {max,min}
                                                Maximize or minimize metric
  --val-percent DATA.VAL_PERCENT
                                                Percent of validation data to be randomly split from
                                                the training set, if no validation set is provided
  --test-percent DATA.TEST_PERCENT
                                                Percent of test data to be randomly split from the
                                                training set, if no test set is provided
  --bsz DATA.BATCH_SIZE
                                                Training batch size
  --bsz-eval DATA.BATCH_SIZE_EVAL
                                                Evaluation batch size
  --num-workers DATA.NUM_WORKERS
                                                Number of workers to be used in the DataLoader
  --no-pin-memory       Don't pin data to GPU memory when transferring
  --drop-last           Drop last incomplete batch
  --no-shuffle-eval     Don't shuffle val & test sets

Parameters:

Name	Type	Description	Default
`parser`	`ArgumentParser`	A parent argument to be augmented	required
`datamodule_cls`	`LightningDataModule`	A data module class that injects arguments through the add_argparse_args method	required

Returns:

Type	Description
`ArgumentParser`	argparse.ArgumentParser: The augmented command line parser

Examples:

>>> import argparse
>>> from slp.plbind.dm import PLDataModuleFromDatasets
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int)  # Create parser with model arguments and anything else you need
>>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
>>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
>>> args.data.batch_size
64
>>> args.optim.lr
0.01

Source code in slp/config/config_parser.py

def make_cli_parser(
    parser: argparse.ArgumentParser, datamodule_cls: pl.LightningDataModule
) -> argparse.ArgumentParser:
    """make_cli_parser Augment an argument parser for slp with the default arguments

    Default arguments for training, logging, optimization etc. are added to the input {parser}.
    If you use make_cli_parser, the following command line arguments will be included

        usage: my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]
                                        [--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
                                        [--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
                                        [--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
                                        [--lr-patience LR_SCHEDULE.PATIENCE]
                                        [--lr-cooldown LR_SCHEDULE.COOLDOWN]
                                        [--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
                                        [--experiment-name TRAINER.EXPERIMENT_NAME]
                                        [--run-id TRAINER.RUN_ID]
                                        [--experiment-group TRAINER.EXPERIMENT_GROUP]
                                        [--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
                                        [--save-top-k TRAINER.SAVE_TOP_K]
                                        [--patience TRAINER.PATIENCE]
                                        [--wandb-project TRAINER.WANDB_PROJECT]
                                        [--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
                                        [--stochastic_weight_avg] [--gpus TRAINER.GPUS]
                                        [--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
                                        [--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
                                        [--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
                                        [--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
                                        [--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
                                        [--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
                                        [--gpus-per-trial TUNE.GPUS_PER_TRIAL]
                                        [--cpus-per-trial TUNE.CPUS_PER_TRIAL]
                                        [--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
                                        [--val-percent DATA.VAL_PERCENT]
                                        [--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
                                        [--bsz-eval DATA.BATCH_SIZE_EVAL]
                                        [--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
                                        [--drop-last] [--no-shuffle-eval]

        optional arguments:
          -h, --help            show this help message and exit
          --hidden MODEL.INTERMEDIATE_HIDDEN
                                                        Intermediate hidden layers for linear module
          --optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
                                                        Which optimizer to use
          --lr OPTIM.LR         Learning rate
          --weight-decay OPTIM.WEIGHT_DECAY
                                                        Learning rate
          --lr-scheduler        Use learning rate scheduling. Currently only
                                                        ReduceLROnPlateau is supported out of the box
          --lr-factor LR_SCHEDULE.FACTOR
                                                        Multiplicative factor by which LR is reduced. Used if
                                                        --lr-scheduler is provided.
          --lr-patience LR_SCHEDULE.PATIENCE
                                                        Number of epochs with no improvement after which
                                                        learning rate will be reduced. Used if --lr-scheduler
                                                        is provided.
          --lr-cooldown LR_SCHEDULE.COOLDOWN
                                                        Number of epochs to wait before resuming normal
                                                        operation after lr has been reduced. Used if --lr-
                                                        scheduler is provided.
          --min-lr LR_SCHEDULE.MIN_LR
                                                        Minimum lr for LR scheduling. Used if --lr-scheduler
                                                        is provided.
          --seed SEED           Seed for reproducibility
          --config CONFIG       Path to YAML configuration file
          --experiment-name TRAINER.EXPERIMENT_NAME
                                                        Name of the running experiment
          --run-id TRAINER.RUN_ID
                                                        Unique identifier for the current run. If not provided
                                                        it is inferred from datetime.now()
          --experiment-group TRAINER.EXPERIMENT_GROUP
                                                        Group of current experiment. Useful when evaluating
                                                        for different seeds / cross-validation etc.
          --experiments-folder TRAINER.EXPERIMENTS_FOLDER
                                                        Top-level folder where experiment results &
                                                        checkpoints are saved
          --save-top-k TRAINER.SAVE_TOP_K
                                                        Save checkpoints for top k models
          --patience TRAINER.PATIENCE
                                                        Number of epochs to wait before early stopping
          --wandb-project TRAINER.WANDB_PROJECT
                                                        Wandb project under which results are saved
          --tags [TRAINER.TAGS [TRAINER.TAGS ...]]
                                                        Tags for current run to make results searchable.
          --stochastic_weight_avg
                                                        Use Stochastic weight averaging.
          --gpus TRAINER.GPUS   Number of GPUs to use
          --val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
                                                        Run validation every n epochs
          --clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
                                                        Clip gradients with ||grad(w)|| >= args.clip_grad_norm
          --epochs TRAINER.MAX_EPOCHS
                                                        Maximum number of training epochs
          --steps TRAINER.MAX_STEPS
                                                        Maximum number of training steps
          --tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
                                                        Truncated Back-propagation-through-time steps.
          --debug               If true, we run a full run on a small subset of the
                                                        input data and overfit 10 training batches
          --offline             If true, forces offline execution of wandb logger
          --early-stop-on TRAINER.EARLY_STOP_ON
                                                        Metric for early stopping
          --early-stop-mode {min,max}
                                                        Minimize or maximize early stopping metric
          --num-trials TUNE.NUM_TRIALS
                                                        Number of trials to run for hyperparameter tuning
          --gpus-per-trial TUNE.GPUS_PER_TRIAL
                                                        How many gpus to use for each trial. If gpus_per_trial
                                                        < 1 multiple trials are packed in the same gpu
          --cpus-per-trial TUNE.CPUS_PER_TRIAL
                                                        How many cpus to use for each trial.
          --tune-metric TUNE.METRIC
                                                        Tune this metric. Need to be one of the keys of
                                                        metrics_map passed into make_trainer_for_ray_tune.
          --tune-mode {max,min}
                                                        Maximize or minimize metric
          --val-percent DATA.VAL_PERCENT
                                                        Percent of validation data to be randomly split from
                                                        the training set, if no validation set is provided
          --test-percent DATA.TEST_PERCENT
                                                        Percent of test data to be randomly split from the
                                                        training set, if no test set is provided
          --bsz DATA.BATCH_SIZE
                                                        Training batch size
          --bsz-eval DATA.BATCH_SIZE_EVAL
                                                        Evaluation batch size
          --num-workers DATA.NUM_WORKERS
                                                        Number of workers to be used in the DataLoader
          --no-pin-memory       Don't pin data to GPU memory when transferring
          --drop-last           Drop last incomplete batch
          --no-shuffle-eval     Don't shuffle val & test sets

    Args:
        parser (argparse.ArgumentParser): A parent argument to be augmented
        datamodule_cls (pytorch_lightning.LightningDataModule): A data module class that injects arguments through the add_argparse_args method

    Returns:
        argparse.ArgumentParser: The augmented command line parser

    Examples:
        >>> import argparse
        >>> from slp.plbind.dm import PLDataModuleFromDatasets
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int)  # Create parser with model arguments and anything else you need
        >>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
        >>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
        >>> args.data.batch_size
        64
        >>> args.optim.lr
        0.01
    """
    parser = add_optimizer_args(parser)
    parser = add_trainer_args(parser)
    parser = add_tune_args(parser)
    parser = datamodule_cls.add_argparse_args(parser)

    return parser

`parse_config(parser, config_file, args=None, include_none=False)`

parse_config Parse a provided YAML config file and command line args and merge them

During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

The precedence for merging is as follows * default cli args values < config file values < user provided cli args

E.g.:

if you don't include a value in your configuration it will take the default value from the argparse arguments
if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

Parameters:

Name	Type	Description	Default
`parser`	`ArgumentParser`	The argument parser you want to use	required
`config_file`	`Union[str, IO]`	Configuration file name or file descriptor	required
`args`	`Optional[List[str]]`	Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]	`None`

Returns:

Type	Description
`Union[omegaconf.listconfig.ListConfig, omegaconf.dictconfig.DictConfig]`	OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object

Examples:

>>> import io
>>> from slp.config.config_parser import parse_config
>>> mock_config_file = io.StringIO('''
model:
  hidden: 100
''')
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 100}}
>>> type(cfg)
<class 'omegaconf.dictconfig.DictConfig'>
>>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
{'model': {'hidden': 200}}
>>> mock_config_file = io.StringIO('''
random_value: hello
''')
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 20}, 'random_value': 'hello'}

Source code in slp/config/config_parser.py

def parse_config(
    parser: argparse.ArgumentParser,
    config_file: Optional[Union[str, IO]],
    args: Optional[List[str]] = None,
    include_none: bool = False,
) -> Union[ListConfig, DictConfig]:
    """parse_config Parse a provided YAML config file and command line args and merge them

    During experimentation we want ideally to have a configuration file with the model and training configuration,
    but also be able to run quick experiments using command line args.
    This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.

    The precedence for merging is as follows
       * default cli args values < config file values < user provided cli args

    E.g.:

       * if you don't include a value in your configuration it will take the default value from the argparse arguments
       * if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file

    Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)

    Args:
        parser (argparse.ArgumentParser): The argument parser you want to use
        config_file (Union[str, IO]): Configuration file name or file descriptor
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]

    Returns:
        OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object

    Examples:
        >>> import io
        >>> from slp.config.config_parser import parse_config
        >>> mock_config_file = io.StringIO('''
        model:
          hidden: 100
        ''')
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
        >>> cfg = parse_config(parser, mock_config_file)
        {'model': {'hidden': 100}}
        >>> type(cfg)
        <class 'omegaconf.dictconfig.DictConfig'>
        >>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
        {'model': {'hidden': 200}}
        >>> mock_config_file = io.StringIO('''
        random_value: hello
        ''')
        >>> cfg = parse_config(parser, mock_config_file)
        {'model': {'hidden': 20}, 'random_value': 'hello'}
    """
    # Merge Configurations Precedence: default kwarg values < default argparse values < config file values < user provided CLI args values

    if config_file is not None:
        dict_config = OmegaConf.from_yaml(config_file)  # type: ignore
    else:
        dict_config = OmegaConf.create({})

    user_cli, default_cli = OmegaConf.from_argparse(parser, include_none=include_none)
    config = OmegaConf.merge(default_cli, dict_config, user_cli)

    logger.info("Running with the following configuration")
    logger.info(f"\n{OmegaConf.to_yaml(config)}")

    return config

`SPECIAL_TOKENS`

SPECIAL_TOKENS Special Tokens for NLP applications

Default special tokens values and indices (compatible with BERT):

* [PAD]: 0
* [MASK]: 1
* [UNK]: 2
* [BOS]: 3
* [EOS]: 4
* [CLS]: 5
* [SEP]: 6
* [PAUSE]: 7

`OmegaConfExtended`

OmegaConfExtended Extended OmegaConf class, to include argparse style CLI arguments

Unfortunately the original authors are not interested into providing integration with argparse (https://github.com/omry/omegaconf/issues/569), so we have to get by with this extension

`from_argparse(parser, args=None, include_none=False)` `staticmethod`

from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects

We parse the command line arguments and separate the user provided values and the default values. This is useful for merging with a config file.

Parameters:

Name	Type	Description	Default
`parser`	`ArgumentParser`	Parser for argparse arguments	required
`args`	`Optional[List[str]]`	Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:]	`None`

Returns:

Type	Description
`Tuple[omegaconf.dictconfig.DictConfig, omegaconf.dictconfig.DictConfig]`	Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs

Examples:

>>> import argparse
>>> from slp.config.omegaconf import OmegaConfExtended
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
>>> user_provided_args
{'model': {'hidden': 100}}
>>> default_args
{}
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
>>> user_provided_args
{}
>>> default_args
{'model': {'hidden': 20}}

Source code in slp/config/omegaconf.py

@staticmethod
def from_argparse(
    parser: argparse.ArgumentParser,
    args: Optional[List[str]] = None,
    include_none: bool = False,
) -> Tuple[DictConfig, DictConfig]:
    """from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects

    We parse the command line arguments and separate the user provided values and the default values.
    This is useful for merging with a config file.

    Args:
        parser (argparse.ArgumentParser): Parser for argparse arguments
        args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
            Use this only for testing. By default it uses sys.argv[1:]
    Returns:
        Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs

    Examples:
        >>> import argparse
        >>> from slp.config.omegaconf import OmegaConfExtended
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
        >>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
        >>> user_provided_args
        {'model': {'hidden': 100}}
        >>> default_args
        {}
        >>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
        >>> user_provided_args
        {}
        >>> default_args
        {'model': {'hidden': 20}}
    """
    dest_to_arg = {v.dest: k for k, v in parser._option_string_actions.items()}

    all_args = vars(parser.parse_args(args=args))
    provided_args = {}
    default_args = {}

    for k, v in all_args.items():
        if dest_to_arg[k] in sys.argv:
            provided_args[k] = v
        else:
            default_args[k] = v

    provided = OmegaConf.create(_nest(provided_args, include_none=include_none))
    defaults = OmegaConf.create(_nest(default_args, include_none=include_none))

    return provided, defaults

`from_yaml(file_)` `staticmethod`

Alias for OmegaConf.load OmegaConf.from_yaml got removed at some point. Bring it back

Parameters:

Name	Type	Description	Default
`file_`	`Union[str, pathlib.Path, IO[Any]]`	file to load or file descriptor	required

Returns:

Type	Description
`Union[omegaconf.dictconfig.DictConfig, omegaconf.listconfig.ListConfig]`	Union[DictConfig, ListConfig]: The loaded configuration

Source code in slp/config/omegaconf.py

@staticmethod
def from_yaml(
    file_: Union[str, pathlib.Path, IO[Any]]
) -> Union[DictConfig, ListConfig]:
    """Alias for OmegaConf.load
    OmegaConf.from_yaml got removed at some point. Bring it back

    Args:
        file_ (Union[str, pathlib.Path, IO[Any]]): file to load or file descriptor

    Returns:
        Union[DictConfig, ListConfig]: The loaded configuration

    """
    return OmegaConfExtended.load(file_)

`MultimodalSequenceClassificationCollator`

`call(self, batch)` `special`

Call collate function

Parameters:

Name	Type	Description	Default
`batch`	`List[Dict[str, torch.Tensor]]`	Batch of samples. It expects a list of dictionaries from modalities to torch tensors	required

Returns:

Type	Description
`Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]`	Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of (dict batched modality tensors, labels, dict of modality sequence lengths)

Source code in slp/data/collators.py

def __call__(
    self, batch: List[Dict[str, torch.Tensor]]
) -> Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]:
    """Call collate function

    Args:
        batch (List[Dict[str, torch.Tensor]]): Batch of samples.
            It expects a list of dictionaries from modalities to torch tensors

    Returns:
        Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of
            (dict batched modality tensors, labels, dict of modality sequence lengths)
    """
    inputs = {}
    lengths = {}

    for m in self.modalities:
        seq = self.extract_sequence(batch, m)
        lengths[m] = torch.tensor([s.size(0) for s in seq], device=self.device)

        if self.max_length > 0:
            lengths[m] = torch.clamp(lengths[m], min=0, max=self.max_length)

        inputs[m] = pad_sequence(
            seq,
            batch_first=True,
            padding_value=self.pad_indx,
            max_length=self.max_length,
        ).to(self.device)

    targets: List[Label] = [b[self.label_key] for b in batch]

    # Pad and convert to tensor
    ttargets: torch.Tensor = mktensor(
        targets, device=self.device, dtype=self.label_dtype
    )

    return inputs, ttargets.to(self.device), lengths

`init(self, pad_indx=0, modalities={'audio', 'visual', 'text'}, label_key='label', max_length=-1, label_dtype=torch.float32, device='cpu')` `special`

Collate function for sequence classification tasks

Perform padding
Calculate sequence lengths

Parameters:

Name	Type	Description	Default
`pad_indx`	`int`	Pad token index. Defaults to 0.	`0`
`modalities`	`Set`	Which modalities are included in the batch dict	`{'audio', 'visual', 'text'}`
`max_length`	`int`	Pad sequences to a fixed maximum length	`-1`
`label_key`	`str`	String to access the label in the batch dict	`'label'`
`device`	`str`	device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.	`'cpu'`

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())

Source code in slp/data/collators.py

def __init__(
    self,
    pad_indx=0,
    modalities={"visual", "text", "audio"},
    label_key="label",
    max_length=-1,
    label_dtype=torch.float,
    device="cpu",
):
    """Collate function for sequence classification tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        modalities (Set): Which modalities are included in the batch dict
        max_length (int): Pad sequences to a fixed maximum length
        label_key (str): String to access the label in the batch dict
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.device = device
    self.max_length = max_length
    self.label_key = label_key
    self.modalities = modalities
    self.label_dtype = label_dtype

`Seq2SeqCollator`

`call(self, batch)` `special`

Call collate function

Parameters:

Name	Type	Description	Default
`batch`	`List[Tuple[torch.Tensor, torch.Tensor]]`	Batch of samples. It expects a list of tuples (source, target) Each source and target are a sequences of features or ids.	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]`	Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths_inputs, lengths_targets)

Source code in slp/data/collators.py

def __call__(
    self, batch: List[Tuple[torch.Tensor, torch.Tensor]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Call collate function

    Args:
        batch (List[Tuple[torch.Tensor, torch.Tensor]]): Batch of samples.
            It expects a list of tuples (source, target)
            Each source and target are a sequences of features or ids.

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors
            (inputs, labels, lengths_inputs, lengths_targets)
    """
    inputs: List[torch.Tensor] = [b[0] for b in batch]
    targets: List[torch.Tensor] = [b[1] for b in batch]
    lengths_inputs = torch.tensor([s.size(0) for s in inputs], device=self.device)
    lengths_targets = torch.tensor([s.size(0) for s in targets], device=self.device)

    if self.max_length > 0:
        lengths_inputs = torch.clamp(lengths_inputs, min=0, max=self.max_length)
        lengths_targets = torch.clamp(lengths_targets, min=0, max=self.max_length)

    inputs_padded: torch.Tensor = pad_sequence(
        inputs,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    targets_padded: torch.Tensor = pad_sequence(
        targets,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    return inputs_padded, targets_padded, lengths_inputs, lengths_targets

`init(self, pad_indx=0, max_length=-1, device='cpu')` `special`

Collate function for seq2seq tasks

Perform padding
Calculate sequence lengths

Parameters:

Name	Type	Description	Default
`pad_indx`	`int`	Pad token index. Defaults to 0.	`0`
`max_length`	`int`	Pad sequences to a fixed maximum length	`-1`
`device`	`str`	device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.	`'cpu'`

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())

Source code in slp/data/collators.py

def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
    """Collate function for seq2seq tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        max_length (int): Pad sequences to a fixed maximum length
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.max_length = max_length
    self.device = device

`SequenceClassificationCollator`

`call(self, batch)` `special`

Call collate function

Parameters:

Name	Type	Description	Default
`batch`	`List[Tuple[torch.Tensor, Union[numpy.ndarray, torch.Tensor, List[~T], int]]]`	Batch of samples. It expects a list of tuples (inputs, label).	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor, torch.Tensor]`	Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths)

Source code in slp/data/collators.py

def __call__(
    self, batch: List[Tuple[torch.Tensor, Label]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
    """Call collate function

    Args:
        batch (List[Tuple[torch.Tensor, slp.util.types.Label]]): Batch of samples.
            It expects a list of tuples (inputs, label).

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths)
    """
    inputs: List[torch.Tensor] = [b[0] for b in batch]
    targets: List[Label] = [b[1] for b in batch]
    #  targets: List[torch.tensor] = map(list, zip(*batch))
    lengths = torch.tensor([s.size(0) for s in inputs], device=self.device)

    if self.max_length > 0:
        lengths = torch.clamp(lengths, min=0, max=self.max_length)
    # Pad and convert to tensor
    inputs_padded: torch.Tensor = pad_sequence(
        inputs,
        batch_first=True,
        padding_value=self.pad_indx,
        max_length=self.max_length,
    ).to(self.device)

    ttargets: torch.Tensor = mktensor(targets, device=self.device, dtype=torch.long)

    return inputs_padded, ttargets.to(self.device), lengths

`init(self, pad_indx=0, max_length=-1, device='cpu')` `special`

Collate function for sequence classification tasks

Perform padding
Calculate sequence lengths

Parameters:

Name	Type	Description	Default
`pad_indx`	`int`	Pad token index. Defaults to 0.	`0`
`max_length`	`int`	Pad sequences to a fixed maximum length	`-1`
`device`	`str`	device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion.	`'cpu'`

Examples:

>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())

Source code in slp/data/collators.py

def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
    """Collate function for sequence classification tasks

    * Perform padding
    * Calculate sequence lengths

    Args:
        pad_indx (int): Pad token index. Defaults to 0.
        max_length (int): Pad sequences to a fixed maximum length
        device (str): device of returned tensors. Leave this as "cpu".
            The LightningModule will handle the Conversion.

    Examples:
        >>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())
    """
    self.pad_indx = pad_indx
    self.device = device
    self.max_length = max_length

`EmbeddingsLoader`

`init(self, embeddings_file, dim, vocab=None, extra_tokens=None)` `special`

Load word embeddings in text format

Parameters:

Name	Type	Description	Default
`embeddings_file`	`str`	File where embeddings are stored (e.g. glove.6B.50d.txt)	required
`dim`	`int`	Dimensionality of embeddings	required
`vocab`	`Optional[Dict[str, int]]`	Load only embeddings in vocab. Defaults to None.	`None`
`extra_tokens`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Create random embeddings for these special tokens. Defaults to None.	`None`

Source code in slp/data/corpus.py

def __init__(
    self,
    embeddings_file: str,
    dim: int,
    vocab: Optional[Dict[str, int]] = None,
    extra_tokens: Optional[SPECIAL_TOKENS] = None,
) -> None:
    """Load word embeddings in text format

    Args:
        embeddings_file (str): File where embeddings are stored (e.g. glove.6B.50d.txt)
        dim (int): Dimensionality of embeddings
        vocab (Optional[Dict[str, int]]): Load only embeddings in vocab. Defaults to None.
        extra_tokens (Optional[slp.config.nlp.SPECIAL_TOKENS]): Create random embeddings for these special tokens.
            Defaults to None.
    """
    self.embeddings_file = embeddings_file
    self.vocab = vocab
    self.cache_ = self._get_cache_name()
    self.dim_ = dim
    self.extra_tokens = extra_tokens

`repr(self)` `special`

String representation of class

Source code in slp/data/corpus.py

def __repr__(self):
    """String representation of class"""

    return f"{self.__class__.__name__}({self.embeddings_file}, {self.dim_})"

`augment_embeddings(self, word2idx, idx2word, embeddings, token, emb=None)`

Create a random embedding for a special token and append it to the embeddings array

Parameters:

Name	Type	Description	Default
`word2idx`	`Dict[str, int]`	Current word2idx map	required
`idx2word`	`Dict[int, str]`	Current idx2word map	required
`embeddings`	`List[numpy.ndarray]`	Embeddings array as list of embeddings	required
`token`	`str`	The special token (e.g. [PAD])	required
`emb`	`Optional[numpy.ndarray]`	Optional value for the embedding to be appended. Defaults to None, where a random embedding is created.	`None`

Returns:

Type	Description
`Tuple[Dict[str, int], Dict[int, str], List[numpy.ndarray]]`	Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple

Source code in slp/data/corpus.py

def augment_embeddings(
    self,
    word2idx: Dict[str, int],
    idx2word: Dict[int, str],
    embeddings: List[np.ndarray],
    token: str,
    emb: Optional[np.ndarray] = None,
) -> Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]:
    """Create a random embedding for a special token and append it to the embeddings array

    Args:
        word2idx (Dict[str, int]): Current word2idx map
        idx2word (Dict[int, str]): Current idx2word map
        embeddings (List[np.ndarray]): Embeddings array as list of embeddings
        token (str): The special token (e.g. [PAD])
        emb (Optional[np.ndarray]): Optional value for the embedding to be appended.
            Defaults to None, where a random embedding is created.

    Returns:
        Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple
    """
    word2idx[token] = len(embeddings)
    idx2word[len(embeddings)] = token

    if emb is None:
        emb = np.random.uniform(low=-0.05, high=0.05, size=self.dim_)
    embeddings.append(emb)

    return word2idx, idx2word, embeddings

`in_accepted_vocab(self, word)`

Check if word exists in given vocabulary

Parameters:

Name	Type	Description	Default
`word`	`str`	word from embeddings file	required

Returns:

Type	Description
`bool`	bool: Word exists

Source code in slp/data/corpus.py

def in_accepted_vocab(self, word: str) -> bool:
    """Check if word exists in given vocabulary

    Args:
        word (str): word from embeddings file

    Returns:
        bool: Word exists
    """

    return True if self.vocab is None else word in self.vocab

`load(self)`

Read the word vectors from a text file

Read embeddings
Filter with given vocabulary
Augment with special tokens

Returns:

Type	Description
`Tuple[Dict[str, int], Dict[int, str], numpy.ndarray]`	types.Embeddings: (word2idx, idx2word, embeddings) tuple

Source code in slp/data/corpus.py

@system.timethis(method=True)
def load(self) -> types.Embeddings:
    """Read the word vectors from a text file

    * Read embeddings
    * Filter with given vocabulary
    * Augment with special tokens

    Returns:
        types.Embeddings: (word2idx, idx2word, embeddings) tuple
    """
    # in order to avoid this time consuming operation, cache the results
    try:
        cache = self._load_cache()
        logger.info("Loaded word embeddings from cache.")

        return cache
    except OSError:
        logger.warning(f"Didn't find embeddings cache file {self.embeddings_file}")
        logger.warning("Loading embeddings from file.")

    # create the necessary dictionaries and the word embeddings matrix

    if not os.path.exists(self.embeddings_file):
        logger.critical(f"{self.embeddings_file} not found!")
        raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), self.embeddings_file)

    logger.info(f"Indexing file {self.embeddings_file} ...")

    # create the 2D array, which will be used for initializing
    # the Embedding layer of a NN.
    # We reserve the first row (idx=0), as the word embedding,
    # which will be used for zero padding (word with id = 0).

    if self.extra_tokens is not None:
        word2idx, idx2word, embeddings = self.augment_embeddings(
            {},
            {},
            [],
            self.extra_tokens.PAD.value,  # type: ignore
            emb=np.zeros(self.dim_),
        )

        for token in self.extra_tokens:  # type: ignore
            logger.debug(f"Adding token {token.value} to embeddings matrix")

            if token == self.extra_tokens.PAD:
                continue
            word2idx, idx2word, embeddings = self.augment_embeddings(
                word2idx, idx2word, embeddings, token.value
            )
    else:
        word2idx, idx2word, embeddings = self.augment_embeddings(
            {}, {}, [], "[PAD]", emb=np.zeros(self.dim_)
        )
    # read file, line by line
    with open(self.embeddings_file, "r") as f:
        num_lines = sum(1 for line in f)

    with open(self.embeddings_file, "r") as f:
        index = len(embeddings)

        for line in tqdm(
            f, total=num_lines, desc="Loading word embeddings...", leave=False
        ):
            # skip the first row if it is a header

            if len(line.split()) < self.dim_:
                continue

            values = line.rstrip().split(" ")
            word = values[0]

            if word in word2idx:
                continue

            if not self.in_accepted_vocab(word):
                continue

            vector = np.asarray(values[1:], dtype=np.float32)
            idx2word[index] = word
            word2idx[word] = index
            embeddings.append(vector)
            index += 1

    logger.info(f"Loaded {len(embeddings)} word vectors.")
    embeddings_out = np.array(embeddings, dtype="float32")

    # write the data to a cache file
    self._dump_cache((word2idx, idx2word, embeddings_out))

    return word2idx, idx2word, embeddings_out

`HfCorpus`

`embeddings: None` `property` `readonly`

Unused. Defined for compatibility

`frequencies: Dict[str, int]` `property` `readonly`

Retrieve wordpieces occurence counts

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: wordpieces occurence counts

`idx2word: None` `property` `readonly`

Unused. Defined for compatibility

`indices: List[List[int]]` `property` `readonly`

Retrieve corpus as token indices

Returns:

Type	Description
`List[List[int]]`	List[List[int]]: Token indices for corpus

`raw: List[str]` `property` `readonly`

Retrieve raw corpus

Returns:

Type	Description
`List[str]`	List[str]: Raw Corpus

`tokenized: List[List[str]]` `property` `readonly`

Retrieve tokenized corpus

Returns:

Type	Description
`List[List[str]]`	List[List[str]]: tokenized corpus

`vocab: Set[str]` `property` `readonly`

Retrieve set of words in vocabulary

Returns:

Type	Description
`Set[str]`	Set[str]: set of words in vocabulary

`vocab_size: int` `property` `readonly`

Retrieve vocabulary size

Returns:

Type	Description
`int`	int: Vocabulary size

`word2idx: None` `property` `readonly`

Unused. Defined for compatibility

`getitem(self, idx)` `special`

Get ith element in corpus as token indices

Parameters:

Name	Type	Description	Default
`idx`	`List[int]`	index in corpus	required

Returns:

Type	Description
`List[int]`	List[int]: List of token indices for sentence

Source code in slp/data/corpus.py

def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

`init(self, corpus, lower=True, tokenizer_model='bert-base-uncased', add_special_tokens=True, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)` `special`

Process a corpus using hugging face tokenizers

Select one of hugging face tokenizers and process corpus

Parameters:

Name	Type	Description	Default
`corpus`	`List[str]`	List of sentences	required
`lower`	`bool`	Convert strings to lower case. Defaults to True.	`True`
`tokenizer_model`	`str`	Hugging face model to use. Defaults to "bert-base-uncased".	`'bert-base-uncased'`
`add_special_tokens`	`bool`	Add special tokens in sentence during tokenization. Defaults to True.	`True`
`special_tokens`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`
`max_length`	`int`	Crop sequences above this length. Defaults to -1 where sequences are left unaltered.	`-1`

Source code in slp/data/corpus.py

def __init__(
    self,
    corpus: List[str],
    lower: bool = True,
    tokenizer_model: str = "bert-base-uncased",
    add_special_tokens: bool = True,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    max_length: int = -1,
    **kwargs,
):
    """Process a corpus using hugging face tokenizers

    Select one of hugging face tokenizers and process corpus

    Args:
        corpus (List[str]): List of sentences
        lower (bool): Convert strings to lower case. Defaults to True.
        tokenizer_model (str): Hugging face model to use. Defaults to "bert-base-uncased".
        add_special_tokens (bool): Add special tokens in sentence during tokenization. Defaults to True.
        special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
             Defaults to slp.config.nlp.SPECIAL_TOKENS.
        max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
    """
    self.corpus_ = corpus
    self.max_length = max_length

    logger.info(
        f"Tokenizing corpus using hugging face tokenizer from {tokenizer_model}"
    )

    self.tokenizer = HuggingFaceTokenizer(
        lower=lower, model=tokenizer_model, add_special_tokens=add_special_tokens
    )

    self.corpus_indices_ = [
        self.tokenizer(s)
        for s in tqdm(
            self.corpus_, desc="Converting tokens to indices...", leave=False
        )
    ]

    self.tokenized_corpus_ = [
        self.tokenizer.detokenize(s)
        for s in tqdm(
            self.corpus_indices_,
            desc="Mapping indices to tokens...",
            leave=False,
        )
    ]

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=-1,
        special_tokens=special_tokens,
    )

`len(self)` `special`

Number of samples in corpus

Returns:

Type	Description
`int`	int: Corpus length

Source code in slp/data/corpus.py

def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

`TokenizedCorpus`

`embeddings: None` `property` `readonly`

Unused. Kept for compatibility

`frequencies: Dict[str, int]` `property` `readonly`

Retrieve wordpieces occurence counts

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: wordpieces occurence counts

`idx2word: Dict[int, str]` `property` `readonly`

Retrieve idx2word mapping

Returns:

Type	Description
`Dict[int, str]`	Dict[str, int]: idx2word mapping

`indices: Union[List[int], List[List[int]]]` `property` `readonly`

Retrieve corpus as token indices

Returns:

Type	Description
`Union[List[int], List[List[int]]]`	List[List[int]]: Token indices for corpus

`raw: Union[List[str], List[List[str]]]` `property` `readonly`

Retrieve raw corpus

Returns:

Type	Description
`Union[List[str], List[List[str]]]`	List[str]: Raw Corpus

`tokenized: Union[List[str], List[List[str]]]` `property` `readonly`

Retrieve tokenized corpus

Returns:

Type	Description
`Union[List[str], List[List[str]]]`	List[List[str]]: Tokenized corpus

`vocab: Set[str]` `property` `readonly`

Retrieve set of words in vocabulary

Returns:

Type	Description
`Set[str]`	Set[str]: set of words in vocabulary

`vocab_size: int` `property` `readonly`

Retrieve vocabulary size

Returns:

Type	Description
`int`	int: Vocabulary size

`word2idx: Dict[str, int]` `property` `readonly`

Retrieve word2idx mapping

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: word2idx mapping

`getitem(self, idx)` `special`

Get ith element in corpus as token indices

Parameters:

Name	Type	Description	Default
`idx`	`List[int]`	index in corpus	required

Returns:

Type	Description
`List[int]`	List[int]: List of token indices for sentence

Source code in slp/data/corpus.py

def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

`init(self, corpus, word2idx=None, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)` `special`

Wrap a corpus that's already tokenized

Parameters:

Name	Type	Description	Default
`corpus`	`Union[List[str], List[List[str]]]`	List of tokens or List of lists of tokens	required
`word2idx`	`Dict[str, int]`	Token to index mapping. Defaults to None.	`None`
`special_tokens`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special Tokens. Defaults to SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`

Source code in slp/data/corpus.py

def __init__(
    self,
    corpus: Union[List[str], List[List[str]]],
    word2idx: Dict[str, int] = None,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    max_length: int = -1,
    **kwargs,
):
    """Wrap a corpus that's already tokenized

    Args:
        corpus (Union[List[str], List[List[str]]]): List of tokens or List of lists of tokens
        word2idx (Dict[str, int], optional): Token to index mapping. Defaults to None.
        special_tokens (Optional[SPECIAL_TOKENS], optional): Special Tokens. Defaults to SPECIAL_TOKENS.
    """
    self.corpus_ = corpus
    self.tokenized_corpus_ = corpus
    self.max_length = max_length

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=-1,
        special_tokens=special_tokens,
    )

    if word2idx is not None:
        logger.info("Converting tokens to ids using word2idx.")
        self.word2idx_ = word2idx
    else:
        logger.info(
            "No word2idx provided. Will convert tokens to ids using an iterative counter."
        )
        self.word2idx_ = dict(zip(self.vocab_.keys(), itertools.count()))

    self.idx2word_ = {v: k for k, v in self.word2idx_.items()}

    self.to_token_ids = ToTokenIds(
        self.word2idx_,
        specials=SPECIAL_TOKENS,  # type: ignore
    )

    if isinstance(self.tokenized_corpus_[0], list):
        self.corpus_indices_ = [
            self.to_token_ids(s)
            for s in tqdm(
                self.tokenized_corpus_,
                desc="Converting tokens to token ids...",
                leave=False,
            )
        ]
    else:
        self.corpus_indices_ = self.to_token_ids(self.tokenized_corpus_)  # type: ignore

`len(self)` `special`

Number of samples in corpus

Returns:

Type	Description
`int`	int: Corpus length

Source code in slp/data/corpus.py

def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

`WordCorpus`

`embeddings: ndarray` `property` `readonly`

Retrieve embeddings array

Returns:

Type	Description
`ndarray`	np.ndarray: Array of pretrained word embeddings

`frequencies: Dict[str, int]` `property` `readonly`

Retrieve word occurence counts

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: word occurence counts

`idx2word: Dict[int, str]` `property` `readonly`

Retrieve idx2word mapping

Returns:

Type	Description
`Dict[int, str]`	Dict[str, int]: idx2word mapping

`indices: List[List[int]]` `property` `readonly`

Retrieve corpus as token indices

Returns:

Type	Description
`List[List[int]]`	List[List[int]]: Token indices for corpus

`raw: List[str]` `property` `readonly`

Retrieve raw corpus

Returns:

Type	Description
`List[str]`	List[str]: Raw Corpus

`tokenized: List[List[str]]` `property` `readonly`

Retrieve tokenized corpus

Returns:

Type	Description
`List[List[str]]`	List[List[str]]: Tokenized corpus

`vocab: Set[str]` `property` `readonly`

Retrieve set of words in vocabulary

Returns:

Type	Description
`Set[str]`	Set[str]: set of words in vocabulary

`vocab_size: int` `property` `readonly`

Retrieve vocabulary size for corpus

Returns:

Type	Description
`int`	int: vocabulary size

`word2idx: Dict[str, int]` `property` `readonly`

Retrieve word2idx mapping

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: word2idx mapping

`getitem(self, idx)` `special`

Get ith element in corpus as token indices

Parameters:

Name	Type	Description	Default
`idx`	`List[int]`	index in corpus	required

Returns:

Type	Description
`List[int]`	List[int]: List of token indices for sentence

Source code in slp/data/corpus.py

def __getitem__(self, idx) -> List[int]:
    """Get ith element in corpus as token indices

    Args:
        idx (List[int]): index in corpus

    Returns:
        List[int]: List of token indices for sentence
    """
    out: List[int] = (
        self.corpus_indices_[idx]
        if self.max_length <= 0
        else self.corpus_indices_[idx][: self.max_length]
    )

    return out

`init(self, corpus, limit_vocab_size=30000, word2idx=None, idx2word=None, embeddings=None, embeddings_file=None, embeddings_dim=300, lower=True, special_tokens=<enum 'SPECIAL_TOKENS'>, prepend_bos=False, append_eos=False, lang='en_core_web_md', max_length=-1, **kwargs)` `special`

Load corpus embeddings, tokenize in words using spacy and convert to ids

This class handles the handling of a raw corpus. It handles:

Tokenization into words (spacy)
Loading of pretrained word embedding
Calculation of word frequencies / corpus statistics
Conversion to token ids

You can pass either:

Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits where we want to pass the train split embeddings / word2idx.

Parameters:

Name	Type	Description	Default
`corpus`	`List[str]`	Corpus as a list of sentences	required
`limit_vocab_size`	`int`	Upper bound for number of most frequent tokens to keep. Defaults to 30000.	`30000`
`word2idx`	`Optional[Dict[str, int]]`	Mapping of word to indices. Defaults to None.	`None`
`idx2word`	`Optional[Dict[int, str]]`	Mapping of indices to words. Defaults to None.	`None`
`embeddings`	`Optional[numpy.ndarray]`	Embeddings array. Defaults to None.	`None`
`embeddings_file`	`Optional[str]`	Embeddings file to read. Defaults to None.	`None`
`embeddings_dim`	`int`	Dimension of embeddings. Defaults to 300.	`300`
`lower`	`bool`	Convert strings to lower case. Defaults to True.	`True`
`special_tokens`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`
`prepend_bos`	`bool`	Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False.	`False`
`append_eos`	`bool`	Append End of Sequence token for seq2seq tasks. Defaults to False.	`False`
`lang`	`str`	Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".	`'en_core_web_md'`
`max_length`	`int`	Crop sequences above this length. Defaults to -1 where sequences are left unaltered.	`-1`

Source code in slp/data/corpus.py

def __init__(
    self,
    corpus: List[str],
    limit_vocab_size: int = 30000,
    word2idx: Optional[Dict[str, int]] = None,
    idx2word: Optional[Dict[int, str]] = None,
    embeddings: Optional[np.ndarray] = None,
    embeddings_file: Optional[str] = None,
    embeddings_dim: int = 300,
    lower: bool = True,
    special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    prepend_bos: bool = False,
    append_eos: bool = False,
    lang: str = "en_core_web_md",
    max_length: int = -1,
    **kwargs,
):
    """Load corpus embeddings, tokenize in words using spacy and convert to ids

    This class handles the handling of a raw corpus. It handles:

    * Tokenization into words (spacy)
    * Loading of pretrained word embedding
    * Calculation of word frequencies / corpus statistics
    * Conversion to token ids

    You can pass either:

    * Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
    * Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits
      where we want to pass the train split embeddings / word2idx.

    Args:
        corpus (List[List[str]]): Corpus as a list of sentences
        limit_vocab_size (int): Upper bound for number of most frequent tokens to keep. Defaults to 30000.
        word2idx (Optional[Dict[str, int]]): Mapping of word to indices. Defaults to None.
        idx2word (Optional[Dict[int, str]]): Mapping of indices to words. Defaults to None.
        embeddings (Optional[np.ndarray]): Embeddings array. Defaults to None.
        embeddings_file (Optional[str]): Embeddings file to read. Defaults to None.
        embeddings_dim (int): Dimension of embeddings. Defaults to 300.
        lower (bool): Convert strings to lower case. Defaults to True.
        special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
             Defaults to slp.config.nlp.SPECIAL_TOKENS.
        prepend_bos (bool): Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False.
        append_eos (bool): Append End of Sequence token for seq2seq tasks. Defaults to False.
        lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
        max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
    """
    # FIXME: Extract super class to avoid repetition
    self.corpus_ = corpus
    self.max_length = max_length
    self.tokenizer = SpacyTokenizer(
        lower=lower,
        prepend_bos=prepend_bos,
        append_eos=append_eos,
        specials=special_tokens,
        lang=lang,
    )

    logger.info(f"Tokenizing corpus using spacy {lang}")

    self.tokenized_corpus_ = [
        self.tokenizer(s)
        for s in tqdm(self.corpus_, desc="Tokenizing corpus...", leave=False)
    ]

    self.vocab_ = create_vocab(
        self.tokenized_corpus_,
        vocab_size=limit_vocab_size if word2idx is None else -1,
        special_tokens=special_tokens,
    )

    self.word2idx_, self.idx2word_, self.embeddings_ = None, None, None
    # self.corpus_indices_ = self.tokenized_corpus_

    if word2idx is not None:
        logger.info("Word2idx was already provided. Going to used it.")

    if embeddings_file is not None and word2idx is None:
        logger.info(
            f"Going to load {len(self.vocab_)} embeddings from {embeddings_file}"
        )
        loader = EmbeddingsLoader(
            embeddings_file,
            embeddings_dim,
            vocab=self.vocab_,
            extra_tokens=special_tokens,
        )
        word2idx, idx2word, embeddings = loader.load()

    if embeddings is not None:
        self.embeddings_ = embeddings

    if idx2word is not None:
        self.idx2word_ = idx2word

    if word2idx is not None:
        self.word2idx_ = word2idx

        logger.info("Converting tokens to ids using word2idx.")
        self.to_token_ids = ToTokenIds(
            self.word2idx_,
            specials=SPECIAL_TOKENS,  # type: ignore
        )

        self.corpus_indices_ = [
            self.to_token_ids(s)
            for s in tqdm(
                self.tokenized_corpus_,
                desc="Converting tokens to token ids...",
                leave=False,
            )
        ]

        logger.info("Filtering corpus vocabulary.")

        updated_vocab = {}

        for k, v in self.vocab_.items():
            if k in self.word2idx_:
                updated_vocab[k] = v

        logger.info(
            f"Out of {len(self.vocab_)} tokens {len(self.vocab_) - len(updated_vocab)} were not found in the pretrained embeddings."
        )

        self.vocab_ = updated_vocab

`len(self)` `special`

Number of samples in corpus

Returns:

Type	Description
`int`	int: Corpus length

Source code in slp/data/corpus.py

def __len__(self) -> int:
    """Number of samples in corpus

    Returns:
        int: Corpus length
    """

    return len(self.corpus_indices_)

`create_vocab(corpus, vocab_size=-1, special_tokens=None)`

Create the vocabulary based on tokenized input corpus

Injects special tokens in the vocabulary
Calculates the occurence count for each token
Limits vocabulary to vocab_size most common tokens

Parameters:

Name	Type	Description	Default
`corpus`	`Union[List[str], List[List[str]]]`	The tokenized corpus as a list of sentences or a list of tokenized sentences	required
`vocab_size`	`int`	[description]. Limit vocabulary to vocab_size most common tokens. Defaults to -1 which keeps all tokens.	`-1`
`special_tokens`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens to include in the vocabulary. Defaults to None.	`None`

Returns:

Type	Description
`Dict[str, int]`	Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts

Examples:

>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
{'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
{'far': 2, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
{'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}

Source code in slp/data/corpus.py

def create_vocab(
    corpus: Union[List[str], List[List[str]]],
    vocab_size: int = -1,
    special_tokens: Optional[SPECIAL_TOKENS] = None,
) -> Dict[str, int]:
    """Create the vocabulary based on tokenized input corpus

    * Injects special tokens in the vocabulary
    * Calculates the occurence count for each token
    * Limits vocabulary to vocab_size most common tokens

    Args:
        corpus (Union[List[str], List[List[str]]]): The tokenized corpus as a list of sentences or a list of tokenized sentences
        vocab_size (int): [description]. Limit vocabulary to vocab_size most common tokens.
            Defaults to -1 which keeps all tokens.
        special_tokens Optional[SPECIAL_TOKENS]: Special tokens to include in the vocabulary. Defaults to None.

    Returns:
        Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts

    Examples:
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
        {'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
        {'far': 2, 'a': 1, 'in': 1}
        >>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
        {'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}
    """

    if isinstance(corpus[0], list):
        corpus = list(itertools.chain.from_iterable(corpus))
    freq = Counter(corpus)

    if special_tokens is None:
        extra_tokens = []
    else:
        extra_tokens = special_tokens.to_list()

    if vocab_size < 0:
        vocab_size = len(freq)
    take = min(vocab_size, len(freq))
    logger.info(f"Keeping {vocab_size} most common tokens out of {len(freq)}")

    def take0(x: Tuple[Any, Any]) -> Any:
        """Take first tuple element"""

        return x[0]

    common_words = list(map(take0, freq.most_common(take)))
    common_words = list(set(common_words) - set(extra_tokens))
    words = extra_tokens + common_words

    if len(words) > vocab_size:
        words = words[: vocab_size + len(extra_tokens)]

    def token_freq(t):
        """Token frequeny"""

        return 0 if t in extra_tokens else freq[t]

    vocab = dict(zip(words, map(token_freq, words)))
    logger.info(f"Vocabulary created with {len(vocab)} tokens.")
    logger.info(f"The 10 most common tokens are:\n{freq.most_common(10)}")

    return vocab

`CorpusDataset`

`getitem(self, idx)` `special`

Get a source and target token from the corpus

Parameters:

Name	Type	Description	Default
`idx`	`int`	Token position	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	(processed sentence, label)

Source code in slp/data/datasets.py

def __getitem__(self, idx):
    """Get a source and target token from the corpus

    Args:
        idx (int): Token position

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (processed sentence, label)
    """
    text, target = self.corpus[idx], self.labels[idx]
    if self.label_encoder is not None:
        target = self.label_encoder.transform([target])[0]
    for t in self.transforms:
        text = t(text)
    return text, target

`init(self, corpus, labels)` `special`

Labeled corpus dataset

Parameters:

Name	Type	Description	Default
`corpus`	`WordCorpus, HfCorpus etc..`	Input corpus	required
`labels`	`List[Any]`	Labels for examples	required

Source code in slp/data/datasets.py

def __init__(self, corpus, labels):
    """Labeled corpus dataset

    Args:
        corpus (WordCorpus, HfCorpus etc..): Input corpus
        labels (List[Any]): Labels for examples
    """
    self.corpus = corpus
    self.labels = labels
    assert len(self.labels) == len(self.corpus), "Incompatible labels and corpus"
    self.transforms = []
    self.label_encoder = None
    if isinstance(self.labels[0], str):
        self.label_encoder = LabelEncoder().fit(self.labels)

`len(self)` `special`

Length of corpus

Returns:

Type	Description
`int`	Corpus Length

Source code in slp/data/datasets.py

def __len__(self):
    """Length of corpus

    Returns:
        int: Corpus Length
    """
    return len(self.corpus)

`map(self, t)`

Append a transform to self.transforms, in order to be applied to the data

Parameters:

Name	Type	Description	Default
`t`	`Callable[[str], Any]`	Transform of input token	required

Returns:

Type	Description
`CorpusDataset`	self

Source code in slp/data/datasets.py

def map(self, t):
    """Append a transform to self.transforms, in order to be applied to the data

    Args:
        t (Callable[[str], Any]): Transform of input token

    Returns:
        CorpusDataset: self
    """
    self.transforms.append(t)
    return self

`CorpusLMDataset`

`getitem(self, idx)` `special`

Get a source and target token from the corpus

Parameters:

Name	Type	Description	Default
`idx`	`int`	Token position	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	source=coprus[idx], target=corpus[idx+1]

Source code in slp/data/datasets.py

def __getitem__(self, idx):
    """Get a source and target token from the corpus

    Args:
        idx (int): Token position

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: source=coprus[idx], target=corpus[idx+1]
    """
    src, tgt = self.source[idx], self.target[idx]
    for t in self.transforms:
        src = t(src)
        tgt = t(tgt)
    return src, tgt

`init(self, corpus)` `special`

Wraps a tokenized dataset which is provided as a list of tokens

Targets = source shifted one token to the left (next token prediction)

Parameters:

Name	Type	Description	Default
`corpus`	`List[str] or WordCorpus`	List of tokens	required

Source code in slp/data/datasets.py

def __init__(self, corpus):
    """Wraps a tokenized dataset which is provided as a list of tokens

    Targets = source shifted one token to the left (next token prediction)

    Args:
        corpus (List[str] or WordCorpus): List of tokens
    """
    self.source = corpus[:-1]
    self.target = corpus[1:]
    self.transforms = []

`len(self)` `special`

Length of corpus

Returns:

Type	Description
`int`	Corpus Length

Source code in slp/data/datasets.py

def __len__(self):
    """Length of corpus

    Returns:
        int: Corpus Length
    """
    return int(len(self.source))

`map(self, t)`

Append a transform to self.transforms, in order to be applied to the data

Parameters:

Name	Type	Description	Default
`t`	`Callable[[str], Any]`	Transform of input token	required

Returns:

Type	Description
`CorpusLMDataset`	self

Source code in slp/data/datasets.py

def map(self, t):
    """Append a transform to self.transforms, in order to be applied to the data

    Args:
        t (Callable[[str], Any]): Transform of input token

    Returns:
        CorpusLMDataset: self
    """
    self.transforms.append(t)
    return self

`HuggingFaceTokenizer`

`call(self, x)` `special`

Call to tokenize function

Parameters:

Name	Type	Description	Default
`x`	`str`	Input string	required

Returns:

Type	Description
`List[int]`	List[int]: List of token ids

Source code in slp/data/transforms.py

def __call__(self, x: str) -> List[int]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[int]: List of token ids
    """
    out: List[int] = self.tokenizer.encode(
        x, add_special_tokens=self.add_special_tokens, max_length=65536
    )
    return out

`init(self, lower=True, model='bert-base-uncased', add_special_tokens=True)` `special`

Apply one of huggingface tokenizers to a string

Parameters:

Name	Type	Description	Default
`lower`	`bool`	Lowercase string. Defaults to True.	`True`
`model`	`str`	Select transformer model. Defaults to "bert-base-uncased".	`'bert-base-uncased'`
`add_special_tokens`	`bool`	Insert special tokens to tokenized string. Defaults to True.	`True`

Source code in slp/data/transforms.py

def __init__(
    self,
    lower: bool = True,
    model: str = "bert-base-uncased",
    add_special_tokens: bool = True,
):
    """Apply one of huggingface tokenizers to a string

    Args:
        lower (bool): Lowercase string. Defaults to True.
        model (str): Select transformer model. Defaults to "bert-base-uncased".
        add_special_tokens (bool): Insert special tokens to tokenized string. Defaults to True.
    """
    self.tokenizer = AutoTokenizer.from_pretrained(model, do_lower_case=lower)
    self.vocab_size = len(self.tokenizer.vocab)
    self.add_special_tokens = add_special_tokens

`detokenize(self, x)`

Convert list of token ids to list of tokens

Parameters:

Name	Type	Description	Default
`x`	`List[int]`	List of token ids	required

Returns:

Type	Description
`List[str]`	List[str]: List of tokens

Source code in slp/data/transforms.py

def detokenize(self, x: List[int]) -> List[str]:
    """Convert list of token ids to list of tokens

    Args:
        x (List[int]): List of token ids

    Returns:
        List[str]: List of tokens
    """
    out: List[str] = self.tokenizer.convert_ids_to_tokens(x)
    return out

`ReplaceUnknownToken`

`call(self, x)` `special`

Convert in list of tokens to [UNK]

Parameters:

Name	Type	Description	Default
`x`	`List[str]`	List of tokens	required

Returns:

Type	Description
`List[str]`	List[str]: List of tokens

Source code in slp/data/transforms.py

def __call__(self, x: List[str]) -> List[str]:
    """Convert <unk> in list of tokens to [UNK]

    Args:
        x (List[str]): List of tokens

    Returns:
        List[str]: List of tokens
    """
    return [w if w != self.old_unk else self.new_unk for w in x]

`init(self, old_unk='<unk>', new_unk='[UNK]')` `special`

Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext

Parameters:

Name	Type	Description	Default
`old_unk`	`str`	Unk token in corpus. Defaults to "".	`'<unk>'`
`new_unk`	`str`	Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value.	`'[UNK]'`

Source code in slp/data/transforms.py

def __init__(
    self,
    old_unk: str = "<unk>",
    new_unk: str = SPECIAL_TOKENS.UNK.value,  # type: ignore
):
    """Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext

    Args:
        old_unk (str): Unk token in corpus. Defaults to "<unk>".
        new_unk (str): Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value.
    """
    self.old_unk = old_unk
    self.new_unk = new_unk

`SentencepieceTokenizer`

`call(self, x)` `special`

Call to tokenize function

Parameters:

Name	Type	Description	Default
`x`	`str`	Input string	required

Returns:

Type	Description
`List[int]`	List[int]: List of tokens ids

Source code in slp/data/transforms.py

def __call__(self, x: str) -> List[int]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[int]: List of tokens ids
    """
    if self.lower:
        x = x.lower()
    ids: List[int] = self.pre_id + self.tokenizer.encode_as_ids(x) + self.post_id
    return ids

`init(self, lower=True, model=None, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>)` `special`

Tokenize sentence using pretrained sentencepiece model

Parameters:

Name	Type	Description	Default
`lower`	`bool`	Lowercase string. Defaults to True.	`True`
`model`	`Optional[Any]`	Sentencepiece model. Defaults to None.	`None`
`prepend_bos`	`bool`	Prepend BOS for seq2seq. Defaults to False.	`False`
`append_eos`	`bool`	Append EOS for seq2seq. Defaults to False.	`False`
`specials`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens. Defaults to SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`

Source code in slp/data/transforms.py

def __init__(
    self,
    lower: bool = True,
    model: Optional[Any] = None,
    prepend_bos: bool = False,
    append_eos: bool = False,
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
):
    """Tokenize sentence using pretrained sentencepiece model

    Args:
        lower (bool): Lowercase string. Defaults to True.
        model (Optional[Any]): Sentencepiece model. Defaults to None.
        prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
        append_eos (bool): Append EOS for seq2seq. Defaults to False.
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
    """
    self.tokenizer = spm.SentencePieceProcessor()
    self.tokenizer.Load(model)
    self.specials = specials
    self.lower = lower
    self.vocab_size = self.tokenizer.get_piece_size()
    self.pre_id = []
    self.post_id = []
    if prepend_bos:
        self.pre_id.append(self.tokenizer.piece_to_id(self.specials.BOS.value))  # type: ignore
    if append_eos:
        self.post_id.append(self.tokenizer.piece_to_id(self.specials.EOS.value))  # type: ignore

`SpacyTokenizer`

`call(self, x)` `special`

Call to tokenize function

Parameters:

Name	Type	Description	Default
`x`	`str`	Input string	required

Returns:

Type	Description
`List[str]`	List[str]: List of tokens

Source code in slp/data/transforms.py

def __call__(self, x: str) -> List[str]:
    """Call to tokenize function

    Args:
        x (str): Input string

    Returns:
        List[str]: List of tokens
    """
    if self.lower:
        x = x.lower()
    out: List[str] = (
        self.pre_id + [y.text for y in self.nlp.tokenizer(x)] + self.post_id
    )
    return out

`init(self, lower=True, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>, lang='en_core_web_sm')` `special`

Apply spacy tokenizer to str

Parameters:

Name	Type	Description	Default
`lower`	`bool`	Lowercase string. Defaults to True.	`True`
`prepend_bos`	`bool`	Prepend BOS for seq2seq. Defaults to False.	`False`
`append_eos`	`bool`	Append EOS for seq2seq. Defaults to False.	`False`
`specials`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens. Defaults to SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`
`lang`	`str`	Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".	`'en_core_web_sm'`

Source code in slp/data/transforms.py

def __init__(
    self,
    lower: bool = True,
    prepend_bos: bool = False,
    append_eos: bool = False,
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
    lang: str = "en_core_web_sm",
):
    """Apply spacy tokenizer to str

    Args:
        lower (bool): Lowercase string. Defaults to True.
        prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
        append_eos (bool): Append EOS for seq2seq. Defaults to False.
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
        lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
    """
    self.lower = lower
    self.specials = SPECIAL_TOKENS
    self.lang = lang
    self.pre_id = []
    self.post_id = []
    if prepend_bos:
        self.pre_id.append(self.specials.BOS.value)
    if append_eos:
        self.post_id.append(self.specials.EOS.value)
    self.nlp = self.get_nlp(name=lang, specials=specials)

`get_nlp(self, name='en_core_web_sm', specials=<enum 'SPECIAL_TOKENS'>)`

Get spacy nlp object for given lang and add SPECIAL_TOKENS

Parameters:

Name	Type	Description	Default
`name`	`str`	Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".	`'en_core_web_sm'`
`specials`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens. Defaults to SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`

Returns:

Type	Description
`Language`	spacy.Language: spacy text-processing pipeline

Source code in slp/data/transforms.py

def get_nlp(
    self,
    name: str = "en_core_web_sm",
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
) -> spacy.Language:
    """Get spacy nlp object for given lang and add SPECIAL_TOKENS

    Args:
        name (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.

    Returns:
        spacy.Language: spacy text-processing pipeline
    """
    nlp = spacy.load(name)
    if specials is not None:
        for token in specials.to_list():
            nlp.tokenizer.add_special_case(token, [{ORTH: token}])
    return nlp

`ToTensor`

`call(self, x)` `special`

Convert list of tokens or list of features to tensor

Parameters:

Name	Type	Description	Default
`x`	`List[Any]`	List of tokens or features	required

Returns:

Type	Description
`Tensor`	torch.Tensor: Resulting tensor

Source code in slp/data/transforms.py

def __call__(self, x: List[Any]) -> torch.Tensor:
    """Convert list of tokens or list of features to tensor

    Args:
        x (List[Any]): List of tokens or features

    Returns:
        torch.Tensor: Resulting tensor
    """
    return mktensor(x, device=self.device, dtype=self.dtype)

`init(self, device='cpu', dtype=torch.int64)` `special`

To tensor convertor

Parameters:

Name	Type	Description	Default
`device`	`str`	Device to map the tensor. Defaults to "cpu".	`'cpu'`
`dtype`	`dtype`	Type of resulting tensor. Defaults to torch.long.	`torch.int64`

Source code in slp/data/transforms.py

def __init__(self, device: str = "cpu", dtype: torch.dtype = torch.long):
    """To tensor convertor

    Args:
        device (str): Device to map the tensor. Defaults to "cpu".
        dtype (torch.dtype): Type of resulting tensor. Defaults to torch.long.
    """
    self.device = device
    self.dtype = dtype

`ToTokenIds`

`call(self, x)` `special`

Convert list of tokens to list of token ids

Parameters:

Name	Type	Description	Default
`x`	`List[str]`	List of tokens	required

Returns:

Type	Description
`List[int]`	List[int]: List of token ids

Source code in slp/data/transforms.py

def __call__(self, x: List[str]) -> List[int]:
    """Convert list of tokens to list of token ids

    Args:
        x (List[str]): List of tokens

    Returns:
        List[int]: List of token ids
    """
    return [
        self.word2idx[w] if w in self.word2idx else self.word2idx[self.unk_value]
        for w in x
    ]

`init(self, word2idx, specials=<enum 'SPECIAL_TOKENS'>)` `special`

Convert List of tokens to list of token ids

Parameters:

Name	Type	Description	Default
`word2idx`	`Dict[str, int]`	Word to index mapping	required
`specials`	`Optional[slp.config.nlp.SPECIAL_TOKENS]`	Special tokens. Defaults to SPECIAL_TOKENS.	`<enum 'SPECIAL_TOKENS'>`

Source code in slp/data/transforms.py

def __init__(
    self,
    word2idx: Dict[str, int],
    specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS,  # type: ignore
):
    """Convert List of tokens to list of token ids

    Args:
        word2idx (Dict[str, int]): Word to index mapping
        specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
    """
    self.word2idx = word2idx
    self.unk_value = specials.UNK.value if specials is not None else "[UNK]"  # type: ignore

`Attention`

`init(self, attention_size=512, input_size=None, dropout=0.1)` `special`

Single-Headed Dot-product attention module

Parameters:

Name	Type	Description	Default
`attention_size`	`int`	Number of hidden features. Defaults to 512.	`512`
`input_size`	`Optional[int]`	Input features. Defaults to None. If None input_size is set to attention_size.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.1.	`0.1`

Source code in slp/modules/attention.py

def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
):
    """Single-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(Attention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.dk = input_size
    self.k = nn.Linear(input_size, attention_size, bias=False)
    self.q = nn.Linear(input_size, attention_size, bias=False)
    self.v = nn.Linear(input_size, attention_size, bias=False)
    self.dropout = dropout
    reset_parameters(self.named_parameters())

`forward(self, keys, queries=None, attention_mask=None)`

Single-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

\[a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`keys`	`Tensor`	[B, L, D] Keys tensor	required
`queries`	`Optional[torch.Tensor]`	Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.	`None`
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])

Source code in slp/modules/attention.py

def forward(
    self,
    keys: torch.Tensor,
    queries: Optional[torch.Tensor] = None,
    attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
    r"""Single-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    $$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        keys (torch.Tensor): [B, L, D] Keys tensor
        queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
        attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
    """
    if attention_mask is not None:
        if len(list(attention_mask.size())) == 2:
            attention_mask = attention_mask.unsqueeze(1)

    if queries is None:
        queries = keys

    values = keys

    k = self.k(keys)  # (B, L, A)
    q = self.q(queries)
    v = self.v(values)

    # weights => (B, L, L)
    out, scores = attention(
        k,
        q,
        v,
        self.dk,
        attention_mask=attention_mask,
        dropout=self.dropout,
        training=self.training,
    )

    return out, scores

`MultiheadAttention`

`init(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)` `special`

Multi-Headed Dot-product attention module

Parameters:

Name	Type	Description	Default
`attention_size`	`int`	Number of hidden features. Defaults to 512.	`512`
`num_heads`	`int`	Number of attention heads	`8`
`input_size`	`Optional[int]`	Input features. Defaults to None. If None input_size is set to attention_size.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.1.	`0.1`
`nystrom`	`bool`	Use nystrom method for attention calculation. Defaults to False.	`False`
`num_landmarks`	`int`	Number of landmark points for nystrom attention. Defaults to 64.	`64`
`inverse_iterations`	`int`	Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.	`6`
`kernel_size`	`Optional[int]`	Use residual convolution in the output. Defaults to None.	`None`

Source code in slp/modules/attention.py

def __init__(
    self,
    attention_size: int = 512,
    num_heads: int = 8,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    """Multi-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
        nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
        num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
        inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
        kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
    """
    super(MultiheadAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.inverse_iterations = inverse_iterations
    self.num_landmarks = num_landmarks
    self.nystrom = nystrom
    self.num_heads = num_heads
    self.head_size = int(attention_size / num_heads)
    self.dk = self.head_size
    self.attention_size = attention_size
    self.k = nn.Linear(input_size, attention_size, bias=False)
    self.q = nn.Linear(input_size, attention_size, bias=False)
    self.v = nn.Linear(input_size, attention_size, bias=False)
    self.output = nn.Linear(attention_size, attention_size)
    self.dropout = dropout

    self.conv = None

    if kernel_size is not None:
        self.conv = nn.Conv2d(
            in_channels=self.num_heads,
            out_channels=self.num_heads,
            kernel_size=(kernel_size, 1),
            padding=(kernel_size // 2, 0),
            bias=False,
            groups=self.num_heads,
        )

    reset_parameters(self.named_parameters())

`forward(self, keys, queries=None, attention_mask=None)`

Multi-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

Each head performs dot-product attention

\[a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H\]

The outputs of multiple heads are concatenated and passed through a feedforward layer.

\[a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`keys`	`torch.Tensor`	[B, L, D] Keys tensor	required
`queries`	`Optional[torch.Tensor]`	Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.	`None`
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	(Reweighted values [B, L, D], attention scores [B, H, M, L])

Source code in slp/modules/attention.py

def forward(self, keys, queries=None, attention_mask=None):
    r"""Multi-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    Each head performs dot-product attention

    $$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$

    The outputs of multiple heads are concatenated and passed through a feedforward layer.

    $$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$


    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension


    Args:
        keys (torch.Tensor): [B, L, D] Keys tensor
        queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
        attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
    """
    _, seq_length, _ = keys.size()

    if attention_mask is not None:
        if attention_mask.ndim == 2:
            attention_mask = attention_mask.unsqueeze(1)
        attention_mask = attention_mask.unsqueeze(1)

    if self.nystrom:
        keys, attention_mask = pad_for_nystrom(
            keys, self.num_landmarks, attention_mask=attention_mask
        )

    if queries is None:
        queries = keys

    values = keys

    k = self.k(keys)
    q = self.q(queries)
    v = self.v(values)
    k = split_heads(k, self.num_heads)
    q = split_heads(q, self.num_heads)
    v = split_heads(v, self.num_heads)

    if self.nystrom:
        # out = (B, H, L, A/H)
        # scores = Tuple
        out, scores = nystrom_attention(
            k,
            q,
            v,
            self.dk,
            self.num_landmarks,
            attention_mask=attention_mask,
            inverse_iterations=self.inverse_iterations,
            dropout=self.dropout,
            training=self.training,
        )
    else:
        # out => (B, H, L, A/H)
        # scores => (B, H, L, L)
        out, scores = attention(
            k,
            q,
            v,
            self.dk,
            attention_mask=attention_mask,
            dropout=self.dropout,
            training=self.training,
        )

    if self.conv is not None:
        if attention_mask is None or attention_mask.ndim > 2:
            out += self.conv(v)
        else:
            attention_mask = attention_mask.squeeze()
            out += self.conv(v * attention_mask[:, None, :, None])

    # out => (B, H, L, A/H)
    out = merge_heads(out)
    if out.size(1) != seq_length:
        out = out[:, :seq_length, :]
    out = self.output(out)

    return out, scores

`MultiheadSelfAttention`

`init(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)` `special`

Multi-Headed Dot-product attention module

Parameters:

Name	Type	Description	Default
`attention_size`	`int`	Number of hidden features. Defaults to 512.	`512`
`num_heads`	`int`	Number of attention heads	`8`
`input_size`	`Optional[int]`	Input features. Defaults to None. If None input_size is set to attention_size.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.1.	`0.1`

Source code in slp/modules/attention.py

def __init__(
    self,
    attention_size: int = 512,
    num_heads: int = 8,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    """Multi-Headed Dot-product attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(MultiheadSelfAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.inverse_iterations = inverse_iterations
    self.num_landmarks = num_landmarks
    self.nystrom = nystrom
    self.num_heads = num_heads
    self.head_size = int(attention_size / num_heads)
    self.dk = self.head_size
    self.attention_size = attention_size
    self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
    self.output = nn.Linear(attention_size, attention_size)
    self.dropout = dropout

    self.conv = None

    if kernel_size is not None:
        self.conv = nn.Conv2d(
            in_channels=self.num_heads,
            out_channels=self.num_heads,
            kernel_size=(kernel_size, 1),
            padding=(kernel_size // 2, 0),
            bias=False,
            groups=self.num_heads,
        )

    reset_parameters(self.named_parameters())

`forward(self, x, attention_mask=None)`

Multi-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

Each head performs dot-product attention

\[a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H\]

The outputs of multiple heads are concatenated and passed through a feedforward layer.

\[a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`x`	`torch.Tensor`	[B, L, D] Keys tensor	required
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	(Reweighted values [B, L, D], attention scores [B, H, M, L])

Source code in slp/modules/attention.py

def forward(self, x, attention_mask=None):
    r"""Multi-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    Each head performs dot-product attention

    $$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$

    The outputs of multiple heads are concatenated and passed through a feedforward layer.

    $$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$


    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension


    Args:
        x (torch.Tensor): [B, L, D] Keys tensor
        attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
    """
    _, seq_length, _ = x.size()

    if attention_mask is not None:
        if attention_mask.ndim == 2:
            attention_mask = attention_mask.unsqueeze(1)
        attention_mask = attention_mask.unsqueeze(1)

    if self.nystrom:
        x, attention_mask = pad_for_nystrom(
            x, self.num_landmarks, attention_mask=attention_mask
        )

    k, q, v = self.kqv(x).chunk(3, dim=-1)
    k = split_heads(k, self.num_heads)
    q = split_heads(q, self.num_heads)
    v = split_heads(v, self.num_heads)

    if self.nystrom:
        # out = (B, H, L, A/H)
        # scores = Tuple
        out, scores = nystrom_attention(
            k,
            q,
            v,
            self.dk,
            self.num_landmarks,
            attention_mask=attention_mask,
            inverse_iterations=self.inverse_iterations,
            dropout=self.dropout,
            training=self.training,
        )
    else:
        # out => (B, H, L, A/H)
        # scores => (B, H, L, L)
        out, scores = attention(
            k,
            q,
            v,
            self.dk,
            attention_mask=attention_mask,
            dropout=self.dropout,
            training=self.training,
        )

    if self.conv is not None:
        if attention_mask is None or attention_mask.ndim > 2:
            out = out + self.conv(v)
        else:
            attention_mask = attention_mask.squeeze()
            out = out + self.conv(v * attention_mask[:, None, :, None])

    # out => (B, H, L, A/H)
    out = merge_heads(out)
    if out.size(1) != seq_length:
        out = out[:, -seq_length:, :]
    out = self.output(out)

    return out, scores

`MultiheadTwowayAttention`

`init(self, attention_size=512, input_size=None, dropout=0.1, num_heads=8, residual=True, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)` `special`

Multihead twoway attention for multimodal fusion

This module performs two way attention for two input modality feature sequences. If att is the MultiheadAttention operation and x, y the input modality sequences, the operation is summarized as

\[out = (att(x \rightarrow y), att(y \rightarrow x))\]

If residual is True then a Vilbert-like residual connection is applied

\[out = (att(x \rightarrow y) + x, att(y \rightarrow x) + y)\]

Parameters:

Name	Type	Description	Default
`attention_size`	`int`	Number of hidden features. Defaults to 512.	`512`
`num_heads`	`int`	Number of attention heads	`8`
`input_size`	`Optional[int]`	Input features. Defaults to None. If None input_size is set to attention_size.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.1.	`0.1`
`nystrom`	`bool`	Use nystrom method for attention calculation. Defaults to False.	`False`
`num_landmarks`	`int`	Number of landmark points for nystrom attention. Defaults to 64.	`64`
`inverse_iterations`	`int`	Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.	`6`
`kernel_size`	`Optional[int]`	Use residual convolution in the output. Defaults to None.	`None`
`residual`	`bool`	Use vilbert-like residual connections for fusion. Defaults to True.	`True`

Source code in slp/modules/attention.py

def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
    num_heads: int = 8,
    residual: bool = True,
    nystrom: bool = False,
    num_landmarks: int = 64,
    inverse_iterations: int = 6,
    kernel_size: Optional[int] = None,
):
    r"""Multihead twoway attention for multimodal fusion

    This module performs two way attention for two input modality feature sequences.
    If att is the MultiheadAttention operation and x, y the input modality sequences,
    the operation is summarized as

    $$out = (att(x \rightarrow y), att(y \rightarrow x))$$

    If residual is True then a Vilbert-like residual connection is applied

    $$out = (att(x \rightarrow y) + x, att(y \rightarrow x) + y)$$


    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        num_heads (int): Number of attention heads
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
        nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
        num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
        inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
        kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
        residual (bool, optional): Use vilbert-like residual connections for fusion. Defaults to True.
    """
    super(MultiheadTwowayAttention, self).__init__()

    self.xy = MultiheadAttention(
        attention_size=attention_size,
        input_size=input_size,
        dropout=dropout,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        inverse_iterations=inverse_iterations,
        kernel_size=kernel_size,
    )
    self.yx = MultiheadAttention(
        attention_size=attention_size,
        input_size=input_size,
        dropout=dropout,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        inverse_iterations=inverse_iterations,
        kernel_size=kernel_size,
    )
    self.residual = residual

`forward(self, mod1, mod2, attention_mask=None)`

x : (B, L, D) queries : (B, L, D) values : (B, L, D)

Source code in slp/modules/attention.py

def forward(self, mod1, mod2, attention_mask=None):
    """
    x : (B, L, D)
    queries : (B, L, D)
    values : (B, L, D)
    """
    out_mod1, _ = self.xy(mod1, queries=mod2, attention_mask=attention_mask)
    out_mod2, _ = self.yx(mod2, queries=mod1, attention_mask=attention_mask)

    if not self.residual:
        return out_mod1, out_mod2
    else:
        # vilbert cross residual

        # v + attention(v->a)
        # a + attention(a->v)
        out_mod1 += mod2
        out_mod2 += mod1

        return out_mod1, out_mod2

`SelfAttention`

`init(self, attention_size=512, input_size=None, dropout=0.1)` `special`

Single-Headed Dot-product self attention module

Parameters:

Name	Type	Description	Default
`attention_size`	`int`	Number of hidden features. Defaults to 512.	`512`
`input_size`	`Optional[int]`	Input features. Defaults to None. If None input_size is set to attention_size.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.1.	`0.1`

Source code in slp/modules/attention.py

def __init__(
    self,
    attention_size: int = 512,
    input_size: Optional[int] = None,
    dropout: float = 0.1,
):
    """Single-Headed Dot-product self attention module

    Args:
        attention_size (int): Number of hidden features. Defaults to 512.
        input_size (Optional[int]): Input features. Defaults to None.
            If None input_size is set to attention_size.
        dropout (float): Drop probability. Defaults to 0.1.
    """
    super(SelfAttention, self).__init__()

    if input_size is None:
        input_size = attention_size
    self.dk = input_size
    self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
    self.dropout = dropout
    reset_parameters(self.named_parameters())

`forward(self, x, attention_mask=None)`

Single-head scaled dot-product attention forward pass

Outputs the values, where features for each sequence element are weighted by their respective attention scores

\[a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L, D] Input tensor	required
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.	`None`

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor]`	Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])

Source code in slp/modules/attention.py

def forward(
    self,
    x: torch.Tensor,
    attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
    r"""Single-head scaled dot-product attention forward pass

    Outputs the values, where features for each sequence element are weighted by their respective attention scores

    $$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        x (torch.Tensor): [B, L, D] Input tensor
        attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.

    Returns:
        Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
    """
    if attention_mask is not None:
        if len(list(attention_mask.size())) == 2:
            attention_mask = attention_mask.unsqueeze(1)

    k, q, v = self.kqv(x).chunk(3, dim=-1)  # (B, L, A)

    # weights => (B, L, L)
    out, scores = attention(
        k,
        q,
        v,
        self.dk,
        attention_mask=attention_mask,
        dropout=self.dropout,
        training=self.training,
    )

    return out, scores

`attention(k, q, v, dk, attention_mask=None, dropout=0.2, training=True)`

Reweight values using scaled dot product attention

\[s = softmax(\frac{Q \cdot K^T}{\sqrt{d}}) V\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`k`	`Tensor`	Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor	required
`q`	`Tensor`	Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor	required
`v`	`Tensor`	Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor	required
`dk`	`int`	Model dimension	required
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.2.	`0.2`
`training`	`bool`	Is module in training phase? Defaults to True.	`True`

Returns:

Type	Description
`torch.Tensor`	[B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py

def attention(
    k: torch.Tensor,
    q: torch.Tensor,
    v: torch.Tensor,
    dk: int,
    attention_mask: Optional[torch.Tensor] = None,
    dropout: float = 0.2,
    training: bool = True,
):
    r"""Reweight values using scaled dot product attention

    $$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}}) V$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
        dk (int): Model dimension
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """

    scores = attention_scores(
        k, q, dk, attention_mask=attention_mask, dropout=dropout, training=training
    )
    out = torch.matmul(scores, v)

    return out, scores

`attention_scores(k, q, dk, attention_mask=None, dropout=0.2, training=True)`

Calculate attention scores for scaled dot product attention

\[s = softmax(\frac{Q \cdot K^T}{\sqrt{d}})\]

B: Batch size
L: Keys Sequence length
M: Queries Sequence length
H: Number of heads
A: Feature dimension

Parameters:

Name	Type	Description	Default
`k`	`Tensor`	Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor	required
`q`	`Tensor`	Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor	required
`dk`	`int`	Model dimension	required
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.	`None`
`dropout`	`float`	Drop probability. Defaults to 0.2.	`0.2`
`training`	`bool`	Is module in training phase? Defaults to True.	`True`

Returns:

Type	Description
`Tensor`	torch.Tensor: [B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py

def attention_scores(
    k: torch.Tensor,
    q: torch.Tensor,
    dk: int,
    attention_mask: Optional[torch.Tensor] = None,
    dropout: float = 0.2,
    training: bool = True,
) -> torch.Tensor:
    r"""Calculate attention scores for scaled dot product attention

    $$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}})$$

    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        dk (int): Model dimension
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """
    scores = torch.matmul(q, k.transpose(-1, -2)) / math.sqrt(dk)

    if attention_mask is not None:
        scores = scores + ((1 - attention_mask) * -1e5)
    scores = F.softmax(scores, dim=-1)
    scores = F.dropout(scores, p=dropout, training=training)

    return scores

`merge_heads(x)`

Merge multiple attention heads into output tensor

(Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, H, L, A/H] multi-head tensor	required

Returns:

Type	Description
`Tensor`	torch.Tensor: [B, L, A] merged / reshaped tensor

Source code in slp/modules/attention.py

def merge_heads(x: torch.Tensor) -> torch.Tensor:
    """Merge multiple attention heads into output tensor

    (Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)

    Args:
        x (torch.Tensor): [B, H, L, A/H] multi-head tensor

    Returns:
        torch.Tensor:  [B, L, A] merged / reshaped tensor
    """
    batch_size, _, max_length, _ = x.size()
    # x => (B, L, H, A/H)
    x = x.permute(0, 2, 1, 3).contiguous()

    return x.view(batch_size, max_length, -1)

`nystrom_attention(k, q, v, dk, num_landmarks, attention_mask=None, inverse_iterations=6, dropout=0.2, training=True)`

Calculate attention using nystrom approximation

Implementation heavily based on: https://github.com/lucidrains/nystrom-attention

Reference: https://arxiv.org/abs/2102.03902 * B: Batch size * L: Keys Sequence length * M: Queries Sequence length * H: Number of heads * A: Feature dimension

Parameters:

Name	Type	Description	Default
`k`	`Tensor`	Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor	required
`q`	`Tensor`	Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor	required
`v`	`Tensor`	Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor	required
`dk`	`int`	Model dimension	required
`num_landmarks`	`int`	Number of landmark points	required
`attention_mask`	`Optional[torch.Tensor]`	Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None.	`None`
`inverse_iterations`	`int`	Number of iterations for Moore Penrose iterative inverse approximation	`6`
`dropout`	`float`	Drop probability. Defaults to 0.2.	`0.2`
`training`	`bool`	Is module in training phase? Defaults to True.	`True`

Returns:

Type	Description
`torch.Tensor`	[B, M, L] or [B, H, M, L] attention scores

Source code in slp/modules/attention.py

def nystrom_attention(
    k: torch.Tensor,
    q: torch.Tensor,
    v: torch.Tensor,
    dk: int,
    num_landmarks: int,
    attention_mask: Optional[torch.Tensor] = None,
    inverse_iterations: int = 6,
    dropout: float = 0.2,
    training: bool = True,
):
    """Calculate attention using nystrom approximation

    Implementation heavily based on: https://github.com/lucidrains/nystrom-attention

    Reference: https://arxiv.org/abs/2102.03902
    * B: Batch size
    * L: Keys Sequence length
    * M: Queries Sequence length
    * H: Number of heads
    * A: Feature dimension

    Args:
        k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
        q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
        v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
        dk (int): Model dimension
        num_landmarks (int): Number of landmark points
        attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
            tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
            preserved. Defaults to None.
        inverse_iterations (int): Number of iterations for Moore Penrose iterative inverse
            approximation
        dropout (float): Drop probability. Defaults to 0.2.
        training (bool): Is module in training phase? Defaults to True.

    Returns:
        torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
    """
    _, num_heads, seq_length, head_size = k.size()

    masked_mean_denom = seq_length // num_landmarks
    if attention_mask is not None:
        attention_mask = attention_mask.unsqueeze(1)
        masked_mean_denom = (
            attention_mask.reshape(-1, 1, num_landmarks, seq_length // num_landmarks).sum(-1) + 1e-8  # type: ignore
        )  # (B, 1, Landmarks)
        mask_landmarks = (masked_mean_denom > 0).type(torch.float)  # type: ignore
        masked_mean_denom = masked_mean_denom[..., None]  # type: ignore
        attention_mask = attention_mask.unsqueeze(-1)
        q = q * attention_mask  # (B, H, L, A/H)
        k = k * attention_mask  # (B, H, L, A/H)
        v = v * attention_mask  # (B, H, L, A/H)

        scores_1_mask = attention_mask * mask_landmarks[..., None, :]
        scores_2_mask = mask_landmarks[..., None] * mask_landmarks[..., None, :]
        scores_3_mask = scores_1_mask.transpose(-1, -2)

    q = q / math.sqrt(dk)

    q_landmarks = q.reshape(
        q.size(0),  # batch_size
        q.size(1),  # num_heads
        num_landmarks,  # landmarks
        seq_length // num_landmarks,  # reduced length
        q.size(-1),  # head_size
    ).sum(
        dim=-2
    )  # (B, H, Landmarks, A/H)

    k_landmarks = k.reshape(
        k.size(0),  # batch_size
        k.size(1),  # num_heads
        num_landmarks,  # landmarks
        seq_length // num_landmarks,  # reduced length
        k.size(-1),  # head size
    ).sum(
        dim=-2
    )  # (B, H, Landmarks, A/H)

    k_landmarks = k_landmarks / masked_mean_denom
    q_landmarks = q_landmarks / masked_mean_denom

    scores_1 = attention_scores(
        k_landmarks,
        q,
        1,  # We have already accounted for dk
        attention_mask=scores_1_mask,
        dropout=dropout,
        training=training,
    )

    scores_2 = attention_scores(
        k_landmarks,
        q_landmarks,
        1,  # We have already accounted for dk
        attention_mask=scores_2_mask,
        dropout=dropout,
        training=training,
    )

    scores_3 = attention_scores(
        k,
        q_landmarks,
        1,  # We have already accounted for dk
        attention_mask=scores_3_mask,
        dropout=dropout,
        training=training,
    )

    z_star = moore_penrose_pinv(scores_2, num_iter=inverse_iterations)
    out = (scores_1 @ z_star) @ (scores_3 @ v)

    return out, (scores_1, scores_2, scores_3)

`pad_for_nystrom(x, num_landmarks, attention_mask=None)`

Pad inputs and attention_mask to perform Nystrom Attention

Pad to nearest multiple of num_landmarks

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L, A] Input tensor	required
`num_landmarks`	`int`	Number of landmark points	required
`attention_mask`	`Optional[torch.Tensor]`	[B, L] Padding mask	`None`

Returns:

Type	Description
`Tuple[torch.Tensor, Optional[torch.Tensor]]`	Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask

Source code in slp/modules/attention.py

def pad_for_nystrom(
    x: torch.Tensor, num_landmarks: int, attention_mask: Optional[torch.Tensor] = None
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
    """Pad inputs and attention_mask to perform Nystrom Attention

    Pad to nearest multiple of num_landmarks

    Args:
        x (torch.Tensor): [B, L, A] Input tensor
        num_landmarks (int): Number of landmark points
        attention_mask (Optional[torch.Tensor]): [B, L] Padding mask

    Returns:
        Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask
    """
    if attention_mask is not None:
        attention_mask = attention_mask.squeeze()

    _, seq_length, _ = x.size()

    _, remainder = (
        math.ceil(seq_length / num_landmarks),
        seq_length % num_landmarks,
    )

    if remainder > 0:
        padding = num_landmarks - remainder
        x = F.pad(x, (0, 0, padding, 0), value=0)

        if attention_mask is not None:
            attention_mask = F.pad(attention_mask, (padding, 0))

    return x, attention_mask

`reset_parameters(named_parameters)`

Initialize parameters in the transformer model.

Source code in slp/modules/attention.py

def reset_parameters(named_parameters):
    """Initialize parameters in the transformer model."""

    for name, p in named_parameters:
        if "weight" in name:
            nn.init.xavier_normal_(p)

        if "bias" in name:
            nn.init.constant_(p, 0.0)

`split_heads(x, num_heads)`

Split input tensor into multiple attention heads

(Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L, A] input tensor	required
`num_heads`	`int`	number of heads	required

Returns:

Type	Description
`Tensor`	torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor

Source code in slp/modules/attention.py

def split_heads(x: torch.Tensor, num_heads: int) -> torch.Tensor:
    """Split input tensor into multiple attention heads

    (Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)

    Args:
        x (torch.Tensor): [B, L, A] input tensor
        num_heads (int): number of heads

    Returns:
        torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor
    """
    batch_size, max_length, attention_size = x.size()
    head_size = int(attention_size / num_heads)

    return x.view(batch_size, max_length, num_heads, head_size).permute(0, 2, 1, 3)

`Classifier`

`init(self, encoder, encoded_features, num_classes, dropout=0.2)` `special`

Classifier wrapper module

Stores a Neural Network encoder and adds a classification layer on top.

Parameters:

Name	Type	Description	Default
`encoder`	`Module`	[description]	required
`encoded_features`	`int`	[description]	required
`num_classes`	`int`	[description]	required
`dropout`	`float`	Drop probability	`0.2`

Source code in slp/modules/classifier.py

def __init__(
    self,
    encoder: nn.Module,
    encoded_features: int,
    num_classes: int,
    dropout: float = 0.2,
):
    """Classifier wrapper module

    Stores a Neural Network encoder and adds a classification layer on top.

    Args:
        encoder (nn.Module): [description]
        encoded_features (int): [description]
        num_classes (int): [description]
        dropout (float): Drop probability
    """
    super(Classifier, self).__init__()
    self.encoder = encoder
    self.drop = nn.Dropout(dropout)
    self.clf = nn.Linear(encoded_features, num_classes)

`forward(self, *args, **kwargs)`

Encode inputs using the encoder network and perform classification

Returns:

Type	Description
`Tensor`	torch.Tensor: [B, *, num_classes] Logits tensor

Source code in slp/modules/classifier.py

def forward(self, *args, **kwargs) -> torch.Tensor:
    """Encode inputs using the encoder network and perform classification

    Returns:
        torch.Tensor: [B, *, num_classes] Logits tensor
    """
    encoded: torch.Tensor = self.encoder(*args, **kwargs)  # type: ignore
    out: torch.Tensor = self.drop(encoded)
    out = self.clf(out)

    return out

`MOSEITextClassifier`

`forward(self, x, lengths)`

Encode inputs using the encoder network and perform classification

Returns:

Type	Description
`torch.Tensor`	[B, *, num_classes] Logits tensor

Source code in slp/modules/classifier.py

def forward(self, x, lengths):
    x = x["text"]
    lengths = lengths["text"]

    return super().forward(x, lengths)

`RNNLateFusionClassifier`

`forward(self, inputs, lengths)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/classifier.py

def forward(self, inputs, lengths):
    encoded = [
        self.modality_encoders[m](inputs[m], lengths[m]) for m in self.modalities
    ]
    if self.mmdrop is not None:
        encoded = self.mmdrop(*encoded)
    fused = torch.cat(encoded, dim=-1)
    fused = self.drop(fused)
    out = self.clf(fused)

    return out

`TransformerLateFusionClassifier`

`forward(self, inputs, attention_masks=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/classifier.py

def forward(self, inputs, attention_masks=None):
    if attention_masks is None:
        attention_masks = dict(
            zip(self.modalities, [None for _ in self.modalities])
        )

    encoded = [
        self.modality_encoders[m](inputs[m], attention_mask=attention_masks[m])
        for m in self.modalities
    ]

    if self.mmdrop is not None:
        encoded = self.mmdrop(*encoded)
    fused = torch.cat(encoded, dim=-1)
    if self.modality_drop is not None:
        fused = self.modality_drop(fused)

    out = self.clf(fused)

    return out

`Embed`

`init(self, num_embeddings, embedding_dim, embeddings=None, noise=0.0, dropout=0.0, scale=1.0, trainable=False)` `special`

Define the layer of the model and perform the initializations of the layers (wherever it is necessary)

Parameters:

Name	Type	Description	Default
`num_embeddings`	`int`	Total number of embeddings.	required
`embedding_dim`	`int`	Embedding dimension.	required
`embeddings`	`Optional[numpy.ndarray]`	the 2D ndarray with the word vectors.	`None`
`noise`	`float`	Optional additive noise. Defaults to 0.0.	`0.0`
`dropout`	`float`	Embedding dropout probability. Defaults to 0.0.	`0.0`
`scale`	`float`	Scale word embeddings by a constant. Defaults to 1.0.	`1.0`
`trainable`	`bool`	Finetune embeddings. Defaults to False	`False`

Source code in slp/modules/embed.py

def __init__(
    self,
    num_embeddings: int,
    embedding_dim: int,
    embeddings: Optional[np.ndarray] = None,
    noise: float = 0.0,
    dropout: float = 0.0,
    scale: float = 1.0,
    trainable: bool = False,
):
    """
    Define the layer of the model and perform the initializations
    of the layers (wherever it is necessary)

    Args:
        num_embeddings (int): Total number of embeddings.
        embedding_dim (int): Embedding dimension.
        embeddings (numpy.ndarray): the 2D ndarray with the word vectors.
        noise (float): Optional additive noise. Defaults to 0.0.
        dropout (float): Embedding dropout probability. Defaults to 0.0.
        scale (float): Scale word embeddings by a constant. Defaults to 1.0.
        trainable (bool): Finetune embeddings. Defaults to False
    """
    super(Embed, self).__init__()
    self.scale = scale  # scale embeddings by value. Needed for transformer
    # define the embedding layer, with the corresponding dimensions
    self.embedding = nn.Embedding(
        num_embeddings=num_embeddings, embedding_dim=embedding_dim
    )

    if embeddings is not None:
        logger.info("Initializing Embedding layer with pre-trained weights.")
        if trainable:
            logger.info("Embeddings are going to be finetuned")
        else:
            logger.info("Embeddings are frozen")
        self.init_embeddings(embeddings, trainable)

    # the dropout "layer" for the word embeddings
    self.dropout = nn.Dropout(dropout)

    # the gaussian noise "layer" for the word embeddings
    self.noise = GaussianNoise(noise)

`forward(self, x)`

Embed input tokens

Assign embedding that corresponds to each token. Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L] Input token ids.	required

Returns:

Type	Description
`Tensor`	(torch.Tensor) -> [B, L, E] Embedded tokens.

Source code in slp/modules/embed.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Embed input tokens

    Assign embedding that corresponds to each token.
    Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.

    Args:
        x (torch.Tensor): [B, L] Input token ids.

    Returns:
        (torch.Tensor) -> [B, L, E] Embedded tokens.
    """
    embeddings = self.embedding(x)

    if self.noise.stddev > 0:
        embeddings = self.noise(embeddings)

    if self.dropout.p > 0:
        embeddings = self.dropout(embeddings)

    return embeddings * self.scale  # type: ignore

`init_embeddings(self, weights, trainable)`

Initialize embeddings matrix with pretrained embeddings

Parameters:

Name	Type	Description	Default
`weights`	`ndarray`	pretrained embeddings	required
`trainable`	`bool`	Finetune embeddings?	required

Source code in slp/modules/embed.py

def init_embeddings(self, weights: np.ndarray, trainable: bool):
    """Initialize embeddings matrix with pretrained embeddings

    Args:
        weights (np.ndarray): pretrained embeddings
        trainable (bool): Finetune embeddings?
    """
    self.embedding.weight = nn.Parameter(
        torch.from_numpy(weights), requires_grad=trainable
    )

`PositionalEncoding`

`init(self, embedding_dim=512, max_len=5000)` `special`

Inject some information about the relative or absolute position of the tokens in the sequence.

The positional encodings have the same dimension as the embeddings, so that the two can be summed. Here, we use sine and cosine functions of different frequencies.

PE for even positions:

\[\text{PosEncoder}(pos, 2i) = sin(\frac{pos}{10000^{\frac{2i}{d}}})\]

PE for odd positions:

\[\text{PosEncoder}(pos, 2i+1) = cos(\frac{pos}{10000^{\frac{2i}{d}}})\]

where \(pos\) is the word position and \(i\) is the embedding idx

Implementation modified from pytorch/examples/word_language_model.py

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int`	Embedding / model dimension. Defaults to 512.	`512`
`max_len`	`int`	Maximum sequence length that can be encoded. Defaults to 5000.	`5000`

Source code in slp/modules/embed.py

def __init__(self, embedding_dim: int = 512, max_len: int = 5000):
    r"""Inject some information about the relative or absolute position of the tokens in the sequence.

    The positional encodings have the same dimension as
    the embeddings, so that the two can be summed. Here, we use sine and cosine
    functions of different frequencies.

    PE for even positions:

    $$\text{PosEncoder}(pos, 2i) = sin(\frac{pos}{10000^{\frac{2i}{d}}})$$

    PE for odd positions:

    $$\text{PosEncoder}(pos, 2i+1) = cos(\frac{pos}{10000^{\frac{2i}{d}}})$$

    where $pos$ is the word position and $i$ is the embedding idx

    Implementation modified from pytorch/examples/word_language_model.py

    Args:
        embedding_dim (int): Embedding / model dimension. Defaults to 512.
        max_len (int): Maximum sequence length that can be encoded. Defaults to 5000.
    """
    super(PositionalEncoding, self).__init__()
    pe = torch.zeros(max_len, embedding_dim)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(
        torch.arange(0, embedding_dim, 2).float()
        * (-math.log(10000.0) / embedding_dim)
    )
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    pe = pe.unsqueeze(0)
    self.register_buffer("pe", pe)

`forward(self, x)`

Calculate positional embeddings for input and add them to input tensor

\[out = x + PosEmbed(x)\]

x is assumed to be batch first

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L, D] input embeddings	required

Returns:

Type	Description
`Tensor`	torch.Tensor: Embeddings + positional embeddings

Source code in slp/modules/embed.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Calculate positional embeddings for input and add them to input tensor

    $$out = x + PosEmbed(x)$$

    x is assumed to be batch first

    Args:
        x (torch.Tensor): [B, L, D] input embeddings

    Returns:
        torch.Tensor: Embeddings + positional embeddings
    """
    x = x + self.pe[:, : x.size(1), :]  # type: ignore
    return x

`PositionwiseFF`

`init(self, d_model, d_ff, dropout=0.1, gelu=False)` `special`

Transformer Position-wise feed-forward layer

Linear -> LayerNorm -> ReLU -> Linear

Parameters:

Name	Type	Description	Default
`d_model`	`int`	Model dimension	required
`d_ff`	`int`	Hidden dimension	required
`dropout`	`float`	Dropout probability. Defaults to 0.1.	`0.1`

Source code in slp/modules/feedforward.py

def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1, gelu=False):
    """Transformer Position-wise feed-forward layer

    Linear -> LayerNorm -> ReLU -> Linear

    Args:
        d_model (int): Model dimension
        d_ff (int): Hidden dimension
        dropout (float): Dropout probability. Defaults to 0.1.
    """
    super(PositionwiseFF, self).__init__()
    self.ff1 = nn.Linear(d_model, d_ff)
    self.ff2 = nn.Linear(d_ff, d_model)
    self.drop = nn.Dropout(dropout)
    self.activation = nn.ReLU() if not gelu else nn.GELU()

`forward(self, x)`

Position-wise FF forward pass

\[out = W_2 \dot max(0, W_1 \dot x + b_1) + b_2\]

[B, , D] -> [B, , H] -> [B, *, D]

B: Batch size
D: Model dim
H: Hidden size > Model dim (Usually \(H = 2D\))

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, *, D] Input features	required

Returns:

Type	Description
`Tensor`	torch.Tensor: [B, *, D] Output features

Source code in slp/modules/feedforward.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    r"""Position-wise FF forward pass

    $$out = W_2 \dot max(0, W_1 \dot x + b_1) + b_2$$

    [B, *, D] -> [B, *, H] -> [B, *, D]

    * B: Batch size
    * D: Model dim
    * H: Hidden size > Model dim (Usually $H = 2D$)

    Args:
        x (torch.Tensor): [B, *, D] Input features

    Returns:
        torch.Tensor: [B, *, D] Output features
    """
    out: torch.Tensor = self.ff2(self.drop(self.activation(self.ff1(x))))
    return out

`TwoLayer`

`forward(self, x)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/feedforward.py

def forward(self, x):
    out = self.l1(x)
    out = self.drop(out)
    out = self.act(out)
    out = self.l2(out)
    out = self.drop(out)

    if self.residual:
        out = x + out

    return out

`LayerNormTf`

`init(self, hidden_size, eps=1e-12)` `special`

Construct a layernorm module in the TF style (epsilon inside the square root). Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234

Source code in slp/modules/norm.py

def __init__(self, hidden_size: int, eps: float = 1e-12):
    """Construct a layernorm module in the TF style (epsilon inside the square root).
    Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234
    """
    super(LayerNormTf, self).__init__()
    self.weight = nn.Parameter(torch.ones(hidden_size))
    self.bias = nn.Parameter(torch.zeros(hidden_size))
    self.variance_epsilon = eps

`forward(self, x)`

Calculate Layernorm the tf way

Source code in slp/modules/norm.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Calculate Layernorm the tf way"""
    u = x.mean(-1, keepdim=True)
    s = (x - u).pow(2).mean(-1, keepdim=True)
    x = (x - u) / torch.sqrt(s + self.variance_epsilon)

    return self.weight * x + self.bias

`ScaleNorm`

`forward(self, x)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/norm.py

def forward(self, x: torch.Tensor):
    scaled_norm = self.g / safe_norm(x, dim=-1, keepdim=True).clamp(min=self.eps)

    return scaled_norm * x

`GaussianNoise`

`init(self, stddev, mean=0.0)` `special`

Additive Gaussian Noise layer

Parameters:

Name	Type	Description	Default
`stddev`	`float`	the standard deviation of the distribution	required
`mean`	`float`	the mean of the distribution	`0.0`

Source code in slp/modules/regularization.py

def __init__(self, stddev: float, mean: float = 0.0):
    """Additive Gaussian Noise layer

    Args:
        stddev (float): the standard deviation of the distribution
        mean (float): the mean of the distribution
    """
    super().__init__()
    self.stddev = stddev
    self.mean = mean

`repr(self)` `special`

String representation of class

Source code in slp/modules/regularization.py

def __repr__(self):
    """String representation of class"""
    return "{} (mean={}, stddev={})".format(
        self.__class__.__name__, str(self.mean), str(self.stddev)
    )

`forward(self, x)`

Gaussian noise forward pass

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input features.	required

Returns:

Type	Description
`Tensor`

Source code in slp/modules/regularization.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Gaussian noise forward pass

    Args:
        x (torch.Tensor): Input features.

    Returns:
        [type]: [description]
    """
    if self.training:
        noise = Variable(x.data.new(x.size()).normal_(self.mean, self.stddev))
        return x + noise
    return x

`AttentiveRNN`

`init(self, input_size, hidden_size=256, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False)` `special`

RNN with embedding layer and optional attention mechanism

Single-headed scaled dot-product attention is used as an attention mechanism

Parameters:

Name	Type	Description	Default
`input_size`	`int`	Input features dimension	required
`hidden_size`	`int`	Hidden features	`256`
`batch_first`	`bool`	Use batch first representation type. Defaults to True.	`True`
`layers`	`int`	Number of RNN layers. Defaults to 1.	`1`
`bidirectional`	`bool`	Use bidirectional RNNs. Defaults to False.	`False`
`merge_bi`	`str`	How bidirectional states are merged. Defaults to "cat".	`'cat'`
`dropout`	`float`	Dropout probability. Defaults to 0.0.	`0.1`
`rnn_type`	`str`	lstm or gru. Defaults to "lstm".	`'lstm'`
`packed_sequence`	`bool`	Use packed sequences. Defaults to True.	`True`
`max_length`	`int`	Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch	`-1`
`attention`	`bool`	Use attention mechanism. Defaults to False	`False`
`num_heads`	`int`	Number of attention heads. If 1 uses single headed attention	`1`
`nystrom`	`bool`	Use nystrom approximation for multihead attention	`True`
`num_landmarks`	`int`	Number of landmark sequence elements for nystrom attention	`32`
`kernel_size`	`Optional[int]`	Kernel size for multihead attention output residual convolution	`33`
`inverse_iterations`	`int`	Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value	`6`
`return_hidden`	`bool`	Return all hidden states. Defaults to False.	`False`

Source code in slp/modules/rnn.py

def __init__(
    self,
    input_size: int,
    hidden_size: int = 256,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.1,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    attention: bool = False,
    max_length: int = -1,
    num_heads: int = 1,
    nystrom: bool = True,
    num_landmarks: int = 32,
    kernel_size: Optional[int] = 33,
    inverse_iterations: int = 6,
    return_hidden: bool = False,
):
    """RNN with embedding layer and optional attention mechanism

    Single-headed scaled dot-product attention is used as an attention mechanism

    Args:
        input_size (int): Input features dimension
        hidden_size (int): Hidden features
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
        max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
            largest sequence length in this batch
        attention (bool): Use attention mechanism. Defaults to False
        num_heads (int): Number of attention heads. If 1 uses single headed attention
        nystrom (bool): Use nystrom approximation for multihead attention
        num_landmarks (int): Number of landmark sequence elements for nystrom attention
        kernel_size (int): Kernel size for multihead attention output residual convolution
        inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
            in nystrom attention. 6 is a good value
        return_hidden (bool): Return all hidden states. Defaults to False.
    """
    super(AttentiveRNN, self).__init__()
    self.rnn = RNN(
        input_size,  # type: ignore
        hidden_size,
        batch_first=batch_first,
        layers=layers,
        merge_bi=merge_bi,
        bidirectional=bidirectional,
        dropout=dropout,
        rnn_type=rnn_type,
        packed_sequence=packed_sequence,
        max_length=max_length,
    )
    self.out_size = (
        hidden_size
        if not (bidirectional and merge_bi == "cat")
        else 2 * hidden_size
    )
    self.batch_first = batch_first
    self.return_hidden = return_hidden

    self.attention = None

    if attention:
        if num_heads == 1:
            self.attention = Attention(
                attention_size=self.out_size, dropout=dropout
            )
        else:
            self.attention = MultiheadAttention(  # type: ignore
                attention_size=self.out_size,
                num_heads=num_heads,
                kernel_size=kernel_size,
                nystrom=nystrom,
                num_landmarks=num_landmarks,
                inverse_iterations=inverse_iterations,
                dropout=dropout,
            )

`forward(self, x, lengths)`

Attentive RNN forward pass

If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L] Input token ids	required
`lengths`	`Tensor`	[B] Original sequence lengths	required

Returns:

Type	Description
`Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]`	Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]: if return_hidden == False: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification if return_hidden == True: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification, and a tensor of all the hidden states

Source code in slp/modules/rnn.py

def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    """Attentive RNN forward pass

    If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
    Else the output is the last hidden state of the RNN.

    Args:
        x (torch.Tensor): [B, L] Input token ids
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
            if return_hidden == False: Returns a tensor [B, H] or [B, 2*H] of output features to be used for classification
            if return_hidden == True: Returns a tensor [B, H] or [B, 2*H] of output features to
                be used for classification, and a tensor of all the hidden states
    """
    states, last_hidden, _ = self.rnn(x, lengths)

    out: torch.Tensor = last_hidden

    if self.attention is not None:
        states, _ = self.attention(
            states,
            attention_mask=pad_mask(
                lengths,
                max_length=states.size(1) if self.batch_first else states.size(0),
            ),
        )
        out = states.mean(dim=1)

    if self.return_hidden:
        return out, states
    else:
        return out

`RNN`

`out_size: int` `property` `readonly`

RNN output features size

Returns:

Type	Description
`int`	int: RNN output features size

`init(self, input_size, hidden_size, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.0, rnn_type='lstm', packed_sequence=True, max_length=-1)` `special`

LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states

It is recommended to run with batch_first=True because the rest of the code is built with this assumption

Parameters:

Name	Type	Description	Default
`input_size`	`int`	Input features.	required
`hidden_size`	`int`	Hidden features.	required
`batch_first`	`bool`	Use batch first representation type. Defaults to True.	`True`
`layers`	`int`	Number of RNN layers. Defaults to 1.	`1`
`bidirectional`	`bool`	Use bidirectional RNNs. Defaults to False.	`False`
`merge_bi`	`str`	How bidirectional states are merged. Defaults to "cat".	`'cat'`
`dropout`	`float`	Dropout probability. Defaults to 0.0.	`0.0`
`rnn_type`	`str`	lstm or gru. Defaults to "lstm".	`'lstm'`
`packed_sequence`	`bool`	Use packed sequences. Defaults to True.	`True`

Source code in slp/modules/rnn.py

def __init__(
    self,
    input_size: int,
    hidden_size: int,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.0,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    max_length: int = -1,
):
    """LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states

    It is recommended to run with batch_first=True because the rest of the code is built with this assumption

    Args:
        input_size (int): Input features.
        hidden_size (int): Hidden features.
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
    """
    super(RNN, self).__init__()
    self.bidirectional = bidirectional
    self.hidden_size = hidden_size
    self.batch_first = batch_first
    self.merge_bi = merge_bi
    self.rnn_type = rnn_type.lower()

    if not batch_first:
        logger.warning(
            "You are running RNN with batch_first=False. Make sure this is really what you want"
        )

    if not packed_sequence:
        logger.warning(
            "You have set packed_sequence=False. Running with packed_sequence=True will be much faster"
        )

    rnn_cls = nn.LSTM if self.rnn_type == "lstm" else nn.GRU
    self.rnn = rnn_cls(
        input_size,
        hidden_size,
        batch_first=batch_first,
        num_layers=layers,
        bidirectional=bidirectional,
    )
    self.drop = nn.Dropout(dropout)
    self.packed_sequence = packed_sequence

    if packed_sequence:
        self.pack = PackSequence(batch_first=batch_first)
        self.unpack = PadPackedSequence(
            batch_first=batch_first, max_length=max_length
        )

`forward(self, x, lengths)`

RNN forward pass

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L, D] Input features	required
`lengths`	`Tensor`	[B] Original sequence lengths	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor, Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]]`	Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: ( merged forward and backward states [B, L, H] or [B, L, 2H], merged last forward and backward state [B, H] or [B, 2H], hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU )

Source code in slp/modules/rnn.py

def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[
    torch.Tensor,
    torch.Tensor,
    Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
]:
    """RNN forward pass

    Args:
        x (torch.Tensor): [B, L, D] Input features
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: (
            merged forward and backward states [B, L, H] or [B, L, 2*H],
            merged last forward and backward state [B, H] or [B, 2*H],
            hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU
        )
    """
    self.rnn.flatten_parameters()

    if self.packed_sequence:
        # Latest pytorch allows only cpu tensors for packed sequence
        lengths = lengths.to("cpu")
        x, lengths = self.pack(x, lengths)
    out, hidden = self.rnn(x)

    if self.packed_sequence:
        out = self.unpack(out, lengths)
    out = self.drop(out)
    lengths = lengths.to(out.device)

    out, last_timestep = self._final_output(out, lengths)

    return out, last_timestep, hidden

`TokenRNN`

`init(self, hidden_size=256, vocab_size=None, embeddings_dim=None, embeddings=None, embeddings_dropout=0.0, finetune_embeddings=False, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False)` `special`

RNN with embedding layer and optional attention mechanism

Single-headed scaled dot-product attention is used as an attention mechanism

Parameters:

Name	Type	Description	Default
`hidden_size`	`int`	Hidden features	`256`
`vocab_size`	`Optional[int]`	Vocabulary size. Defaults to None.	`None`
`embeddings_dim`	`Optional[int]`	Embedding dimension. Defaults to None.	`None`
`embeddings`	`Optional[numpy.ndarray]`	Embedding matrix. Defaults to None.	`None`
`embeddings_dropout`	`float`	Embedding dropout probability. Defaults to 0.0.	`0.0`
`finetune_embeddings`	`bool`	Finetune embeddings? Defaults to False.	`False`
`batch_first`	`bool`	Use batch first representation type. Defaults to True.	`True`
`layers`	`int`	Number of RNN layers. Defaults to 1.	`1`
`bidirectional`	`bool`	Use bidirectional RNNs. Defaults to False.	`False`
`merge_bi`	`str`	How bidirectional states are merged. Defaults to "cat".	`'cat'`
`dropout`	`float`	Dropout probability. Defaults to 0.0.	`0.1`
`rnn_type`	`str`	lstm or gru. Defaults to "lstm".	`'lstm'`
`packed_sequence`	`bool`	Use packed sequences. Defaults to True.	`True`
`max_length`	`int`	Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch	`-1`
`attention`	`bool`	Use attention mechanism. Defaults to False	`False`
`num_heads`	`int`	Number of attention heads. If 1 uses single headed attention	`1`
`nystrom`	`bool`	Use nystrom approximation for multihead attention	`True`
`num_landmarks`	`int`	Number of landmark sequence elements for nystrom attention	`32`
`kernel_size`	`Optional[int]`	Kernel size for multihead attention output residual convolution	`33`
`inverse_iterations`	`int`	Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value	`6`

Source code in slp/modules/rnn.py

def __init__(
    self,
    hidden_size: int = 256,
    vocab_size: Optional[int] = None,
    embeddings_dim: Optional[int] = None,
    embeddings: Optional[np.ndarray] = None,
    embeddings_dropout: float = 0.0,
    finetune_embeddings: bool = False,
    batch_first: bool = True,
    layers: int = 1,
    bidirectional: bool = False,
    merge_bi: str = "cat",
    dropout: float = 0.1,
    rnn_type: str = "lstm",
    packed_sequence: bool = True,
    attention: bool = False,
    max_length: int = -1,
    num_heads: int = 1,
    nystrom: bool = True,
    num_landmarks: int = 32,
    kernel_size: Optional[int] = 33,
    inverse_iterations: int = 6,
    return_hidden=False,
):
    """RNN with embedding layer and optional attention mechanism

    Single-headed scaled dot-product attention is used as an attention mechanism

    Args:
        hidden_size (int): Hidden features
        vocab_size (Optional[int]): Vocabulary size. Defaults to None.
        embeddings_dim (Optional[int]): Embedding dimension. Defaults to None.
        embeddings (Optional[np.ndarray]): Embedding matrix. Defaults to None.
        embeddings_dropout (float): Embedding dropout probability. Defaults to 0.0.
        finetune_embeddings (bool): Finetune embeddings? Defaults to False.
        batch_first (bool): Use batch first representation type. Defaults to True.
        layers (int): Number of RNN layers. Defaults to 1.
        bidirectional (bool): Use bidirectional RNNs. Defaults to False.
        merge_bi (str): How bidirectional states are merged. Defaults to "cat".
        dropout (float): Dropout probability. Defaults to 0.0.
        rnn_type (str): lstm or gru. Defaults to "lstm".
        packed_sequence (bool): Use packed sequences. Defaults to True.
        max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
            largest sequence length in this batch
        attention (bool): Use attention mechanism. Defaults to False
        num_heads (int): Number of attention heads. If 1 uses single headed attention
        nystrom (bool): Use nystrom approximation for multihead attention
        num_landmarks (int): Number of landmark sequence elements for nystrom attention
        kernel_size (int): Kernel size for multihead attention output residual convolution
        inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
            in nystrom attention. 6 is a good value
    """
    super(TokenRNN, self).__init__()

    if embeddings is None:
        finetune_embeddings = True
        assert (
            vocab_size is not None
        ), "You should either pass an embeddings matrix or vocab size"
        assert (
            embeddings_dim is not None
        ), "You should either pass an embeddings matrix or embeddings_dim"
    else:
        vocab_size = embeddings.shape[0]
        embeddings_dim = embeddings.shape[1]

    self.embed = Embed(
        vocab_size,  # type: ignore
        embeddings_dim,  # type: ignore
        embeddings=embeddings,
        dropout=embeddings_dropout,
        scale=hidden_size ** 0.5,
        trainable=finetune_embeddings,
    )
    self.encoder = AttentiveRNN(
        embeddings_dim,  # type: ignore
        hidden_size,
        batch_first=batch_first,
        layers=layers,
        bidirectional=bidirectional,
        merge_bi=merge_bi,
        dropout=dropout,
        rnn_type=rnn_type,
        packed_sequence=packed_sequence,
        attention=attention,
        max_length=max_length,
        num_heads=num_heads,
        nystrom=nystrom,
        num_landmarks=num_landmarks,
        kernel_size=kernel_size,
        inverse_iterations=inverse_iterations,
        return_hidden=return_hidden,
    )

    self.out_size = self.encoder.out_size

`forward(self, x, lengths)`

Token RNN forward pass

If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	[B, L] Input token ids	required
`lengths`	`Tensor`	[B] Original sequence lengths	required

Returns:

Type	Description
`Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]`	torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification

Source code in slp/modules/rnn.py

def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    """Token RNN forward pass

    If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
    Else the output is the last hidden state of the RNN.

    Args:
        x (torch.Tensor): [B, L] Input token ids
        lengths (torch.Tensor): [B] Original sequence lengths

    Returns:
        torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification
    """
    x = self.embed(x)
    out = self.encoder(x, lengths)

    return out  # type: ignore

`Decoder`

`forward(self, target, encoded, source_mask=None, target_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, target, encoded, source_mask=None, target_mask=None):

    for l in self.decoder:
        target = l(
            target, encoded, source_mask=source_mask, target_mask=target_mask
        )

    return target

`DecoderLayer`

`forward(self, targets, encoded, source_mask=None, target_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, targets, encoded, source_mask=None, target_mask=None):
    targets = self.in_layer(targets, attention_mask=target_mask)
    out = self.fuse_layer(encoded, targets, attention_mask=source_mask)
    out = self.out_layer(out)

    return out

`Encoder`

`forward(self, x, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, attention_mask=None):
    for layer in self.encoder:
        x = layer(x, attention_mask=attention_mask)

    return x

`EncoderDecoder`

`forward(self, source, target, source_mask=None, target_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, source, target, source_mask=None, target_mask=None):
    encoded = self.encoder(source, attention_mask=source_mask)
    decoded = self.decoder(
        target, encoded, source_mask=source_mask, target_mask=target_mask
    )

    return decoded

`EncoderLayer`

`forward(self, x, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, attention_mask=None):
    out = self.l1(x, attention_mask=attention_mask)
    out = self.l2(out)

    return out

`Sublayer1`

`forward(self, x, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, attention_mask=None):
    return (
        self._prenorm(x, attention_mask=attention_mask)
        if self.prenorm
        else self._postnorm(x, attention_mask=attention_mask)
    )

`Sublayer2`

`forward(self, x)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x):
    return self._prenorm(x) if self.prenorm else self._postnorm(x)

`Sublayer3`

`forward(self, x, y, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, y, attention_mask=None):
    return (
        self._prenorm(x, y, attention_mask=attention_mask)
        if self.prenorm
        else self._postnorm(x, y, attention_mask=attention_mask)
    )

`Transformer`

`forward(self, source, target, source_mask=None, target_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, source, target, source_mask=None, target_mask=None):
    source = self.embed(source)
    target = self.embed(target)
    # Adding embeddings + pos embeddings
    # is done in PositionalEncoding class
    source = self.pe(source)
    target = self.pe(target)
    out = self.transformer_block(
        source, target, source_mask=source_mask, target_mask=target_mask
    )
    out = self.drop(out)
    out = self.predict(out)

    return out

`TransformerSequenceEncoder`

`forward(self, x, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, attention_mask=None):
    if self.feature_norm:
        x = self.feature_norm(x)

    x = self.embed(x)
    x = self.pe(x)
    out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)

    return out

`TransformerTokenSequenceEncoder`

`forward(self, x, attention_mask=None)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/modules/transformer.py

def forward(self, x, attention_mask=None):
    x = self.embed(x)
    x = self.pe(x)
    out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)

    return out

`reset_parameters(named_parameters, gain=1.0)`

Initialize parameters in the transformer model.

Source code in slp/modules/transformer.py

def reset_parameters(named_parameters, gain=1.0):
    """Initialize parameters in the transformer model."""

    for name, p in named_parameters:
        if p.dim() > 1:
            if "weight" in name:
                nn.init.xavier_normal_(p, gain=gain)

            if "bias" in name:
                nn.init.constant_(p, 0.0)

`PLDataModuleFromCorpus`

`embeddings: Optional[numpy.ndarray]` `property` `readonly`

Embeddings matrix

Returns:

Type	Description
`Optional[numpy.ndarray]`	Optional[np.ndarray]: Embeddings matrix

`vocab_size: int` `property` `readonly`

Number of tokens in the vocabulary

Returns:

Type	Description
`int`	int: Number of tokens in the vocabulary

`init(self, train, train_labels=None, val=None, val_labels=None, test=None, test_labels=None, val_percent=0.2, test_percent=0.2, batch_size=64, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, shuffle_eval=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, collate_fn=None, language_model=False, tokenizer='spacy', no_test_set=False, **corpus_args)` `special`

Wrap raw corpus in a LightningDataModule

This handles the selection of the appropriate corpus class based on the tokenizer argument.
If language_model=True it uses the appropriate dataset from slp.data.datasets.
Uses the PLDataModuleFromDatasets to split the val and test sets if not provided

Parameters:

Name	Type	Description	Default
`train`	`List`	Raw train corpus	required
`train_labels`	`Optional[List]`	Train labels. Defaults to None.	`None`
`val`	`Optional[List]`	Raw validation corpus. Defaults to None.	`None`
`val_labels`	`Optional[List]`	Validation labels. Defaults to None.	`None`
`test`	`Optional[List]`	Raw test corpus. Defaults to None.	`None`
`test_labels`	`Optional[List]`	Test labels. Defaults to None.	`None`
`val_percent`	`float`	Percent of train to be used for validation if no validation set is given. Defaults to 0.2.	`0.2`
`test_percent`	`float`	Percent of train to be used for test set if no test set is given. Defaults to 0.2.	`0.2`
`batch_size`	`int`	Training batch size. Defaults to 1.	`64`
`batch_size_eval`	`int`	Validation and test batch size. Defaults to None.	`None`
`seed`	`int`	Seed for deterministic run. Defaults to None.	`None`
`num_workers`	`int`	Number of workers in the DataLoader. Defaults to 1.	`1`
`pin_memory`	`bool`	Pin tensors to GPU memory. Defaults to True.	`True`
`drop_last`	`bool`	Drop last incomplete batch. Defaults to False.	`False`
`sampler_train`	`Sampler`	Sampler for train loader. Defaults to None.	`None`
`sampler_val`	`Sampler`	Sampler for validation loader. Defaults to None.	`None`
`sampler_test`	`Sampler`	Sampler for test loader. Defaults to None.	`None`
`batch_sampler_train`	`BatchSampler`	Batch sampler for train loader. Defaults to None.	`None`
`batch_sampler_val`	`BatchSampler`	Batch sampler for validation loader. Defaults to None.	`None`
`batch_sampler_test`	`BatchSampler`	Batch sampler for test loader. Defaults to None.	`None`
`shuffle_eval`	`bool`	Shuffle validation and test dataloaders. Defaults to False.	`False`
`collate_fn`	`Optional[Callable[..., Any]]`	Collator function. Defaults to None.	`None`
`language_model`	`bool`	Use corpus for Language Modeling. Defaults to False.	`False`
`tokenizer`	`str`	Select one of the cls.accepted_tokenizers. Defaults to "spacy".	`'spacy'`
`no_test_set`	`bool`	Do not create test set. Useful for tuning	`False`
`**corpus_args`	`kwargs`	Extra arguments to be passed to the corpus. See slp/data/corpus.py	`{}`

Exceptions:

Type	Description
`ValueError`	[description]
`ValueError`	[description]

Source code in slp/plbind/dm.py

def __init__(
    self,
    train: List,
    train_labels: Optional[List] = None,
    val: Optional[List] = None,
    val_labels: Optional[List] = None,
    test: Optional[List] = None,
    test_labels: Optional[List] = None,
    val_percent: float = 0.2,
    test_percent: float = 0.2,
    batch_size: int = 64,
    batch_size_eval: int = None,
    seed: int = None,
    num_workers: int = 1,
    pin_memory: bool = True,
    drop_last: bool = False,
    shuffle_eval: bool = False,
    sampler_train: Sampler = None,
    sampler_val: Sampler = None,
    sampler_test: Sampler = None,
    batch_sampler_train: BatchSampler = None,
    batch_sampler_val: BatchSampler = None,
    batch_sampler_test: BatchSampler = None,
    collate_fn: Optional[Callable[..., Any]] = None,
    language_model: bool = False,
    tokenizer: str = "spacy",
    no_test_set: bool = False,
    **corpus_args,
):
    """Wrap raw corpus in a LightningDataModule

    * This handles the selection of the appropriate corpus class based on the tokenizer argument.
    * If language_model=True it uses the appropriate dataset from slp.data.datasets.
    * Uses the PLDataModuleFromDatasets to split the val and test sets if not provided

    Args:
        train (List): Raw train corpus
        train_labels (Optional[List]): Train labels. Defaults to None.
        val (Optional[List]): Raw validation corpus. Defaults to None.
        val_labels (Optional[List]): Validation labels. Defaults to None.
        test (Optional[List]): Raw test corpus. Defaults to None.
        test_labels (Optional[List]): Test labels. Defaults to None.
        val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
        test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
        batch_size (int): Training batch size. Defaults to 1.
        batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
        seed (Optional[int]): Seed for deterministic run. Defaults to None.
        num_workers (int): Number of workers in the DataLoader. Defaults to 1.
        pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
        drop_last (bool): Drop last incomplete batch. Defaults to False.
        sampler_train (Sampler): Sampler for train loader. Defaults to None.
        sampler_val (Sampler): Sampler for validation loader. Defaults to None.
        sampler_test (Sampler): Sampler for test loader. Defaults to None.
        batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
        batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
        batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
        shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
        collate_fn (Callable[..., Any]): Collator function. Defaults to None.
        language_model (bool): Use corpus for Language Modeling. Defaults to False.
        tokenizer (str): Select one of the cls.accepted_tokenizers. Defaults to "spacy".
        no_test_set (bool): Do not create test set. Useful for tuning
        **corpus_args (kwargs): Extra arguments to be passed to the corpus. See
            slp/data/corpus.py
    Raises:
        ValueError: [description]
        ValueError: [description]
    """
    self.language_model = language_model
    self.tokenizer = tokenizer
    self.corpus_args = corpus_args

    train_data, val_data, test_data = self._zip_corpus_and_labels(
        train, val, test, train_labels, val_labels, test_labels
    )

    self.no_test_set = no_test_set
    super(PLDataModuleFromCorpus, self).__init__(
        train_data,  # type: ignore
        val=val_data,  # type: ignore
        test=test_data,  # type: ignore
        val_percent=val_percent,
        test_percent=test_percent,
        batch_size=batch_size,
        batch_size_eval=batch_size_eval,
        seed=seed,
        num_workers=num_workers,
        pin_memory=pin_memory,
        drop_last=drop_last,
        shuffle_eval=shuffle_eval,
        sampler_train=sampler_train,
        sampler_val=sampler_val,
        sampler_test=sampler_test,
        batch_sampler_train=batch_sampler_train,
        batch_sampler_val=batch_sampler_val,
        batch_sampler_test=batch_sampler_test,
        collate_fn=collate_fn,
        no_test_set=no_test_set,
    )

`add_argparse_args(parent_parser)` `classmethod`

Augment input parser with arguments for data loading and corpus processing

Parameters:

Name	Type	Description	Default
`parent_parser`	`argparse.ArgumentParser`	Parser created by the user	required

Returns:

Type	Description
`argparse.ArgumentParser`	Augmented parser

Source code in slp/plbind/dm.py

@classmethod
def add_argparse_args(cls, parent_parser):
    """Augment input parser with arguments for data loading and corpus processing

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = super(PLDataModuleFromCorpus, cls).add_argparse_args(parent_parser)
    parser.add_argument(
        "--tokenizer",
        dest="data.tokenizer",
        type=str.lower,
        # Corpus can already be tokenized, you can use spacy for word tokenization or any tokenizer from hugging face
        choices=cls.accepted_tokenizers,
        default="spacy",
        help="Token type. The tokenization will happen at this level.",
    )

    # Only when tokenizer == spacy
    parser.add_argument(
        "--limit-vocab",
        dest="data.limit_vocab_size",
        type=int,
        default=-1,
        help="Limit vocab size. -1 means use the whole vocab. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--embeddings-file",
        dest="data.embeddings_file",
        type=dir_path,
        default=None,
        help="Path to file with pretrained embeddings. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--embeddings-dim",
        dest="data.embeddings_dim",
        type=int,
        default=50,
        help="Embedding dim of pretrained embeddings. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--lang",
        dest="data.lang",
        type=str,
        default="en_core_web_md",
        help="Language for spacy tokenizer, e.g. en_core_web_md. Applicable only when --tokenizer=spacy",
    )

    parser.add_argument(
        "--no-add-specials",
        dest="data.add_special_tokens",
        action="store_false",
        help="Do not add special tokens for hugging face tokenizers",
    )

    # Generic args
    parser.add_argument(
        "--lower",
        dest="data.lower",
        action="store_true",
        help="Convert to lowercase.",
    )

    parser.add_argument(
        "--prepend-bos",
        dest="data.prepend_bos",
        action="store_true",
        help="Prepend [BOS] token",
    )

    parser.add_argument(
        "--append-eos",
        dest="data.append_eos",
        action="store_true",
        help="Append [EOS] token",
    )

    parser.add_argument(
        "--max-sentence-length",
        dest="data.max_len",
        type=int,
        default=-1,
        help="Maximum allowed sentence length. -1 means use the whole sentence",
    )

    return parser

`PLDataModuleFromDatasets`

`init(self, train, val=None, test=None, val_percent=0.2, test_percent=0.2, batch_size=1, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, shuffle_eval=False, collate_fn=None, no_test_set=False)` `special`

LightningDataModule wrapper for generic torch.utils.data.Dataset

If val or test Datasets are not provided, this class will split val_pecent and test_percent of the train set respectively to create them

Parameters:

Name	Type	Description	Default
`train`	`Dataset`	Train set	required
`val`	`Dataset`	Validation set. Defaults to None.	`None`
`test`	`Dataset`	Test set. Defaults to None.	`None`
`val_percent`	`float`	Percent of train to be used for validation if no validation set is given. Defaults to 0.2.	`0.2`
`test_percent`	`float`	Percent of train to be used for test set if no test set is given. Defaults to 0.2.	`0.2`
`batch_size`	`int`	Training batch size. Defaults to 1.	`1`
`batch_size_eval`	`Optional[int]`	Validation and test batch size. Defaults to None.	`None`
`seed`	`Optional[int]`	Seed for deterministic run. Defaults to None.	`None`
`num_workers`	`int`	Number of workers in the DataLoader. Defaults to 1.	`1`
`pin_memory`	`bool`	Pin tensors to GPU memory. Defaults to True.	`True`
`drop_last`	`bool`	Drop last incomplete batch. Defaults to False.	`False`
`sampler_train`	`Sampler`	Sampler for train loader. Defaults to None.	`None`
`sampler_val`	`Sampler`	Sampler for validation loader. Defaults to None.	`None`
`sampler_test`	`Sampler`	Sampler for test loader. Defaults to None.	`None`
`batch_sampler_train`	`BatchSampler`	Batch sampler for train loader. Defaults to None.	`None`
`batch_sampler_val`	`BatchSampler`	Batch sampler for validation loader. Defaults to None.	`None`
`batch_sampler_test`	`BatchSampler`	Batch sampler for test loader. Defaults to None.	`None`
`shuffle_eval`	`bool`	Shuffle validation and test dataloaders. Defaults to False.	`False`
`collate_fn`	`Optional[Callable[..., Any]]`	Collator function. Defaults to None.	`None`
`no_test_set`	`bool`	Do not create test set. Useful for tuning	`False`

Exceptions:

Type	Description
`ValueError`	If both mutually exclusive sampler_train and batch_sampler_train are provided
`ValueError`	If both mutually exclusive sampler_val and batch_sampler_val are provided
`ValueError`	If both mutually exclusive sampler_test and batch_sampler_test are provided

Source code in slp/plbind/dm.py

def __init__(
    self,
    train: Dataset,
    val: Dataset = None,
    test: Dataset = None,
    val_percent: float = 0.2,
    test_percent: float = 0.2,
    batch_size: int = 1,
    batch_size_eval: Optional[int] = None,
    seed: Optional[int] = None,
    num_workers: int = 1,
    pin_memory: bool = True,
    drop_last: bool = False,
    sampler_train: Sampler = None,
    sampler_val: Sampler = None,
    sampler_test: Sampler = None,
    batch_sampler_train: BatchSampler = None,
    batch_sampler_val: BatchSampler = None,
    batch_sampler_test: BatchSampler = None,
    shuffle_eval: bool = False,
    collate_fn: Optional[Callable[..., Any]] = None,
    no_test_set: bool = False,
):
    """LightningDataModule wrapper for generic torch.utils.data.Dataset

    If val or test Datasets are not provided, this class will split
    val_pecent and test_percent of the train set respectively to create them

    Args:
        train (Dataset): Train set
        val (Dataset): Validation set. Defaults to None.
        test (Dataset): Test set. Defaults to None.
        val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
        test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
        batch_size (int): Training batch size. Defaults to 1.
        batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
        seed (Optional[int]): Seed for deterministic run. Defaults to None.
        num_workers (int): Number of workers in the DataLoader. Defaults to 1.
        pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
        drop_last (bool): Drop last incomplete batch. Defaults to False.
        sampler_train (Sampler): Sampler for train loader. Defaults to None.
        sampler_val (Sampler): Sampler for validation loader. Defaults to None.
        sampler_test (Sampler): Sampler for test loader. Defaults to None.
        batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
        batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
        batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
        shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
        collate_fn (Callable[..., Any]): Collator function. Defaults to None.
        no_test_set (bool): Do not create test set. Useful for tuning

    Raises:
        ValueError: If both mutually exclusive sampler_train and batch_sampler_train are provided
        ValueError: If both mutually exclusive sampler_val and batch_sampler_val are provided
        ValueError: If both mutually exclusive sampler_test and batch_sampler_test are provided
    """
    super(PLDataModuleFromDatasets, self).__init__()
    self.setup_has_run = False
    if batch_sampler_train is not None and sampler_train is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the train set. These are mutually exclusive"
        )

    if batch_sampler_val is not None and sampler_val is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the validation set. These are mutually exclusive"
        )

    if batch_sampler_test is not None and sampler_test is not None:
        raise ValueError(
            "You provided both a sampler and a batch sampler for the test set. These are mutually exclusive"
        )
    self.val_percent = val_percent
    self.test_percent = test_percent
    self.sampler_train = sampler_train
    self.sampler_val = sampler_val
    self.sampler_test = sampler_test
    self.batch_sampler_train = batch_sampler_train
    self.batch_sampler_val = batch_sampler_val
    self.batch_sampler_test = batch_sampler_test
    self.num_workers = num_workers
    self.pin_memory = pin_memory
    self.drop_last = drop_last

    self.shuffle_eval = shuffle_eval
    self.collate_fn = collate_fn

    self.batch_size = batch_size
    self.seed = seed

    if batch_size_eval is None:
        batch_size_eval = self.batch_size

    self.no_test_set = no_test_set
    self.batch_size_eval = batch_size_eval
    self.train = train
    self.val = val
    self.test = test

`add_argparse_args(parent_parser)` `classmethod`

Augment input parser with arguments for data loading

Parameters:

Name	Type	Description	Default
`parent_parser`	`ArgumentParser`	Parser created by the user	required

Returns:

Type	Description
`ArgumentParser`	argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/dm.py

@classmethod
def add_argparse_args(
    cls, parent_parser: argparse.ArgumentParser
) -> argparse.ArgumentParser:
    """Augment input parser with arguments for data loading

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--val-percent",
        dest="data.val_percent",
        type=float,
        default=0.2,
        help="Percent of validation data to be randomly split from the training set, if no validation set is provided",
    )

    parser.add_argument(
        "--test-percent",
        dest="data.test_percent",
        type=float,
        default=0.2,
        help="Percent of test data to be randomly split from the training set, if no test set is provided",
    )

    parser.add_argument(
        "--bsz",
        dest="data.batch_size",
        type=int,
        default=32,
        help="Training batch size",
    )

    parser.add_argument(
        "--bsz-eval",
        dest="data.batch_size_eval",
        type=int,
        default=32,
        help="Evaluation batch size",
    )

    parser.add_argument(
        "--num-workers",
        dest="data.num_workers",
        type=int,
        default=1,
        help="Number of workers to be used in the DataLoader",
    )

    parser.add_argument(
        "--no-pin-memory",
        dest="data.pin_memory",
        action="store_false",
        help="Don't pin data to GPU memory when transferring",
    )

    parser.add_argument(
        "--drop-last",
        dest="data.drop_last",
        action="store_true",
        help="Drop last incomplete batch",
    )

    parser.add_argument(
        "--no-shuffle-eval",
        dest="data.shuffle_eval",
        action="store_false",
        help="Don't shuffle val & test sets",
    )

    return parser

`prepare_data(self)`

Use this to download and prepare data.

.. warning:: DO NOT set state to the model (use setup instead) since this is NOT called on every GPU in DDP/TPU

Example::

def prepare_data(self):
    # good
    download_data()
    tokenize()
    etc()

    # bad
    self.split = data_split
    self.some_state = some_other_state()

In DDP prepare_data can be called in two ways (using Trainer(prepare_data_per_node)):

Once per node. This is the default and is only called on LOCAL_RANK=0.
Once in total. Only called on GLOBAL_RANK=0.

Example::

# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
Trainer(prepare_data_per_node=True)

# call on GLOBAL_RANK=0 (great for shared file systems)
Trainer(prepare_data_per_node=False)

This is called before requesting the dataloaders:

.. code-block:: python

model.prepare_data()
    if ddp/tpu: init()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()

Source code in slp/plbind/dm.py

def prepare_data(self):
    return None

`test_dataloader(self)`

Configure test DataLoader

Returns:

Type	Description
`DataLoader`	Pytorch DataLoader for test set

Source code in slp/plbind/dm.py

def test_dataloader(self):
    """Configure test DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for test set
    """

    return DataLoader(
        self.test,
        batch_size=self.batch_size_eval if self.batch_sampler_test is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_test is None),
        sampler=self.sampler_test,
        batch_sampler=self.batch_sampler_test,
        shuffle=(
            self.shuffle_eval
            and (self.batch_sampler_test is None)
            and (self.sampler_test is None)
        ),
        collate_fn=self.collate_fn,
    )

`train_dataloader(self)`

Configure train DataLoader

Returns:

Type	Description
`DataLoader`	DataLoader: Pytorch DataLoader for train set

Source code in slp/plbind/dm.py

def train_dataloader(self) -> DataLoader:
    """Configure train DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for train set
    """

    return DataLoader(
        self.train,
        batch_size=self.batch_size if self.batch_sampler_train is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_train is None),
        sampler=self.sampler_train,
        batch_sampler=self.batch_sampler_train,
        shuffle=(self.batch_sampler_train is None) and (self.sampler_train is None),
        collate_fn=self.collate_fn,
    )

`val_dataloader(self)`

Configure validation DataLoader

Returns:

Type	Description
`DataLoader`	Pytorch DataLoader for validation set

Source code in slp/plbind/dm.py

def val_dataloader(self):
    """Configure validation DataLoader

    Returns:
        DataLoader: Pytorch DataLoader for validation set
    """
    val = DataLoader(
        self.val,
        batch_size=self.batch_size_eval if self.batch_sampler_val is None else 1,
        num_workers=self.num_workers,
        pin_memory=self.pin_memory,
        drop_last=self.drop_last and (self.batch_sampler_val is None),
        sampler=self.sampler_val,
        batch_sampler=self.batch_sampler_val,
        shuffle=(
            self.shuffle_eval
            and (self.batch_sampler_val is None)
            and (self.sampler_val is None)
        ),
        collate_fn=self.collate_fn,
    )

    return val

`split_data(dataset, test_size, seed)`

Train-test split of dataset.

Dataset can be either a torch.utils.data.Dataset or a list

Parameters:

Name	Type	Description	Default
`dataset`	`Union[Dataset, List]`	Input dataset	required
`test_size`	`float`	Size of the test set. Defaults to 0.2.	required
`seed`	`int`	Optional seed for deterministic run. Defaults to None.	required

Returns:

Type	Description
`Tuple[Union[Dataset, List], Union[Dataset, List]`	(train set, test set)

Source code in slp/plbind/dm.py

def split_data(dataset, test_size, seed):
    """Train-test split of dataset.

    Dataset can be either a torch.utils.data.Dataset or a list

    Args:
        dataset (Union[Dataset, List]): Input dataset
        test_size (float): Size of the test set. Defaults to 0.2.
        seed (int): Optional seed for deterministic run. Defaults to None.

    Returns:
        Tuple[Union[Dataset, List], Union[Dataset, List]: (train set, test set)
    """
    train, test = None, None

    if isinstance(dataset, torch.utils.data.Dataset):
        test_len = int(test_size * len(dataset))
        train_len = len(dataset) - test_len

        seed_generator = None

        if seed is not None:
            seed_generator = torch.Generator().manual_seed(seed)

        train, test = random_split(
            dataset, [train_len, test_len], generator=seed_generator
        )

    else:

        train, test = train_test_split(dataset, test_size=test_size, random_state=seed)

    return train, test

`FixedWandbLogger`

`init(self, name=None, save_dir=None, offline=False, id=None, anonymous=False, version=None, project=None, log_model=False, experiment=None, prefix='', sync_step=True, checkpoint_dir=None, **kwargs)` `special`

Wandb logger fix to save checkpoints in wandb

Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory

Parameters:

Name	Type	Description	Default
`name`	`Optional[str]`	Display name for the run. Defaults to None.	`None`
`save_dir`	`Optional[str]`	Path where data is saved. Defaults to None.	`None`
`offline`	`Optional[bool]`	Run offline (data can be streamed later to wandb servers). Defaults to False.	`False`
`id`	`Optional[str]`	Sets the version, mainly used to resume a previous run. Defaults to None.	`None`
`anonymous`	`Optional[bool]`	Enables or explicitly disables anonymous logging. Defaults to False.	`False`
`version`	`Optional[str]`	Sets the version, mainly used to resume a previous run. Defaults to None.	`None`
`project`	`Optional[str]`	The name of the project to which this run will belong. Defaults to None.	`None`
`log_model`	`Optional[bool]`	Save checkpoints in wandb dir to upload on W&B servers. Defaults to False.	`False`
`experiment`	`Run`	WandB experiment object. Defaults to None.	`None`
`prefix`	`Optional[str]`	A string to put at the beginning of metric keys. Defaults to "".	`''`
`sync_step`	`Optional[bool]`	Sync Trainer step with wandb step. Defaults to True.	`True`
`checkpoint_dir`	`Optional[str]`	Real checkpoint dir. Defaults to None.	`None`

Source code in slp/plbind/helpers.py

def __init__(
    self,
    name: Optional[str] = None,
    save_dir: Optional[str] = None,
    offline: Optional[bool] = False,
    id: Optional[str] = None,
    anonymous: Optional[bool] = False,
    version: Optional[str] = None,
    project: Optional[str] = None,
    log_model: Optional[bool] = False,
    experiment: wandb.sdk.wandb_run.Run = None,
    prefix: Optional[str] = "",
    sync_step: Optional[bool] = True,
    checkpoint_dir: Optional[str] = None,
    **kwargs,
):
    """Wandb logger fix to save checkpoints in wandb

    Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory

    Args:
        name (Optional[str]): Display name for the run. Defaults to None.
        save_dir (Optional[str]): Path where data is saved. Defaults to None.
        offline (Optional[bool]): Run offline (data can be streamed later to wandb servers). Defaults to False.
        id (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
        anonymous (Optional[bool]): Enables or explicitly disables anonymous logging. Defaults to False.
        version (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
        project (Optional[str]): The name of the project to which this run will belong. Defaults to None.
        log_model (Optional[bool]): Save checkpoints in wandb dir to upload on W&B servers. Defaults to False.
        experiment ([type]): WandB experiment object. Defaults to None.
        prefix (Optional[str]): A string to put at the beginning of metric keys. Defaults to "".
        sync_step (Optional[bool]): Sync Trainer step with wandb step. Defaults to True.
        checkpoint_dir (Optional[str]): Real checkpoint dir. Defaults to None.
    """
    self._checkpoint_dir = checkpoint_dir
    super(FixedWandbLogger, self).__init__(
        name=name,
        save_dir=save_dir,
        offline=offline,
        id=id,
        anonymous=anonymous,
        version=version,
        project=project,
        log_model=log_model,
        experiment=experiment,
        prefix=prefix,
        sync_step=sync_step,
        **kwargs,
    )

`finalize(self, status)`

Determine where checkpoints are saved and upload to wandb servers

Parameters:

Name	Type	Description	Default
`status`	`str`	Experiment status	required

Source code in slp/plbind/helpers.py

@rank_zero_only
def finalize(self, status: str) -> None:
    """Determine where checkpoints are saved and upload to wandb servers

    Args:
        status (str): Experiment status
    """
    # offset future training logged on same W&B run

    if self._experiment is not None:
        self._step_offset = self._experiment.step

    checkpoint_dir = (
        self._checkpoint_dir if self._checkpoint_dir is not None else self.save_dir
    )

    if checkpoint_dir is None:
        logger.warning(
            "Invalid checkpoint dir. Checkpoints will not be uploaded to Wandb."
        )
        logger.info(
            "You can manually upload your checkpoints through the CLI interface."
        )

    else:
        # upload all checkpoints from saving dir

        if self._log_model:
            wandb.save(os.path.join(checkpoint_dir, "*.ckpt"))

`FromLogits`

`init(self, metric)` `special`

Wrap pytorch lighting metric to accept logits input

Parameters:

Name	Type	Description	Default
`metric`	`Metric`	The metric to wrap, e.g. pl.metrics.Accuracy	required

Source code in slp/plbind/helpers.py

def __init__(self, metric: pl.metrics.Metric):
    """Wrap pytorch lighting metric to accept logits input

    Args:
        metric (pl.metrics.Metric): The metric to wrap, e.g. pl.metrics.Accuracy
    """
    super(FromLogits, self).__init__(
        compute_on_step=metric.compute_on_step,
        dist_sync_on_step=metric.dist_sync_on_step,
        process_group=metric.process_group,
        dist_sync_fn=metric.dist_sync_fn,
    )
    self.metric = metric

`compute(self)`

Compute metric

Returns:

Type	Description
`Tensor`	torch.Tensor: metric value

Source code in slp/plbind/helpers.py

def compute(self) -> torch.Tensor:
    """Compute metric

    Returns:
        torch.Tensor: metric value
    """
    return self.metric.compute()  # type: ignore

`update(self, preds, target)`

Update underlying metric

Calculate softmax under the hood and pass probs to the underlying metric

Parameters:

Name	Type	Description	Default
`preds`	`Tensor`	[B, *, num_classes] Logits	required
`target`	`Tensor`	[B, *] Ground truths	required

Source code in slp/plbind/helpers.py

def update(self, preds: torch.Tensor, target: torch.Tensor) -> None:  # type: ignore
    """Update underlying metric

    Calculate softmax under the hood and pass probs to the underlying metric

    Args:
        preds (torch.Tensor): [B, *, num_classes] Logits
        target (torch.Tensor): [B, *] Ground truths
    """
    preds = F.softmax(preds, dim=-1)
    self.metric.update(preds, target)  # type: ignore

`AutoEncoderPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(AutoEncoderPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_AutoEncoder,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`BertPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(BertPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_BertSequenceClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`MultimodalTransformerClassificationPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(MultimodalTransformerClassificationPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_MultimodalTransformerClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`PLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(PLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_Classification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`RnnPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(RnnPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_RnnClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`SimplePLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, predictor_cls=<class 'slp.plbind.module._Classification'>, calculate_perplexity=False)` `special`

Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule

Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step with use of the predictor helper classes (e.g. _Classification, _RnnClassification)

Parameters:

Name	Type	Description	Default
`model`	`Module`	Module to use for prediction	required
`optimizer`	`Union[torch.optim.optimizer.Optimizer, List[torch.optim.optimizer.Optimizer]]`	Optimizers to use for training	required
`criterion`	`Union[torch.nn.modules.module.Module, Callable]`	Task loss	required
`lr_scheduler`	`Union[torch.optim.lr_scheduler._LRScheduler, List[torch.optim.lr_scheduler._LRScheduler]]`	Learning rate scheduler. Defaults to None.	`None`
`hparams`	`Union[omegaconf.dictconfig.DictConfig, Dict[str, Any], argparse.Namespace]`	Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None.	`None`
`metrics`	`Optional[Dict[str, pytorch_lightning.metrics.metric.Metric]]`	Metrics to track. Defaults to None.	`None`
`predictor_cls`	`[type]`	Class that defines a parse_batch and a get_predictions_and_targets method. Defaults to _Classification.	`<class 'slp.plbind.module._Classification'>`
`calculate_perplexity`	`bool`	Whether to calculate perplexity. Would be cleaner as a metric, but this is more efficient. Defaults to False.	`False`

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    predictor_cls=_Classification,
    calculate_perplexity: bool = False,  # for LM. Dirty but much more efficient
):
    """Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule

    Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step
    with use of the predictor helper classes (e.g. _Classification, _RnnClassification)

    Args:
        model (nn.Module): Module to use for prediction
        optimizer (Union[Optimizer, List[Optimizer]]): Optimizers to use for training
        criterion (LossType): Task loss
        lr_scheduler (Union[_LRScheduler, List[_LRScheduler]], optional): Learning rate scheduler. Defaults to None.
        hparams (Configuration, optional): Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None.
        metrics (Optional[Dict[str, pl.metrics.Metric]], optional): Metrics to track. Defaults to None.
        predictor_cls ([type], optional): Class that defines a parse_batch and a
                get_predictions_and_targets method. Defaults to _Classification.
        calculate_perplexity (bool, optional): Whether to calculate perplexity.
                Would be cleaner as a metric, but this is more efficient. Defaults to False.
    """
    super(SimplePLModule, self).__init__()
    self.calculate_perplexity = calculate_perplexity
    self.model = model
    self.optimizer = optimizer
    self.lr_scheduler = lr_scheduler
    self.criterion = criterion

    if metrics is not None:
        self.train_metrics = nn.ModuleDict(metrics)
        self.val_metrics = nn.ModuleDict({k: v.clone() for k, v in metrics.items()})
        self.test_metrics = nn.ModuleDict(
            {k: v.clone() for k, v in metrics.items()}
        )
    else:
        self.train_metrics = nn.ModuleDict(modules=None)
        self.val_metrics = nn.ModuleDict(modules=None)
        self.test_metrics = nn.ModuleDict(modules=None)
    self.predictor = predictor_cls()

    if hparams is not None:
        if isinstance(hparams, Namespace):
            dict_params = vars(hparams)
        elif isinstance(hparams, DictConfig):
            dict_params = cast(Dict[str, Any], OmegaConf.to_container(hparams))
        else:
            dict_params = hparams
        # self.hparams = dict_params
        self.save_hyperparameters(dict_params)

`aggregate_epoch_metrics(self, outputs, mode='Training')`

Aggregate metrics over a whole epoch

Parameters:

Name	Type	Description	Default
`outputs`	`List[Dict[str, torch.Tensor]]`	Aggregated outputs from train_step, validation_step or test_step	required
`mode`	`str`	"Training", "Validation" or "Testing". Defaults to "Training".	`'Training'`

Source code in slp/plbind/module.py

def aggregate_epoch_metrics(self, outputs, mode="Training"):
    """Aggregate metrics over a whole epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step, validation_step or test_step
        mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
    """

    def fmt(name):
        """Format metric name"""

        return f"{name}" if name != "loss" else "train_loss"

    keys = list(outputs[0].keys())
    aggregated = {fmt(k): torch.stack([x[k] for x in outputs]).mean() for k in keys}
    aggregated["epoch"] = self.current_epoch + 1

    self.log_dict(aggregated, logger=True, prog_bar=False, on_epoch=True)

    return aggregated

`configure_optimizers(self)`

Return optimizers and learning rate schedulers

Returns:

Type	Description
`Tuple[List[Optimizer], List[_LRScheduler]]`	(optimizers, lr_schedulers)

Source code in slp/plbind/module.py

def configure_optimizers(self):
    """Return optimizers and learning rate schedulers

    Returns:
        Tuple[List[Optimizer], List[_LRScheduler]]: (optimizers, lr_schedulers)
    """

    if self.lr_scheduler is not None:
        scheduler = {
            "scheduler": self.lr_scheduler,
            "interval": "epoch",
            "monitor": "val_loss",
        }

        return [self.optimizer], [scheduler]

    return self.optimizer

`forward(self, *args, **kwargs)`

Call wrapped module forward

Source code in slp/plbind/module.py

def forward(self, *args, **kwargs):
    """Call wrapped module forward"""

    return self.model(*args, **kwargs)

`log_to_console(self, metrics, mode='Training')`

Log metrics to console

Parameters:

Name	Type	Description	Default
`metrics`	`Dict[str, torch.Tensor]`	Computed metrics	required
`mode`	`str`	"Training", "Validation" or "Testing". Defaults to "Training".	`'Training'`

Source code in slp/plbind/module.py

def log_to_console(self, metrics, mode="Training"):
    """Log metrics to console

    Args:
        metrics (Dict[str, torch.Tensor]): Computed metrics
        mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
    """
    logger.info("Epoch {} {} results".format(self.current_epoch + 1, mode))
    print_separator(symbol="-", n=50, print_fn=logger.info)

    for name, value in metrics.items():
        if name == "epoch":
            continue
        logger.info("{:<15} {:<15}".format(name, value))

    print_separator(symbol="%", n=50, print_fn=logger.info)

`test_epoch_end(self, outputs)`

Aggregate metrics of a test epoch

Parameters:

Name	Type	Description	Default
`outputs`	`List[Dict[str, torch.Tensor]]`	Aggregated outputs from test_step	required

Source code in slp/plbind/module.py

def test_epoch_end(self, outputs):
    """Aggregate metrics of a test epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from test_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Test")
    self.log_to_console(outputs, mode="Test")

`test_step(self, batch, batch_idx)`

Compute loss for a single test step and log metrics to loggers

Parameters:

Name	Type	Description	Default
`batch`	`Tuple[torch.Tensor, ...]`	Input batch	required
`batch_idx`	`int`	Index of batch	required

Returns:

Type	Description
`Dict[str, torch.Tensor]`	computed metrics

Source code in slp/plbind/module.py

def test_step(self, batch, batch_idx):
    """Compute loss for a single test step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.test_metrics, loss, y_hat, targets, mode="test"
    )

    return metrics

`training_epoch_end(self, outputs)`

Aggregate metrics of a training epoch

Parameters:

Name	Type	Description	Default
`outputs`	`List[Dict[str, torch.Tensor]]`	Aggregated outputs from train_step	required

Source code in slp/plbind/module.py

def training_epoch_end(self, outputs):
    """Aggregate metrics of a training epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Training")
    self.log_to_console(outputs, mode="Training")

`training_step(self, batch, batch_idx)`

Compute loss for a single training step and log metrics to loggers

Parameters:

Name	Type	Description	Default
`batch`	`Tuple[torch.Tensor, ...]`	Input batch	required
`batch_idx`	`int`	Index of batch	required

Returns:

Type	Description
`Dict[str, torch.Tensor]`	computed metrics

Source code in slp/plbind/module.py

def training_step(self, batch, batch_idx):
    """Compute loss for a single training step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self.model, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.train_metrics, loss, y_hat, targets, mode="train"
    )

    self.log_dict(
        metrics,
        on_step=True,
        on_epoch=False,
        logger=True,
        prog_bar=False,
    )

    metrics["loss"] = loss

    return metrics

`validation_epoch_end(self, outputs)`

Aggregate metrics of a validation epoch

Parameters:

Name	Type	Description	Default
`outputs`	`List[Dict[str, torch.Tensor]]`	Aggregated outputs from validation_step	required

Source code in slp/plbind/module.py

def validation_epoch_end(self, outputs):
    """Aggregate metrics of a validation epoch

    Args:
        outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from validation_step
    """
    outputs = self.aggregate_epoch_metrics(outputs, mode="Validation")

    if torch.isnan(outputs["val_loss"]) or torch.isinf(outputs["val_loss"]):
        outputs["val_loss"] = 1000000

    outputs["best_score"] = min(
        outputs[self.trainer.early_stopping_callback.monitor].detach().cpu(),
        self.trainer.early_stopping_callback.best_score.detach().cpu(),
    )
    self.log_to_console(outputs, mode="Validation")

`validation_step(self, batch, batch_idx)`

Compute loss for a single validation step and log metrics to loggers

Parameters:

Name	Type	Description	Default
`batch`	`Tuple[torch.Tensor, ...]`	Input batch	required
`batch_idx`	`int`	Index of batch	required

Returns:

Type	Description
`Dict[str, torch.Tensor]`	computed metrics

Source code in slp/plbind/module.py

def validation_step(self, batch, batch_idx):
    """Compute loss for a single validation step and log metrics to loggers

    Args:
        batch (Tuple[torch.Tensor, ...]): Input batch
        batch_idx (int): Index of batch

    Returns:
        Dict[str, torch.Tensor]: computed metrics
    """
    y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
    loss = self.criterion(y_hat, targets)
    metrics = self._compute_metrics(
        self.val_metrics, loss, y_hat, targets, mode="val"
    )

    metrics[
        "best_score"
    ] = self.trainer.early_stopping_callback.best_score.detach().cpu()

    return metrics

`TransformerClassificationPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(TransformerClassificationPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_TransformerClassification,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`TransformerPLModule`

`init(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)` `special`

Pass arguments through to base class

Source code in slp/plbind/module.py

def __init__(
    self,
    model: nn.Module,
    optimizer: Union[Optimizer, List[Optimizer]],
    criterion: LossType,
    lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
    hparams: Configuration = None,
    metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
    calculate_perplexity=False,
):
    """Pass arguments through to base class"""
    super(TransformerPLModule, self).__init__(
        model,
        optimizer,
        criterion,
        predictor_cls=_Transformer,
        lr_scheduler=lr_scheduler,
        hparams=hparams,
        metrics=metrics,
        calculate_perplexity=calculate_perplexity,
    )

`add_optimizer_args(parent_parser)`

Augment parser with optimizer arguments

Parameters:

Name	Type	Description	Default
`parent_parser`	`ArgumentParser`	Parser created by the user	required

Returns:

Type	Description
`ArgumentParser`	argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/trainer.py

def add_optimizer_args(
    parent_parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
    """Augment parser with optimizer arguments

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--optimizer",
        dest="optimizer",
        type=str,
        choices=[
            "Adam",
            "AdamW",
            "SGD",
            "Adadelta",
            "Adagrad",
            "Adamax",
            "ASGD",
            "RMSprop",
        ],
        default="Adam",
        help="Which optimizer to use",
    )

    parser.add_argument(
        "--lr",
        dest="optim.lr",
        type=float,
        default=1e-3,
        help="Learning rate",
    )

    parser.add_argument(
        "--weight-decay",
        dest="optim.weight_decay",
        type=float,
        default=0,
        help="Learning rate",
    )

    parser.add_argument(
        "--lr-scheduler",
        dest="lr_scheduler",
        action="store_true",
        # type=str,
        # choices=["ReduceLROnPlateau"],
        help="Use learning rate scheduling. Currently only ReduceLROnPlateau is supported out of the box",
    )

    parser.add_argument(
        "--lr-factor",
        dest="lr_schedule.factor",
        type=float,
        default=0.1,
        help="Multiplicative factor by which LR is reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--lr-patience",
        dest="lr_schedule.patience",
        type=int,
        default=10,
        help="Number of epochs with no improvement after which learning rate will be reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--lr-cooldown",
        dest="lr_schedule.cooldown",
        type=int,
        default=0,
        help="Number of epochs to wait before resuming normal operation after lr has been reduced. Used if --lr-scheduler is provided.",
    )

    parser.add_argument(
        "--min-lr",
        dest="lr_schedule.min_lr",
        type=float,
        default=0,
        help="Minimum lr for LR scheduling. Used if --lr-scheduler is provided.",
    )

    return parser

`add_trainer_args(parent_parser)`

Augment parser with trainer arguments

Parameters:

Name	Type	Description	Default
`parent_parser`	`ArgumentParser`	Parser created by the user	required

Returns:

Type	Description
`ArgumentParser`	argparse.ArgumentParser: Augmented parser

Source code in slp/plbind/trainer.py

def add_trainer_args(parent_parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
    """Augment parser with trainer arguments

    Args:
        parent_parser (argparse.ArgumentParser): Parser created by the user

    Returns:
        argparse.ArgumentParser: Augmented parser
    """
    parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
    parser.add_argument(
        "--seed",
        dest="seed",
        type=int,
        default=None,
        help="Seed for reproducibility",
    )

    parser.add_argument(
        "--config",
        dest="config",
        type=str,  # dir_path,
        default=None,
        help="Path to YAML configuration file",
    )

    parser.add_argument(
        "--experiment-name",
        dest="trainer.experiment_name",
        type=str,
        default="experiment",
        help="Name of the running experiment",
    )

    parser.add_argument(
        "--run-id",
        dest="trainer.run_id",
        type=str,
        default=None,
        help="Unique identifier for the current run. If not provided it is inferred from datetime.now()",
    )

    parser.add_argument(
        "--experiment-group",
        dest="trainer.experiment_group",
        type=str,
        default=None,
        help="Group of current experiment. Useful when evaluating for different seeds / cross-validation etc.",
    )

    parser.add_argument(
        "--experiments-folder",
        dest="trainer.experiments_folder",
        type=str,
        default="experiments",
        help="Top-level folder where experiment results & checkpoints are saved",
    )

    parser.add_argument(
        "--save-top-k",
        dest="trainer.save_top_k",
        type=int,
        default=3,
        help="Save checkpoints for top k models",
    )

    parser.add_argument(
        "--patience",
        dest="trainer.patience",
        type=int,
        default=3,
        help="Number of epochs to wait before early stopping",
    )

    parser.add_argument(
        "--wandb-project",
        dest="trainer.wandb_project",
        type=str,
        default=None,
        help="Wandb project under which results are saved",
    )

    parser.add_argument(
        "--tags",
        dest="trainer.tags",
        type=str,
        nargs="*",
        default=[],
        help="Tags for current run to make results searchable.",
    )

    parser.add_argument(
        "--stochastic_weight_avg",
        dest="trainer.stochastic_weight_avg",
        action="store_true",
        help="Use Stochastic weight averaging.",
    )

    parser.add_argument(
        "--gpus", dest="trainer.gpus", type=int, default=0, help="Number of GPUs to use"
    )

    parser.add_argument(
        "--val-interval",
        dest="trainer.check_val_every_n_epoch",
        type=int,
        default=1,
        help="Run validation every n epochs",
    )

    parser.add_argument(
        "--clip-grad-norm",
        dest="trainer.gradient_clip_val",
        type=float,
        default=0,
        help="Clip gradients with ||grad(w)|| >= args.clip_grad_norm",
    )

    parser.add_argument(
        "--epochs",
        dest="trainer.max_epochs",
        type=int,
        default=100,
        help="Maximum number of training epochs",
    )

    parser.add_argument(
        "--num-nodes",
        dest="trainer.num_nodes",
        type=int,
        default=1,
        help="Number of nodes to run",
    )

    parser.add_argument(
        "--steps",
        dest="trainer.max_steps",
        type=int,
        default=None,
        help="Maximum number of training steps",
    )

    parser.add_argument(
        "--tbtt_steps",
        dest="trainer.truncated_bptt_steps",
        type=int,
        default=None,
        help="Truncated Back-propagation-through-time steps.",
    )

    parser.add_argument(
        "--debug",
        dest="debug",
        action="store_true",
        help="If true, we run a full run on a small subset of the input data and overfit 10 training batches",
    )

    parser.add_argument(
        "--offline",
        dest="trainer.force_wandb_offline",
        action="store_true",
        help="If true, forces offline execution of wandb logger",
    )

    parser.add_argument(
        "--early-stop-on",
        dest="trainer.early_stop_on",
        type=str,
        default="val_loss",
        help="Metric for early stopping",
    )

    parser.add_argument(
        "--early-stop-mode",
        dest="trainer.early_stop_mode",
        type=str,
        choices=["min", "max"],
        default="min",
        help="Minimize or maximize early stopping metric",
    )

    return parser

make_trainer(experiment_name='experiment', experiment_description=None, run_id=None, experiment_group=None, experiments_folder='experiments', save_top_k=3, patience=3, wandb_project=None, wandb_user=None, force_wandb_offline=False, tags=None, stochastic_weight_avg=False, auto_scale_batch_size=False, gpus=0, check_val_every_n_epoch=1, gradient_clip_val=0, precision=32, num_nodes=1, max_epochs=100, max_steps=None, truncated_bptt_steps=None, fast_dev_run=None, overfit_batches=None, terminate_on_nan=False, profiler='simple', early_stop_on='val_loss', early_stop_mode='min')

Configure trainer with preferred defaults

Experiment folder and run_id configured (based on datetime.now())
Wandb and CSV loggers run by default
Wandb configured to save code and checkpoints
Wandb configured in online mode except if no internet connection is available
Early stopping on best validation loss is configured by default
Checkpointing on best validation loss is configured by default *

Parameters:

Name	Type	Description	Default
`experiment_name`	`str`	Experiment name. Defaults to "experiment".	`'experiment'`
`experiment_description`	`Optional[str]`	Detailed description of the experiment. Defaults to None.	`None`
`run_id`	`Optional[str]`	Unique run_id. Defaults to datetime.now(). Defaults to None.	`None`
`experiment_group`	`Optional[str]`	Group experiments over multiple runs. Defaults to None.	`None`
`experiments_folder`	`str`	Folder to save outputs. Defaults to "experiments".	`'experiments'`
`save_top_k`	`int`	Save top k checkpoints. Defaults to 3.	`3`
`patience`	`int`	Patience for early stopping. Defaults to 3.	`3`
`wandb_project`	`Optional[str]`	Wandb project to save the experiment. Defaults to None.	`None`
`wandb_user`	`Optional[str]`	Wandb username. Defaults to None.	`None`
`force_wandb_offline`	`bool`	Force offline execution of wandb	`False`
`tags`	`Optional[Sequence]`	Additional tags to attach to the experiment. Defaults to None.	`None`
`stochastic_weight_avg`	`bool`	Use stochastic weight averaging. Defaults to False.	`False`
`auto_scale_batch_size`	`bool`	Find optimal batch size for the available resources when running trainer.tune(). Defaults to False.	`False`
`gpus`	`int`	number of GPUs to use. Defaults to 0.	`0`
`check_val_every_n_epoch`	`int`	Run validation every n epochs. Defaults to 1.	`1`
`gradient_clip_val`	`float`	Clip gradient norm value. Defaults to 0 (no clipping).	`0`
`precision`	`int`	Floating point precision. Defaults to 32.	`32`
`num_nodes`	`int`	Number of nodes to run on	`1`
`max_epochs`	`Optional[int]`	Maximum number of epochs for training. Defaults to 100.	`100`
`max_steps`	`Optional[int]`	Maximum number of steps for training. Defaults to None.	`None`
`truncated_bptt_steps`	`Optional[int]`	Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None.	`None`
`fast_dev_run`	`Optional[int]`	Run training on a small number of batches for debugging. Defaults to None.	`None`
`overfit_batches`	`Optional[int]`	Try to overfit a small number of batches for debugging. Defaults to None.	`None`
`terminate_on_nan`	`bool`	Terminate on NaN gradients. Warning this makes training slow. Defaults to False.	`False`
`profiler`	`Union[pytorch_lightning.profiler.profilers.BaseProfiler, bool, str]`	Use profiler to track execution times of each function	`'simple'`
`early_stop_on`	`str`	metric for early stopping	`'val_loss'`
`early_stop_mode`	`str`	"min" or "max"	`'min'`

Returns:

Type	Description
`Trainer`	pl.Trainer: Configured trainer

Source code in slp/plbind/trainer.py

def make_trainer(
    experiment_name: str = "experiment",
    experiment_description: Optional[str] = None,
    run_id: Optional[str] = None,
    experiment_group: Optional[str] = None,
    experiments_folder: str = "experiments",
    save_top_k: int = 3,
    patience: int = 3,
    wandb_project: Optional[str] = None,
    wandb_user: Optional[str] = None,
    force_wandb_offline: bool = False,
    tags: Optional[Sequence] = None,
    stochastic_weight_avg: bool = False,
    auto_scale_batch_size: bool = False,
    gpus: int = 0,
    check_val_every_n_epoch: int = 1,
    gradient_clip_val: float = 0,
    precision: int = 32,
    num_nodes: int = 1,
    max_epochs: Optional[int] = 100,
    max_steps: Optional[int] = None,
    truncated_bptt_steps: Optional[int] = None,
    fast_dev_run: Optional[int] = None,
    overfit_batches: Optional[int] = None,
    terminate_on_nan: bool = False,  # Be careful this makes training very slow for large models
    profiler: Optional[Union[pl.profiler.BaseProfiler, bool, str]] = "simple",
    early_stop_on: str = "val_loss",
    early_stop_mode: str = "min",
) -> pl.Trainer:
    """Configure trainer with preferred defaults

    * Experiment folder and run_id configured (based on datetime.now())
    * Wandb and CSV loggers run by default
    * Wandb configured to save code and checkpoints
    * Wandb configured in online mode except if no internet connection is available
    * Early stopping on best validation loss is configured by default
    * Checkpointing on best validation loss is configured by default
    *

    Args:
        experiment_name (str, optional): Experiment name. Defaults to "experiment".
        experiment_description (Optional[str], optional): Detailed description of the experiment. Defaults to None.
        run_id (Optional[str], optional): Unique run_id. Defaults to datetime.now(). Defaults to None.
        experiment_group (Optional[str], optional): Group experiments over multiple runs. Defaults to None.
        experiments_folder (str, optional): Folder to save outputs. Defaults to "experiments".
        save_top_k (int, optional): Save top k checkpoints. Defaults to 3.
        patience (int, optional): Patience for early stopping. Defaults to 3.
        wandb_project (Optional[str], optional): Wandb project to save the experiment. Defaults to None.
        wandb_user (Optional[str], optional): Wandb username. Defaults to None.
        force_wandb_offline (bool): Force offline execution of wandb
        tags (Optional[Sequence], optional): Additional tags to attach to the experiment. Defaults to None.
        stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
        auto_scale_batch_size (bool, optional): Find optimal batch size for the available resources when running
                trainer.tune(). Defaults to False.
        gpus (int, optional): number of GPUs to use. Defaults to 0.
        check_val_every_n_epoch (int, optional): Run validation every n epochs. Defaults to 1.
        gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
        precision (int, optional): Floating point precision. Defaults to 32.
        num_nodes (int): Number of nodes to run on
        max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
        max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
        truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
                sequence. Defaults to None.
        fast_dev_run (Optional[int], optional): Run training on a small number of  batches for debugging. Defaults to None.
        overfit_batches (Optional[int], optional): Try to overfit a small number of batches for debugging. Defaults to None.
        terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
        profiler (Optional[Union[pl.profiler.BaseProfiler, bool, str]]): Use profiler to track execution times of each function
        early_stop_on (str): metric for early stopping
        early_stop_mode (str): "min" or "max"

    Returns:
        pl.Trainer: Configured trainer
    """

    if overfit_batches is not None:
        trainer = pl.Trainer(overfit_batches=overfit_batches, gpus=gpus)

        return trainer

    if fast_dev_run is not None:
        trainer = pl.Trainer(fast_dev_run=fast_dev_run, gpus=gpus)

        return trainer

    logging_dir = os.path.join(experiments_folder, experiment_name)
    safe_mkdirs(logging_dir)

    run_id = run_id if run_id is not None else date_fname()

    if run_id in os.listdir(logging_dir):
        logger.warning(
            "The run id you provided {run_id} already exists in {logging_dir}"
        )
        run_id = date_fname()
        logger.info("Setting run_id={run_id}")

    checkpoint_dir = os.path.join(logging_dir, run_id, "checkpoints")

    logger.info(f"Logs will be saved in {logging_dir}")
    logger.info(f"Logs will be saved in {checkpoint_dir}")

    if wandb_project is None:
        wandb_project = experiment_name

    connected = has_internet_connection()
    offline_run = force_wandb_offline or not connected

    loggers = [
        pl.loggers.CSVLogger(logging_dir, name="csv_logs", version=run_id),
        FixedWandbLogger(  # type: ignore
            name=experiment_name,
            project=wandb_project,
            anonymous=False,
            save_dir=logging_dir,
            version=run_id,
            save_code=True,
            checkpoint_dir=checkpoint_dir,
            offline=offline_run,
            log_model=not offline_run,
            entity=wandb_user,
            group=experiment_group,
            notes=experiment_description,
            tags=tags,
        ),
    ]

    if gpus > 1:
        del loggers[
            1
        ]  # https://github.com/PyTorchLightning/pytorch-lightning/issues/6106

    logger.info("Configured wandb and CSV loggers.")
    logger.info(
        f"Wandb configured to run {experiment_name}/{run_id} in project {wandb_project}"
    )

    if connected:
        logger.info("Results will be stored online.")
    else:
        logger.info("Results will be stored offline due to bad internet connection.")
        logger.info(
            f"If you want to upload your results later run\n\t wandb sync {logging_dir}/wandb/run-{run_id}"
        )

    if experiment_description is not None:
        logger.info(
            f"Experiment verbose description:\n{experiment_description}\n\nTags:{'n/a' if tags is None else tags}"
        )

    callbacks = [
        EarlyStoppingWithLogs(
            monitor=early_stop_on,
            mode=early_stop_mode,
            patience=patience,
            verbose=True,
        ),
        pl.callbacks.ModelCheckpoint(
            dirpath=checkpoint_dir,
            filename="{epoch}-{val_loss:.2f}",
            monitor=early_stop_on,
            save_top_k=save_top_k,
            mode=early_stop_mode,
        ),
        pl.callbacks.LearningRateMonitor(logging_interval="step"),
    ]

    logger.info("Configured Early stopping and Model checkpointing to track val_loss")

    trainer = pl.Trainer(
        default_root_dir=logging_dir,
        gpus=gpus,
        max_epochs=max_epochs,
        max_steps=max_steps,
        callbacks=callbacks,
        logger=loggers,
        check_val_every_n_epoch=check_val_every_n_epoch,
        gradient_clip_val=gradient_clip_val,
        auto_scale_batch_size=auto_scale_batch_size,
        stochastic_weight_avg=stochastic_weight_avg,
        precision=precision,
        truncated_bptt_steps=truncated_bptt_steps,
        terminate_on_nan=terminate_on_nan,
        progress_bar_refresh_rate=10,
        profiler=profiler,
        num_nodes=num_nodes,
    )

    return trainer

`make_trainer_for_ray_tune(patience=3, stochastic_weight_avg=False, gpus=0, gradient_clip_val=0, precision=32, max_epochs=100, max_steps=None, truncated_bptt_steps=None, terminate_on_nan=False, early_stop_on='val_loss', early_stop_mode='min', metrics_map=None, **extra_kwargs)`

Configure trainer with preferred defaults

Early stopping on best validation loss is configured by default
Ray tune callback configured

Parameters:

Name	Type	Description	Default
`patience`	`int`	Patience for early stopping. Defaults to 3.	`3`
`stochastic_weight_avg`	`bool`	Use stochastic weight averaging. Defaults to False.	`False`
`gpus`	`int`	number of GPUs to use. Defaults to 0.	`0`
`gradient_clip_val`	`float`	Clip gradient norm value. Defaults to 0 (no clipping).	`0`
`precision`	`int`	Floating point precision. Defaults to 32.	`32`
`max_epochs`	`Optional[int]`	Maximum number of epochs for training. Defaults to 100.	`100`
`max_steps`	`Optional[int]`	Maximum number of steps for training. Defaults to None.	`None`
`truncated_bptt_steps`	`Optional[int]`	Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None.	`None`
`terminate_on_nan`	`bool`	Terminate on NaN gradients. Warning this makes training slow. Defaults to False.	`False`
`early_stop_on`	`str`	metric for early stopping	`'val_loss'`
`early_stop_mode`	`str`	"min" or "max"	`'min'`
`metrics_map`	`Optional[Dict[str, str]]`	The mapping from pytorch lightning logged metrics to ray tune metrics. The --tune-metric argument should be one of the keys of this mapping	`None`
`extra_kwargs`	`kwargs`	Ignored. We use it so that we are able to pass the same config object as in make_trainer	`{}`

Returns:

Type	Description
`Trainer`	pl.Trainer: Configured trainer

Source code in slp/plbind/trainer.py

def make_trainer_for_ray_tune(
    patience: int = 3,
    stochastic_weight_avg: bool = False,
    gpus: int = 0,
    gradient_clip_val: float = 0,
    precision: int = 32,
    max_epochs: Optional[int] = 100,
    max_steps: Optional[int] = None,
    truncated_bptt_steps: Optional[int] = None,
    terminate_on_nan: bool = False,  # Be careful this makes training very slow for large models
    early_stop_on: str = "val_loss",
    early_stop_mode: str = "min",
    metrics_map: Optional[Dict[str, str]] = None,
    **extra_kwargs,
) -> pl.Trainer:
    """Configure trainer with preferred defaults

    * Early stopping on best validation loss is configured by default
    * Ray tune callback configured

    Args:
        patience (int, optional): Patience for early stopping. Defaults to 3.
        stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
        gpus (int, optional): number of GPUs to use. Defaults to 0.
        gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
        precision (int, optional): Floating point precision. Defaults to 32.
        max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
        max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
        truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
                sequence. Defaults to None.
        terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
        early_stop_on (str): metric for early stopping
        early_stop_mode (str): "min" or "max"
        metrics_map (Optional[Dict[str, str]]): The mapping from pytorch lightning logged metrics
            to ray tune metrics. The --tune-metric argument should be one of the keys of this
            mapping
        extra_kwargs (kwargs): Ignored. We use it so that we are able to pass the same config
            object as in make_trainer
    Returns:
        pl.Trainer: Configured trainer
    """

    if metrics_map is None:
        raise ValueError("Need to pass metrics for TuneReportCallback")

    callbacks = [
        EarlyStoppingWithLogs(
            monitor=early_stop_on,
            mode=early_stop_mode,
            patience=patience,
            verbose=True,
        ),
        TuneReportCallback(metrics_map, on="validation_end"),
        pl.callbacks.LearningRateMonitor(logging_interval="step"),
    ]

    logger.info("Configured Early stopping to track val_loss")

    trainer = pl.Trainer(
        gpus=gpus,
        max_epochs=max_epochs,
        max_steps=max_steps,
        callbacks=callbacks,
        logger=[],
        check_val_every_n_epoch=1,
        gradient_clip_val=gradient_clip_val,
        stochastic_weight_avg=stochastic_weight_avg,
        precision=precision,
        truncated_bptt_steps=truncated_bptt_steps,
        terminate_on_nan=terminate_on_nan,
        progress_bar_refresh_rate=0,
        num_sanity_val_steps=0,
        auto_scale_batch_size=False,
    )

    return trainer

`watch_model(trainer, model)`

If wandb logger is configured track gradient and weight norms

Parameters:

Name	Type	Description	Default
`trainer`	`Trainer`	Trainer	required
`model`	`Module`	Module to watch	required

Source code in slp/plbind/trainer.py

def watch_model(trainer: pl.Trainer, model: nn.Module) -> None:
    """If wandb logger is configured track gradient and weight norms

    Args:
        trainer (pl.Trainer): Trainer
        model (nn.Module): Module to watch
    """

    if trainer.num_gpus > 1:
        return

    if isinstance(trainer.logger.experiment, list):
        for log in trainer.logger.experiment:
            try:
                log.watch(model, log="all")
                logger.info("Tracking model weights & gradients in wandb.")

                break
            except:
                pass
    else:
        try:
            trainer.logger.experiment.watch(model, log="all")
            logger.info("Tracking model weights & gradients in wandb.")
        except:
            pass

`configure_logging(logfile_prefix=None)`

configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile

We use logure for stdout/stderr logging in this project. This function configures loguru to intercept logs from other modules that use the default python logging module. It also configures loguru so that it plays well with writes in the tqdm progress bars If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using logfile_prefix and datetime.now()

Parameters:

Name	Type	Description	Default
`logfile_prefix`	`Optional[str]`	Optional prefix to file where logs will be written.	`None`

Returns:

Type	Description
`Optional[str]`	str: The logfile where logs are written

Examples:

>>> configure_logging("logs/my-cool-experiment)
logs/my-cool-experiment.20210228-211832.log

Source code in slp/util/log.py

def configure_logging(logfile_prefix: Optional[str] = None) -> Optional[str]:
    """configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile

    We use logure for stdout/stderr logging in this project.
    This function configures loguru to intercept logs from other modules that use the default python logging module.
    It also configures loguru so that it plays well with writes in the tqdm progress bars
    If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using
    logfile_prefix and datetime.now()

    Args:
        logfile_prefix (Optional[str]): Optional prefix to file where logs will be written.

    Returns:
        str: The logfile where logs are written

    Examples:
        >>> configure_logging("logs/my-cool-experiment)
        logs/my-cool-experiment.20210228-211832.log
    """

    class InterceptHandler(logging.Handler):
        def emit(self, record):
            """Intercept standard logging logs in loguru. Should test this for distributed pytorch lightning"""
            # Get corresponding Loguru level if it exists
            try:
                level = logger.level(record.levelname).name
            except ValueError:
                level = record.levelno

            # Find caller from where originated the logged message
            frame, depth = logging.currentframe(), 2
            while frame.f_code.co_filename == logging.__file__:
                frame = frame.f_back
                depth += 1

            logger.opt(depth=depth, exception=record.exc_info).log(
                level, record.getMessage()
            )

    logger.info("Intercepting standard logging logs in loguru")

    # Make loguru play well with tqdm
    logger.remove()

    def tqdm_write(msg: str) -> Any:
        """Loguru wrapper for tqdm.write"""
        return tqdm.write(msg, end="")

    logger.add(tqdm_write, colorize=True)

    logging.basicConfig(handlers=[InterceptHandler()], level=logging.INFO)

    logfile = None
    if logfile_prefix is not None:
        logfile = log_to_file(logfile_prefix)
        logger.info(f"Log file will be saved in {logfile}")

    return logfile

`log_to_file(fname_prefix)`

log_to_file Configure loguru to log to a logfile

Parameters:

Name	Type	Description	Default
`fname_prefix`	`Optional[str]`	Optional prefix to file where logs will be written.	required

Returns:

Type	Description
`str`	str: The logfile where logs are written

Source code in slp/util/log.py

def log_to_file(fname_prefix: Optional[str]) -> str:
    """log_to_file Configure loguru to log to a logfile

    Args:
        fname_prefix (Optional[str]): Optional prefix to file where logs will be written.

    Returns:
        str: The logfile where logs are written
    """
    logfile = f"{fname_prefix}.{date_fname()}.log"
    logger.add(
        logfile,
        colorize=False,
        level="DEBUG",
        enqueue=True,
    )
    return logfile

`NoOp`

`forward(self, x)`

Defines the computation performed at every call.

Should be overridden by all subclasses.

.. note:: Although the recipe for forward pass needs to be defined within this function, one should call the :class:Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Source code in slp/util/pytorch.py

def forward(self, x):
    return x

`PackSequence`

`init(self, batch_first=True)` `special`

Wrap sequence packing in nn.Module

Parameters:

Name	Type	Description	Default
`batch_first`	`bool`	Use batch first representation. Defaults to True.	`True`

Source code in slp/util/pytorch.py

def __init__(self, batch_first: bool = True):
    """Wrap sequence packing in nn.Module

    Args:
        batch_first (bool, optional): Use batch first representation. Defaults to True.
    """
    super(PackSequence, self).__init__()
    self.batch_first = batch_first

`forward(self, x, lengths)`

Pack a padded sequence and sort lengths

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Padded tensor	required
`lengths`	`Tensor`	Original lengths befor padding	required

Returns:

Type	Description
`Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]`	Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths)

Source code in slp/util/pytorch.py

def forward(
    self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]:
    """Pack a padded sequence and sort lengths

    Args:
        x (torch.Tensor): Padded tensor
        lengths (torch.Tensor): Original lengths befor padding

    Returns:
        Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths)
    """
    out: torch.nn.utils.rnn.PackedSequence = pack_padded_sequence(
        x, lengths, batch_first=self.batch_first, enforce_sorted=False
    )
    lengths = lengths[out.sorted_indices]

    return out, lengths

`PadPackedSequence`

`init(self, batch_first=True, max_length=-1)` `special`

Wrap sequence padding in nn.Module

Parameters:

Name	Type	Description	Default
`batch_first`	`bool`	Use batch first representation. Defaults to True.	`True`

Source code in slp/util/pytorch.py

def __init__(self, batch_first: bool = True, max_length: int = -1):
    """Wrap sequence padding in nn.Module

    Args:
        batch_first (bool, optional): Use batch first representation. Defaults to True.
    """
    super(PadPackedSequence, self).__init__()
    self.batch_first = batch_first
    self.max_length = max_length if max_length > 0 else None

`forward(self, x, lengths)`

Convert packed sequence to padded sequence

Parameters:

Name	Type	Description	Default
`x`	`PackedSequence`	Packed sequence	required
`lengths`	`Tensor`	Sorted original sequence lengths	required

Returns:

Type	Description
`Tensor`	torch.Tensor: Padded sequence

Source code in slp/util/pytorch.py

def forward(
    self, x: torch.nn.utils.rnn.PackedSequence, lengths: torch.Tensor
) -> torch.Tensor:
    """Convert packed sequence to padded sequence

    Args:
        x (torch.nn.utils.rnn.PackedSequence): Packed sequence
        lengths (torch.Tensor): Sorted original sequence lengths

    Returns:
        torch.Tensor: Padded sequence
    """
    out, _ = pad_packed_sequence(
        x, batch_first=self.batch_first, total_length=self.max_length  # type: ignore
    )

    return out  # type: ignore

`from_checkpoint(checkpoint_file, obj, map_location='cpu', dataparallel=False)`

Load model or optimizer from saved state_dict

Parameters:

Name	Type	Description	Default
`checkpoint_file`	`Optional[str]`	File containing the state dict	required
`obj`	`Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer]`	Module or optimizer instance to load the checkpoint	required
`map_location`	`Union[torch.device, str]`	Where to load. Defaults to "cpu".	`'cpu'`
`dataparallel`	`bool`	If data parallel remove leading "module." from statedict keys. Defaults to False.	`False`

Returns:

Type	Description
`Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer]`	types.ModuleOrOptimizer: Loaded module or optimizer

Source code in slp/util/pytorch.py

def from_checkpoint(
    checkpoint_file: Optional[str],
    obj: types.ModuleOrOptimizer,
    map_location: Optional[types.Device] = "cpu",
    dataparallel: bool = False,
) -> types.ModuleOrOptimizer:
    """Load model or optimizer from saved state_dict

    Args:
        checkpoint_file (Optional[str]): File containing the state dict
        obj (types.ModuleOrOptimizer): Module or optimizer instance to load the checkpoint
        map_location (Optional[types.Device], optional): Where to load. Defaults to "cpu".
        dataparallel (bool, optional): If data parallel remove leading "module." from statedict keys. Defaults to False.

    Returns:
        types.ModuleOrOptimizer: Loaded module or optimizer
    """

    if checkpoint_file is None:
        return obj

    if not system.is_file(checkpoint_file):
        logger.warning(
            f"The checkpoint {checkpoint_file} you are trying to load "
            "does not exist. Continuing without loading..."
        )

        return obj

    state_dict = torch.load(checkpoint_file, map_location=map_location)

    if dataparallel:
        state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
    obj.load_state_dict(state_dict)

    return obj

`mktensor(data, dtype=torch.float32, device='cpu', requires_grad=False, copy_tensor=True)`

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This can copy data or make the operation in place.

Parameters:

Name	Type	Description	Default
`data`	`Union[numpy.ndarray, torch.Tensor, List[~T]]`	(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.	required
`dtype`	`dtype`	(torch.dtype): The type of the tensor elements (Default value = torch.float)	`torch.float32`
`device`	`Union[torch.device, str]`	(torch.device, str): Device where the tensor should be (Default value = 'cpu')	`'cpu'`
`requires_grad`	`bool`	(bool): Trainable tensor or not? (Default value = False)	`False`
`copy_tensor`	`bool`	(bool): If false creates the tensor inplace else makes a copy (Default value = True)	`True`

Returns:

Type	Description
`Tensor`	(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py

def mktensor(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: types.Device = "cpu",
    requires_grad: bool = False,
    copy_tensor: bool = True,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
        is passed it is cast to  dtype, device and the requires_grad flag is
        set. This can copy data or make the operation in place.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: (bool): Trainable tensor or not? (Default value = False)
        copy_tensor: (bool): If false creates the tensor inplace else makes a copy
            (Default value = True)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """
    tensor_factory = t if copy_tensor else t_

    return tensor_factory(data, dtype=dtype, device=device, requires_grad=requires_grad)

`moore_penrose_pinv(x, num_iter=6)`

Calculate approximate Moore-Penrose pseudoinverse, via iterative method

Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13

Parameters:

Name	Type	Description	Default
`x`	`torch.Tensor`	(, M, M) The square tensors to inverse. Dimension can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M)	required
`num_iter`	`int`	Number of iterations to run for approximation (6 is good enough usually)	`6`

Returns:

Type	Description
`(torch.Tensor)`	(B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat

Source code in slp/util/pytorch.py

def moore_penrose_pinv(x, num_iter=6):
    """Calculate approximate Moore-Penrose pseudoinverse, via iterative method

    * Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
    * Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13

    Args:
        x (torch.Tensor): (*, M, M) The square tensors to inverse.
            Dimension * can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M)
        num_iter (int): Number of iterations to run for approximation (6 is good enough usually)
    Returns:
        (torch.Tensor): (B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat
    """
    abs_x = torch.abs(x)
    col = abs_x.sum(dim=-1)
    row = abs_x.sum(dim=-2)
    z = x.transpose(-1, -2).contiguous()
    z = z / (torch.max(col) * torch.max(row))

    I = torch.eye(x.shape[-1], device=x.device).unsqueeze(0)

    for _ in range(num_iter):
        xz = x @ z
        z = 0.25 * z @ (13 * I - (xz @ (15 * I - (xz @ (7 * I - xz)))))

    return z

`pad_mask(lengths, max_length=None)`

Generate mask for padded tokens

Parameters:

Name	Type	Description	Default
`lengths`	`Tensor`	Original sequence lengths before padding	required
`max_length`	`Union[torch.Tensor, int]`	Maximum sequence length. Defaults to None.	`None`

Returns:

Type	Description
`Tensor`	torch.Tensor: padding mask

Source code in slp/util/pytorch.py

def pad_mask(
    lengths: torch.Tensor, max_length: Optional[Union[torch.Tensor, int]] = None
) -> torch.Tensor:
    """Generate mask for padded tokens

    Args:
        lengths (torch.Tensor): Original sequence lengths before padding
        max_length (Optional[Union[torch.Tensor, int]], optional): Maximum sequence length. Defaults to None.

    Returns:
        torch.Tensor: padding mask
    """

    if max_length is None or max_length < 0:
        max_length = cast(int, torch.max(lengths).item())
    max_length = cast(int, max_length)
    idx = torch.arange(0, max_length, device=lengths.device).unsqueeze(0)
    mask: torch.Tensor = (idx < lengths.unsqueeze(1)).float()

    return mask

`pad_sequence(sequences, batch_first=False, padding_value=0.0, max_length=-1)`

Pad a list of variable length Tensors with padding_value

pad_sequence stacks a list of Tensors along a new dimension, and pads them to equal length. For example, if the input is list of sequences with size L x * and if batch_first is False, and T x B x * otherwise.

B is batch size. It is equal to the number of elements in sequences. T is length of the longest sequence. L is length of the sequence. * is any number of trailing dimensions, including none.

Examples:

>>> from torch.nn.utils.rnn import pad_sequence
>>> a = torch.ones(25, 300)
>>> b = torch.ones(22, 300)
>>> c = torch.ones(15, 300)
>>> pad_sequence([a, b, c]).size()
torch.Size([25, 3, 300])

!!! note This function returns a Tensor of size T x B x * or B x T x * where T is the length of the longest sequence. This function assumes trailing dimensions and type of all the Tensors in sequences are same.

Note:
This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
max_length argument for fixed length padding

Parameters:

Name	Type	Description	Default
`sequences`	`List[torch.Tensor]`	list of variable length sequences.	required
`batch_first`	`bool`	output will be in `B x T x ` if True, or in `T x B x ` otherwise	`False`
`padding_value`	`Union[float, int]`	value for padded elements. Default: 0.	`0.0`
`max_length`	`int`	If max length is > 0 then this function will pad to a fixed maximum length. If any sequence is longer than max_length, it will be trimmed.	`-1`

Returns:

Type	Description
Tensor of size ``T x B x *`` if	attr:`batch_first` is `False`. Tensor of size `B x T x *` otherwise

Source code in slp/util/pytorch.py

def pad_sequence(
    sequences: List[torch.Tensor],
    batch_first: bool = False,
    padding_value: Union[float, int] = 0.0,
    max_length: int = -1,
):
    r"""Pad a list of variable length Tensors with ``padding_value``

    ``pad_sequence`` stacks a list of Tensors along a new dimension,
    and pads them to equal length. For example, if the input is list of
    sequences with size ``L x *`` and if batch_first is False, and ``T x B x *``
    otherwise.

    `B` is batch size. It is equal to the number of elements in ``sequences``.
    `T` is length of the longest sequence.
    `L` is length of the sequence.
    `*` is any number of trailing dimensions, including none.

    Example:
        >>> from torch.nn.utils.rnn import pad_sequence
        >>> a = torch.ones(25, 300)
        >>> b = torch.ones(22, 300)
        >>> c = torch.ones(15, 300)
        >>> pad_sequence([a, b, c]).size()
        torch.Size([25, 3, 300])

    Note:
        This function returns a Tensor of size ``T x B x *`` or ``B x T x *``
        where `T` is the length of the longest sequence. This function assumes
        trailing dimensions and type of all the Tensors in sequences are same.

        Note:
        This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
        max_length argument for fixed length padding

    Args:
        sequences (list[Tensor]): list of variable length sequences.
        batch_first (bool, optional): output will be in ``B x T x *`` if True, or in
            ``T x B x *`` otherwise
        padding_value (float, optional): value for padded elements. Default: 0.
        max_length (int): If max length is > 0 then this function will pad to a fixed maximum
            length. If any sequence is longer than max_length, it will be trimmed.
    Returns:
        Tensor of size ``T x B x *`` if :attr:`batch_first` is ``False``.
        Tensor of size ``B x T x *`` otherwise
    """

    # assuming trailing dimensions and type of all the Tensors
    # in sequences are same and fetching those from sequences[0]
    max_size = sequences[0].size()
    trailing_dims = max_size[1:]
    if max_length < 0:
        max_len = max([s.size(0) for s in sequences])
    else:
        max_len = max_length
    if batch_first:
        out_dims = (len(sequences), max_len) + trailing_dims
    else:
        out_dims = (max_len, len(sequences)) + trailing_dims

    out_tensor = sequences[0].new_full(out_dims, padding_value)
    for i, tensor in enumerate(sequences):
        length = tensor.size(0)
        # use index notation to prevent duplicate references to the tensor
        if batch_first:
            out_tensor[i, : min(length, max_len), ...] = tensor[
                : min(length, max_len), ...
            ]
        else:
            out_tensor[: min(length, max_len), i, ...] = tensor[
                : min(length, max_len), ...
            ]

    return out_tensor

`repeat_layer(l, times)`

Clone a layer multiple times

Parameters:

Name	Type	Description	Default
`l`	`Module`	nn.Module to stack	required
`times`	`int`	Times to clone	required

Returns:

Type	Description
`List[torch.nn.modules.module.Module]`	List[nn.Module]: List of identical clones of input layer

Source code in slp/util/pytorch.py

def repeat_layer(l: nn.Module, times: int) -> List[nn.Module]:
    """Clone a layer multiple times

    Args:
        l (nn.Module): nn.Module to stack
        times (int): Times to clone

    Returns:
        List[nn.Module]: List of identical clones of input layer
    """

    return [l] + [copy.deepcopy(l) for _ in range(times - 1)]

`rotate_tensor(l, n=1)`

Roate tensor by n positions to the right

Parameters:

Name	Type	Description	Default
`l`	`Tensor`	input tensor	required
`n`	`int`	positions to rotate. Defaults to 1.	`1`

Returns:

Type	Description
`Tensor`	torch.Tensor: rotated tensor

Source code in slp/util/pytorch.py

def rotate_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
    """Roate tensor by n positions to the right

    Args:
        l (torch.Tensor): input tensor
        n (int, optional): positions to rotate. Defaults to 1.

    Returns:
        torch.Tensor: rotated tensor
    """

    return torch.cat((l[n:], l[:n]))

`shift_tensor(l, n=1)`

Shift tensor by n positions

Parameters:

Name	Type	Description	Default
`l`	`Tensor`	input tensor	required
`n`	`int`	positions to shift. Defaults to 1.	`1`

Returns:

Type	Description
`Tensor`	torch.Tensor: shifted tensor

Source code in slp/util/pytorch.py

def shift_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
    """Shift tensor by n positions

    Args:
        l (torch.Tensor): input tensor
        n (int, optional): positions to shift. Defaults to 1.

    Returns:
        torch.Tensor: shifted tensor
    """
    out = rotate_tensor(l, n=n)
    out[-n:] = 0

    return out

`sort_sequences(inputs, lengths)`

Sort sequences according to lengths (descending)

Parameters:

Name	Type	Description	Default
`inputs`	`Tensor`	input sequences, size [B, T, D]	required
`lengths`	`Tensor`	length of each sequence, size [B]	required

Returns:

Type	Description
`Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]]`	Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]: (sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state)

Source code in slp/util/pytorch.py

def sort_sequences(
    inputs: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]]:
    """Sort sequences according to lengths (descending)

    Args:
        inputs (torch.Tensor): input sequences, size [B, T, D]
        lengths (torch.Tensor): length of each sequence, size [B]

    Returns:
        Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]:
            (sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state)
    """
    lengths_sorted, sorted_idx = lengths.sort(descending=True)
    _, unsorted_idx = sorted_idx.sort()

    def unsort(tt: torch.Tensor) -> torch.Tensor:
        """Restore original unsorted sequence"""

        return tt[unsorted_idx]

    return inputs[sorted_idx], lengths_sorted, unsort

`subsequent_mask(max_length)`

Generate subsequent (lower triangular) mask for transformer autoregressive tasks

Parameters:

Name	Type	Description	Default
`max_length`	`int`	Maximum sequence length	required

Returns:

Type	Description
`Tensor`	torch.Tensor: The subsequent mask

Source code in slp/util/pytorch.py

def subsequent_mask(max_length: int) -> torch.Tensor:
    """Generate subsequent (lower triangular) mask for transformer autoregressive tasks

    Args:
        max_length (int): Maximum sequence length

    Returns:
        torch.Tensor: The subsequent mask
    """
    mask = torch.ones(max_length, max_length)
    # Ignore typecheck because pytorch types are incomplete

    return mask.triu().t().unsqueeze(0).contiguous()  # type: ignore

`t(data, dtype=torch.float32, device='cpu', requires_grad=False)`

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This always copies data.

Parameters:

Name	Type	Description	Default
`data`	`Union[numpy.ndarray, torch.Tensor, List[~T]]`	(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.	required
`dtype`	`dtype`	(torch.dtype): The type of the tensor elements (Default value = torch.float)	`torch.float32`
`device`	`Union[torch.device, str]`	(torch.device, str): Device where the tensor should be (Default value = 'cpu')	`'cpu'`
`requires_grad`	`bool`	(bool): Trainable tensor or not? (Default value = False)	`False`

Returns:

Type	Description
`Tensor`	(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py

def t(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: types.Device = "cpu",
    requires_grad: bool = False,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
    is passed it is cast to  dtype, device and the requires_grad flag is
    set. This always copies data.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: (bool): Trainable tensor or not? (Default value = False)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """
    tt = torch.tensor(data, dtype=dtype, device=device, requires_grad=requires_grad)

    return tt

`t_(data, dtype=torch.float32, device='cpu', requires_grad=False)`

Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set IN PLACE.

Parameters:

Name	Type	Description	Default
`data`	`Union[numpy.ndarray, torch.Tensor, List[~T]]`	(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor.	required
`dtype`	`dtype`	(torch.dtype): The type of the tensor elements (Default value = torch.float)	`torch.float32`
`device`	`Union[torch.device, str]`	(torch.device, str): Device where the tensor should be (Default value = 'cpu')	`'cpu'`
`requires_grad`	`bool`	bool): Trainable tensor or not? (Default value = False)	`False`

Returns:

Type	Description
`Tensor`	(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data

Source code in slp/util/pytorch.py

def t_(
    data: types.NdTensor,
    dtype: torch.dtype = torch.float,
    device: Optional[types.Device] = "cpu",
    requires_grad: bool = False,
) -> torch.Tensor:
    """Convert a list or numpy array to torch tensor. If a torch tensor
    is passed it is cast to  dtype, device and the requires_grad flag is
    set IN PLACE.

    Args:
        data: (list, np.ndarray, torch.Tensor): Data to be converted to
            torch tensor.
        dtype: (torch.dtype): The type of the tensor elements
            (Default value = torch.float)
        device: (torch.device, str): Device where the tensor should be
            (Default value = 'cpu')
        requires_grad: bool): Trainable tensor or not? (Default value = False)

    Returns:
        (torch.Tensor): A tensor of appropriate dtype, device and
            requires_grad containing data

    """

    if isinstance(device, str):
        device = torch.device(device)

    tt = torch.as_tensor(data, dtype=dtype, device=device).requires_grad_(requires_grad)

    return tt

`to_device(tt, device='cpu', non_blocking=False)`

Send a tensor to a device

Parameters:

Name	Type	Description	Default
`tt`	`Tensor`	input tensor	required
`device`	`Union[torch.device, str]`	Output device. Defaults to "cpu".	`'cpu'`
`non_blocking`	`bool`	Use blocking or non-blocking memory transfer. Defaults to False.	`False`

Returns:

Type	Description
`Tensor`	torch.Tensor: Tensor in the desired device

Source code in slp/util/pytorch.py

def to_device(
    tt: torch.Tensor, device: Optional[types.Device] = "cpu", non_blocking: bool = False
) -> torch.Tensor:
    """Send a tensor to a device

    Args:
        tt (torch.Tensor): input tensor
        device (Optional[types.Device], optional): Output device. Defaults to "cpu".
        non_blocking (bool, optional): Use blocking or non-blocking memory transfer. Defaults to False.

    Returns:
        torch.Tensor: Tensor in the desired device
    """

    return tt.to(device, non_blocking=non_blocking)

`date_fname()`

date_fname Generate a filename based on datetime.now().

If multiple calls are made within the same second, the filename will not be unique. We could add miliseconds etc. in the fname but that would hinder readability. For practical purposes e.g. unique logs between different experiments this should be enough. Either way if we need a truly unique descriptor, there is the uuid module.

Returns:

Type	Description
`str`	str: A filename, e.g. 20210228-211832

Source code in slp/util/system.py

def date_fname() -> str:
    """date_fname Generate a filename based on datetime.now().

    If multiple calls are made within the same second, the filename will not be unique.
    We could add miliseconds etc. in the fname but that would hinder readability.
    For practical purposes e.g. unique logs between different experiments this should be enough.
    Either way if we need a truly unique descriptor, there is the uuid module.

    Returns:
        str: A filename, e.g. 20210228-211832
    """
    return datetime.now().strftime("%Y%m%d-%H%M%S")

`download_url(url, dest_path)`

download_url Download a file to a destination path given a URL

Parameters:

Name	Type	Description	Default
`url`	`str`	A url pointing to the file we want to download	required
`dest_path`	`str`	The destination path to write the file	required

Returns:

Type	Description
`str`	(str): The filename where the downloaded file is written

Source code in slp/util/system.py

def download_url(url: str, dest_path: str) -> str:
    """download_url Download a file to a destination path given a URL

    Args:
        url (str): A url pointing to the file we want to download
        dest_path (str): The destination path to write the file

    Returns:
        (str): The filename where the downloaded file is written
    """
    name = url.rsplit("/")[-1]
    dest = os.path.join(dest_path, name)
    safe_mkdirs(dest_path)
    response = urllib.request.urlopen(url)
    with open(dest, "wb") as fd:
        shutil.copyfileobj(response, fd)
    return dest

`has_internet_connection(timeout=3)`

has_internet_connection Check if you are connected to the internet

Check if internet connection exists by pinging Google DNS server

Host: 8.8.8.8 (google-public-dns-a.google.com) OpenPort: 53/tcp Service: domain (DNS/TCP)

Parameters:

Name	Type	Description	Default
`timeout`	`int`	Seconds to wait before giving up	`3`

Returns:

Type	Description
`bool`	bool: True if connection is established, False if we are not connected to the internet

Source code in slp/util/system.py

def has_internet_connection(timeout: int = 3) -> bool:
    """has_internet_connection Check if you are connected to the internet

    Check if internet connection exists by pinging Google DNS server

    Host: 8.8.8.8 (google-public-dns-a.google.com)
    OpenPort: 53/tcp
    Service: domain (DNS/TCP)

    Args:
        timeout (int): Seconds to wait before giving up

    Returns:
        bool: True if connection is established, False if we are not connected to the internet
    """
    host, port = "8.8.8.8", 53
    try:
        socket.setdefaulttimeout(timeout)
        socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
        return True
    except socket.error as ex:
        print(ex)
        return False

`is_file(inp)`

is_file Check if the provided string is valid file in the system path

Parameters:

Name	Type	Description	Default
`inp`	`Optional[str]`	A potential file or None	required

Returns:

Type	Description
`Union[validators.utils.ValidationFailure, bool]`	types.ValidationResult: True if a valid file is provided, False if the string is not a url

Examples:

>>> is_file("/bin/bash")
True
>>> is_file("/supercalifragilisticexpialidocious")  # This does not exist. I hope...
False

Source code in slp/util/system.py

def is_file(inp: Optional[str]) -> types.ValidationResult:
    """is_file Check if the provided string is valid file in the system path

    Args:
        inp (Optional[str]): A potential file or None

    Returns:
        types.ValidationResult: True if a valid file is provided, False if the string is not a url

    Examples:
        >>> is_file("/bin/bash")
        True
        >>> is_file("/supercalifragilisticexpialidocious")  # This does not exist. I hope...
        False
    """
    if not inp:
        return False
    return os.path.isfile(inp)

`is_subpath(child, parent)`

is_subpath Check if child path is a subpath of parent

Parameters:

Name	Type	Description	Default
`child`	`str`	Child path	required
`parent`	`str`	parent path	required

Returns:

Type	Description
`bool`	bool: True if child is a subpath of parent, false if not

Examples:

>>> is_subpath("/usr/bin/Xorg", "/usr")
True

Source code in slp/util/system.py

def is_subpath(child: str, parent: str) -> bool:
    """is_subpath Check if child path is a subpath of parent

    Args:
        child (str): Child path
        parent (str): parent path

    Returns:
        bool: True if child is a subpath of parent, false if not

    Examples:
        >>> is_subpath("/usr/bin/Xorg", "/usr")
        True
    """
    parent = os.path.abspath(parent)
    child = os.path.abspath(child)
    return cast(
        bool, os.path.commonpath([parent]) == os.path.commonpath([parent, child])
    )

`is_url(inp)`

is_url Check if the provided string is a URL

Parameters:

Name	Type	Description	Default
`inp`	`Optional[str]`	A potential link or None	required

Returns:

Type	Description
`Union[validators.utils.ValidationFailure, bool]`	types.ValidationResult: True if a valid url is provided, False if the string is not a url

Examples:

>>> is_url("Hello World")
ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
>>> is_url("http://google.com")
True

Source code in slp/util/system.py

def is_url(inp: Optional[str]) -> types.ValidationResult:
    """is_url Check if the provided string is a URL

    Args:
        inp (Optional[str]): A potential link or None

    Returns:
        types.ValidationResult: True if a valid url is provided, False if the string is not a url

    Examples:
        >>> is_url("Hello World")
        ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
        >>> is_url("http://google.com")
        True
    """
    if not inp:
        return False
    return validators.url(inp)

`json_dump(data, fname)`

json_dump Save dict to a json file

Parameters:

Name	Type	Description	Default
`data`	`Dict[~K, ~V]`	Dict to save	required
`fname`	`str`	Output json file	required

Source code in slp/util/system.py

def json_dump(data: types.GenericDict, fname: str) -> None:
    """json_dump Save dict to a json file

    Args:
        data (types.GenericDict): Dict to save
        fname (str): Output json file
    """
    with open(fname, "w") as fd:
        json.dump(data, fd)

`json_load(fname)`

json_load Load dict from a json file

Parameters:

Name	Type	Description	Default
`fname`	`str`	Json file to load	required

Returns:

Type	Description
`Dict[~K, ~V]`	types.GenericDict: Dict of loaded data

Source code in slp/util/system.py

def json_load(fname: str) -> types.GenericDict:
    """json_load Load dict from a json file

    Args:
        fname (str): Json file to load

    Returns:
        types.GenericDict: Dict of loaded data
    """
    with open(fname, "r") as fd:
        data = json.load(fd)
    return cast(types.GenericDict, data)

`pickle_dump(data, fname)`

pickle_dump Save data to pickle file

Parameters:

Name	Type	Description	Default
`data`	`Any`	Data to save	required
`fname`	`str`	Output pickle file	required

Source code in slp/util/system.py

def pickle_dump(data: Any, fname: str) -> None:
    """pickle_dump Save data to pickle file

    Args:
        data (Any): Data to save
        fname (str): Output pickle file
    """
    with open(fname, "wb") as fd:
        pickle.dump(data, fd)

`pickle_load(fname)`

pickle_load Load data from pickle file

Parameters:

Name	Type	Description	Default
`fname`	`str`	file name of pickle file	required

Returns:

Type	Description
`Any`	Any: Loaded data

Source code in slp/util/system.py

def pickle_load(fname: str) -> Any:
    """pickle_load Load data from pickle file

    Args:
        fname (str): file name of pickle file

    Returns:
        Any: Loaded data
    """
    with open(fname, "rb") as fd:
        data = pickle.load(fd)
    return data

`print_separator(symbol='*', n=10, print_fn=<built-in function print>)`

print_separator Print a repeated symbol as a separator

Parameters:

Name	Type	Description	Default
`symbol`	`str`	Symbol to print	`'*'`
`n`	`int`	Number of times to print the symbol	`10`
`print_fn`	`Callable[[str], NoneType]`	Print function to use, e.g. print or logger.info	`<built-in function print>`

Examples:

>>> print_separator(symbol="-", n=2)
--

Source code in slp/util/system.py

def print_separator(
    symbol: str = "*", n: int = 10, print_fn: Callable[[str], None] = print
):
    """print_separator Print a repeated symbol as a separator

    *********************************************************

    Args:
        symbol (str): Symbol to print
        n (int): Number of times to print the symbol
        print_fn (Callable[[str], None]): Print function to use, e.g. print or logger.info

    Examples:
        >>> print_separator(symbol="-", n=2)
        --
    """
    print_fn(symbol * n)

`read_wav(wav_sample)`

read_wav Reads a wav clip into a string and returns the hex string.

Parameters:

Name	Type	Description	Default
`wav_sample`	`str`	Path to wav file	required

Returns:

Type	Description
`str`	A hex string with the audio information.

Source code in slp/util/system.py

def read_wav(wav_sample: str) -> str:
    """read_wav Reads a wav clip into a string and returns the hex string.

    Args:
        wav_sample (str): Path to wav file

    Returns:
        A hex string with the audio information.
    """
    with open(wav_sample, "r") as wav_fd:
        clip = wav_fd.read()
    return clip

`run_cmd(command)`

run_cmd Run given shell command

!!! args
    command (str): Shell command to run

!!! returns
    (int, str): Status code, stdout of shell command

!!! examples
    >>> run_cmd("ls /")
    (0, 'bin

boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')

Source code in slp/util/system.py

def run_cmd(command: str) -> Tuple[int, str]:
    """run_cmd Run given shell command

    Args:
        command (str): Shell command to run

    Returns:
        (int, str): Status code, stdout of shell command

    Examples:
        >>> run_cmd("ls /")
        (0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
    """
    command = f'{os.getenv("SHELL")} -c "{command}"'
    pipe = subprocess.Popen(
        command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT
    )

    stdout = ""
    if pipe.stdout is not None:
        stdout = "".join(
            [line.decode("utf-8") for line in iter(pipe.stdout.readline, b"")]
        )
        pipe.stdout.close()
    returncode = pipe.wait()
    return returncode, stdout

`run_cmd_silent(command)`

run_cmd_silent Run command without printing to console

!!! args
    command (str): Shell command to run

!!! returns
    (int, str): Status code, stdout of shell command

!!! examples
    >>> run_cmd("ls /")
    (0, 'bin

boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')

Source code in slp/util/system.py

def run_cmd_silent(command: str) -> Tuple[int, str]:
    """run_cmd_silent Run command without printing to console

    Args:
        command (str): Shell command to run

    Returns:
        (int, str): Status code, stdout of shell command

    Examples:
        >>> run_cmd("ls /")
        (0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
    """
    return cast(Tuple[int, str], suppress_print(run_cmd)(command))

`safe_mkdirs(path)`

Makes recursively all the directories in input path

Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to mkdir -p	required

Examples:

>>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")

Source code in slp/util/system.py

def safe_mkdirs(path: str) -> None:
    """Makes recursively all the directories in input path

    Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist

    Args:
        path (str): Path to mkdir -p

    Examples:
        >>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")
    """
    if not os.path.exists(path):
        try:
            os.makedirs(path)
        except Exception as e:
            logger.warning(e)
            raise IOError((f"Failed to create recursive directories: {path}"))

`suppress_print(func)`

suppress_print Decorator to supress stdout of decorated function

Examples:

>>> @slp.util.system.timethis
>>> def very_verbose_function(...): ...

Source code in slp/util/system.py

def suppress_print(func: Callable) -> Callable:
    """suppress_print Decorator to supress stdout of decorated function

    Examples:
        >>> @slp.util.system.timethis
        >>> def very_verbose_function(...): ...
    """

    def func_wrapper(*args: types.T, **kwargs: types.T):
        """Inner function for decorator closure"""
        with open("/dev/null", "w") as sys.stdout:
            ret = func(*args, **kwargs)
        sys.stdout = sys.__stdout__
        return ret

    return cast(Callable, func_wrapper)

`timethis(method=False)`

Decorator to measure the time it takes for a function to complete

Examples:

>>> @slp.util.system.timethis
>>> def time_consuming_function(...): ...

Source code in slp/util/system.py

def timethis(method=False) -> Callable:
    """Decorator to measure the time it takes for a function to complete

    Examples:
        >>> @slp.util.system.timethis
        >>> def time_consuming_function(...): ...
    """

    def timethis_inner(func: Callable) -> Callable:
        """Inner function for decorator closure"""

        @functools.wraps(func)
        def timed(*args: types.T, **kwargs: types.T):
            """Inner function for decorator closure"""

            ts = time.time()
            result = func(*args, **kwargs)
            te = time.time()
            elapsed = f"{te - ts}"
            if method:

                logger.info(
                    "BENCHMARK: {cls}.{f}(*{a}, **{kw}) took: {t} sec".format(
                        f=func.__name__, cls=args[0], a=args[1:], kw=kwargs, t=elapsed
                    )
                )
            else:
                logger.info(
                    "BENCHMARK: {f}(*{a}, **{kw}) took: {t} sec".format(
                        f=func.__name__, a=args, kw=kwargs, t=elapsed
                    )
                )
            return result

        return cast(Callable, timed)

    return timethis_inner

`write_wav(byte_str, wav_file)`

write_wav Write a hex string into a wav file

Parameters:

Name	Type	Description	Default
`byte_str`	`str`	The hex string containing the audio data	required
`wav_file`	`str`	The output wav file	required

Source code in slp/util/system.py

def write_wav(byte_str: str, wav_file: str) -> None:
    """write_wav Write a hex string into a wav file

    Args:
        byte_str (str): The hex string containing the audio data
        wav_file (str): The output wav file
    """
    with open(wav_file, "w") as fd:
        fd.write(byte_str)

`yaml_dump(data, fname)`

yaml_dump Save dict to a yaml file

Parameters:

Name	Type	Description	Default
`data`	`Dict[~K, ~V]`	Dict to save	required
`fname`	`str`	Output json file	required

Source code in slp/util/system.py

def yaml_dump(data: types.GenericDict, fname: str) -> None:
    """yaml_dump Save dict to a yaml file

    Args:
        data (types.GenericDict): Dict to save
        fname (str): Output json file
    """
    with open(fname, "w") as fd:
        yaml.dump(data, fd)

`yaml_load(fname)`

yaml_load Load dict from a yaml file

Parameters:

Name	Type	Description	Default
`fname`	`str`	Json file to load	required

Returns:

Type	Description
`Dict[~K, ~V]`	types.GenericDict: Dict of loaded data

Source code in slp/util/system.py

def yaml_load(fname: str) -> types.GenericDict:
    """yaml_load Load dict from a yaml file

    Args:
        fname (str): Json file to load

    Returns:
        types.GenericDict: Dict of loaded data
    """
    with open(fname, "r") as fd:
        data = yaml.load(fd)
    return cast(types.GenericDict, data)

`dir_path(path)`

dir_path Type to use when parsing a path in argparse arguments

Parameters:

Name	Type	Description	Default
`path`	`str`	User provided path	required

Exceptions:

Type	Description
`argparse.ArgumentTypeError`	Path does not exists, so argparse fails

Returns:

Type	Description
`str`	User provided path

Examples:

>>> from slp.util.types import dir_path
>>> import argparse
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--config", type=dir_path)
>>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
Traceback (most recent call last):
argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist

Source code in slp/util/types.py

def dir_path(path):
    """dir_path Type to use when parsing a path in argparse arguments


    Args:
        path (str): User provided path

    Raises:
        argparse.ArgumentTypeError: Path does not exists, so argparse fails

    Returns:
        str: User provided path

    Examples:
        >>> from slp.util.types import dir_path
        >>> import argparse
        >>> parser = argparse.ArgumentParser("My cool model")
        >>> parser.add_argument("--config", type=dir_path)
        >>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
        Traceback (most recent call last):
        argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist

    """

    if os.path.isdir(path):
        return path

    raise argparse.ArgumentTypeError(f"User provided path '{path}' does not exist")

API reference

generate_example_config(parser, output_file, args=None)

make_cli_parser(parser, datamodule_cls)

parse_config(parser, config_file, args=None, include_none=False)

SPECIAL_TOKENS

OmegaConfExtended

from_argparse(parser, args=None, include_none=False) staticmethod

from_yaml(file_) staticmethod

MultimodalSequenceClassificationCollator

__call__(self, batch) special

__init__(self, pad_indx=0, modalities={'audio', 'visual', 'text'}, label_key='label', max_length=-1, label_dtype=torch.float32, device='cpu') special

Seq2SeqCollator

__call__(self, batch) special

__init__(self, pad_indx=0, max_length=-1, device='cpu') special

SequenceClassificationCollator

__call__(self, batch) special

__init__(self, pad_indx=0, max_length=-1, device='cpu') special

EmbeddingsLoader

__init__(self, embeddings_file, dim, vocab=None, extra_tokens=None) special

__repr__(self) special

augment_embeddings(self, word2idx, idx2word, embeddings, token, emb=None)

in_accepted_vocab(self, word)

load(self)

HfCorpus

embeddings: None property readonly

frequencies: Dict[str, int] property readonly

idx2word: None property readonly

indices: List[List[int]] property readonly

raw: List[str] property readonly

tokenized: List[List[str]] property readonly

vocab: Set[str] property readonly

vocab_size: int property readonly

word2idx: None property readonly

__getitem__(self, idx) special

__init__(self, corpus, lower=True, tokenizer_model='bert-base-uncased', add_special_tokens=True, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs) special

__len__(self) special

TokenizedCorpus

embeddings: None property readonly

frequencies: Dict[str, int] property readonly

idx2word: Dict[int, str] property readonly

indices: Union[List[int], List[List[int]]] property readonly

raw: Union[List[str], List[List[str]]] property readonly

tokenized: Union[List[str], List[List[str]]] property readonly

vocab: Set[str] property readonly

vocab_size: int property readonly

word2idx: Dict[str, int] property readonly

__getitem__(self, idx) special

__init__(self, corpus, word2idx=None, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs) special

__len__(self) special

WordCorpus

embeddings: ndarray property readonly

frequencies: Dict[str, int] property readonly

idx2word: Dict[int, str] property readonly

indices: List[List[int]] property readonly

raw: List[str] property readonly

tokenized: List[List[str]] property readonly

vocab: Set[str] property readonly

vocab_size: int property readonly

word2idx: Dict[str, int] property readonly

__getitem__(self, idx) special

__init__(self, corpus, limit_vocab_size=30000, word2idx=None, idx2word=None, embeddings=None, embeddings_file=None, embeddings_dim=300, lower=True, special_tokens=<enum 'SPECIAL_TOKENS'>, prepend_bos=False, append_eos=False, lang='en_core_web_md', max_length=-1, **kwargs) special

__len__(self) special

create_vocab(corpus, vocab_size=-1, special_tokens=None)

CorpusDataset

__getitem__(self, idx) special

__init__(self, corpus, labels) special

__len__(self) special

map(self, t)

CorpusLMDataset

__getitem__(self, idx) special

__init__(self, corpus) special

__len__(self) special

map(self, t)

HuggingFaceTokenizer

__call__(self, x) special

__init__(self, lower=True, model='bert-base-uncased', add_special_tokens=True) special

detokenize(self, x)

ReplaceUnknownToken

__call__(self, x) special

__init__(self, old_unk='<unk>', new_unk='[UNK]') special

`generate_example_config(parser, output_file, args=None)`

`make_cli_parser(parser, datamodule_cls)`

`parse_config(parser, config_file, args=None, include_none=False)`

`SPECIAL_TOKENS`

`OmegaConfExtended`

`from_argparse(parser, args=None, include_none=False)` `staticmethod`

`from_yaml(file_)` `staticmethod`

`MultimodalSequenceClassificationCollator`

`call(self, batch)` `special`

`init(self, pad_indx=0, modalities={'audio', 'visual', 'text'}, label_key='label', max_length=-1, label_dtype=torch.float32, device='cpu')` `special`

`Seq2SeqCollator`

`call(self, batch)` `special`

`init(self, pad_indx=0, max_length=-1, device='cpu')` `special`

`SequenceClassificationCollator`

`call(self, batch)` `special`

`init(self, pad_indx=0, max_length=-1, device='cpu')` `special`

`EmbeddingsLoader`

`init(self, embeddings_file, dim, vocab=None, extra_tokens=None)` `special`

`repr(self)` `special`

`augment_embeddings(self, word2idx, idx2word, embeddings, token, emb=None)`

`in_accepted_vocab(self, word)`

`load(self)`

`HfCorpus`

`embeddings: None` `property` `readonly`

`frequencies: Dict[str, int]` `property` `readonly`

`idx2word: None` `property` `readonly`

`indices: List[List[int]]` `property` `readonly`

`raw: List[str]` `property` `readonly`

`tokenized: List[List[str]]` `property` `readonly`

`vocab: Set[str]` `property` `readonly`

`vocab_size: int` `property` `readonly`

`word2idx: None` `property` `readonly`

`getitem(self, idx)` `special`

`init(self, corpus, lower=True, tokenizer_model='bert-base-uncased', add_special_tokens=True, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)` `special`

`len(self)` `special`

`TokenizedCorpus`

`embeddings: None` `property` `readonly`

`frequencies: Dict[str, int]` `property` `readonly`

`idx2word: Dict[int, str]` `property` `readonly`

`indices: Union[List[int], List[List[int]]]` `property` `readonly`

`raw: Union[List[str], List[List[str]]]` `property` `readonly`

`tokenized: Union[List[str], List[List[str]]]` `property` `readonly`

`vocab: Set[str]` `property` `readonly`

`vocab_size: int` `property` `readonly`

`word2idx: Dict[str, int]` `property` `readonly`

`getitem(self, idx)` `special`

`init(self, corpus, word2idx=None, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)` `special`

`len(self)` `special`

`WordCorpus`

`embeddings: ndarray` `property` `readonly`

`frequencies: Dict[str, int]` `property` `readonly`

`idx2word: Dict[int, str]` `property` `readonly`

`indices: List[List[int]]` `property` `readonly`

`raw: List[str]` `property` `readonly`

`tokenized: List[List[str]]` `property` `readonly`

`vocab: Set[str]` `property` `readonly`

`vocab_size: int` `property` `readonly`

`word2idx: Dict[str, int]` `property` `readonly`

`getitem(self, idx)` `special`

`init(self, corpus, limit_vocab_size=30000, word2idx=None, idx2word=None, embeddings=None, embeddings_file=None, embeddings_dim=300, lower=True, special_tokens=<enum 'SPECIAL_TOKENS'>, prepend_bos=False, append_eos=False, lang='en_core_web_md', max_length=-1, **kwargs)` `special`

`len(self)` `special`

`create_vocab(corpus, vocab_size=-1, special_tokens=None)`

`CorpusDataset`

`getitem(self, idx)` `special`

`init(self, corpus, labels)` `special`

`len(self)` `special`

`map(self, t)`

`CorpusLMDataset`

`getitem(self, idx)` `special`

`init(self, corpus)` `special`

`len(self)` `special`

`map(self, t)`

`HuggingFaceTokenizer`

`call(self, x)` `special`

`init(self, lower=True, model='bert-base-uncased', add_special_tokens=True)` `special`

`detokenize(self, x)`

`ReplaceUnknownToken`

`call(self, x)` `special`

`init(self, old_unk='<unk>', new_unk='[UNK]')` `special`

`SentencepieceTokenizer`