API reference
generate_example_config(parser, output_file, args=None)
parse_config Parse a provided YAML config file and command line args and merge them
During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.
The precedence for merging is as follows * default cli args values < config file values < user provided cli args
E.g.:
- if you don't include a value in your configuration it will take the default value from the argparse arguments
- if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file
Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parser |
ArgumentParser |
The argument parser you want to use |
required |
output_file |
str |
Configuration file name or file descriptor to save example configuration |
required |
args |
Optional[List[str]] |
Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:] |
None |
Source code in slp/config/config_parser.py
def generate_example_config(
parser: argparse.ArgumentParser,
output_file: str,
args: Optional[List[str]] = None,
) -> None:
"""parse_config Parse a provided YAML config file and command line args and merge them
During experimentation we want ideally to have a configuration file with the model and training configuration,
but also be able to run quick experiments using command line args.
This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.
The precedence for merging is as follows
* default cli args values < config file values < user provided cli args
E.g.:
* if you don't include a value in your configuration it will take the default value from the argparse arguments
* if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file
Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)
Args:
parser (argparse.ArgumentParser): The argument parser you want to use
output_file (Union[str, IO]): Configuration file name or file descriptor to save example configuration
args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
Use this only for testing. By default it uses sys.argv[1:]
"""
config = parse_config(parser, None, include_none=True)
OmegaConf.save(config, output_file)
make_cli_parser(parser, datamodule_cls)
make_cli_parser Augment an argument parser for slp with the default arguments
Default arguments for training, logging, optimization etc. are added to the input {parser}. If you use make_cli_parser, the following command line arguments will be included
!!! usage "my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]"
[--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
[--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
[--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
[--lr-patience LR_SCHEDULE.PATIENCE]
[--lr-cooldown LR_SCHEDULE.COOLDOWN]
[--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
[--experiment-name TRAINER.EXPERIMENT_NAME]
[--run-id TRAINER.RUN_ID]
[--experiment-group TRAINER.EXPERIMENT_GROUP]
[--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
[--save-top-k TRAINER.SAVE_TOP_K]
[--patience TRAINER.PATIENCE]
[--wandb-project TRAINER.WANDB_PROJECT]
[--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
[--stochastic_weight_avg] [--gpus TRAINER.GPUS]
[--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
[--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
[--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
[--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
[--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
[--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
[--gpus-per-trial TUNE.GPUS_PER_TRIAL]
[--cpus-per-trial TUNE.CPUS_PER_TRIAL]
[--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
[--val-percent DATA.VAL_PERCENT]
[--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
[--bsz-eval DATA.BATCH_SIZE_EVAL]
[--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
[--drop-last] [--no-shuffle-eval]
optional arguments:
-h, --help show this help message and exit
--hidden MODEL.INTERMEDIATE_HIDDEN
Intermediate hidden layers for linear module
--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
Which optimizer to use
--lr OPTIM.LR Learning rate
--weight-decay OPTIM.WEIGHT_DECAY
Learning rate
--lr-scheduler Use learning rate scheduling. Currently only
ReduceLROnPlateau is supported out of the box
--lr-factor LR_SCHEDULE.FACTOR
Multiplicative factor by which LR is reduced. Used if
--lr-scheduler is provided.
--lr-patience LR_SCHEDULE.PATIENCE
Number of epochs with no improvement after which
learning rate will be reduced. Used if --lr-scheduler
is provided.
--lr-cooldown LR_SCHEDULE.COOLDOWN
Number of epochs to wait before resuming normal
operation after lr has been reduced. Used if --lr-
scheduler is provided.
--min-lr LR_SCHEDULE.MIN_LR
Minimum lr for LR scheduling. Used if --lr-scheduler
is provided.
--seed SEED Seed for reproducibility
--config CONFIG Path to YAML configuration file
--experiment-name TRAINER.EXPERIMENT_NAME
Name of the running experiment
--run-id TRAINER.RUN_ID
Unique identifier for the current run. If not provided
it is inferred from datetime.now()
--experiment-group TRAINER.EXPERIMENT_GROUP
Group of current experiment. Useful when evaluating
for different seeds / cross-validation etc.
--experiments-folder TRAINER.EXPERIMENTS_FOLDER
Top-level folder where experiment results &
checkpoints are saved
--save-top-k TRAINER.SAVE_TOP_K
Save checkpoints for top k models
--patience TRAINER.PATIENCE
Number of epochs to wait before early stopping
--wandb-project TRAINER.WANDB_PROJECT
Wandb project under which results are saved
--tags [TRAINER.TAGS [TRAINER.TAGS ...]]
Tags for current run to make results searchable.
--stochastic_weight_avg
Use Stochastic weight averaging.
--gpus TRAINER.GPUS Number of GPUs to use
--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
Run validation every n epochs
--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
Clip gradients with ||grad(w)|| >= args.clip_grad_norm
--epochs TRAINER.MAX_EPOCHS
Maximum number of training epochs
--steps TRAINER.MAX_STEPS
Maximum number of training steps
--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
Truncated Back-propagation-through-time steps.
--debug If true, we run a full run on a small subset of the
input data and overfit 10 training batches
--offline If true, forces offline execution of wandb logger
--early-stop-on TRAINER.EARLY_STOP_ON
Metric for early stopping
--early-stop-mode {min,max}
Minimize or maximize early stopping metric
--num-trials TUNE.NUM_TRIALS
Number of trials to run for hyperparameter tuning
--gpus-per-trial TUNE.GPUS_PER_TRIAL
How many gpus to use for each trial. If gpus_per_trial
< 1 multiple trials are packed in the same gpu
--cpus-per-trial TUNE.CPUS_PER_TRIAL
How many cpus to use for each trial.
--tune-metric TUNE.METRIC
Tune this metric. Need to be one of the keys of
metrics_map passed into make_trainer_for_ray_tune.
--tune-mode {max,min}
Maximize or minimize metric
--val-percent DATA.VAL_PERCENT
Percent of validation data to be randomly split from
the training set, if no validation set is provided
--test-percent DATA.TEST_PERCENT
Percent of test data to be randomly split from the
training set, if no test set is provided
--bsz DATA.BATCH_SIZE
Training batch size
--bsz-eval DATA.BATCH_SIZE_EVAL
Evaluation batch size
--num-workers DATA.NUM_WORKERS
Number of workers to be used in the DataLoader
--no-pin-memory Don't pin data to GPU memory when transferring
--drop-last Drop last incomplete batch
--no-shuffle-eval Don't shuffle val & test sets
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parser |
ArgumentParser |
A parent argument to be augmented |
required |
datamodule_cls |
LightningDataModule |
A data module class that injects arguments through the add_argparse_args method |
required |
Returns:
Type | Description |
---|---|
ArgumentParser |
argparse.ArgumentParser: The augmented command line parser |
Examples:
>>> import argparse
>>> from slp.plbind.dm import PLDataModuleFromDatasets
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int) # Create parser with model arguments and anything else you need
>>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
>>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
>>> args.data.batch_size
64
>>> args.optim.lr
0.01
Source code in slp/config/config_parser.py
def make_cli_parser(
parser: argparse.ArgumentParser, datamodule_cls: pl.LightningDataModule
) -> argparse.ArgumentParser:
"""make_cli_parser Augment an argument parser for slp with the default arguments
Default arguments for training, logging, optimization etc. are added to the input {parser}.
If you use make_cli_parser, the following command line arguments will be included
usage: my_script.py [-h] [--hidden MODEL.INTERMEDIATE_HIDDEN]
[--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}]
[--lr OPTIM.LR] [--weight-decay OPTIM.WEIGHT_DECAY]
[--lr-scheduler] [--lr-factor LR_SCHEDULE.FACTOR]
[--lr-patience LR_SCHEDULE.PATIENCE]
[--lr-cooldown LR_SCHEDULE.COOLDOWN]
[--min-lr LR_SCHEDULE.MIN_LR] [--seed SEED] [--config CONFIG]
[--experiment-name TRAINER.EXPERIMENT_NAME]
[--run-id TRAINER.RUN_ID]
[--experiment-group TRAINER.EXPERIMENT_GROUP]
[--experiments-folder TRAINER.EXPERIMENTS_FOLDER]
[--save-top-k TRAINER.SAVE_TOP_K]
[--patience TRAINER.PATIENCE]
[--wandb-project TRAINER.WANDB_PROJECT]
[--tags [TRAINER.TAGS [TRAINER.TAGS ...]]]
[--stochastic_weight_avg] [--gpus TRAINER.GPUS]
[--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH]
[--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL]
[--epochs TRAINER.MAX_EPOCHS] [--steps TRAINER.MAX_STEPS]
[--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS] [--debug]
[--offline] [--early-stop-on TRAINER.EARLY_STOP_ON]
[--early-stop-mode {min,max}] [--num-trials TUNE.NUM_TRIALS]
[--gpus-per-trial TUNE.GPUS_PER_TRIAL]
[--cpus-per-trial TUNE.CPUS_PER_TRIAL]
[--tune-metric TUNE.METRIC] [--tune-mode {max,min}]
[--val-percent DATA.VAL_PERCENT]
[--test-percent DATA.TEST_PERCENT] [--bsz DATA.BATCH_SIZE]
[--bsz-eval DATA.BATCH_SIZE_EVAL]
[--num-workers DATA.NUM_WORKERS] [--no-pin-memory]
[--drop-last] [--no-shuffle-eval]
optional arguments:
-h, --help show this help message and exit
--hidden MODEL.INTERMEDIATE_HIDDEN
Intermediate hidden layers for linear module
--optimizer {Adam,AdamW,SGD,Adadelta,Adagrad,Adamax,ASGD,RMSprop}
Which optimizer to use
--lr OPTIM.LR Learning rate
--weight-decay OPTIM.WEIGHT_DECAY
Learning rate
--lr-scheduler Use learning rate scheduling. Currently only
ReduceLROnPlateau is supported out of the box
--lr-factor LR_SCHEDULE.FACTOR
Multiplicative factor by which LR is reduced. Used if
--lr-scheduler is provided.
--lr-patience LR_SCHEDULE.PATIENCE
Number of epochs with no improvement after which
learning rate will be reduced. Used if --lr-scheduler
is provided.
--lr-cooldown LR_SCHEDULE.COOLDOWN
Number of epochs to wait before resuming normal
operation after lr has been reduced. Used if --lr-
scheduler is provided.
--min-lr LR_SCHEDULE.MIN_LR
Minimum lr for LR scheduling. Used if --lr-scheduler
is provided.
--seed SEED Seed for reproducibility
--config CONFIG Path to YAML configuration file
--experiment-name TRAINER.EXPERIMENT_NAME
Name of the running experiment
--run-id TRAINER.RUN_ID
Unique identifier for the current run. If not provided
it is inferred from datetime.now()
--experiment-group TRAINER.EXPERIMENT_GROUP
Group of current experiment. Useful when evaluating
for different seeds / cross-validation etc.
--experiments-folder TRAINER.EXPERIMENTS_FOLDER
Top-level folder where experiment results &
checkpoints are saved
--save-top-k TRAINER.SAVE_TOP_K
Save checkpoints for top k models
--patience TRAINER.PATIENCE
Number of epochs to wait before early stopping
--wandb-project TRAINER.WANDB_PROJECT
Wandb project under which results are saved
--tags [TRAINER.TAGS [TRAINER.TAGS ...]]
Tags for current run to make results searchable.
--stochastic_weight_avg
Use Stochastic weight averaging.
--gpus TRAINER.GPUS Number of GPUs to use
--val-interval TRAINER.CHECK_VAL_EVERY_N_EPOCH
Run validation every n epochs
--clip-grad-norm TRAINER.GRADIENT_CLIP_VAL
Clip gradients with ||grad(w)|| >= args.clip_grad_norm
--epochs TRAINER.MAX_EPOCHS
Maximum number of training epochs
--steps TRAINER.MAX_STEPS
Maximum number of training steps
--tbtt_steps TRAINER.TRUNCATED_BPTT_STEPS
Truncated Back-propagation-through-time steps.
--debug If true, we run a full run on a small subset of the
input data and overfit 10 training batches
--offline If true, forces offline execution of wandb logger
--early-stop-on TRAINER.EARLY_STOP_ON
Metric for early stopping
--early-stop-mode {min,max}
Minimize or maximize early stopping metric
--num-trials TUNE.NUM_TRIALS
Number of trials to run for hyperparameter tuning
--gpus-per-trial TUNE.GPUS_PER_TRIAL
How many gpus to use for each trial. If gpus_per_trial
< 1 multiple trials are packed in the same gpu
--cpus-per-trial TUNE.CPUS_PER_TRIAL
How many cpus to use for each trial.
--tune-metric TUNE.METRIC
Tune this metric. Need to be one of the keys of
metrics_map passed into make_trainer_for_ray_tune.
--tune-mode {max,min}
Maximize or minimize metric
--val-percent DATA.VAL_PERCENT
Percent of validation data to be randomly split from
the training set, if no validation set is provided
--test-percent DATA.TEST_PERCENT
Percent of test data to be randomly split from the
training set, if no test set is provided
--bsz DATA.BATCH_SIZE
Training batch size
--bsz-eval DATA.BATCH_SIZE_EVAL
Evaluation batch size
--num-workers DATA.NUM_WORKERS
Number of workers to be used in the DataLoader
--no-pin-memory Don't pin data to GPU memory when transferring
--drop-last Drop last incomplete batch
--no-shuffle-eval Don't shuffle val & test sets
Args:
parser (argparse.ArgumentParser): A parent argument to be augmented
datamodule_cls (pytorch_lightning.LightningDataModule): A data module class that injects arguments through the add_argparse_args method
Returns:
argparse.ArgumentParser: The augmented command line parser
Examples:
>>> import argparse
>>> from slp.plbind.dm import PLDataModuleFromDatasets
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int) # Create parser with model arguments and anything else you need
>>> parser = make_cli_parser(parser, PLDataModuleFromDatasets)
>>> args = parser.parse_args(args=["--bsz", "64", "--lr", "0.01"])
>>> args.data.batch_size
64
>>> args.optim.lr
0.01
"""
parser = add_optimizer_args(parser)
parser = add_trainer_args(parser)
parser = add_tune_args(parser)
parser = datamodule_cls.add_argparse_args(parser)
return parser
parse_config(parser, config_file, args=None, include_none=False)
parse_config Parse a provided YAML config file and command line args and merge them
During experimentation we want ideally to have a configuration file with the model and training configuration, but also be able to run quick experiments using command line args. This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.
The precedence for merging is as follows * default cli args values < config file values < user provided cli args
E.g.:
- if you don't include a value in your configuration it will take the default value from the argparse arguments
- if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file
Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parser |
ArgumentParser |
The argument parser you want to use |
required |
config_file |
Union[str, IO] |
Configuration file name or file descriptor |
required |
args |
Optional[List[str]] |
Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:] |
None |
Returns:
Type | Description |
---|---|
Union[omegaconf.listconfig.ListConfig, omegaconf.dictconfig.DictConfig] |
OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object |
Examples:
>>> import io
>>> from slp.config.config_parser import parse_config
>>> mock_config_file = io.StringIO('''
model:
hidden: 100
''')
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 100}}
>>> type(cfg)
<class 'omegaconf.dictconfig.DictConfig'>
>>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
{'model': {'hidden': 200}}
>>> mock_config_file = io.StringIO('''
random_value: hello
''')
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 20}, 'random_value': 'hello'}
Source code in slp/config/config_parser.py
def parse_config(
parser: argparse.ArgumentParser,
config_file: Optional[Union[str, IO]],
args: Optional[List[str]] = None,
include_none: bool = False,
) -> Union[ListConfig, DictConfig]:
"""parse_config Parse a provided YAML config file and command line args and merge them
During experimentation we want ideally to have a configuration file with the model and training configuration,
but also be able to run quick experiments using command line args.
This function allows you to double dip, by overriding values in a YAML config file through user provided command line arguments.
The precedence for merging is as follows
* default cli args values < config file values < user provided cli args
E.g.:
* if you don't include a value in your configuration it will take the default value from the argparse arguments
* if you provide a cli arg (e.g. run the script with --bsz 64) it will override the value in the config file
Note we use an extended OmegaConf istance to achieve this (see slp.config.omegaconf.OmegaConf)
Args:
parser (argparse.ArgumentParser): The argument parser you want to use
config_file (Union[str, IO]): Configuration file name or file descriptor
args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
Use this only for testing. By default it uses sys.argv[1:]
Returns:
OmegaConf.DictConfig: The parsed configuration as an OmegaConf DictConfig object
Examples:
>>> import io
>>> from slp.config.config_parser import parse_config
>>> mock_config_file = io.StringIO('''
model:
hidden: 100
''')
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 100}}
>>> type(cfg)
<class 'omegaconf.dictconfig.DictConfig'>
>>> cfg = parse_config(parser, mock_config_file, args=["--hidden", "200"])
{'model': {'hidden': 200}}
>>> mock_config_file = io.StringIO('''
random_value: hello
''')
>>> cfg = parse_config(parser, mock_config_file)
{'model': {'hidden': 20}, 'random_value': 'hello'}
"""
# Merge Configurations Precedence: default kwarg values < default argparse values < config file values < user provided CLI args values
if config_file is not None:
dict_config = OmegaConf.from_yaml(config_file) # type: ignore
else:
dict_config = OmegaConf.create({})
user_cli, default_cli = OmegaConf.from_argparse(parser, include_none=include_none)
config = OmegaConf.merge(default_cli, dict_config, user_cli)
logger.info("Running with the following configuration")
logger.info(f"\n{OmegaConf.to_yaml(config)}")
return config
SPECIAL_TOKENS
SPECIAL_TOKENS Special Tokens for NLP applications
Default special tokens values and indices (compatible with BERT):
* [PAD]: 0
* [MASK]: 1
* [UNK]: 2
* [BOS]: 3
* [EOS]: 4
* [CLS]: 5
* [SEP]: 6
* [PAUSE]: 7
OmegaConfExtended
OmegaConfExtended Extended OmegaConf class, to include argparse style CLI arguments
Unfortunately the original authors are not interested into providing integration with argparse (https://github.com/omry/omegaconf/issues/569), so we have to get by with this extension
from_argparse(parser, args=None, include_none=False)
staticmethod
from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects
We parse the command line arguments and separate the user provided values and the default values. This is useful for merging with a config file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parser |
ArgumentParser |
Parser for argparse arguments |
required |
args |
Optional[List[str]] |
Optional input sys.argv style args. Useful for testing. Use this only for testing. By default it uses sys.argv[1:] |
None |
Returns:
Type | Description |
---|---|
Tuple[omegaconf.dictconfig.DictConfig, omegaconf.dictconfig.DictConfig] |
Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs |
Examples:
>>> import argparse
>>> from slp.config.omegaconf import OmegaConfExtended
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
>>> user_provided_args
{'model': {'hidden': 100}}
>>> default_args
{}
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
>>> user_provided_args
{}
>>> default_args
{'model': {'hidden': 20}}
Source code in slp/config/omegaconf.py
@staticmethod
def from_argparse(
parser: argparse.ArgumentParser,
args: Optional[List[str]] = None,
include_none: bool = False,
) -> Tuple[DictConfig, DictConfig]:
"""from_argparse Static method to convert argparse arguments into OmegaConf DictConfig objects
We parse the command line arguments and separate the user provided values and the default values.
This is useful for merging with a config file.
Args:
parser (argparse.ArgumentParser): Parser for argparse arguments
args (Optional[List[str]]): Optional input sys.argv style args. Useful for testing.
Use this only for testing. By default it uses sys.argv[1:]
Returns:
Tuple[omegaconf.DictConfig, omegaconf.DictConfig]: (user provided cli args, default cli args) as a tuple of omegaconf.DictConfigs
Examples:
>>> import argparse
>>> from slp.config.omegaconf import OmegaConfExtended
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--hidden", dest="model.hidden", type=int, default=20)
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser, args=["--hidden", "100"])
>>> user_provided_args
{'model': {'hidden': 100}}
>>> default_args
{}
>>> user_provided_args, default_args = OmegaConfExtended.from_argparse(parser)
>>> user_provided_args
{}
>>> default_args
{'model': {'hidden': 20}}
"""
dest_to_arg = {v.dest: k for k, v in parser._option_string_actions.items()}
all_args = vars(parser.parse_args(args=args))
provided_args = {}
default_args = {}
for k, v in all_args.items():
if dest_to_arg[k] in sys.argv:
provided_args[k] = v
else:
default_args[k] = v
provided = OmegaConf.create(_nest(provided_args, include_none=include_none))
defaults = OmegaConf.create(_nest(default_args, include_none=include_none))
return provided, defaults
from_yaml(file_)
staticmethod
Alias for OmegaConf.load OmegaConf.from_yaml got removed at some point. Bring it back
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_ |
Union[str, pathlib.Path, IO[Any]] |
file to load or file descriptor |
required |
Returns:
Type | Description |
---|---|
Union[omegaconf.dictconfig.DictConfig, omegaconf.listconfig.ListConfig] |
Union[DictConfig, ListConfig]: The loaded configuration |
Source code in slp/config/omegaconf.py
@staticmethod
def from_yaml(
file_: Union[str, pathlib.Path, IO[Any]]
) -> Union[DictConfig, ListConfig]:
"""Alias for OmegaConf.load
OmegaConf.from_yaml got removed at some point. Bring it back
Args:
file_ (Union[str, pathlib.Path, IO[Any]]): file to load or file descriptor
Returns:
Union[DictConfig, ListConfig]: The loaded configuration
"""
return OmegaConfExtended.load(file_)
MultimodalSequenceClassificationCollator
__call__(self, batch)
special
Call collate function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
List[Dict[str, torch.Tensor]] |
Batch of samples. It expects a list of dictionaries from modalities to torch tensors |
required |
Returns:
Type | Description |
---|---|
Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]] |
Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of (dict batched modality tensors, labels, dict of modality sequence lengths) |
Source code in slp/data/collators.py
def __call__(
self, batch: List[Dict[str, torch.Tensor]]
) -> Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]:
"""Call collate function
Args:
batch (List[Dict[str, torch.Tensor]]): Batch of samples.
It expects a list of dictionaries from modalities to torch tensors
Returns:
Tuple[Dict[str, torch.Tensor], torch.Tensor, Dict[str, torch.Tensor]]: tuple of
(dict batched modality tensors, labels, dict of modality sequence lengths)
"""
inputs = {}
lengths = {}
for m in self.modalities:
seq = self.extract_sequence(batch, m)
lengths[m] = torch.tensor([s.size(0) for s in seq], device=self.device)
if self.max_length > 0:
lengths[m] = torch.clamp(lengths[m], min=0, max=self.max_length)
inputs[m] = pad_sequence(
seq,
batch_first=True,
padding_value=self.pad_indx,
max_length=self.max_length,
).to(self.device)
targets: List[Label] = [b[self.label_key] for b in batch]
# Pad and convert to tensor
ttargets: torch.Tensor = mktensor(
targets, device=self.device, dtype=self.label_dtype
)
return inputs, ttargets.to(self.device), lengths
__init__(self, pad_indx=0, modalities={'audio', 'visual', 'text'}, label_key='label', max_length=-1, label_dtype=torch.float32, device='cpu')
special
Collate function for sequence classification tasks
- Perform padding
- Calculate sequence lengths
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pad_indx |
int |
Pad token index. Defaults to 0. |
0 |
modalities |
Set |
Which modalities are included in the batch dict |
{'audio', 'visual', 'text'} |
max_length |
int |
Pad sequences to a fixed maximum length |
-1 |
label_key |
str |
String to access the label in the batch dict |
'label' |
device |
str |
device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion. |
'cpu' |
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())
Source code in slp/data/collators.py
def __init__(
self,
pad_indx=0,
modalities={"visual", "text", "audio"},
label_key="label",
max_length=-1,
label_dtype=torch.float,
device="cpu",
):
"""Collate function for sequence classification tasks
* Perform padding
* Calculate sequence lengths
Args:
pad_indx (int): Pad token index. Defaults to 0.
modalities (Set): Which modalities are included in the batch dict
max_length (int): Pad sequences to a fixed maximum length
label_key (str): String to access the label in the batch dict
device (str): device of returned tensors. Leave this as "cpu".
The LightningModule will handle the Conversion.
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=MultimodalSequenceClassificationCollator())
"""
self.pad_indx = pad_indx
self.device = device
self.max_length = max_length
self.label_key = label_key
self.modalities = modalities
self.label_dtype = label_dtype
Seq2SeqCollator
__call__(self, batch)
special
Call collate function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
List[Tuple[torch.Tensor, torch.Tensor]] |
Batch of samples. It expects a list of tuples (source, target) Each source and target are a sequences of features or ids. |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor] |
Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths_inputs, lengths_targets) |
Source code in slp/data/collators.py
def __call__(
self, batch: List[Tuple[torch.Tensor, torch.Tensor]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
"""Call collate function
Args:
batch (List[Tuple[torch.Tensor, torch.Tensor]]): Batch of samples.
It expects a list of tuples (source, target)
Each source and target are a sequences of features or ids.
Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors
(inputs, labels, lengths_inputs, lengths_targets)
"""
inputs: List[torch.Tensor] = [b[0] for b in batch]
targets: List[torch.Tensor] = [b[1] for b in batch]
lengths_inputs = torch.tensor([s.size(0) for s in inputs], device=self.device)
lengths_targets = torch.tensor([s.size(0) for s in targets], device=self.device)
if self.max_length > 0:
lengths_inputs = torch.clamp(lengths_inputs, min=0, max=self.max_length)
lengths_targets = torch.clamp(lengths_targets, min=0, max=self.max_length)
inputs_padded: torch.Tensor = pad_sequence(
inputs,
batch_first=True,
padding_value=self.pad_indx,
max_length=self.max_length,
).to(self.device)
targets_padded: torch.Tensor = pad_sequence(
targets,
batch_first=True,
padding_value=self.pad_indx,
max_length=self.max_length,
).to(self.device)
return inputs_padded, targets_padded, lengths_inputs, lengths_targets
__init__(self, pad_indx=0, max_length=-1, device='cpu')
special
Collate function for seq2seq tasks
- Perform padding
- Calculate sequence lengths
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pad_indx |
int |
Pad token index. Defaults to 0. |
0 |
max_length |
int |
Pad sequences to a fixed maximum length |
-1 |
device |
str |
device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion. |
'cpu' |
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())
Source code in slp/data/collators.py
def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
"""Collate function for seq2seq tasks
* Perform padding
* Calculate sequence lengths
Args:
pad_indx (int): Pad token index. Defaults to 0.
max_length (int): Pad sequences to a fixed maximum length
device (str): device of returned tensors. Leave this as "cpu".
The LightningModule will handle the Conversion.
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=Seq2SeqClassificationCollator())
"""
self.pad_indx = pad_indx
self.max_length = max_length
self.device = device
SequenceClassificationCollator
__call__(self, batch)
special
Call collate function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
List[Tuple[torch.Tensor, Union[numpy.ndarray, torch.Tensor, List[~T], int]]] |
Batch of samples. It expects a list of tuples (inputs, label). |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor, torch.Tensor] |
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths) |
Source code in slp/data/collators.py
def __call__(
self, batch: List[Tuple[torch.Tensor, Label]]
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
"""Call collate function
Args:
batch (List[Tuple[torch.Tensor, slp.util.types.Label]]): Batch of samples.
It expects a list of tuples (inputs, label).
Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Returns tuple of batched tensors (inputs, labels, lengths)
"""
inputs: List[torch.Tensor] = [b[0] for b in batch]
targets: List[Label] = [b[1] for b in batch]
# targets: List[torch.tensor] = map(list, zip(*batch))
lengths = torch.tensor([s.size(0) for s in inputs], device=self.device)
if self.max_length > 0:
lengths = torch.clamp(lengths, min=0, max=self.max_length)
# Pad and convert to tensor
inputs_padded: torch.Tensor = pad_sequence(
inputs,
batch_first=True,
padding_value=self.pad_indx,
max_length=self.max_length,
).to(self.device)
ttargets: torch.Tensor = mktensor(targets, device=self.device, dtype=torch.long)
return inputs_padded, ttargets.to(self.device), lengths
__init__(self, pad_indx=0, max_length=-1, device='cpu')
special
Collate function for sequence classification tasks
- Perform padding
- Calculate sequence lengths
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pad_indx |
int |
Pad token index. Defaults to 0. |
0 |
max_length |
int |
Pad sequences to a fixed maximum length |
-1 |
device |
str |
device of returned tensors. Leave this as "cpu". The LightningModule will handle the Conversion. |
'cpu' |
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())
Source code in slp/data/collators.py
def __init__(self, pad_indx=0, max_length=-1, device="cpu"):
"""Collate function for sequence classification tasks
* Perform padding
* Calculate sequence lengths
Args:
pad_indx (int): Pad token index. Defaults to 0.
max_length (int): Pad sequences to a fixed maximum length
device (str): device of returned tensors. Leave this as "cpu".
The LightningModule will handle the Conversion.
Examples:
>>> dataloader = torch.utils.DataLoader(my_dataset, collate_fn=SequenceClassificationCollator())
"""
self.pad_indx = pad_indx
self.device = device
self.max_length = max_length
EmbeddingsLoader
__init__(self, embeddings_file, dim, vocab=None, extra_tokens=None)
special
Load word embeddings in text format
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embeddings_file |
str |
File where embeddings are stored (e.g. glove.6B.50d.txt) |
required |
dim |
int |
Dimensionality of embeddings |
required |
vocab |
Optional[Dict[str, int]] |
Load only embeddings in vocab. Defaults to None. |
None |
extra_tokens |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Create random embeddings for these special tokens. Defaults to None. |
None |
Source code in slp/data/corpus.py
def __init__(
self,
embeddings_file: str,
dim: int,
vocab: Optional[Dict[str, int]] = None,
extra_tokens: Optional[SPECIAL_TOKENS] = None,
) -> None:
"""Load word embeddings in text format
Args:
embeddings_file (str): File where embeddings are stored (e.g. glove.6B.50d.txt)
dim (int): Dimensionality of embeddings
vocab (Optional[Dict[str, int]]): Load only embeddings in vocab. Defaults to None.
extra_tokens (Optional[slp.config.nlp.SPECIAL_TOKENS]): Create random embeddings for these special tokens.
Defaults to None.
"""
self.embeddings_file = embeddings_file
self.vocab = vocab
self.cache_ = self._get_cache_name()
self.dim_ = dim
self.extra_tokens = extra_tokens
__repr__(self)
special
String representation of class
Source code in slp/data/corpus.py
def __repr__(self):
"""String representation of class"""
return f"{self.__class__.__name__}({self.embeddings_file}, {self.dim_})"
augment_embeddings(self, word2idx, idx2word, embeddings, token, emb=None)
Create a random embedding for a special token and append it to the embeddings array
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word2idx |
Dict[str, int] |
Current word2idx map |
required |
idx2word |
Dict[int, str] |
Current idx2word map |
required |
embeddings |
List[numpy.ndarray] |
Embeddings array as list of embeddings |
required |
token |
str |
The special token (e.g. [PAD]) |
required |
emb |
Optional[numpy.ndarray] |
Optional value for the embedding to be appended. Defaults to None, where a random embedding is created. |
None |
Returns:
Type | Description |
---|---|
Tuple[Dict[str, int], Dict[int, str], List[numpy.ndarray]] |
Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple |
Source code in slp/data/corpus.py
def augment_embeddings(
self,
word2idx: Dict[str, int],
idx2word: Dict[int, str],
embeddings: List[np.ndarray],
token: str,
emb: Optional[np.ndarray] = None,
) -> Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]:
"""Create a random embedding for a special token and append it to the embeddings array
Args:
word2idx (Dict[str, int]): Current word2idx map
idx2word (Dict[int, str]): Current idx2word map
embeddings (List[np.ndarray]): Embeddings array as list of embeddings
token (str): The special token (e.g. [PAD])
emb (Optional[np.ndarray]): Optional value for the embedding to be appended.
Defaults to None, where a random embedding is created.
Returns:
Tuple[Dict[str, int], Dict[int, str], List[np.ndarray]]: (word2idx, idx2word, embeddings) tuple
"""
word2idx[token] = len(embeddings)
idx2word[len(embeddings)] = token
if emb is None:
emb = np.random.uniform(low=-0.05, high=0.05, size=self.dim_)
embeddings.append(emb)
return word2idx, idx2word, embeddings
in_accepted_vocab(self, word)
Check if word exists in given vocabulary
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word |
str |
word from embeddings file |
required |
Returns:
Type | Description |
---|---|
bool |
bool: Word exists |
Source code in slp/data/corpus.py
def in_accepted_vocab(self, word: str) -> bool:
"""Check if word exists in given vocabulary
Args:
word (str): word from embeddings file
Returns:
bool: Word exists
"""
return True if self.vocab is None else word in self.vocab
load(self)
Read the word vectors from a text file
- Read embeddings
- Filter with given vocabulary
- Augment with special tokens
Returns:
Type | Description |
---|---|
Tuple[Dict[str, int], Dict[int, str], numpy.ndarray] |
types.Embeddings: (word2idx, idx2word, embeddings) tuple |
Source code in slp/data/corpus.py
@system.timethis(method=True)
def load(self) -> types.Embeddings:
"""Read the word vectors from a text file
* Read embeddings
* Filter with given vocabulary
* Augment with special tokens
Returns:
types.Embeddings: (word2idx, idx2word, embeddings) tuple
"""
# in order to avoid this time consuming operation, cache the results
try:
cache = self._load_cache()
logger.info("Loaded word embeddings from cache.")
return cache
except OSError:
logger.warning(f"Didn't find embeddings cache file {self.embeddings_file}")
logger.warning("Loading embeddings from file.")
# create the necessary dictionaries and the word embeddings matrix
if not os.path.exists(self.embeddings_file):
logger.critical(f"{self.embeddings_file} not found!")
raise OSError(errno.ENOENT, os.strerror(errno.ENOENT), self.embeddings_file)
logger.info(f"Indexing file {self.embeddings_file} ...")
# create the 2D array, which will be used for initializing
# the Embedding layer of a NN.
# We reserve the first row (idx=0), as the word embedding,
# which will be used for zero padding (word with id = 0).
if self.extra_tokens is not None:
word2idx, idx2word, embeddings = self.augment_embeddings(
{},
{},
[],
self.extra_tokens.PAD.value, # type: ignore
emb=np.zeros(self.dim_),
)
for token in self.extra_tokens: # type: ignore
logger.debug(f"Adding token {token.value} to embeddings matrix")
if token == self.extra_tokens.PAD:
continue
word2idx, idx2word, embeddings = self.augment_embeddings(
word2idx, idx2word, embeddings, token.value
)
else:
word2idx, idx2word, embeddings = self.augment_embeddings(
{}, {}, [], "[PAD]", emb=np.zeros(self.dim_)
)
# read file, line by line
with open(self.embeddings_file, "r") as f:
num_lines = sum(1 for line in f)
with open(self.embeddings_file, "r") as f:
index = len(embeddings)
for line in tqdm(
f, total=num_lines, desc="Loading word embeddings...", leave=False
):
# skip the first row if it is a header
if len(line.split()) < self.dim_:
continue
values = line.rstrip().split(" ")
word = values[0]
if word in word2idx:
continue
if not self.in_accepted_vocab(word):
continue
vector = np.asarray(values[1:], dtype=np.float32)
idx2word[index] = word
word2idx[word] = index
embeddings.append(vector)
index += 1
logger.info(f"Loaded {len(embeddings)} word vectors.")
embeddings_out = np.array(embeddings, dtype="float32")
# write the data to a cache file
self._dump_cache((word2idx, idx2word, embeddings_out))
return word2idx, idx2word, embeddings_out
HfCorpus
embeddings: None
property
readonly
Unused. Defined for compatibility
frequencies: Dict[str, int]
property
readonly
Retrieve wordpieces occurence counts
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: wordpieces occurence counts |
idx2word: None
property
readonly
Unused. Defined for compatibility
indices: List[List[int]]
property
readonly
Retrieve corpus as token indices
Returns:
Type | Description |
---|---|
List[List[int]] |
List[List[int]]: Token indices for corpus |
raw: List[str]
property
readonly
Retrieve raw corpus
Returns:
Type | Description |
---|---|
List[str] |
List[str]: Raw Corpus |
tokenized: List[List[str]]
property
readonly
Retrieve tokenized corpus
Returns:
Type | Description |
---|---|
List[List[str]] |
List[List[str]]: tokenized corpus |
vocab: Set[str]
property
readonly
Retrieve set of words in vocabulary
Returns:
Type | Description |
---|---|
Set[str] |
Set[str]: set of words in vocabulary |
vocab_size: int
property
readonly
Retrieve vocabulary size
Returns:
Type | Description |
---|---|
int |
int: Vocabulary size |
word2idx: None
property
readonly
Unused. Defined for compatibility
__getitem__(self, idx)
special
Get ith element in corpus as token indices
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
List[int] |
index in corpus |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of token indices for sentence |
Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
"""Get ith element in corpus as token indices
Args:
idx (List[int]): index in corpus
Returns:
List[int]: List of token indices for sentence
"""
out: List[int] = (
self.corpus_indices_[idx]
if self.max_length <= 0
else self.corpus_indices_[idx][: self.max_length]
)
return out
__init__(self, corpus, lower=True, tokenizer_model='bert-base-uncased', add_special_tokens=True, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)
special
Process a corpus using hugging face tokenizers
Select one of hugging face tokenizers and process corpus
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
List[str] |
List of sentences |
required |
lower |
bool |
Convert strings to lower case. Defaults to True. |
True |
tokenizer_model |
str |
Hugging face model to use. Defaults to "bert-base-uncased". |
'bert-base-uncased' |
add_special_tokens |
bool |
Add special tokens in sentence during tokenization. Defaults to True. |
True |
special_tokens |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
max_length |
int |
Crop sequences above this length. Defaults to -1 where sequences are left unaltered. |
-1 |
Source code in slp/data/corpus.py
def __init__(
self,
corpus: List[str],
lower: bool = True,
tokenizer_model: str = "bert-base-uncased",
add_special_tokens: bool = True,
special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
max_length: int = -1,
**kwargs,
):
"""Process a corpus using hugging face tokenizers
Select one of hugging face tokenizers and process corpus
Args:
corpus (List[str]): List of sentences
lower (bool): Convert strings to lower case. Defaults to True.
tokenizer_model (str): Hugging face model to use. Defaults to "bert-base-uncased".
add_special_tokens (bool): Add special tokens in sentence during tokenization. Defaults to True.
special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
Defaults to slp.config.nlp.SPECIAL_TOKENS.
max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
"""
self.corpus_ = corpus
self.max_length = max_length
logger.info(
f"Tokenizing corpus using hugging face tokenizer from {tokenizer_model}"
)
self.tokenizer = HuggingFaceTokenizer(
lower=lower, model=tokenizer_model, add_special_tokens=add_special_tokens
)
self.corpus_indices_ = [
self.tokenizer(s)
for s in tqdm(
self.corpus_, desc="Converting tokens to indices...", leave=False
)
]
self.tokenized_corpus_ = [
self.tokenizer.detokenize(s)
for s in tqdm(
self.corpus_indices_,
desc="Mapping indices to tokens...",
leave=False,
)
]
self.vocab_ = create_vocab(
self.tokenized_corpus_,
vocab_size=-1,
special_tokens=special_tokens,
)
__len__(self)
special
Number of samples in corpus
Returns:
Type | Description |
---|---|
int |
int: Corpus length |
Source code in slp/data/corpus.py
def __len__(self) -> int:
"""Number of samples in corpus
Returns:
int: Corpus length
"""
return len(self.corpus_indices_)
TokenizedCorpus
embeddings: None
property
readonly
Unused. Kept for compatibility
frequencies: Dict[str, int]
property
readonly
Retrieve wordpieces occurence counts
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: wordpieces occurence counts |
idx2word: Dict[int, str]
property
readonly
Retrieve idx2word mapping
Returns:
Type | Description |
---|---|
Dict[int, str] |
Dict[str, int]: idx2word mapping |
indices: Union[List[int], List[List[int]]]
property
readonly
Retrieve corpus as token indices
Returns:
Type | Description |
---|---|
Union[List[int], List[List[int]]] |
List[List[int]]: Token indices for corpus |
raw: Union[List[str], List[List[str]]]
property
readonly
Retrieve raw corpus
Returns:
Type | Description |
---|---|
Union[List[str], List[List[str]]] |
List[str]: Raw Corpus |
tokenized: Union[List[str], List[List[str]]]
property
readonly
Retrieve tokenized corpus
Returns:
Type | Description |
---|---|
Union[List[str], List[List[str]]] |
List[List[str]]: Tokenized corpus |
vocab: Set[str]
property
readonly
Retrieve set of words in vocabulary
Returns:
Type | Description |
---|---|
Set[str] |
Set[str]: set of words in vocabulary |
vocab_size: int
property
readonly
Retrieve vocabulary size
Returns:
Type | Description |
---|---|
int |
int: Vocabulary size |
word2idx: Dict[str, int]
property
readonly
Retrieve word2idx mapping
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: word2idx mapping |
__getitem__(self, idx)
special
Get ith element in corpus as token indices
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
List[int] |
index in corpus |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of token indices for sentence |
Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
"""Get ith element in corpus as token indices
Args:
idx (List[int]): index in corpus
Returns:
List[int]: List of token indices for sentence
"""
out: List[int] = (
self.corpus_indices_[idx]
if self.max_length <= 0
else self.corpus_indices_[idx][: self.max_length]
)
return out
__init__(self, corpus, word2idx=None, special_tokens=<enum 'SPECIAL_TOKENS'>, max_length=-1, **kwargs)
special
Wrap a corpus that's already tokenized
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
Union[List[str], List[List[str]]] |
List of tokens or List of lists of tokens |
required |
word2idx |
Dict[str, int] |
Token to index mapping. Defaults to None. |
None |
special_tokens |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special Tokens. Defaults to SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
Source code in slp/data/corpus.py
def __init__(
self,
corpus: Union[List[str], List[List[str]]],
word2idx: Dict[str, int] = None,
special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
max_length: int = -1,
**kwargs,
):
"""Wrap a corpus that's already tokenized
Args:
corpus (Union[List[str], List[List[str]]]): List of tokens or List of lists of tokens
word2idx (Dict[str, int], optional): Token to index mapping. Defaults to None.
special_tokens (Optional[SPECIAL_TOKENS], optional): Special Tokens. Defaults to SPECIAL_TOKENS.
"""
self.corpus_ = corpus
self.tokenized_corpus_ = corpus
self.max_length = max_length
self.vocab_ = create_vocab(
self.tokenized_corpus_,
vocab_size=-1,
special_tokens=special_tokens,
)
if word2idx is not None:
logger.info("Converting tokens to ids using word2idx.")
self.word2idx_ = word2idx
else:
logger.info(
"No word2idx provided. Will convert tokens to ids using an iterative counter."
)
self.word2idx_ = dict(zip(self.vocab_.keys(), itertools.count()))
self.idx2word_ = {v: k for k, v in self.word2idx_.items()}
self.to_token_ids = ToTokenIds(
self.word2idx_,
specials=SPECIAL_TOKENS, # type: ignore
)
if isinstance(self.tokenized_corpus_[0], list):
self.corpus_indices_ = [
self.to_token_ids(s)
for s in tqdm(
self.tokenized_corpus_,
desc="Converting tokens to token ids...",
leave=False,
)
]
else:
self.corpus_indices_ = self.to_token_ids(self.tokenized_corpus_) # type: ignore
__len__(self)
special
Number of samples in corpus
Returns:
Type | Description |
---|---|
int |
int: Corpus length |
Source code in slp/data/corpus.py
def __len__(self) -> int:
"""Number of samples in corpus
Returns:
int: Corpus length
"""
return len(self.corpus_indices_)
WordCorpus
embeddings: ndarray
property
readonly
Retrieve embeddings array
Returns:
Type | Description |
---|---|
ndarray |
np.ndarray: Array of pretrained word embeddings |
frequencies: Dict[str, int]
property
readonly
Retrieve word occurence counts
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: word occurence counts |
idx2word: Dict[int, str]
property
readonly
Retrieve idx2word mapping
Returns:
Type | Description |
---|---|
Dict[int, str] |
Dict[str, int]: idx2word mapping |
indices: List[List[int]]
property
readonly
Retrieve corpus as token indices
Returns:
Type | Description |
---|---|
List[List[int]] |
List[List[int]]: Token indices for corpus |
raw: List[str]
property
readonly
Retrieve raw corpus
Returns:
Type | Description |
---|---|
List[str] |
List[str]: Raw Corpus |
tokenized: List[List[str]]
property
readonly
Retrieve tokenized corpus
Returns:
Type | Description |
---|---|
List[List[str]] |
List[List[str]]: Tokenized corpus |
vocab: Set[str]
property
readonly
Retrieve set of words in vocabulary
Returns:
Type | Description |
---|---|
Set[str] |
Set[str]: set of words in vocabulary |
vocab_size: int
property
readonly
Retrieve vocabulary size for corpus
Returns:
Type | Description |
---|---|
int |
int: vocabulary size |
word2idx: Dict[str, int]
property
readonly
Retrieve word2idx mapping
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: word2idx mapping |
__getitem__(self, idx)
special
Get ith element in corpus as token indices
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
List[int] |
index in corpus |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of token indices for sentence |
Source code in slp/data/corpus.py
def __getitem__(self, idx) -> List[int]:
"""Get ith element in corpus as token indices
Args:
idx (List[int]): index in corpus
Returns:
List[int]: List of token indices for sentence
"""
out: List[int] = (
self.corpus_indices_[idx]
if self.max_length <= 0
else self.corpus_indices_[idx][: self.max_length]
)
return out
__init__(self, corpus, limit_vocab_size=30000, word2idx=None, idx2word=None, embeddings=None, embeddings_file=None, embeddings_dim=300, lower=True, special_tokens=<enum 'SPECIAL_TOKENS'>, prepend_bos=False, append_eos=False, lang='en_core_web_md', max_length=-1, **kwargs)
special
Load corpus embeddings, tokenize in words using spacy and convert to ids
This class handles the handling of a raw corpus. It handles:
- Tokenization into words (spacy)
- Loading of pretrained word embedding
- Calculation of word frequencies / corpus statistics
- Conversion to token ids
You can pass either:
- Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
- Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits where we want to pass the train split embeddings / word2idx.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
List[str] |
Corpus as a list of sentences |
required |
limit_vocab_size |
int |
Upper bound for number of most frequent tokens to keep. Defaults to 30000. |
30000 |
word2idx |
Optional[Dict[str, int]] |
Mapping of word to indices. Defaults to None. |
None |
idx2word |
Optional[Dict[int, str]] |
Mapping of indices to words. Defaults to None. |
None |
embeddings |
Optional[numpy.ndarray] |
Embeddings array. Defaults to None. |
None |
embeddings_file |
Optional[str] |
Embeddings file to read. Defaults to None. |
None |
embeddings_dim |
int |
Dimension of embeddings. Defaults to 300. |
300 |
lower |
bool |
Convert strings to lower case. Defaults to True. |
True |
special_tokens |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens to include in the vocabulary. Defaults to slp.config.nlp.SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
prepend_bos |
bool |
Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False. |
False |
append_eos |
bool |
Append End of Sequence token for seq2seq tasks. Defaults to False. |
False |
lang |
str |
Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md". |
'en_core_web_md' |
max_length |
int |
Crop sequences above this length. Defaults to -1 where sequences are left unaltered. |
-1 |
Source code in slp/data/corpus.py
def __init__(
self,
corpus: List[str],
limit_vocab_size: int = 30000,
word2idx: Optional[Dict[str, int]] = None,
idx2word: Optional[Dict[int, str]] = None,
embeddings: Optional[np.ndarray] = None,
embeddings_file: Optional[str] = None,
embeddings_dim: int = 300,
lower: bool = True,
special_tokens: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
prepend_bos: bool = False,
append_eos: bool = False,
lang: str = "en_core_web_md",
max_length: int = -1,
**kwargs,
):
"""Load corpus embeddings, tokenize in words using spacy and convert to ids
This class handles the handling of a raw corpus. It handles:
* Tokenization into words (spacy)
* Loading of pretrained word embedding
* Calculation of word frequencies / corpus statistics
* Conversion to token ids
You can pass either:
* Pass an embeddings file to load pretrained embeddings and create the word2idx mapping
* Pass already loaded embeddings array and word2idx. This is useful for the dev / test splits
where we want to pass the train split embeddings / word2idx.
Args:
corpus (List[List[str]]): Corpus as a list of sentences
limit_vocab_size (int): Upper bound for number of most frequent tokens to keep. Defaults to 30000.
word2idx (Optional[Dict[str, int]]): Mapping of word to indices. Defaults to None.
idx2word (Optional[Dict[int, str]]): Mapping of indices to words. Defaults to None.
embeddings (Optional[np.ndarray]): Embeddings array. Defaults to None.
embeddings_file (Optional[str]): Embeddings file to read. Defaults to None.
embeddings_dim (int): Dimension of embeddings. Defaults to 300.
lower (bool): Convert strings to lower case. Defaults to True.
special_tokens (Optional[SPECIAL_TOKENS]): Special tokens to include in the vocabulary.
Defaults to slp.config.nlp.SPECIAL_TOKENS.
prepend_bos (bool): Prepend Beginning of Sequence token for seq2seq tasks. Defaults to False.
append_eos (bool): Append End of Sequence token for seq2seq tasks. Defaults to False.
lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
max_length (int): Crop sequences above this length. Defaults to -1 where sequences are left unaltered.
"""
# FIXME: Extract super class to avoid repetition
self.corpus_ = corpus
self.max_length = max_length
self.tokenizer = SpacyTokenizer(
lower=lower,
prepend_bos=prepend_bos,
append_eos=append_eos,
specials=special_tokens,
lang=lang,
)
logger.info(f"Tokenizing corpus using spacy {lang}")
self.tokenized_corpus_ = [
self.tokenizer(s)
for s in tqdm(self.corpus_, desc="Tokenizing corpus...", leave=False)
]
self.vocab_ = create_vocab(
self.tokenized_corpus_,
vocab_size=limit_vocab_size if word2idx is None else -1,
special_tokens=special_tokens,
)
self.word2idx_, self.idx2word_, self.embeddings_ = None, None, None
# self.corpus_indices_ = self.tokenized_corpus_
if word2idx is not None:
logger.info("Word2idx was already provided. Going to used it.")
if embeddings_file is not None and word2idx is None:
logger.info(
f"Going to load {len(self.vocab_)} embeddings from {embeddings_file}"
)
loader = EmbeddingsLoader(
embeddings_file,
embeddings_dim,
vocab=self.vocab_,
extra_tokens=special_tokens,
)
word2idx, idx2word, embeddings = loader.load()
if embeddings is not None:
self.embeddings_ = embeddings
if idx2word is not None:
self.idx2word_ = idx2word
if word2idx is not None:
self.word2idx_ = word2idx
logger.info("Converting tokens to ids using word2idx.")
self.to_token_ids = ToTokenIds(
self.word2idx_,
specials=SPECIAL_TOKENS, # type: ignore
)
self.corpus_indices_ = [
self.to_token_ids(s)
for s in tqdm(
self.tokenized_corpus_,
desc="Converting tokens to token ids...",
leave=False,
)
]
logger.info("Filtering corpus vocabulary.")
updated_vocab = {}
for k, v in self.vocab_.items():
if k in self.word2idx_:
updated_vocab[k] = v
logger.info(
f"Out of {len(self.vocab_)} tokens {len(self.vocab_) - len(updated_vocab)} were not found in the pretrained embeddings."
)
self.vocab_ = updated_vocab
__len__(self)
special
Number of samples in corpus
Returns:
Type | Description |
---|---|
int |
int: Corpus length |
Source code in slp/data/corpus.py
def __len__(self) -> int:
"""Number of samples in corpus
Returns:
int: Corpus length
"""
return len(self.corpus_indices_)
create_vocab(corpus, vocab_size=-1, special_tokens=None)
Create the vocabulary based on tokenized input corpus
- Injects special tokens in the vocabulary
- Calculates the occurence count for each token
- Limits vocabulary to vocab_size most common tokens
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
Union[List[str], List[List[str]]] |
The tokenized corpus as a list of sentences or a list of tokenized sentences |
required |
vocab_size |
int |
[description]. Limit vocabulary to vocab_size most common tokens. Defaults to -1 which keeps all tokens. |
-1 |
special_tokens |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens to include in the vocabulary. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Dict[str, int] |
Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts |
Examples:
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
{'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
{'far': 2, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
{'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}
Source code in slp/data/corpus.py
def create_vocab(
corpus: Union[List[str], List[List[str]]],
vocab_size: int = -1,
special_tokens: Optional[SPECIAL_TOKENS] = None,
) -> Dict[str, int]:
"""Create the vocabulary based on tokenized input corpus
* Injects special tokens in the vocabulary
* Calculates the occurence count for each token
* Limits vocabulary to vocab_size most common tokens
Args:
corpus (Union[List[str], List[List[str]]]): The tokenized corpus as a list of sentences or a list of tokenized sentences
vocab_size (int): [description]. Limit vocabulary to vocab_size most common tokens.
Defaults to -1 which keeps all tokens.
special_tokens Optional[SPECIAL_TOKENS]: Special tokens to include in the vocabulary. Defaults to None.
Returns:
Dict[str, int]: Dictionary of all accepted tokens and their corresponding occurence counts
Examples:
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"])
{'far': 2, 'away': 1, 'galaxy': 1, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3)
{'far': 2, 'a': 1, 'in': 1}
>>> create_vocab(["in", "a", "galaxy", "far", "far", "away"], vocab_size=3, special_tokens=slp.config.nlp.SPECIAL_TOKENS)
{'[PAD]': 0, '[MASK]': 0, '[UNK]': 0, '[BOS]': 0, '[EOS]': 0, '[CLS]': 0, '[SEP]': 0, 'far': 2, 'a': 1, 'in': 1}
"""
if isinstance(corpus[0], list):
corpus = list(itertools.chain.from_iterable(corpus))
freq = Counter(corpus)
if special_tokens is None:
extra_tokens = []
else:
extra_tokens = special_tokens.to_list()
if vocab_size < 0:
vocab_size = len(freq)
take = min(vocab_size, len(freq))
logger.info(f"Keeping {vocab_size} most common tokens out of {len(freq)}")
def take0(x: Tuple[Any, Any]) -> Any:
"""Take first tuple element"""
return x[0]
common_words = list(map(take0, freq.most_common(take)))
common_words = list(set(common_words) - set(extra_tokens))
words = extra_tokens + common_words
if len(words) > vocab_size:
words = words[: vocab_size + len(extra_tokens)]
def token_freq(t):
"""Token frequeny"""
return 0 if t in extra_tokens else freq[t]
vocab = dict(zip(words, map(token_freq, words)))
logger.info(f"Vocabulary created with {len(vocab)} tokens.")
logger.info(f"The 10 most common tokens are:\n{freq.most_common(10)}")
return vocab
CorpusDataset
__getitem__(self, idx)
special
Get a source and target token from the corpus
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
int |
Token position |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
(processed sentence, label) |
Source code in slp/data/datasets.py
def __getitem__(self, idx):
"""Get a source and target token from the corpus
Args:
idx (int): Token position
Returns:
Tuple[torch.Tensor, torch.Tensor]: (processed sentence, label)
"""
text, target = self.corpus[idx], self.labels[idx]
if self.label_encoder is not None:
target = self.label_encoder.transform([target])[0]
for t in self.transforms:
text = t(text)
return text, target
__init__(self, corpus, labels)
special
Labeled corpus dataset
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
WordCorpus, HfCorpus etc.. |
Input corpus |
required |
labels |
List[Any] |
Labels for examples |
required |
Source code in slp/data/datasets.py
def __init__(self, corpus, labels):
"""Labeled corpus dataset
Args:
corpus (WordCorpus, HfCorpus etc..): Input corpus
labels (List[Any]): Labels for examples
"""
self.corpus = corpus
self.labels = labels
assert len(self.labels) == len(self.corpus), "Incompatible labels and corpus"
self.transforms = []
self.label_encoder = None
if isinstance(self.labels[0], str):
self.label_encoder = LabelEncoder().fit(self.labels)
__len__(self)
special
Length of corpus
Returns:
Type | Description |
---|---|
int |
Corpus Length |
Source code in slp/data/datasets.py
def __len__(self):
"""Length of corpus
Returns:
int: Corpus Length
"""
return len(self.corpus)
map(self, t)
Append a transform to self.transforms, in order to be applied to the data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
t |
Callable[[str], Any] |
Transform of input token |
required |
Returns:
Type | Description |
---|---|
CorpusDataset |
self |
Source code in slp/data/datasets.py
def map(self, t):
"""Append a transform to self.transforms, in order to be applied to the data
Args:
t (Callable[[str], Any]): Transform of input token
Returns:
CorpusDataset: self
"""
self.transforms.append(t)
return self
CorpusLMDataset
__getitem__(self, idx)
special
Get a source and target token from the corpus
Parameters:
Name | Type | Description | Default |
---|---|---|---|
idx |
int |
Token position |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
source=coprus[idx], target=corpus[idx+1] |
Source code in slp/data/datasets.py
def __getitem__(self, idx):
"""Get a source and target token from the corpus
Args:
idx (int): Token position
Returns:
Tuple[torch.Tensor, torch.Tensor]: source=coprus[idx], target=corpus[idx+1]
"""
src, tgt = self.source[idx], self.target[idx]
for t in self.transforms:
src = t(src)
tgt = t(tgt)
return src, tgt
__init__(self, corpus)
special
Wraps a tokenized dataset which is provided as a list of tokens
Targets = source shifted one token to the left (next token prediction)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
corpus |
List[str] or WordCorpus |
List of tokens |
required |
Source code in slp/data/datasets.py
def __init__(self, corpus):
"""Wraps a tokenized dataset which is provided as a list of tokens
Targets = source shifted one token to the left (next token prediction)
Args:
corpus (List[str] or WordCorpus): List of tokens
"""
self.source = corpus[:-1]
self.target = corpus[1:]
self.transforms = []
__len__(self)
special
Length of corpus
Returns:
Type | Description |
---|---|
int |
Corpus Length |
Source code in slp/data/datasets.py
def __len__(self):
"""Length of corpus
Returns:
int: Corpus Length
"""
return int(len(self.source))
map(self, t)
Append a transform to self.transforms, in order to be applied to the data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
t |
Callable[[str], Any] |
Transform of input token |
required |
Returns:
Type | Description |
---|---|
CorpusLMDataset |
self |
Source code in slp/data/datasets.py
def map(self, t):
"""Append a transform to self.transforms, in order to be applied to the data
Args:
t (Callable[[str], Any]): Transform of input token
Returns:
CorpusLMDataset: self
"""
self.transforms.append(t)
return self
HuggingFaceTokenizer
__call__(self, x)
special
Call to tokenize function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
str |
Input string |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of token ids |
Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[int]:
"""Call to tokenize function
Args:
x (str): Input string
Returns:
List[int]: List of token ids
"""
out: List[int] = self.tokenizer.encode(
x, add_special_tokens=self.add_special_tokens, max_length=65536
)
return out
__init__(self, lower=True, model='bert-base-uncased', add_special_tokens=True)
special
Apply one of huggingface tokenizers to a string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lower |
bool |
Lowercase string. Defaults to True. |
True |
model |
str |
Select transformer model. Defaults to "bert-base-uncased". |
'bert-base-uncased' |
add_special_tokens |
bool |
Insert special tokens to tokenized string. Defaults to True. |
True |
Source code in slp/data/transforms.py
def __init__(
self,
lower: bool = True,
model: str = "bert-base-uncased",
add_special_tokens: bool = True,
):
"""Apply one of huggingface tokenizers to a string
Args:
lower (bool): Lowercase string. Defaults to True.
model (str): Select transformer model. Defaults to "bert-base-uncased".
add_special_tokens (bool): Insert special tokens to tokenized string. Defaults to True.
"""
self.tokenizer = AutoTokenizer.from_pretrained(model, do_lower_case=lower)
self.vocab_size = len(self.tokenizer.vocab)
self.add_special_tokens = add_special_tokens
detokenize(self, x)
Convert list of token ids to list of tokens
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
List[int] |
List of token ids |
required |
Returns:
Type | Description |
---|---|
List[str] |
List[str]: List of tokens |
Source code in slp/data/transforms.py
def detokenize(self, x: List[int]) -> List[str]:
"""Convert list of token ids to list of tokens
Args:
x (List[int]): List of token ids
Returns:
List[str]: List of tokens
"""
out: List[str] = self.tokenizer.convert_ids_to_tokens(x)
return out
ReplaceUnknownToken
__call__(self, x)
special
Convert
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
List[str] |
List of tokens |
required |
Returns:
Type | Description |
---|---|
List[str] |
List[str]: List of tokens |
Source code in slp/data/transforms.py
def __call__(self, x: List[str]) -> List[str]:
"""Convert <unk> in list of tokens to [UNK]
Args:
x (List[str]): List of tokens
Returns:
List[str]: List of tokens
"""
return [w if w != self.old_unk else self.new_unk for w in x]
__init__(self, old_unk='<unk>', new_unk='[UNK]')
special
Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext
Parameters:
Name | Type | Description | Default |
---|---|---|---|
old_unk |
str |
Unk token in corpus. Defaults to " |
'<unk>' |
new_unk |
str |
Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value. |
'[UNK]' |
Source code in slp/data/transforms.py
def __init__(
self,
old_unk: str = "<unk>",
new_unk: str = SPECIAL_TOKENS.UNK.value, # type: ignore
):
"""Replace existing unknown tokens in the vocab to [UNK]. Useful for wikitext
Args:
old_unk (str): Unk token in corpus. Defaults to "<unk>".
new_unk (str): Desired unk value. Defaults to SPECIAL_TOKENS.UNK.value.
"""
self.old_unk = old_unk
self.new_unk = new_unk
SentencepieceTokenizer
__call__(self, x)
special
Call to tokenize function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
str |
Input string |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of tokens ids |
Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[int]:
"""Call to tokenize function
Args:
x (str): Input string
Returns:
List[int]: List of tokens ids
"""
if self.lower:
x = x.lower()
ids: List[int] = self.pre_id + self.tokenizer.encode_as_ids(x) + self.post_id
return ids
__init__(self, lower=True, model=None, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>)
special
Tokenize sentence using pretrained sentencepiece model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lower |
bool |
Lowercase string. Defaults to True. |
True |
model |
Optional[Any] |
Sentencepiece model. Defaults to None. |
None |
prepend_bos |
bool |
Prepend BOS for seq2seq. Defaults to False. |
False |
append_eos |
bool |
Append EOS for seq2seq. Defaults to False. |
False |
specials |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens. Defaults to SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
Source code in slp/data/transforms.py
def __init__(
self,
lower: bool = True,
model: Optional[Any] = None,
prepend_bos: bool = False,
append_eos: bool = False,
specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
):
"""Tokenize sentence using pretrained sentencepiece model
Args:
lower (bool): Lowercase string. Defaults to True.
model (Optional[Any]): Sentencepiece model. Defaults to None.
prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
append_eos (bool): Append EOS for seq2seq. Defaults to False.
specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
"""
self.tokenizer = spm.SentencePieceProcessor()
self.tokenizer.Load(model)
self.specials = specials
self.lower = lower
self.vocab_size = self.tokenizer.get_piece_size()
self.pre_id = []
self.post_id = []
if prepend_bos:
self.pre_id.append(self.tokenizer.piece_to_id(self.specials.BOS.value)) # type: ignore
if append_eos:
self.post_id.append(self.tokenizer.piece_to_id(self.specials.EOS.value)) # type: ignore
SpacyTokenizer
__call__(self, x)
special
Call to tokenize function
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
str |
Input string |
required |
Returns:
Type | Description |
---|---|
List[str] |
List[str]: List of tokens |
Source code in slp/data/transforms.py
def __call__(self, x: str) -> List[str]:
"""Call to tokenize function
Args:
x (str): Input string
Returns:
List[str]: List of tokens
"""
if self.lower:
x = x.lower()
out: List[str] = (
self.pre_id + [y.text for y in self.nlp.tokenizer(x)] + self.post_id
)
return out
__init__(self, lower=True, prepend_bos=False, append_eos=False, specials=<enum 'SPECIAL_TOKENS'>, lang='en_core_web_sm')
special
Apply spacy tokenizer to str
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lower |
bool |
Lowercase string. Defaults to True. |
True |
prepend_bos |
bool |
Prepend BOS for seq2seq. Defaults to False. |
False |
append_eos |
bool |
Append EOS for seq2seq. Defaults to False. |
False |
specials |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens. Defaults to SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
lang |
str |
Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md". |
'en_core_web_sm' |
Source code in slp/data/transforms.py
def __init__(
self,
lower: bool = True,
prepend_bos: bool = False,
append_eos: bool = False,
specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
lang: str = "en_core_web_sm",
):
"""Apply spacy tokenizer to str
Args:
lower (bool): Lowercase string. Defaults to True.
prepend_bos (bool): Prepend BOS for seq2seq. Defaults to False.
append_eos (bool): Append EOS for seq2seq. Defaults to False.
specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
lang (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
"""
self.lower = lower
self.specials = SPECIAL_TOKENS
self.lang = lang
self.pre_id = []
self.post_id = []
if prepend_bos:
self.pre_id.append(self.specials.BOS.value)
if append_eos:
self.post_id.append(self.specials.EOS.value)
self.nlp = self.get_nlp(name=lang, specials=specials)
get_nlp(self, name='en_core_web_sm', specials=<enum 'SPECIAL_TOKENS'>)
Get spacy nlp object for given lang and add SPECIAL_TOKENS
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md". |
'en_core_web_sm' |
specials |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens. Defaults to SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
Returns:
Type | Description |
---|---|
Language |
spacy.Language: spacy text-processing pipeline |
Source code in slp/data/transforms.py
def get_nlp(
self,
name: str = "en_core_web_sm",
specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
) -> spacy.Language:
"""Get spacy nlp object for given lang and add SPECIAL_TOKENS
Args:
name (str): Spacy language, e.g. el_core_web_sm, en_core_web_sm etc. Defaults to "en_core_web_md".
specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
Returns:
spacy.Language: spacy text-processing pipeline
"""
nlp = spacy.load(name)
if specials is not None:
for token in specials.to_list():
nlp.tokenizer.add_special_case(token, [{ORTH: token}])
return nlp
ToTensor
__call__(self, x)
special
Convert list of tokens or list of features to tensor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
List[Any] |
List of tokens or features |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: Resulting tensor |
Source code in slp/data/transforms.py
def __call__(self, x: List[Any]) -> torch.Tensor:
"""Convert list of tokens or list of features to tensor
Args:
x (List[Any]): List of tokens or features
Returns:
torch.Tensor: Resulting tensor
"""
return mktensor(x, device=self.device, dtype=self.dtype)
__init__(self, device='cpu', dtype=torch.int64)
special
To tensor convertor
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device |
str |
Device to map the tensor. Defaults to "cpu". |
'cpu' |
dtype |
dtype |
Type of resulting tensor. Defaults to torch.long. |
torch.int64 |
Source code in slp/data/transforms.py
def __init__(self, device: str = "cpu", dtype: torch.dtype = torch.long):
"""To tensor convertor
Args:
device (str): Device to map the tensor. Defaults to "cpu".
dtype (torch.dtype): Type of resulting tensor. Defaults to torch.long.
"""
self.device = device
self.dtype = dtype
ToTokenIds
__call__(self, x)
special
Convert list of tokens to list of token ids
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
List[str] |
List of tokens |
required |
Returns:
Type | Description |
---|---|
List[int] |
List[int]: List of token ids |
Source code in slp/data/transforms.py
def __call__(self, x: List[str]) -> List[int]:
"""Convert list of tokens to list of token ids
Args:
x (List[str]): List of tokens
Returns:
List[int]: List of token ids
"""
return [
self.word2idx[w] if w in self.word2idx else self.word2idx[self.unk_value]
for w in x
]
__init__(self, word2idx, specials=<enum 'SPECIAL_TOKENS'>)
special
Convert List of tokens to list of token ids
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word2idx |
Dict[str, int] |
Word to index mapping |
required |
specials |
Optional[slp.config.nlp.SPECIAL_TOKENS] |
Special tokens. Defaults to SPECIAL_TOKENS. |
<enum 'SPECIAL_TOKENS'> |
Source code in slp/data/transforms.py
def __init__(
self,
word2idx: Dict[str, int],
specials: Optional[SPECIAL_TOKENS] = SPECIAL_TOKENS, # type: ignore
):
"""Convert List of tokens to list of token ids
Args:
word2idx (Dict[str, int]): Word to index mapping
specials (Optional[SPECIAL_TOKENS]): Special tokens. Defaults to SPECIAL_TOKENS.
"""
self.word2idx = word2idx
self.unk_value = specials.UNK.value if specials is not None else "[UNK]" # type: ignore
Attention
__init__(self, attention_size=512, input_size=None, dropout=0.1)
special
Single-Headed Dot-product attention module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attention_size |
int |
Number of hidden features. Defaults to 512. |
512 |
input_size |
Optional[int] |
Input features. Defaults to None. If None input_size is set to attention_size. |
None |
dropout |
float |
Drop probability. Defaults to 0.1. |
0.1 |
Source code in slp/modules/attention.py
def __init__(
self,
attention_size: int = 512,
input_size: Optional[int] = None,
dropout: float = 0.1,
):
"""Single-Headed Dot-product attention module
Args:
attention_size (int): Number of hidden features. Defaults to 512.
input_size (Optional[int]): Input features. Defaults to None.
If None input_size is set to attention_size.
dropout (float): Drop probability. Defaults to 0.1.
"""
super(Attention, self).__init__()
if input_size is None:
input_size = attention_size
self.dk = input_size
self.k = nn.Linear(input_size, attention_size, bias=False)
self.q = nn.Linear(input_size, attention_size, bias=False)
self.v = nn.Linear(input_size, attention_size, bias=False)
self.dropout = dropout
reset_parameters(self.named_parameters())
forward(self, keys, queries=None, attention_mask=None)
Single-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keys |
Tensor |
[B, L, D] Keys tensor |
required |
queries |
Optional[torch.Tensor] |
Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None. |
None |
attention_mask |
Optional[torch.Tensor] |
Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L]) |
Source code in slp/modules/attention.py
def forward(
self,
keys: torch.Tensor,
queries: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
r"""Single-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
$$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
keys (torch.Tensor): [B, L, D] Keys tensor
queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.
Returns:
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
"""
if attention_mask is not None:
if len(list(attention_mask.size())) == 2:
attention_mask = attention_mask.unsqueeze(1)
if queries is None:
queries = keys
values = keys
k = self.k(keys) # (B, L, A)
q = self.q(queries)
v = self.v(values)
# weights => (B, L, L)
out, scores = attention(
k,
q,
v,
self.dk,
attention_mask=attention_mask,
dropout=self.dropout,
training=self.training,
)
return out, scores
MultiheadAttention
__init__(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)
special
Multi-Headed Dot-product attention module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attention_size |
int |
Number of hidden features. Defaults to 512. |
512 |
num_heads |
int |
Number of attention heads |
8 |
input_size |
Optional[int] |
Input features. Defaults to None. If None input_size is set to attention_size. |
None |
dropout |
float |
Drop probability. Defaults to 0.1. |
0.1 |
nystrom |
bool |
Use nystrom method for attention calculation. Defaults to False. |
False |
num_landmarks |
int |
Number of landmark points for nystrom attention. Defaults to 64. |
64 |
inverse_iterations |
int |
Number of iteration to calculate the inverse in nystrom attention. Defaults to 6. |
6 |
kernel_size |
Optional[int] |
Use residual convolution in the output. Defaults to None. |
None |
Source code in slp/modules/attention.py
def __init__(
self,
attention_size: int = 512,
num_heads: int = 8,
input_size: Optional[int] = None,
dropout: float = 0.1,
nystrom: bool = False,
num_landmarks: int = 64,
inverse_iterations: int = 6,
kernel_size: Optional[int] = None,
):
"""Multi-Headed Dot-product attention module
Args:
attention_size (int): Number of hidden features. Defaults to 512.
num_heads (int): Number of attention heads
input_size (Optional[int]): Input features. Defaults to None.
If None input_size is set to attention_size.
dropout (float): Drop probability. Defaults to 0.1.
nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
"""
super(MultiheadAttention, self).__init__()
if input_size is None:
input_size = attention_size
self.inverse_iterations = inverse_iterations
self.num_landmarks = num_landmarks
self.nystrom = nystrom
self.num_heads = num_heads
self.head_size = int(attention_size / num_heads)
self.dk = self.head_size
self.attention_size = attention_size
self.k = nn.Linear(input_size, attention_size, bias=False)
self.q = nn.Linear(input_size, attention_size, bias=False)
self.v = nn.Linear(input_size, attention_size, bias=False)
self.output = nn.Linear(attention_size, attention_size)
self.dropout = dropout
self.conv = None
if kernel_size is not None:
self.conv = nn.Conv2d(
in_channels=self.num_heads,
out_channels=self.num_heads,
kernel_size=(kernel_size, 1),
padding=(kernel_size // 2, 0),
bias=False,
groups=self.num_heads,
)
reset_parameters(self.named_parameters())
forward(self, keys, queries=None, attention_mask=None)
Multi-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
Each head performs dot-product attention
The outputs of multiple heads are concatenated and passed through a feedforward layer.
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keys |
torch.Tensor |
[B, L, D] Keys tensor |
required |
queries |
Optional[torch.Tensor] |
Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None. |
None |
attention_mask |
Optional[torch.Tensor] |
Optional [B, M, L] zero-one mask for sequence elements. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
(Reweighted values [B, L, D], attention scores [B, H, M, L]) |
Source code in slp/modules/attention.py
def forward(self, keys, queries=None, attention_mask=None):
r"""Multi-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
Each head performs dot-product attention
$$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$
The outputs of multiple heads are concatenated and passed through a feedforward layer.
$$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
keys (torch.Tensor): [B, L, D] Keys tensor
queries (Optional[torch.Tensor]): Optional [B, M, D] Queries tensor. If None queries = keys. Defaults to None.
attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.
Returns:
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
"""
_, seq_length, _ = keys.size()
if attention_mask is not None:
if attention_mask.ndim == 2:
attention_mask = attention_mask.unsqueeze(1)
attention_mask = attention_mask.unsqueeze(1)
if self.nystrom:
keys, attention_mask = pad_for_nystrom(
keys, self.num_landmarks, attention_mask=attention_mask
)
if queries is None:
queries = keys
values = keys
k = self.k(keys)
q = self.q(queries)
v = self.v(values)
k = split_heads(k, self.num_heads)
q = split_heads(q, self.num_heads)
v = split_heads(v, self.num_heads)
if self.nystrom:
# out = (B, H, L, A/H)
# scores = Tuple
out, scores = nystrom_attention(
k,
q,
v,
self.dk,
self.num_landmarks,
attention_mask=attention_mask,
inverse_iterations=self.inverse_iterations,
dropout=self.dropout,
training=self.training,
)
else:
# out => (B, H, L, A/H)
# scores => (B, H, L, L)
out, scores = attention(
k,
q,
v,
self.dk,
attention_mask=attention_mask,
dropout=self.dropout,
training=self.training,
)
if self.conv is not None:
if attention_mask is None or attention_mask.ndim > 2:
out += self.conv(v)
else:
attention_mask = attention_mask.squeeze()
out += self.conv(v * attention_mask[:, None, :, None])
# out => (B, H, L, A/H)
out = merge_heads(out)
if out.size(1) != seq_length:
out = out[:, :seq_length, :]
out = self.output(out)
return out, scores
MultiheadSelfAttention
__init__(self, attention_size=512, num_heads=8, input_size=None, dropout=0.1, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)
special
Multi-Headed Dot-product attention module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attention_size |
int |
Number of hidden features. Defaults to 512. |
512 |
num_heads |
int |
Number of attention heads |
8 |
input_size |
Optional[int] |
Input features. Defaults to None. If None input_size is set to attention_size. |
None |
dropout |
float |
Drop probability. Defaults to 0.1. |
0.1 |
Source code in slp/modules/attention.py
def __init__(
self,
attention_size: int = 512,
num_heads: int = 8,
input_size: Optional[int] = None,
dropout: float = 0.1,
nystrom: bool = False,
num_landmarks: int = 64,
inverse_iterations: int = 6,
kernel_size: Optional[int] = None,
):
"""Multi-Headed Dot-product attention module
Args:
attention_size (int): Number of hidden features. Defaults to 512.
num_heads (int): Number of attention heads
input_size (Optional[int]): Input features. Defaults to None.
If None input_size is set to attention_size.
dropout (float): Drop probability. Defaults to 0.1.
"""
super(MultiheadSelfAttention, self).__init__()
if input_size is None:
input_size = attention_size
self.inverse_iterations = inverse_iterations
self.num_landmarks = num_landmarks
self.nystrom = nystrom
self.num_heads = num_heads
self.head_size = int(attention_size / num_heads)
self.dk = self.head_size
self.attention_size = attention_size
self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
self.output = nn.Linear(attention_size, attention_size)
self.dropout = dropout
self.conv = None
if kernel_size is not None:
self.conv = nn.Conv2d(
in_channels=self.num_heads,
out_channels=self.num_heads,
kernel_size=(kernel_size, 1),
padding=(kernel_size // 2, 0),
bias=False,
groups=self.num_heads,
)
reset_parameters(self.named_parameters())
forward(self, x, attention_mask=None)
Multi-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
Each head performs dot-product attention
The outputs of multiple heads are concatenated and passed through a feedforward layer.
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
torch.Tensor |
[B, L, D] Keys tensor |
required |
attention_mask |
Optional[torch.Tensor] |
Optional [B, M, L] zero-one mask for sequence elements. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
(Reweighted values [B, L, D], attention scores [B, H, M, L]) |
Source code in slp/modules/attention.py
def forward(self, x, attention_mask=None):
r"""Multi-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
Each head performs dot-product attention
$$a_H = softmax(\frac{Q_H \cdot K_H^T}{\sqrt{d}}) \cdot V_H$$
The outputs of multiple heads are concatenated and passed through a feedforward layer.
$$a = W (a^{(1)}_{H} \mathbin\Vert a^{(2)}_{H} \dots) + b$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
x (torch.Tensor): [B, L, D] Keys tensor
attention_mask (Optional[torch.Tensor]): Optional [B, M, L] zero-one mask for sequence elements. Defaults to None.
Returns:
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, H, M, L])
"""
_, seq_length, _ = x.size()
if attention_mask is not None:
if attention_mask.ndim == 2:
attention_mask = attention_mask.unsqueeze(1)
attention_mask = attention_mask.unsqueeze(1)
if self.nystrom:
x, attention_mask = pad_for_nystrom(
x, self.num_landmarks, attention_mask=attention_mask
)
k, q, v = self.kqv(x).chunk(3, dim=-1)
k = split_heads(k, self.num_heads)
q = split_heads(q, self.num_heads)
v = split_heads(v, self.num_heads)
if self.nystrom:
# out = (B, H, L, A/H)
# scores = Tuple
out, scores = nystrom_attention(
k,
q,
v,
self.dk,
self.num_landmarks,
attention_mask=attention_mask,
inverse_iterations=self.inverse_iterations,
dropout=self.dropout,
training=self.training,
)
else:
# out => (B, H, L, A/H)
# scores => (B, H, L, L)
out, scores = attention(
k,
q,
v,
self.dk,
attention_mask=attention_mask,
dropout=self.dropout,
training=self.training,
)
if self.conv is not None:
if attention_mask is None or attention_mask.ndim > 2:
out = out + self.conv(v)
else:
attention_mask = attention_mask.squeeze()
out = out + self.conv(v * attention_mask[:, None, :, None])
# out => (B, H, L, A/H)
out = merge_heads(out)
if out.size(1) != seq_length:
out = out[:, -seq_length:, :]
out = self.output(out)
return out, scores
MultiheadTwowayAttention
__init__(self, attention_size=512, input_size=None, dropout=0.1, num_heads=8, residual=True, nystrom=False, num_landmarks=64, inverse_iterations=6, kernel_size=None)
special
Multihead twoway attention for multimodal fusion
This module performs two way attention for two input modality feature sequences. If att is the MultiheadAttention operation and x, y the input modality sequences, the operation is summarized as
If residual is True then a Vilbert-like residual connection is applied
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attention_size |
int |
Number of hidden features. Defaults to 512. |
512 |
num_heads |
int |
Number of attention heads |
8 |
input_size |
Optional[int] |
Input features. Defaults to None. If None input_size is set to attention_size. |
None |
dropout |
float |
Drop probability. Defaults to 0.1. |
0.1 |
nystrom |
bool |
Use nystrom method for attention calculation. Defaults to False. |
False |
num_landmarks |
int |
Number of landmark points for nystrom attention. Defaults to 64. |
64 |
inverse_iterations |
int |
Number of iteration to calculate the inverse in nystrom attention. Defaults to 6. |
6 |
kernel_size |
Optional[int] |
Use residual convolution in the output. Defaults to None. |
None |
residual |
bool |
Use vilbert-like residual connections for fusion. Defaults to True. |
True |
Source code in slp/modules/attention.py
def __init__(
self,
attention_size: int = 512,
input_size: Optional[int] = None,
dropout: float = 0.1,
num_heads: int = 8,
residual: bool = True,
nystrom: bool = False,
num_landmarks: int = 64,
inverse_iterations: int = 6,
kernel_size: Optional[int] = None,
):
r"""Multihead twoway attention for multimodal fusion
This module performs two way attention for two input modality feature sequences.
If att is the MultiheadAttention operation and x, y the input modality sequences,
the operation is summarized as
$$out = (att(x \rightarrow y), att(y \rightarrow x))$$
If residual is True then a Vilbert-like residual connection is applied
$$out = (att(x \rightarrow y) + x, att(y \rightarrow x) + y)$$
Args:
attention_size (int): Number of hidden features. Defaults to 512.
num_heads (int): Number of attention heads
input_size (Optional[int]): Input features. Defaults to None.
If None input_size is set to attention_size.
dropout (float): Drop probability. Defaults to 0.1.
nystrom (bool, optional): Use nystrom method for attention calculation. Defaults to False.
num_landmarks (int, optional): Number of landmark points for nystrom attention. Defaults to 64.
inverse_iterations (int, optional): Number of iteration to calculate the inverse in nystrom attention. Defaults to 6.
kernel_size (Optional[int], optional): Use residual convolution in the output. Defaults to None.
residual (bool, optional): Use vilbert-like residual connections for fusion. Defaults to True.
"""
super(MultiheadTwowayAttention, self).__init__()
self.xy = MultiheadAttention(
attention_size=attention_size,
input_size=input_size,
dropout=dropout,
num_heads=num_heads,
nystrom=nystrom,
num_landmarks=num_landmarks,
inverse_iterations=inverse_iterations,
kernel_size=kernel_size,
)
self.yx = MultiheadAttention(
attention_size=attention_size,
input_size=input_size,
dropout=dropout,
num_heads=num_heads,
nystrom=nystrom,
num_landmarks=num_landmarks,
inverse_iterations=inverse_iterations,
kernel_size=kernel_size,
)
self.residual = residual
forward(self, mod1, mod2, attention_mask=None)
x : (B, L, D) queries : (B, L, D) values : (B, L, D)
Source code in slp/modules/attention.py
def forward(self, mod1, mod2, attention_mask=None):
"""
x : (B, L, D)
queries : (B, L, D)
values : (B, L, D)
"""
out_mod1, _ = self.xy(mod1, queries=mod2, attention_mask=attention_mask)
out_mod2, _ = self.yx(mod2, queries=mod1, attention_mask=attention_mask)
if not self.residual:
return out_mod1, out_mod2
else:
# vilbert cross residual
# v + attention(v->a)
# a + attention(a->v)
out_mod1 += mod2
out_mod2 += mod1
return out_mod1, out_mod2
SelfAttention
__init__(self, attention_size=512, input_size=None, dropout=0.1)
special
Single-Headed Dot-product self attention module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attention_size |
int |
Number of hidden features. Defaults to 512. |
512 |
input_size |
Optional[int] |
Input features. Defaults to None. If None input_size is set to attention_size. |
None |
dropout |
float |
Drop probability. Defaults to 0.1. |
0.1 |
Source code in slp/modules/attention.py
def __init__(
self,
attention_size: int = 512,
input_size: Optional[int] = None,
dropout: float = 0.1,
):
"""Single-Headed Dot-product self attention module
Args:
attention_size (int): Number of hidden features. Defaults to 512.
input_size (Optional[int]): Input features. Defaults to None.
If None input_size is set to attention_size.
dropout (float): Drop probability. Defaults to 0.1.
"""
super(SelfAttention, self).__init__()
if input_size is None:
input_size = attention_size
self.dk = input_size
self.kqv = nn.Linear(input_size, 3 * attention_size, bias=False)
self.dropout = dropout
reset_parameters(self.named_parameters())
forward(self, x, attention_mask=None)
Single-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L, D] Input tensor |
required |
attention_mask |
Optional[torch.Tensor] |
Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor] |
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L]) |
Source code in slp/modules/attention.py
def forward(
self,
x: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, torch.Tensor]:
r"""Single-head scaled dot-product attention forward pass
Outputs the values, where features for each sequence element are weighted by their respective attention scores
$$a = softmax(\frac{Q}{K^T}){\sqrt{d}}) \dot V$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
x (torch.Tensor): [B, L, D] Input tensor
attention_mask (Optional[torch.Tensor]): Optional [B, L] or [B, M, L] zero-one mask for sequence elements. Defaults to None.
Returns:
Tuple[torch.Tensor, torch.Tensor]: (Reweighted values [B, L, D], attention scores [B, M, L])
"""
if attention_mask is not None:
if len(list(attention_mask.size())) == 2:
attention_mask = attention_mask.unsqueeze(1)
k, q, v = self.kqv(x).chunk(3, dim=-1) # (B, L, A)
# weights => (B, L, L)
out, scores = attention(
k,
q,
v,
self.dk,
attention_mask=attention_mask,
dropout=self.dropout,
training=self.training,
)
return out, scores
attention(k, q, v, dk, attention_mask=None, dropout=0.2, training=True)
Reweight values using scaled dot product attention
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k |
Tensor |
Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor |
required |
q |
Tensor |
Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor |
required |
v |
Tensor |
Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor |
required |
dk |
int |
Model dimension |
required |
attention_mask |
Optional[torch.Tensor] |
Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None. |
None |
dropout |
float |
Drop probability. Defaults to 0.2. |
0.2 |
training |
bool |
Is module in training phase? Defaults to True. |
True |
Returns:
Type | Description |
---|---|
torch.Tensor |
[B, M, L] or [B, H, M, L] attention scores |
Source code in slp/modules/attention.py
def attention(
k: torch.Tensor,
q: torch.Tensor,
v: torch.Tensor,
dk: int,
attention_mask: Optional[torch.Tensor] = None,
dropout: float = 0.2,
training: bool = True,
):
r"""Reweight values using scaled dot product attention
$$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}}) V$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
dk (int): Model dimension
attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
preserved. Defaults to None.
dropout (float): Drop probability. Defaults to 0.2.
training (bool): Is module in training phase? Defaults to True.
Returns:
torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
"""
scores = attention_scores(
k, q, dk, attention_mask=attention_mask, dropout=dropout, training=training
)
out = torch.matmul(scores, v)
return out, scores
attention_scores(k, q, dk, attention_mask=None, dropout=0.2, training=True)
Calculate attention scores for scaled dot product attention
- B: Batch size
- L: Keys Sequence length
- M: Queries Sequence length
- H: Number of heads
- A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k |
Tensor |
Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor |
required |
q |
Tensor |
Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor |
required |
dk |
int |
Model dimension |
required |
attention_mask |
Optional[torch.Tensor] |
Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None. |
None |
dropout |
float |
Drop probability. Defaults to 0.2. |
0.2 |
training |
bool |
Is module in training phase? Defaults to True. |
True |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: [B, M, L] or [B, H, M, L] attention scores |
Source code in slp/modules/attention.py
def attention_scores(
k: torch.Tensor,
q: torch.Tensor,
dk: int,
attention_mask: Optional[torch.Tensor] = None,
dropout: float = 0.2,
training: bool = True,
) -> torch.Tensor:
r"""Calculate attention scores for scaled dot product attention
$$s = softmax(\frac{Q \cdot K^T}{\sqrt{d}})$$
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
dk (int): Model dimension
attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
preserved. Defaults to None.
dropout (float): Drop probability. Defaults to 0.2.
training (bool): Is module in training phase? Defaults to True.
Returns:
torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
"""
scores = torch.matmul(q, k.transpose(-1, -2)) / math.sqrt(dk)
if attention_mask is not None:
scores = scores + ((1 - attention_mask) * -1e5)
scores = F.softmax(scores, dim=-1)
scores = F.dropout(scores, p=dropout, training=training)
return scores
merge_heads(x)
Merge multiple attention heads into output tensor
(Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, H, L, A/H] multi-head tensor |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: [B, L, A] merged / reshaped tensor |
Source code in slp/modules/attention.py
def merge_heads(x: torch.Tensor) -> torch.Tensor:
"""Merge multiple attention heads into output tensor
(Batch size, Heads, Lengths, Attention size / Heads) => (Batch size, Length, Attention size)
Args:
x (torch.Tensor): [B, H, L, A/H] multi-head tensor
Returns:
torch.Tensor: [B, L, A] merged / reshaped tensor
"""
batch_size, _, max_length, _ = x.size()
# x => (B, L, H, A/H)
x = x.permute(0, 2, 1, 3).contiguous()
return x.view(batch_size, max_length, -1)
nystrom_attention(k, q, v, dk, num_landmarks, attention_mask=None, inverse_iterations=6, dropout=0.2, training=True)
Calculate attention using nystrom approximation
Implementation heavily based on: https://github.com/lucidrains/nystrom-attention
Reference: https://arxiv.org/abs/2102.03902 * B: Batch size * L: Keys Sequence length * M: Queries Sequence length * H: Number of heads * A: Feature dimension
Parameters:
Name | Type | Description | Default |
---|---|---|---|
k |
Tensor |
Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor |
required |
q |
Tensor |
Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor |
required |
v |
Tensor |
Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor |
required |
dk |
int |
Model dimension |
required |
num_landmarks |
int |
Number of landmark points |
required |
attention_mask |
Optional[torch.Tensor] |
Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be preserved. Defaults to None. |
None |
inverse_iterations |
int |
Number of iterations for Moore Penrose iterative inverse approximation |
6 |
dropout |
float |
Drop probability. Defaults to 0.2. |
0.2 |
training |
bool |
Is module in training phase? Defaults to True. |
True |
Returns:
Type | Description |
---|---|
torch.Tensor |
[B, M, L] or [B, H, M, L] attention scores |
Source code in slp/modules/attention.py
def nystrom_attention(
k: torch.Tensor,
q: torch.Tensor,
v: torch.Tensor,
dk: int,
num_landmarks: int,
attention_mask: Optional[torch.Tensor] = None,
inverse_iterations: int = 6,
dropout: float = 0.2,
training: bool = True,
):
"""Calculate attention using nystrom approximation
Implementation heavily based on: https://github.com/lucidrains/nystrom-attention
Reference: https://arxiv.org/abs/2102.03902
* B: Batch size
* L: Keys Sequence length
* M: Queries Sequence length
* H: Number of heads
* A: Feature dimension
Args:
k (torch.Tensor): Single head [B, L, A] or multi-head [B, H, L, A/H] Keys tensor
q (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Keys tensor
v (torch.Tensor): Single head [B, M, A] or multi-head [B, H, M, A/H] Values tensor
dk (int): Model dimension
num_landmarks (int): Number of landmark points
attention_mask (Optional[torch.Tensor]): Optional [B, [H], 1, L] pad mask or [B, [H], M, L] pad mask + subsequent mask
tensor with zeros in sequence indices that should be masked and ones in sequence indices that should be
preserved. Defaults to None.
inverse_iterations (int): Number of iterations for Moore Penrose iterative inverse
approximation
dropout (float): Drop probability. Defaults to 0.2.
training (bool): Is module in training phase? Defaults to True.
Returns:
torch.Tensor: [B, M, L] or [B, H, M, L] attention scores
"""
_, num_heads, seq_length, head_size = k.size()
masked_mean_denom = seq_length // num_landmarks
if attention_mask is not None:
attention_mask = attention_mask.unsqueeze(1)
masked_mean_denom = (
attention_mask.reshape(-1, 1, num_landmarks, seq_length // num_landmarks).sum(-1) + 1e-8 # type: ignore
) # (B, 1, Landmarks)
mask_landmarks = (masked_mean_denom > 0).type(torch.float) # type: ignore
masked_mean_denom = masked_mean_denom[..., None] # type: ignore
attention_mask = attention_mask.unsqueeze(-1)
q = q * attention_mask # (B, H, L, A/H)
k = k * attention_mask # (B, H, L, A/H)
v = v * attention_mask # (B, H, L, A/H)
scores_1_mask = attention_mask * mask_landmarks[..., None, :]
scores_2_mask = mask_landmarks[..., None] * mask_landmarks[..., None, :]
scores_3_mask = scores_1_mask.transpose(-1, -2)
q = q / math.sqrt(dk)
q_landmarks = q.reshape(
q.size(0), # batch_size
q.size(1), # num_heads
num_landmarks, # landmarks
seq_length // num_landmarks, # reduced length
q.size(-1), # head_size
).sum(
dim=-2
) # (B, H, Landmarks, A/H)
k_landmarks = k.reshape(
k.size(0), # batch_size
k.size(1), # num_heads
num_landmarks, # landmarks
seq_length // num_landmarks, # reduced length
k.size(-1), # head size
).sum(
dim=-2
) # (B, H, Landmarks, A/H)
k_landmarks = k_landmarks / masked_mean_denom
q_landmarks = q_landmarks / masked_mean_denom
scores_1 = attention_scores(
k_landmarks,
q,
1, # We have already accounted for dk
attention_mask=scores_1_mask,
dropout=dropout,
training=training,
)
scores_2 = attention_scores(
k_landmarks,
q_landmarks,
1, # We have already accounted for dk
attention_mask=scores_2_mask,
dropout=dropout,
training=training,
)
scores_3 = attention_scores(
k,
q_landmarks,
1, # We have already accounted for dk
attention_mask=scores_3_mask,
dropout=dropout,
training=training,
)
z_star = moore_penrose_pinv(scores_2, num_iter=inverse_iterations)
out = (scores_1 @ z_star) @ (scores_3 @ v)
return out, (scores_1, scores_2, scores_3)
pad_for_nystrom(x, num_landmarks, attention_mask=None)
Pad inputs and attention_mask to perform Nystrom Attention
Pad to nearest multiple of num_landmarks
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L, A] Input tensor |
required |
num_landmarks |
int |
Number of landmark points |
required |
attention_mask |
Optional[torch.Tensor] |
[B, L] Padding mask |
None |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, Optional[torch.Tensor]] |
Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask |
Source code in slp/modules/attention.py
def pad_for_nystrom(
x: torch.Tensor, num_landmarks: int, attention_mask: Optional[torch.Tensor] = None
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
"""Pad inputs and attention_mask to perform Nystrom Attention
Pad to nearest multiple of num_landmarks
Args:
x (torch.Tensor): [B, L, A] Input tensor
num_landmarks (int): Number of landmark points
attention_mask (Optional[torch.Tensor]): [B, L] Padding mask
Returns:
Tuple[torch.Tensor, Optional[torch.Tensor]]: Padded inputs and attention_mask
"""
if attention_mask is not None:
attention_mask = attention_mask.squeeze()
_, seq_length, _ = x.size()
_, remainder = (
math.ceil(seq_length / num_landmarks),
seq_length % num_landmarks,
)
if remainder > 0:
padding = num_landmarks - remainder
x = F.pad(x, (0, 0, padding, 0), value=0)
if attention_mask is not None:
attention_mask = F.pad(attention_mask, (padding, 0))
return x, attention_mask
reset_parameters(named_parameters)
Initialize parameters in the transformer model.
Source code in slp/modules/attention.py
def reset_parameters(named_parameters):
"""Initialize parameters in the transformer model."""
for name, p in named_parameters:
if "weight" in name:
nn.init.xavier_normal_(p)
if "bias" in name:
nn.init.constant_(p, 0.0)
split_heads(x, num_heads)
Split input tensor into multiple attention heads
(Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L, A] input tensor |
required |
num_heads |
int |
number of heads |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor |
Source code in slp/modules/attention.py
def split_heads(x: torch.Tensor, num_heads: int) -> torch.Tensor:
"""Split input tensor into multiple attention heads
(Batch size, Length, Attention size) => (Batch size, Heads, Lengths, Attention size / Heads)
Args:
x (torch.Tensor): [B, L, A] input tensor
num_heads (int): number of heads
Returns:
torch.Tensor: [B, H, L, A/H] Splitted / reshaped tensor
"""
batch_size, max_length, attention_size = x.size()
head_size = int(attention_size / num_heads)
return x.view(batch_size, max_length, num_heads, head_size).permute(0, 2, 1, 3)
Classifier
__init__(self, encoder, encoded_features, num_classes, dropout=0.2)
special
Classifier wrapper module
Stores a Neural Network encoder and adds a classification layer on top.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoder |
Module |
[description] |
required |
encoded_features |
int |
[description] |
required |
num_classes |
int |
[description] |
required |
dropout |
float |
Drop probability |
0.2 |
Source code in slp/modules/classifier.py
def __init__(
self,
encoder: nn.Module,
encoded_features: int,
num_classes: int,
dropout: float = 0.2,
):
"""Classifier wrapper module
Stores a Neural Network encoder and adds a classification layer on top.
Args:
encoder (nn.Module): [description]
encoded_features (int): [description]
num_classes (int): [description]
dropout (float): Drop probability
"""
super(Classifier, self).__init__()
self.encoder = encoder
self.drop = nn.Dropout(dropout)
self.clf = nn.Linear(encoded_features, num_classes)
forward(self, *args, **kwargs)
Encode inputs using the encoder network and perform classification
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: [B, *, num_classes] Logits tensor |
Source code in slp/modules/classifier.py
def forward(self, *args, **kwargs) -> torch.Tensor:
"""Encode inputs using the encoder network and perform classification
Returns:
torch.Tensor: [B, *, num_classes] Logits tensor
"""
encoded: torch.Tensor = self.encoder(*args, **kwargs) # type: ignore
out: torch.Tensor = self.drop(encoded)
out = self.clf(out)
return out
MOSEITextClassifier
forward(self, x, lengths)
Encode inputs using the encoder network and perform classification
Returns:
Type | Description |
---|---|
torch.Tensor |
[B, *, num_classes] Logits tensor |
Source code in slp/modules/classifier.py
def forward(self, x, lengths):
x = x["text"]
lengths = lengths["text"]
return super().forward(x, lengths)
RNNLateFusionClassifier
forward(self, inputs, lengths)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/classifier.py
def forward(self, inputs, lengths):
encoded = [
self.modality_encoders[m](inputs[m], lengths[m]) for m in self.modalities
]
if self.mmdrop is not None:
encoded = self.mmdrop(*encoded)
fused = torch.cat(encoded, dim=-1)
fused = self.drop(fused)
out = self.clf(fused)
return out
TransformerLateFusionClassifier
forward(self, inputs, attention_masks=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/classifier.py
def forward(self, inputs, attention_masks=None):
if attention_masks is None:
attention_masks = dict(
zip(self.modalities, [None for _ in self.modalities])
)
encoded = [
self.modality_encoders[m](inputs[m], attention_mask=attention_masks[m])
for m in self.modalities
]
if self.mmdrop is not None:
encoded = self.mmdrop(*encoded)
fused = torch.cat(encoded, dim=-1)
if self.modality_drop is not None:
fused = self.modality_drop(fused)
out = self.clf(fused)
return out
Embed
__init__(self, num_embeddings, embedding_dim, embeddings=None, noise=0.0, dropout=0.0, scale=1.0, trainable=False)
special
Define the layer of the model and perform the initializations of the layers (wherever it is necessary)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_embeddings |
int |
Total number of embeddings. |
required |
embedding_dim |
int |
Embedding dimension. |
required |
embeddings |
Optional[numpy.ndarray] |
the 2D ndarray with the word vectors. |
None |
noise |
float |
Optional additive noise. Defaults to 0.0. |
0.0 |
dropout |
float |
Embedding dropout probability. Defaults to 0.0. |
0.0 |
scale |
float |
Scale word embeddings by a constant. Defaults to 1.0. |
1.0 |
trainable |
bool |
Finetune embeddings. Defaults to False |
False |
Source code in slp/modules/embed.py
def __init__(
self,
num_embeddings: int,
embedding_dim: int,
embeddings: Optional[np.ndarray] = None,
noise: float = 0.0,
dropout: float = 0.0,
scale: float = 1.0,
trainable: bool = False,
):
"""
Define the layer of the model and perform the initializations
of the layers (wherever it is necessary)
Args:
num_embeddings (int): Total number of embeddings.
embedding_dim (int): Embedding dimension.
embeddings (numpy.ndarray): the 2D ndarray with the word vectors.
noise (float): Optional additive noise. Defaults to 0.0.
dropout (float): Embedding dropout probability. Defaults to 0.0.
scale (float): Scale word embeddings by a constant. Defaults to 1.0.
trainable (bool): Finetune embeddings. Defaults to False
"""
super(Embed, self).__init__()
self.scale = scale # scale embeddings by value. Needed for transformer
# define the embedding layer, with the corresponding dimensions
self.embedding = nn.Embedding(
num_embeddings=num_embeddings, embedding_dim=embedding_dim
)
if embeddings is not None:
logger.info("Initializing Embedding layer with pre-trained weights.")
if trainable:
logger.info("Embeddings are going to be finetuned")
else:
logger.info("Embeddings are frozen")
self.init_embeddings(embeddings, trainable)
# the dropout "layer" for the word embeddings
self.dropout = nn.Dropout(dropout)
# the gaussian noise "layer" for the word embeddings
self.noise = GaussianNoise(noise)
forward(self, x)
Embed input tokens
Assign embedding that corresponds to each token. Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L] Input token ids. |
required |
Returns:
Type | Description |
---|---|
Tensor |
(torch.Tensor) -> [B, L, E] Embedded tokens. |
Source code in slp/modules/embed.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Embed input tokens
Assign embedding that corresponds to each token.
Optionally add Gaussian noise and embedding dropout and scale embeddings by a constant.
Args:
x (torch.Tensor): [B, L] Input token ids.
Returns:
(torch.Tensor) -> [B, L, E] Embedded tokens.
"""
embeddings = self.embedding(x)
if self.noise.stddev > 0:
embeddings = self.noise(embeddings)
if self.dropout.p > 0:
embeddings = self.dropout(embeddings)
return embeddings * self.scale # type: ignore
init_embeddings(self, weights, trainable)
Initialize embeddings matrix with pretrained embeddings
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weights |
ndarray |
pretrained embeddings |
required |
trainable |
bool |
Finetune embeddings? |
required |
Source code in slp/modules/embed.py
def init_embeddings(self, weights: np.ndarray, trainable: bool):
"""Initialize embeddings matrix with pretrained embeddings
Args:
weights (np.ndarray): pretrained embeddings
trainable (bool): Finetune embeddings?
"""
self.embedding.weight = nn.Parameter(
torch.from_numpy(weights), requires_grad=trainable
)
PositionalEncoding
__init__(self, embedding_dim=512, max_len=5000)
special
Inject some information about the relative or absolute position of the tokens in the sequence.
The positional encodings have the same dimension as the embeddings, so that the two can be summed. Here, we use sine and cosine functions of different frequencies.
PE for even positions:
PE for odd positions:
where \(pos\) is the word position and \(i\) is the embedding idx
Implementation modified from pytorch/examples/word_language_model.py
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_dim |
int |
Embedding / model dimension. Defaults to 512. |
512 |
max_len |
int |
Maximum sequence length that can be encoded. Defaults to 5000. |
5000 |
Source code in slp/modules/embed.py
def __init__(self, embedding_dim: int = 512, max_len: int = 5000):
r"""Inject some information about the relative or absolute position of the tokens in the sequence.
The positional encodings have the same dimension as
the embeddings, so that the two can be summed. Here, we use sine and cosine
functions of different frequencies.
PE for even positions:
$$\text{PosEncoder}(pos, 2i) = sin(\frac{pos}{10000^{\frac{2i}{d}}})$$
PE for odd positions:
$$\text{PosEncoder}(pos, 2i+1) = cos(\frac{pos}{10000^{\frac{2i}{d}}})$$
where $pos$ is the word position and $i$ is the embedding idx
Implementation modified from pytorch/examples/word_language_model.py
Args:
embedding_dim (int): Embedding / model dimension. Defaults to 512.
max_len (int): Maximum sequence length that can be encoded. Defaults to 5000.
"""
super(PositionalEncoding, self).__init__()
pe = torch.zeros(max_len, embedding_dim)
position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
div_term = torch.exp(
torch.arange(0, embedding_dim, 2).float()
* (-math.log(10000.0) / embedding_dim)
)
pe[:, 0::2] = torch.sin(position * div_term)
pe[:, 1::2] = torch.cos(position * div_term)
pe = pe.unsqueeze(0)
self.register_buffer("pe", pe)
forward(self, x)
Calculate positional embeddings for input and add them to input tensor
x is assumed to be batch first
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L, D] input embeddings |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: Embeddings + positional embeddings |
Source code in slp/modules/embed.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Calculate positional embeddings for input and add them to input tensor
$$out = x + PosEmbed(x)$$
x is assumed to be batch first
Args:
x (torch.Tensor): [B, L, D] input embeddings
Returns:
torch.Tensor: Embeddings + positional embeddings
"""
x = x + self.pe[:, : x.size(1), :] # type: ignore
return x
PositionwiseFF
__init__(self, d_model, d_ff, dropout=0.1, gelu=False)
special
Transformer Position-wise feed-forward layer
Linear -> LayerNorm -> ReLU -> Linear
Parameters:
Name | Type | Description | Default |
---|---|---|---|
d_model |
int |
Model dimension |
required |
d_ff |
int |
Hidden dimension |
required |
dropout |
float |
Dropout probability. Defaults to 0.1. |
0.1 |
Source code in slp/modules/feedforward.py
def __init__(self, d_model: int, d_ff: int, dropout: float = 0.1, gelu=False):
"""Transformer Position-wise feed-forward layer
Linear -> LayerNorm -> ReLU -> Linear
Args:
d_model (int): Model dimension
d_ff (int): Hidden dimension
dropout (float): Dropout probability. Defaults to 0.1.
"""
super(PositionwiseFF, self).__init__()
self.ff1 = nn.Linear(d_model, d_ff)
self.ff2 = nn.Linear(d_ff, d_model)
self.drop = nn.Dropout(dropout)
self.activation = nn.ReLU() if not gelu else nn.GELU()
forward(self, x)
Position-wise FF forward pass
[B, , D] -> [B, , H] -> [B, *, D]
- B: Batch size
- D: Model dim
- H: Hidden size > Model dim (Usually \(H = 2D\))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, *, D] Input features |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: [B, *, D] Output features |
Source code in slp/modules/feedforward.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
r"""Position-wise FF forward pass
$$out = W_2 \dot max(0, W_1 \dot x + b_1) + b_2$$
[B, *, D] -> [B, *, H] -> [B, *, D]
* B: Batch size
* D: Model dim
* H: Hidden size > Model dim (Usually $H = 2D$)
Args:
x (torch.Tensor): [B, *, D] Input features
Returns:
torch.Tensor: [B, *, D] Output features
"""
out: torch.Tensor = self.ff2(self.drop(self.activation(self.ff1(x))))
return out
TwoLayer
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/feedforward.py
def forward(self, x):
out = self.l1(x)
out = self.drop(out)
out = self.act(out)
out = self.l2(out)
out = self.drop(out)
if self.residual:
out = x + out
return out
LayerNormTf
__init__(self, hidden_size, eps=1e-12)
special
Construct a layernorm module in the TF style (epsilon inside the square root). Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234
Source code in slp/modules/norm.py
def __init__(self, hidden_size: int, eps: float = 1e-12):
"""Construct a layernorm module in the TF style (epsilon inside the square root).
Link: https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L234
"""
super(LayerNormTf, self).__init__()
self.weight = nn.Parameter(torch.ones(hidden_size))
self.bias = nn.Parameter(torch.zeros(hidden_size))
self.variance_epsilon = eps
forward(self, x)
Calculate Layernorm the tf way
Source code in slp/modules/norm.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Calculate Layernorm the tf way"""
u = x.mean(-1, keepdim=True)
s = (x - u).pow(2).mean(-1, keepdim=True)
x = (x - u) / torch.sqrt(s + self.variance_epsilon)
return self.weight * x + self.bias
ScaleNorm
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/norm.py
def forward(self, x: torch.Tensor):
scaled_norm = self.g / safe_norm(x, dim=-1, keepdim=True).clamp(min=self.eps)
return scaled_norm * x
GaussianNoise
__init__(self, stddev, mean=0.0)
special
Additive Gaussian Noise layer
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stddev |
float |
the standard deviation of the distribution |
required |
mean |
float |
the mean of the distribution |
0.0 |
Source code in slp/modules/regularization.py
def __init__(self, stddev: float, mean: float = 0.0):
"""Additive Gaussian Noise layer
Args:
stddev (float): the standard deviation of the distribution
mean (float): the mean of the distribution
"""
super().__init__()
self.stddev = stddev
self.mean = mean
__repr__(self)
special
String representation of class
Source code in slp/modules/regularization.py
def __repr__(self):
"""String representation of class"""
return "{} (mean={}, stddev={})".format(
self.__class__.__name__, str(self.mean), str(self.stddev)
)
forward(self, x)
Gaussian noise forward pass
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
Input features. |
required |
Returns:
Type | Description |
---|---|
Tensor |
Source code in slp/modules/regularization.py
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Gaussian noise forward pass
Args:
x (torch.Tensor): Input features.
Returns:
[type]: [description]
"""
if self.training:
noise = Variable(x.data.new(x.size()).normal_(self.mean, self.stddev))
return x + noise
return x
AttentiveRNN
__init__(self, input_size, hidden_size=256, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False)
special
RNN with embedding layer and optional attention mechanism
Single-headed scaled dot-product attention is used as an attention mechanism
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
int |
Input features dimension |
required |
hidden_size |
int |
Hidden features |
256 |
batch_first |
bool |
Use batch first representation type. Defaults to True. |
True |
layers |
int |
Number of RNN layers. Defaults to 1. |
1 |
bidirectional |
bool |
Use bidirectional RNNs. Defaults to False. |
False |
merge_bi |
str |
How bidirectional states are merged. Defaults to "cat". |
'cat' |
dropout |
float |
Dropout probability. Defaults to 0.0. |
0.1 |
rnn_type |
str |
lstm or gru. Defaults to "lstm". |
'lstm' |
packed_sequence |
bool |
Use packed sequences. Defaults to True. |
True |
max_length |
int |
Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch |
-1 |
attention |
bool |
Use attention mechanism. Defaults to False |
False |
num_heads |
int |
Number of attention heads. If 1 uses single headed attention |
1 |
nystrom |
bool |
Use nystrom approximation for multihead attention |
True |
num_landmarks |
int |
Number of landmark sequence elements for nystrom attention |
32 |
kernel_size |
Optional[int] |
Kernel size for multihead attention output residual convolution |
33 |
inverse_iterations |
int |
Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value |
6 |
return_hidden |
bool |
Return all hidden states. Defaults to False. |
False |
Source code in slp/modules/rnn.py
def __init__(
self,
input_size: int,
hidden_size: int = 256,
batch_first: bool = True,
layers: int = 1,
bidirectional: bool = False,
merge_bi: str = "cat",
dropout: float = 0.1,
rnn_type: str = "lstm",
packed_sequence: bool = True,
attention: bool = False,
max_length: int = -1,
num_heads: int = 1,
nystrom: bool = True,
num_landmarks: int = 32,
kernel_size: Optional[int] = 33,
inverse_iterations: int = 6,
return_hidden: bool = False,
):
"""RNN with embedding layer and optional attention mechanism
Single-headed scaled dot-product attention is used as an attention mechanism
Args:
input_size (int): Input features dimension
hidden_size (int): Hidden features
batch_first (bool): Use batch first representation type. Defaults to True.
layers (int): Number of RNN layers. Defaults to 1.
bidirectional (bool): Use bidirectional RNNs. Defaults to False.
merge_bi (str): How bidirectional states are merged. Defaults to "cat".
dropout (float): Dropout probability. Defaults to 0.0.
rnn_type (str): lstm or gru. Defaults to "lstm".
packed_sequence (bool): Use packed sequences. Defaults to True.
max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
largest sequence length in this batch
attention (bool): Use attention mechanism. Defaults to False
num_heads (int): Number of attention heads. If 1 uses single headed attention
nystrom (bool): Use nystrom approximation for multihead attention
num_landmarks (int): Number of landmark sequence elements for nystrom attention
kernel_size (int): Kernel size for multihead attention output residual convolution
inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
in nystrom attention. 6 is a good value
return_hidden (bool): Return all hidden states. Defaults to False.
"""
super(AttentiveRNN, self).__init__()
self.rnn = RNN(
input_size, # type: ignore
hidden_size,
batch_first=batch_first,
layers=layers,
merge_bi=merge_bi,
bidirectional=bidirectional,
dropout=dropout,
rnn_type=rnn_type,
packed_sequence=packed_sequence,
max_length=max_length,
)
self.out_size = (
hidden_size
if not (bidirectional and merge_bi == "cat")
else 2 * hidden_size
)
self.batch_first = batch_first
self.return_hidden = return_hidden
self.attention = None
if attention:
if num_heads == 1:
self.attention = Attention(
attention_size=self.out_size, dropout=dropout
)
else:
self.attention = MultiheadAttention( # type: ignore
attention_size=self.out_size,
num_heads=num_heads,
kernel_size=kernel_size,
nystrom=nystrom,
num_landmarks=num_landmarks,
inverse_iterations=inverse_iterations,
dropout=dropout,
)
forward(self, x, lengths)
Attentive RNN forward pass
If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L] Input token ids |
required |
lengths |
Tensor |
[B] Original sequence lengths |
required |
Returns:
Type | Description |
---|---|
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] |
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]: if return_hidden == False: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification if return_hidden == True: Returns a tensor [B, H] or [B, 2H] of output features to be used for classification, and a tensor of all the hidden states |
Source code in slp/modules/rnn.py
def forward(
self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
"""Attentive RNN forward pass
If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
Else the output is the last hidden state of the RNN.
Args:
x (torch.Tensor): [B, L] Input token ids
lengths (torch.Tensor): [B] Original sequence lengths
Returns:
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
if return_hidden == False: Returns a tensor [B, H] or [B, 2*H] of output features to be used for classification
if return_hidden == True: Returns a tensor [B, H] or [B, 2*H] of output features to
be used for classification, and a tensor of all the hidden states
"""
states, last_hidden, _ = self.rnn(x, lengths)
out: torch.Tensor = last_hidden
if self.attention is not None:
states, _ = self.attention(
states,
attention_mask=pad_mask(
lengths,
max_length=states.size(1) if self.batch_first else states.size(0),
),
)
out = states.mean(dim=1)
if self.return_hidden:
return out, states
else:
return out
RNN
out_size: int
property
readonly
RNN output features size
Returns:
Type | Description |
---|---|
int |
int: RNN output features size |
__init__(self, input_size, hidden_size, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.0, rnn_type='lstm', packed_sequence=True, max_length=-1)
special
LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states
It is recommended to run with batch_first=True because the rest of the code is built with this assumption
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_size |
int |
Input features. |
required |
hidden_size |
int |
Hidden features. |
required |
batch_first |
bool |
Use batch first representation type. Defaults to True. |
True |
layers |
int |
Number of RNN layers. Defaults to 1. |
1 |
bidirectional |
bool |
Use bidirectional RNNs. Defaults to False. |
False |
merge_bi |
str |
How bidirectional states are merged. Defaults to "cat". |
'cat' |
dropout |
float |
Dropout probability. Defaults to 0.0. |
0.0 |
rnn_type |
str |
lstm or gru. Defaults to "lstm". |
'lstm' |
packed_sequence |
bool |
Use packed sequences. Defaults to True. |
True |
Source code in slp/modules/rnn.py
def __init__(
self,
input_size: int,
hidden_size: int,
batch_first: bool = True,
layers: int = 1,
bidirectional: bool = False,
merge_bi: str = "cat",
dropout: float = 0.0,
rnn_type: str = "lstm",
packed_sequence: bool = True,
max_length: int = -1,
):
"""LSTM - GRU wrapper with packed sequence support and handling for bidirectional / last output states
It is recommended to run with batch_first=True because the rest of the code is built with this assumption
Args:
input_size (int): Input features.
hidden_size (int): Hidden features.
batch_first (bool): Use batch first representation type. Defaults to True.
layers (int): Number of RNN layers. Defaults to 1.
bidirectional (bool): Use bidirectional RNNs. Defaults to False.
merge_bi (str): How bidirectional states are merged. Defaults to "cat".
dropout (float): Dropout probability. Defaults to 0.0.
rnn_type (str): lstm or gru. Defaults to "lstm".
packed_sequence (bool): Use packed sequences. Defaults to True.
"""
super(RNN, self).__init__()
self.bidirectional = bidirectional
self.hidden_size = hidden_size
self.batch_first = batch_first
self.merge_bi = merge_bi
self.rnn_type = rnn_type.lower()
if not batch_first:
logger.warning(
"You are running RNN with batch_first=False. Make sure this is really what you want"
)
if not packed_sequence:
logger.warning(
"You have set packed_sequence=False. Running with packed_sequence=True will be much faster"
)
rnn_cls = nn.LSTM if self.rnn_type == "lstm" else nn.GRU
self.rnn = rnn_cls(
input_size,
hidden_size,
batch_first=batch_first,
num_layers=layers,
bidirectional=bidirectional,
)
self.drop = nn.Dropout(dropout)
self.packed_sequence = packed_sequence
if packed_sequence:
self.pack = PackSequence(batch_first=batch_first)
self.unpack = PadPackedSequence(
batch_first=batch_first, max_length=max_length
)
forward(self, x, lengths)
RNN forward pass
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L, D] Input features |
required |
lengths |
Tensor |
[B] Original sequence lengths |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor, Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]] |
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: ( merged forward and backward states [B, L, H] or [B, L, 2H], merged last forward and backward state [B, H] or [B, 2H], hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU ) |
Source code in slp/modules/rnn.py
def forward(
self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[
torch.Tensor,
torch.Tensor,
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]],
]:
"""RNN forward pass
Args:
x (torch.Tensor): [B, L, D] Input features
lengths (torch.Tensor): [B] Original sequence lengths
Returns:
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: (
merged forward and backward states [B, L, H] or [B, L, 2*H],
merged last forward and backward state [B, H] or [B, 2*H],
hidden states tuple of [num_layers * num_directions, B, H] for LSTM or tensor [num_layers * num_directions, B, H] for GRU
)
"""
self.rnn.flatten_parameters()
if self.packed_sequence:
# Latest pytorch allows only cpu tensors for packed sequence
lengths = lengths.to("cpu")
x, lengths = self.pack(x, lengths)
out, hidden = self.rnn(x)
if self.packed_sequence:
out = self.unpack(out, lengths)
out = self.drop(out)
lengths = lengths.to(out.device)
out, last_timestep = self._final_output(out, lengths)
return out, last_timestep, hidden
TokenRNN
__init__(self, hidden_size=256, vocab_size=None, embeddings_dim=None, embeddings=None, embeddings_dropout=0.0, finetune_embeddings=False, batch_first=True, layers=1, bidirectional=False, merge_bi='cat', dropout=0.1, rnn_type='lstm', packed_sequence=True, attention=False, max_length=-1, num_heads=1, nystrom=True, num_landmarks=32, kernel_size=33, inverse_iterations=6, return_hidden=False)
special
RNN with embedding layer and optional attention mechanism
Single-headed scaled dot-product attention is used as an attention mechanism
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hidden_size |
int |
Hidden features |
256 |
vocab_size |
Optional[int] |
Vocabulary size. Defaults to None. |
None |
embeddings_dim |
Optional[int] |
Embedding dimension. Defaults to None. |
None |
embeddings |
Optional[numpy.ndarray] |
Embedding matrix. Defaults to None. |
None |
embeddings_dropout |
float |
Embedding dropout probability. Defaults to 0.0. |
0.0 |
finetune_embeddings |
bool |
Finetune embeddings? Defaults to False. |
False |
batch_first |
bool |
Use batch first representation type. Defaults to True. |
True |
layers |
int |
Number of RNN layers. Defaults to 1. |
1 |
bidirectional |
bool |
Use bidirectional RNNs. Defaults to False. |
False |
merge_bi |
str |
How bidirectional states are merged. Defaults to "cat". |
'cat' |
dropout |
float |
Dropout probability. Defaults to 0.0. |
0.1 |
rnn_type |
str |
lstm or gru. Defaults to "lstm". |
'lstm' |
packed_sequence |
bool |
Use packed sequences. Defaults to True. |
True |
max_length |
int |
Maximum sequence length for fixed length padding. If -1 takes the largest sequence length in this batch |
-1 |
attention |
bool |
Use attention mechanism. Defaults to False |
False |
num_heads |
int |
Number of attention heads. If 1 uses single headed attention |
1 |
nystrom |
bool |
Use nystrom approximation for multihead attention |
True |
num_landmarks |
int |
Number of landmark sequence elements for nystrom attention |
32 |
kernel_size |
Optional[int] |
Kernel size for multihead attention output residual convolution |
33 |
inverse_iterations |
int |
Number of iterations for moore-penrose inverse approximation in nystrom attention. 6 is a good value |
6 |
Source code in slp/modules/rnn.py
def __init__(
self,
hidden_size: int = 256,
vocab_size: Optional[int] = None,
embeddings_dim: Optional[int] = None,
embeddings: Optional[np.ndarray] = None,
embeddings_dropout: float = 0.0,
finetune_embeddings: bool = False,
batch_first: bool = True,
layers: int = 1,
bidirectional: bool = False,
merge_bi: str = "cat",
dropout: float = 0.1,
rnn_type: str = "lstm",
packed_sequence: bool = True,
attention: bool = False,
max_length: int = -1,
num_heads: int = 1,
nystrom: bool = True,
num_landmarks: int = 32,
kernel_size: Optional[int] = 33,
inverse_iterations: int = 6,
return_hidden=False,
):
"""RNN with embedding layer and optional attention mechanism
Single-headed scaled dot-product attention is used as an attention mechanism
Args:
hidden_size (int): Hidden features
vocab_size (Optional[int]): Vocabulary size. Defaults to None.
embeddings_dim (Optional[int]): Embedding dimension. Defaults to None.
embeddings (Optional[np.ndarray]): Embedding matrix. Defaults to None.
embeddings_dropout (float): Embedding dropout probability. Defaults to 0.0.
finetune_embeddings (bool): Finetune embeddings? Defaults to False.
batch_first (bool): Use batch first representation type. Defaults to True.
layers (int): Number of RNN layers. Defaults to 1.
bidirectional (bool): Use bidirectional RNNs. Defaults to False.
merge_bi (str): How bidirectional states are merged. Defaults to "cat".
dropout (float): Dropout probability. Defaults to 0.0.
rnn_type (str): lstm or gru. Defaults to "lstm".
packed_sequence (bool): Use packed sequences. Defaults to True.
max_length (int): Maximum sequence length for fixed length padding. If -1 takes the
largest sequence length in this batch
attention (bool): Use attention mechanism. Defaults to False
num_heads (int): Number of attention heads. If 1 uses single headed attention
nystrom (bool): Use nystrom approximation for multihead attention
num_landmarks (int): Number of landmark sequence elements for nystrom attention
kernel_size (int): Kernel size for multihead attention output residual convolution
inverse_iterations (int): Number of iterations for moore-penrose inverse approximation
in nystrom attention. 6 is a good value
"""
super(TokenRNN, self).__init__()
if embeddings is None:
finetune_embeddings = True
assert (
vocab_size is not None
), "You should either pass an embeddings matrix or vocab size"
assert (
embeddings_dim is not None
), "You should either pass an embeddings matrix or embeddings_dim"
else:
vocab_size = embeddings.shape[0]
embeddings_dim = embeddings.shape[1]
self.embed = Embed(
vocab_size, # type: ignore
embeddings_dim, # type: ignore
embeddings=embeddings,
dropout=embeddings_dropout,
scale=hidden_size ** 0.5,
trainable=finetune_embeddings,
)
self.encoder = AttentiveRNN(
embeddings_dim, # type: ignore
hidden_size,
batch_first=batch_first,
layers=layers,
bidirectional=bidirectional,
merge_bi=merge_bi,
dropout=dropout,
rnn_type=rnn_type,
packed_sequence=packed_sequence,
attention=attention,
max_length=max_length,
num_heads=num_heads,
nystrom=nystrom,
num_landmarks=num_landmarks,
kernel_size=kernel_size,
inverse_iterations=inverse_iterations,
return_hidden=return_hidden,
)
self.out_size = self.encoder.out_size
forward(self, x, lengths)
Token RNN forward pass
If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights Else the output is the last hidden state of the RNN.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
[B, L] Input token ids |
required |
lengths |
Tensor |
[B] Original sequence lengths |
required |
Returns:
Type | Description |
---|---|
Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] |
torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification |
Source code in slp/modules/rnn.py
def forward(
self, x: torch.Tensor, lengths: torch.Tensor
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
"""Token RNN forward pass
If self.attention=True then the outputs are the weighted sum of the RNN hidden states with the attention score weights
Else the output is the last hidden state of the RNN.
Args:
x (torch.Tensor): [B, L] Input token ids
lengths (torch.Tensor): [B] Original sequence lengths
Returns:
torch.Tensor: [B, H] or [B, 2*H] Output features to be used for classification
"""
x = self.embed(x)
out = self.encoder(x, lengths)
return out # type: ignore
Decoder
forward(self, target, encoded, source_mask=None, target_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, target, encoded, source_mask=None, target_mask=None):
for l in self.decoder:
target = l(
target, encoded, source_mask=source_mask, target_mask=target_mask
)
return target
DecoderLayer
forward(self, targets, encoded, source_mask=None, target_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, targets, encoded, source_mask=None, target_mask=None):
targets = self.in_layer(targets, attention_mask=target_mask)
out = self.fuse_layer(encoded, targets, attention_mask=source_mask)
out = self.out_layer(out)
return out
Encoder
forward(self, x, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
for layer in self.encoder:
x = layer(x, attention_mask=attention_mask)
return x
EncoderDecoder
forward(self, source, target, source_mask=None, target_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, source, target, source_mask=None, target_mask=None):
encoded = self.encoder(source, attention_mask=source_mask)
decoded = self.decoder(
target, encoded, source_mask=source_mask, target_mask=target_mask
)
return decoded
EncoderLayer
forward(self, x, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
out = self.l1(x, attention_mask=attention_mask)
out = self.l2(out)
return out
Sublayer1
forward(self, x, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
return (
self._prenorm(x, attention_mask=attention_mask)
if self.prenorm
else self._postnorm(x, attention_mask=attention_mask)
)
Sublayer2
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x):
return self._prenorm(x) if self.prenorm else self._postnorm(x)
Sublayer3
forward(self, x, y, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, y, attention_mask=None):
return (
self._prenorm(x, y, attention_mask=attention_mask)
if self.prenorm
else self._postnorm(x, y, attention_mask=attention_mask)
)
Transformer
forward(self, source, target, source_mask=None, target_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, source, target, source_mask=None, target_mask=None):
source = self.embed(source)
target = self.embed(target)
# Adding embeddings + pos embeddings
# is done in PositionalEncoding class
source = self.pe(source)
target = self.pe(target)
out = self.transformer_block(
source, target, source_mask=source_mask, target_mask=target_mask
)
out = self.drop(out)
out = self.predict(out)
return out
TransformerSequenceEncoder
forward(self, x, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
if self.feature_norm:
x = self.feature_norm(x)
x = self.embed(x)
x = self.pe(x)
out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)
return out
TransformerTokenSequenceEncoder
forward(self, x, attention_mask=None)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/modules/transformer.py
def forward(self, x, attention_mask=None):
x = self.embed(x)
x = self.pe(x)
out = self.transformer_block(x, attention_mask=attention_mask).mean(dim=1)
return out
reset_parameters(named_parameters, gain=1.0)
Initialize parameters in the transformer model.
Source code in slp/modules/transformer.py
def reset_parameters(named_parameters, gain=1.0):
"""Initialize parameters in the transformer model."""
for name, p in named_parameters:
if p.dim() > 1:
if "weight" in name:
nn.init.xavier_normal_(p, gain=gain)
if "bias" in name:
nn.init.constant_(p, 0.0)
PLDataModuleFromCorpus
embeddings: Optional[numpy.ndarray]
property
readonly
Embeddings matrix
Returns:
Type | Description |
---|---|
Optional[numpy.ndarray] |
Optional[np.ndarray]: Embeddings matrix |
vocab_size: int
property
readonly
Number of tokens in the vocabulary
Returns:
Type | Description |
---|---|
int |
int: Number of tokens in the vocabulary |
__init__(self, train, train_labels=None, val=None, val_labels=None, test=None, test_labels=None, val_percent=0.2, test_percent=0.2, batch_size=64, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, shuffle_eval=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, collate_fn=None, language_model=False, tokenizer='spacy', no_test_set=False, **corpus_args)
special
Wrap raw corpus in a LightningDataModule
- This handles the selection of the appropriate corpus class based on the tokenizer argument.
- If language_model=True it uses the appropriate dataset from slp.data.datasets.
- Uses the PLDataModuleFromDatasets to split the val and test sets if not provided
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train |
List |
Raw train corpus |
required |
train_labels |
Optional[List] |
Train labels. Defaults to None. |
None |
val |
Optional[List] |
Raw validation corpus. Defaults to None. |
None |
val_labels |
Optional[List] |
Validation labels. Defaults to None. |
None |
test |
Optional[List] |
Raw test corpus. Defaults to None. |
None |
test_labels |
Optional[List] |
Test labels. Defaults to None. |
None |
val_percent |
float |
Percent of train to be used for validation if no validation set is given. Defaults to 0.2. |
0.2 |
test_percent |
float |
Percent of train to be used for test set if no test set is given. Defaults to 0.2. |
0.2 |
batch_size |
int |
Training batch size. Defaults to 1. |
64 |
batch_size_eval |
int |
Validation and test batch size. Defaults to None. |
None |
seed |
int |
Seed for deterministic run. Defaults to None. |
None |
num_workers |
int |
Number of workers in the DataLoader. Defaults to 1. |
1 |
pin_memory |
bool |
Pin tensors to GPU memory. Defaults to True. |
True |
drop_last |
bool |
Drop last incomplete batch. Defaults to False. |
False |
sampler_train |
Sampler |
Sampler for train loader. Defaults to None. |
None |
sampler_val |
Sampler |
Sampler for validation loader. Defaults to None. |
None |
sampler_test |
Sampler |
Sampler for test loader. Defaults to None. |
None |
batch_sampler_train |
BatchSampler |
Batch sampler for train loader. Defaults to None. |
None |
batch_sampler_val |
BatchSampler |
Batch sampler for validation loader. Defaults to None. |
None |
batch_sampler_test |
BatchSampler |
Batch sampler for test loader. Defaults to None. |
None |
shuffle_eval |
bool |
Shuffle validation and test dataloaders. Defaults to False. |
False |
collate_fn |
Optional[Callable[..., Any]] |
Collator function. Defaults to None. |
None |
language_model |
bool |
Use corpus for Language Modeling. Defaults to False. |
False |
tokenizer |
str |
Select one of the cls.accepted_tokenizers. Defaults to "spacy". |
'spacy' |
no_test_set |
bool |
Do not create test set. Useful for tuning |
False |
**corpus_args |
kwargs |
Extra arguments to be passed to the corpus. See slp/data/corpus.py |
{} |
Exceptions:
Type | Description |
---|---|
ValueError |
[description] |
ValueError |
[description] |
Source code in slp/plbind/dm.py
def __init__(
self,
train: List,
train_labels: Optional[List] = None,
val: Optional[List] = None,
val_labels: Optional[List] = None,
test: Optional[List] = None,
test_labels: Optional[List] = None,
val_percent: float = 0.2,
test_percent: float = 0.2,
batch_size: int = 64,
batch_size_eval: int = None,
seed: int = None,
num_workers: int = 1,
pin_memory: bool = True,
drop_last: bool = False,
shuffle_eval: bool = False,
sampler_train: Sampler = None,
sampler_val: Sampler = None,
sampler_test: Sampler = None,
batch_sampler_train: BatchSampler = None,
batch_sampler_val: BatchSampler = None,
batch_sampler_test: BatchSampler = None,
collate_fn: Optional[Callable[..., Any]] = None,
language_model: bool = False,
tokenizer: str = "spacy",
no_test_set: bool = False,
**corpus_args,
):
"""Wrap raw corpus in a LightningDataModule
* This handles the selection of the appropriate corpus class based on the tokenizer argument.
* If language_model=True it uses the appropriate dataset from slp.data.datasets.
* Uses the PLDataModuleFromDatasets to split the val and test sets if not provided
Args:
train (List): Raw train corpus
train_labels (Optional[List]): Train labels. Defaults to None.
val (Optional[List]): Raw validation corpus. Defaults to None.
val_labels (Optional[List]): Validation labels. Defaults to None.
test (Optional[List]): Raw test corpus. Defaults to None.
test_labels (Optional[List]): Test labels. Defaults to None.
val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
batch_size (int): Training batch size. Defaults to 1.
batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
seed (Optional[int]): Seed for deterministic run. Defaults to None.
num_workers (int): Number of workers in the DataLoader. Defaults to 1.
pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
drop_last (bool): Drop last incomplete batch. Defaults to False.
sampler_train (Sampler): Sampler for train loader. Defaults to None.
sampler_val (Sampler): Sampler for validation loader. Defaults to None.
sampler_test (Sampler): Sampler for test loader. Defaults to None.
batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
collate_fn (Callable[..., Any]): Collator function. Defaults to None.
language_model (bool): Use corpus for Language Modeling. Defaults to False.
tokenizer (str): Select one of the cls.accepted_tokenizers. Defaults to "spacy".
no_test_set (bool): Do not create test set. Useful for tuning
**corpus_args (kwargs): Extra arguments to be passed to the corpus. See
slp/data/corpus.py
Raises:
ValueError: [description]
ValueError: [description]
"""
self.language_model = language_model
self.tokenizer = tokenizer
self.corpus_args = corpus_args
train_data, val_data, test_data = self._zip_corpus_and_labels(
train, val, test, train_labels, val_labels, test_labels
)
self.no_test_set = no_test_set
super(PLDataModuleFromCorpus, self).__init__(
train_data, # type: ignore
val=val_data, # type: ignore
test=test_data, # type: ignore
val_percent=val_percent,
test_percent=test_percent,
batch_size=batch_size,
batch_size_eval=batch_size_eval,
seed=seed,
num_workers=num_workers,
pin_memory=pin_memory,
drop_last=drop_last,
shuffle_eval=shuffle_eval,
sampler_train=sampler_train,
sampler_val=sampler_val,
sampler_test=sampler_test,
batch_sampler_train=batch_sampler_train,
batch_sampler_val=batch_sampler_val,
batch_sampler_test=batch_sampler_test,
collate_fn=collate_fn,
no_test_set=no_test_set,
)
add_argparse_args(parent_parser)
classmethod
Augment input parser with arguments for data loading and corpus processing
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parent_parser |
argparse.ArgumentParser |
Parser created by the user |
required |
Returns:
Type | Description |
---|---|
argparse.ArgumentParser |
Augmented parser |
Source code in slp/plbind/dm.py
@classmethod
def add_argparse_args(cls, parent_parser):
"""Augment input parser with arguments for data loading and corpus processing
Args:
parent_parser (argparse.ArgumentParser): Parser created by the user
Returns:
argparse.ArgumentParser: Augmented parser
"""
parser = super(PLDataModuleFromCorpus, cls).add_argparse_args(parent_parser)
parser.add_argument(
"--tokenizer",
dest="data.tokenizer",
type=str.lower,
# Corpus can already be tokenized, you can use spacy for word tokenization or any tokenizer from hugging face
choices=cls.accepted_tokenizers,
default="spacy",
help="Token type. The tokenization will happen at this level.",
)
# Only when tokenizer == spacy
parser.add_argument(
"--limit-vocab",
dest="data.limit_vocab_size",
type=int,
default=-1,
help="Limit vocab size. -1 means use the whole vocab. Applicable only when --tokenizer=spacy",
)
parser.add_argument(
"--embeddings-file",
dest="data.embeddings_file",
type=dir_path,
default=None,
help="Path to file with pretrained embeddings. Applicable only when --tokenizer=spacy",
)
parser.add_argument(
"--embeddings-dim",
dest="data.embeddings_dim",
type=int,
default=50,
help="Embedding dim of pretrained embeddings. Applicable only when --tokenizer=spacy",
)
parser.add_argument(
"--lang",
dest="data.lang",
type=str,
default="en_core_web_md",
help="Language for spacy tokenizer, e.g. en_core_web_md. Applicable only when --tokenizer=spacy",
)
parser.add_argument(
"--no-add-specials",
dest="data.add_special_tokens",
action="store_false",
help="Do not add special tokens for hugging face tokenizers",
)
# Generic args
parser.add_argument(
"--lower",
dest="data.lower",
action="store_true",
help="Convert to lowercase.",
)
parser.add_argument(
"--prepend-bos",
dest="data.prepend_bos",
action="store_true",
help="Prepend [BOS] token",
)
parser.add_argument(
"--append-eos",
dest="data.append_eos",
action="store_true",
help="Append [EOS] token",
)
parser.add_argument(
"--max-sentence-length",
dest="data.max_len",
type=int,
default=-1,
help="Maximum allowed sentence length. -1 means use the whole sentence",
)
return parser
PLDataModuleFromDatasets
__init__(self, train, val=None, test=None, val_percent=0.2, test_percent=0.2, batch_size=1, batch_size_eval=None, seed=None, num_workers=1, pin_memory=True, drop_last=False, sampler_train=None, sampler_val=None, sampler_test=None, batch_sampler_train=None, batch_sampler_val=None, batch_sampler_test=None, shuffle_eval=False, collate_fn=None, no_test_set=False)
special
LightningDataModule wrapper for generic torch.utils.data.Dataset
If val or test Datasets are not provided, this class will split val_pecent and test_percent of the train set respectively to create them
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train |
Dataset |
Train set |
required |
val |
Dataset |
Validation set. Defaults to None. |
None |
test |
Dataset |
Test set. Defaults to None. |
None |
val_percent |
float |
Percent of train to be used for validation if no validation set is given. Defaults to 0.2. |
0.2 |
test_percent |
float |
Percent of train to be used for test set if no test set is given. Defaults to 0.2. |
0.2 |
batch_size |
int |
Training batch size. Defaults to 1. |
1 |
batch_size_eval |
Optional[int] |
Validation and test batch size. Defaults to None. |
None |
seed |
Optional[int] |
Seed for deterministic run. Defaults to None. |
None |
num_workers |
int |
Number of workers in the DataLoader. Defaults to 1. |
1 |
pin_memory |
bool |
Pin tensors to GPU memory. Defaults to True. |
True |
drop_last |
bool |
Drop last incomplete batch. Defaults to False. |
False |
sampler_train |
Sampler |
Sampler for train loader. Defaults to None. |
None |
sampler_val |
Sampler |
Sampler for validation loader. Defaults to None. |
None |
sampler_test |
Sampler |
Sampler for test loader. Defaults to None. |
None |
batch_sampler_train |
BatchSampler |
Batch sampler for train loader. Defaults to None. |
None |
batch_sampler_val |
BatchSampler |
Batch sampler for validation loader. Defaults to None. |
None |
batch_sampler_test |
BatchSampler |
Batch sampler for test loader. Defaults to None. |
None |
shuffle_eval |
bool |
Shuffle validation and test dataloaders. Defaults to False. |
False |
collate_fn |
Optional[Callable[..., Any]] |
Collator function. Defaults to None. |
None |
no_test_set |
bool |
Do not create test set. Useful for tuning |
False |
Exceptions:
Type | Description |
---|---|
ValueError |
If both mutually exclusive sampler_train and batch_sampler_train are provided |
ValueError |
If both mutually exclusive sampler_val and batch_sampler_val are provided |
ValueError |
If both mutually exclusive sampler_test and batch_sampler_test are provided |
Source code in slp/plbind/dm.py
def __init__(
self,
train: Dataset,
val: Dataset = None,
test: Dataset = None,
val_percent: float = 0.2,
test_percent: float = 0.2,
batch_size: int = 1,
batch_size_eval: Optional[int] = None,
seed: Optional[int] = None,
num_workers: int = 1,
pin_memory: bool = True,
drop_last: bool = False,
sampler_train: Sampler = None,
sampler_val: Sampler = None,
sampler_test: Sampler = None,
batch_sampler_train: BatchSampler = None,
batch_sampler_val: BatchSampler = None,
batch_sampler_test: BatchSampler = None,
shuffle_eval: bool = False,
collate_fn: Optional[Callable[..., Any]] = None,
no_test_set: bool = False,
):
"""LightningDataModule wrapper for generic torch.utils.data.Dataset
If val or test Datasets are not provided, this class will split
val_pecent and test_percent of the train set respectively to create them
Args:
train (Dataset): Train set
val (Dataset): Validation set. Defaults to None.
test (Dataset): Test set. Defaults to None.
val_percent (float): Percent of train to be used for validation if no validation set is given. Defaults to 0.2.
test_percent (float): Percent of train to be used for test set if no test set is given. Defaults to 0.2.
batch_size (int): Training batch size. Defaults to 1.
batch_size_eval (Optional[int]): Validation and test batch size. Defaults to None.
seed (Optional[int]): Seed for deterministic run. Defaults to None.
num_workers (int): Number of workers in the DataLoader. Defaults to 1.
pin_memory (bool): Pin tensors to GPU memory. Defaults to True.
drop_last (bool): Drop last incomplete batch. Defaults to False.
sampler_train (Sampler): Sampler for train loader. Defaults to None.
sampler_val (Sampler): Sampler for validation loader. Defaults to None.
sampler_test (Sampler): Sampler for test loader. Defaults to None.
batch_sampler_train (BatchSampler): Batch sampler for train loader. Defaults to None.
batch_sampler_val (BatchSampler): Batch sampler for validation loader. Defaults to None.
batch_sampler_test (BatchSampler): Batch sampler for test loader. Defaults to None.
shuffle_eval (bool): Shuffle validation and test dataloaders. Defaults to False.
collate_fn (Callable[..., Any]): Collator function. Defaults to None.
no_test_set (bool): Do not create test set. Useful for tuning
Raises:
ValueError: If both mutually exclusive sampler_train and batch_sampler_train are provided
ValueError: If both mutually exclusive sampler_val and batch_sampler_val are provided
ValueError: If both mutually exclusive sampler_test and batch_sampler_test are provided
"""
super(PLDataModuleFromDatasets, self).__init__()
self.setup_has_run = False
if batch_sampler_train is not None and sampler_train is not None:
raise ValueError(
"You provided both a sampler and a batch sampler for the train set. These are mutually exclusive"
)
if batch_sampler_val is not None and sampler_val is not None:
raise ValueError(
"You provided both a sampler and a batch sampler for the validation set. These are mutually exclusive"
)
if batch_sampler_test is not None and sampler_test is not None:
raise ValueError(
"You provided both a sampler and a batch sampler for the test set. These are mutually exclusive"
)
self.val_percent = val_percent
self.test_percent = test_percent
self.sampler_train = sampler_train
self.sampler_val = sampler_val
self.sampler_test = sampler_test
self.batch_sampler_train = batch_sampler_train
self.batch_sampler_val = batch_sampler_val
self.batch_sampler_test = batch_sampler_test
self.num_workers = num_workers
self.pin_memory = pin_memory
self.drop_last = drop_last
self.shuffle_eval = shuffle_eval
self.collate_fn = collate_fn
self.batch_size = batch_size
self.seed = seed
if batch_size_eval is None:
batch_size_eval = self.batch_size
self.no_test_set = no_test_set
self.batch_size_eval = batch_size_eval
self.train = train
self.val = val
self.test = test
add_argparse_args(parent_parser)
classmethod
Augment input parser with arguments for data loading
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parent_parser |
ArgumentParser |
Parser created by the user |
required |
Returns:
Type | Description |
---|---|
ArgumentParser |
argparse.ArgumentParser: Augmented parser |
Source code in slp/plbind/dm.py
@classmethod
def add_argparse_args(
cls, parent_parser: argparse.ArgumentParser
) -> argparse.ArgumentParser:
"""Augment input parser with arguments for data loading
Args:
parent_parser (argparse.ArgumentParser): Parser created by the user
Returns:
argparse.ArgumentParser: Augmented parser
"""
parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
parser.add_argument(
"--val-percent",
dest="data.val_percent",
type=float,
default=0.2,
help="Percent of validation data to be randomly split from the training set, if no validation set is provided",
)
parser.add_argument(
"--test-percent",
dest="data.test_percent",
type=float,
default=0.2,
help="Percent of test data to be randomly split from the training set, if no test set is provided",
)
parser.add_argument(
"--bsz",
dest="data.batch_size",
type=int,
default=32,
help="Training batch size",
)
parser.add_argument(
"--bsz-eval",
dest="data.batch_size_eval",
type=int,
default=32,
help="Evaluation batch size",
)
parser.add_argument(
"--num-workers",
dest="data.num_workers",
type=int,
default=1,
help="Number of workers to be used in the DataLoader",
)
parser.add_argument(
"--no-pin-memory",
dest="data.pin_memory",
action="store_false",
help="Don't pin data to GPU memory when transferring",
)
parser.add_argument(
"--drop-last",
dest="data.drop_last",
action="store_true",
help="Drop last incomplete batch",
)
parser.add_argument(
"--no-shuffle-eval",
dest="data.shuffle_eval",
action="store_false",
help="Don't shuffle val & test sets",
)
return parser
prepare_data(self)
Use this to download and prepare data.
.. warning:: DO NOT set state to the model (use setup
instead)
since this is NOT called on every GPU in DDP/TPU
Example::
def prepare_data(self):
# good
download_data()
tokenize()
etc()
# bad
self.split = data_split
self.some_state = some_other_state()
In DDP prepare_data can be called in two ways (using Trainer(prepare_data_per_node)):
- Once per node. This is the default and is only called on LOCAL_RANK=0.
- Once in total. Only called on GLOBAL_RANK=0.
Example::
# DEFAULT
# called once per node on LOCAL_RANK=0 of that node
Trainer(prepare_data_per_node=True)
# call on GLOBAL_RANK=0 (great for shared file systems)
Trainer(prepare_data_per_node=False)
This is called before requesting the dataloaders:
.. code-block:: python
model.prepare_data()
if ddp/tpu: init()
model.setup(stage)
model.train_dataloader()
model.val_dataloader()
model.test_dataloader()
Source code in slp/plbind/dm.py
def prepare_data(self):
return None
test_dataloader(self)
Configure test DataLoader
Returns:
Type | Description |
---|---|
DataLoader |
Pytorch DataLoader for test set |
Source code in slp/plbind/dm.py
def test_dataloader(self):
"""Configure test DataLoader
Returns:
DataLoader: Pytorch DataLoader for test set
"""
return DataLoader(
self.test,
batch_size=self.batch_size_eval if self.batch_sampler_test is None else 1,
num_workers=self.num_workers,
pin_memory=self.pin_memory,
drop_last=self.drop_last and (self.batch_sampler_test is None),
sampler=self.sampler_test,
batch_sampler=self.batch_sampler_test,
shuffle=(
self.shuffle_eval
and (self.batch_sampler_test is None)
and (self.sampler_test is None)
),
collate_fn=self.collate_fn,
)
train_dataloader(self)
Configure train DataLoader
Returns:
Type | Description |
---|---|
DataLoader |
DataLoader: Pytorch DataLoader for train set |
Source code in slp/plbind/dm.py
def train_dataloader(self) -> DataLoader:
"""Configure train DataLoader
Returns:
DataLoader: Pytorch DataLoader for train set
"""
return DataLoader(
self.train,
batch_size=self.batch_size if self.batch_sampler_train is None else 1,
num_workers=self.num_workers,
pin_memory=self.pin_memory,
drop_last=self.drop_last and (self.batch_sampler_train is None),
sampler=self.sampler_train,
batch_sampler=self.batch_sampler_train,
shuffle=(self.batch_sampler_train is None) and (self.sampler_train is None),
collate_fn=self.collate_fn,
)
val_dataloader(self)
Configure validation DataLoader
Returns:
Type | Description |
---|---|
DataLoader |
Pytorch DataLoader for validation set |
Source code in slp/plbind/dm.py
def val_dataloader(self):
"""Configure validation DataLoader
Returns:
DataLoader: Pytorch DataLoader for validation set
"""
val = DataLoader(
self.val,
batch_size=self.batch_size_eval if self.batch_sampler_val is None else 1,
num_workers=self.num_workers,
pin_memory=self.pin_memory,
drop_last=self.drop_last and (self.batch_sampler_val is None),
sampler=self.sampler_val,
batch_sampler=self.batch_sampler_val,
shuffle=(
self.shuffle_eval
and (self.batch_sampler_val is None)
and (self.sampler_val is None)
),
collate_fn=self.collate_fn,
)
return val
split_data(dataset, test_size, seed)
Train-test split of dataset.
Dataset can be either a torch.utils.data.Dataset or a list
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset |
Union[Dataset, List] |
Input dataset |
required |
test_size |
float |
Size of the test set. Defaults to 0.2. |
required |
seed |
int |
Optional seed for deterministic run. Defaults to None. |
required |
Returns:
Type | Description |
---|---|
Tuple[Union[Dataset, List], Union[Dataset, List] |
(train set, test set) |
Source code in slp/plbind/dm.py
def split_data(dataset, test_size, seed):
"""Train-test split of dataset.
Dataset can be either a torch.utils.data.Dataset or a list
Args:
dataset (Union[Dataset, List]): Input dataset
test_size (float): Size of the test set. Defaults to 0.2.
seed (int): Optional seed for deterministic run. Defaults to None.
Returns:
Tuple[Union[Dataset, List], Union[Dataset, List]: (train set, test set)
"""
train, test = None, None
if isinstance(dataset, torch.utils.data.Dataset):
test_len = int(test_size * len(dataset))
train_len = len(dataset) - test_len
seed_generator = None
if seed is not None:
seed_generator = torch.Generator().manual_seed(seed)
train, test = random_split(
dataset, [train_len, test_len], generator=seed_generator
)
else:
train, test = train_test_split(dataset, test_size=test_size, random_state=seed)
return train, test
FixedWandbLogger
__init__(self, name=None, save_dir=None, offline=False, id=None, anonymous=False, version=None, project=None, log_model=False, experiment=None, prefix='', sync_step=True, checkpoint_dir=None, **kwargs)
special
Wandb logger fix to save checkpoints in wandb
Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
Optional[str] |
Display name for the run. Defaults to None. |
None |
save_dir |
Optional[str] |
Path where data is saved. Defaults to None. |
None |
offline |
Optional[bool] |
Run offline (data can be streamed later to wandb servers). Defaults to False. |
False |
id |
Optional[str] |
Sets the version, mainly used to resume a previous run. Defaults to None. |
None |
anonymous |
Optional[bool] |
Enables or explicitly disables anonymous logging. Defaults to False. |
False |
version |
Optional[str] |
Sets the version, mainly used to resume a previous run. Defaults to None. |
None |
project |
Optional[str] |
The name of the project to which this run will belong. Defaults to None. |
None |
log_model |
Optional[bool] |
Save checkpoints in wandb dir to upload on W&B servers. Defaults to False. |
False |
experiment |
Run |
WandB experiment object. Defaults to None. |
None |
prefix |
Optional[str] |
A string to put at the beginning of metric keys. Defaults to "". |
'' |
sync_step |
Optional[bool] |
Sync Trainer step with wandb step. Defaults to True. |
True |
checkpoint_dir |
Optional[str] |
Real checkpoint dir. Defaults to None. |
None |
Source code in slp/plbind/helpers.py
def __init__(
self,
name: Optional[str] = None,
save_dir: Optional[str] = None,
offline: Optional[bool] = False,
id: Optional[str] = None,
anonymous: Optional[bool] = False,
version: Optional[str] = None,
project: Optional[str] = None,
log_model: Optional[bool] = False,
experiment: wandb.sdk.wandb_run.Run = None,
prefix: Optional[str] = "",
sync_step: Optional[bool] = True,
checkpoint_dir: Optional[str] = None,
**kwargs,
):
"""Wandb logger fix to save checkpoints in wandb
Accepts an additional checkpoint_dir argument, pointing to the real checkpoint directory
Args:
name (Optional[str]): Display name for the run. Defaults to None.
save_dir (Optional[str]): Path where data is saved. Defaults to None.
offline (Optional[bool]): Run offline (data can be streamed later to wandb servers). Defaults to False.
id (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
anonymous (Optional[bool]): Enables or explicitly disables anonymous logging. Defaults to False.
version (Optional[str]): Sets the version, mainly used to resume a previous run. Defaults to None.
project (Optional[str]): The name of the project to which this run will belong. Defaults to None.
log_model (Optional[bool]): Save checkpoints in wandb dir to upload on W&B servers. Defaults to False.
experiment ([type]): WandB experiment object. Defaults to None.
prefix (Optional[str]): A string to put at the beginning of metric keys. Defaults to "".
sync_step (Optional[bool]): Sync Trainer step with wandb step. Defaults to True.
checkpoint_dir (Optional[str]): Real checkpoint dir. Defaults to None.
"""
self._checkpoint_dir = checkpoint_dir
super(FixedWandbLogger, self).__init__(
name=name,
save_dir=save_dir,
offline=offline,
id=id,
anonymous=anonymous,
version=version,
project=project,
log_model=log_model,
experiment=experiment,
prefix=prefix,
sync_step=sync_step,
**kwargs,
)
finalize(self, status)
Determine where checkpoints are saved and upload to wandb servers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
status |
str |
Experiment status |
required |
Source code in slp/plbind/helpers.py
@rank_zero_only
def finalize(self, status: str) -> None:
"""Determine where checkpoints are saved and upload to wandb servers
Args:
status (str): Experiment status
"""
# offset future training logged on same W&B run
if self._experiment is not None:
self._step_offset = self._experiment.step
checkpoint_dir = (
self._checkpoint_dir if self._checkpoint_dir is not None else self.save_dir
)
if checkpoint_dir is None:
logger.warning(
"Invalid checkpoint dir. Checkpoints will not be uploaded to Wandb."
)
logger.info(
"You can manually upload your checkpoints through the CLI interface."
)
else:
# upload all checkpoints from saving dir
if self._log_model:
wandb.save(os.path.join(checkpoint_dir, "*.ckpt"))
FromLogits
__init__(self, metric)
special
Wrap pytorch lighting metric to accept logits input
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metric |
Metric |
The metric to wrap, e.g. pl.metrics.Accuracy |
required |
Source code in slp/plbind/helpers.py
def __init__(self, metric: pl.metrics.Metric):
"""Wrap pytorch lighting metric to accept logits input
Args:
metric (pl.metrics.Metric): The metric to wrap, e.g. pl.metrics.Accuracy
"""
super(FromLogits, self).__init__(
compute_on_step=metric.compute_on_step,
dist_sync_on_step=metric.dist_sync_on_step,
process_group=metric.process_group,
dist_sync_fn=metric.dist_sync_fn,
)
self.metric = metric
compute(self)
Compute metric
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: metric value |
Source code in slp/plbind/helpers.py
def compute(self) -> torch.Tensor:
"""Compute metric
Returns:
torch.Tensor: metric value
"""
return self.metric.compute() # type: ignore
update(self, preds, target)
Update underlying metric
Calculate softmax under the hood and pass probs to the underlying metric
Parameters:
Name | Type | Description | Default |
---|---|---|---|
preds |
Tensor |
[B, *, num_classes] Logits |
required |
target |
Tensor |
[B, *] Ground truths |
required |
Source code in slp/plbind/helpers.py
def update(self, preds: torch.Tensor, target: torch.Tensor) -> None: # type: ignore
"""Update underlying metric
Calculate softmax under the hood and pass probs to the underlying metric
Args:
preds (torch.Tensor): [B, *, num_classes] Logits
target (torch.Tensor): [B, *] Ground truths
"""
preds = F.softmax(preds, dim=-1)
self.metric.update(preds, target) # type: ignore
AutoEncoderPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(AutoEncoderPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_AutoEncoder,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
BertPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(BertPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_BertSequenceClassification,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
MultimodalTransformerClassificationPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(MultimodalTransformerClassificationPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_MultimodalTransformerClassification,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
PLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(PLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_Classification,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
RnnPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(RnnPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_RnnClassification,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
SimplePLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, predictor_cls=<class 'slp.plbind.module._Classification'>, calculate_perplexity=False)
special
Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule
Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step with use of the predictor helper classes (e.g. _Classification, _RnnClassification)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Module |
Module to use for prediction |
required |
optimizer |
Union[torch.optim.optimizer.Optimizer, List[torch.optim.optimizer.Optimizer]] |
Optimizers to use for training |
required |
criterion |
Union[torch.nn.modules.module.Module, Callable] |
Task loss |
required |
lr_scheduler |
Union[torch.optim.lr_scheduler._LRScheduler, List[torch.optim.lr_scheduler._LRScheduler]] |
Learning rate scheduler. Defaults to None. |
None |
hparams |
Union[omegaconf.dictconfig.DictConfig, Dict[str, Any], argparse.Namespace] |
Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None. |
None |
metrics |
Optional[Dict[str, pytorch_lightning.metrics.metric.Metric]] |
Metrics to track. Defaults to None. |
None |
predictor_cls |
[type] |
Class that defines a parse_batch and a get_predictions_and_targets method. Defaults to _Classification. |
<class 'slp.plbind.module._Classification'> |
calculate_perplexity |
bool |
Whether to calculate perplexity. Would be cleaner as a metric, but this is more efficient. Defaults to False. |
False |
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
predictor_cls=_Classification,
calculate_perplexity: bool = False, # for LM. Dirty but much more efficient
):
"""Wraps a (model, optimizer, criterion, lr_scheduler) tuple in a LightningModule
Handles the boilerplate for metrics calculation and logging and defines the train_step / val_step / test_step
with use of the predictor helper classes (e.g. _Classification, _RnnClassification)
Args:
model (nn.Module): Module to use for prediction
optimizer (Union[Optimizer, List[Optimizer]]): Optimizers to use for training
criterion (LossType): Task loss
lr_scheduler (Union[_LRScheduler, List[_LRScheduler]], optional): Learning rate scheduler. Defaults to None.
hparams (Configuration, optional): Hyperparameter values. This ensures they are logged with trainer.loggers. Defaults to None.
metrics (Optional[Dict[str, pl.metrics.Metric]], optional): Metrics to track. Defaults to None.
predictor_cls ([type], optional): Class that defines a parse_batch and a
get_predictions_and_targets method. Defaults to _Classification.
calculate_perplexity (bool, optional): Whether to calculate perplexity.
Would be cleaner as a metric, but this is more efficient. Defaults to False.
"""
super(SimplePLModule, self).__init__()
self.calculate_perplexity = calculate_perplexity
self.model = model
self.optimizer = optimizer
self.lr_scheduler = lr_scheduler
self.criterion = criterion
if metrics is not None:
self.train_metrics = nn.ModuleDict(metrics)
self.val_metrics = nn.ModuleDict({k: v.clone() for k, v in metrics.items()})
self.test_metrics = nn.ModuleDict(
{k: v.clone() for k, v in metrics.items()}
)
else:
self.train_metrics = nn.ModuleDict(modules=None)
self.val_metrics = nn.ModuleDict(modules=None)
self.test_metrics = nn.ModuleDict(modules=None)
self.predictor = predictor_cls()
if hparams is not None:
if isinstance(hparams, Namespace):
dict_params = vars(hparams)
elif isinstance(hparams, DictConfig):
dict_params = cast(Dict[str, Any], OmegaConf.to_container(hparams))
else:
dict_params = hparams
# self.hparams = dict_params
self.save_hyperparameters(dict_params)
aggregate_epoch_metrics(self, outputs, mode='Training')
Aggregate metrics over a whole epoch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Dict[str, torch.Tensor]] |
Aggregated outputs from train_step, validation_step or test_step |
required |
mode |
str |
"Training", "Validation" or "Testing". Defaults to "Training". |
'Training' |
Source code in slp/plbind/module.py
def aggregate_epoch_metrics(self, outputs, mode="Training"):
"""Aggregate metrics over a whole epoch
Args:
outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step, validation_step or test_step
mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
"""
def fmt(name):
"""Format metric name"""
return f"{name}" if name != "loss" else "train_loss"
keys = list(outputs[0].keys())
aggregated = {fmt(k): torch.stack([x[k] for x in outputs]).mean() for k in keys}
aggregated["epoch"] = self.current_epoch + 1
self.log_dict(aggregated, logger=True, prog_bar=False, on_epoch=True)
return aggregated
configure_optimizers(self)
Return optimizers and learning rate schedulers
Returns:
Type | Description |
---|---|
Tuple[List[Optimizer], List[_LRScheduler]] |
(optimizers, lr_schedulers) |
Source code in slp/plbind/module.py
def configure_optimizers(self):
"""Return optimizers and learning rate schedulers
Returns:
Tuple[List[Optimizer], List[_LRScheduler]]: (optimizers, lr_schedulers)
"""
if self.lr_scheduler is not None:
scheduler = {
"scheduler": self.lr_scheduler,
"interval": "epoch",
"monitor": "val_loss",
}
return [self.optimizer], [scheduler]
return self.optimizer
forward(self, *args, **kwargs)
Call wrapped module forward
Source code in slp/plbind/module.py
def forward(self, *args, **kwargs):
"""Call wrapped module forward"""
return self.model(*args, **kwargs)
log_to_console(self, metrics, mode='Training')
Log metrics to console
Parameters:
Name | Type | Description | Default |
---|---|---|---|
metrics |
Dict[str, torch.Tensor] |
Computed metrics |
required |
mode |
str |
"Training", "Validation" or "Testing". Defaults to "Training". |
'Training' |
Source code in slp/plbind/module.py
def log_to_console(self, metrics, mode="Training"):
"""Log metrics to console
Args:
metrics (Dict[str, torch.Tensor]): Computed metrics
mode (str, optional): "Training", "Validation" or "Testing". Defaults to "Training".
"""
logger.info("Epoch {} {} results".format(self.current_epoch + 1, mode))
print_separator(symbol="-", n=50, print_fn=logger.info)
for name, value in metrics.items():
if name == "epoch":
continue
logger.info("{:<15} {:<15}".format(name, value))
print_separator(symbol="%", n=50, print_fn=logger.info)
test_epoch_end(self, outputs)
Aggregate metrics of a test epoch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Dict[str, torch.Tensor]] |
Aggregated outputs from test_step |
required |
Source code in slp/plbind/module.py
def test_epoch_end(self, outputs):
"""Aggregate metrics of a test epoch
Args:
outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from test_step
"""
outputs = self.aggregate_epoch_metrics(outputs, mode="Test")
self.log_to_console(outputs, mode="Test")
test_step(self, batch, batch_idx)
Compute loss for a single test step and log metrics to loggers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tuple[torch.Tensor, ...] |
Input batch |
required |
batch_idx |
int |
Index of batch |
required |
Returns:
Type | Description |
---|---|
Dict[str, torch.Tensor] |
computed metrics |
Source code in slp/plbind/module.py
def test_step(self, batch, batch_idx):
"""Compute loss for a single test step and log metrics to loggers
Args:
batch (Tuple[torch.Tensor, ...]): Input batch
batch_idx (int): Index of batch
Returns:
Dict[str, torch.Tensor]: computed metrics
"""
y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
loss = self.criterion(y_hat, targets)
metrics = self._compute_metrics(
self.test_metrics, loss, y_hat, targets, mode="test"
)
return metrics
training_epoch_end(self, outputs)
Aggregate metrics of a training epoch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Dict[str, torch.Tensor]] |
Aggregated outputs from train_step |
required |
Source code in slp/plbind/module.py
def training_epoch_end(self, outputs):
"""Aggregate metrics of a training epoch
Args:
outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from train_step
"""
outputs = self.aggregate_epoch_metrics(outputs, mode="Training")
self.log_to_console(outputs, mode="Training")
training_step(self, batch, batch_idx)
Compute loss for a single training step and log metrics to loggers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tuple[torch.Tensor, ...] |
Input batch |
required |
batch_idx |
int |
Index of batch |
required |
Returns:
Type | Description |
---|---|
Dict[str, torch.Tensor] |
computed metrics |
Source code in slp/plbind/module.py
def training_step(self, batch, batch_idx):
"""Compute loss for a single training step and log metrics to loggers
Args:
batch (Tuple[torch.Tensor, ...]): Input batch
batch_idx (int): Index of batch
Returns:
Dict[str, torch.Tensor]: computed metrics
"""
y_hat, targets = self.predictor.get_predictions_and_targets(self.model, batch)
loss = self.criterion(y_hat, targets)
metrics = self._compute_metrics(
self.train_metrics, loss, y_hat, targets, mode="train"
)
self.log_dict(
metrics,
on_step=True,
on_epoch=False,
logger=True,
prog_bar=False,
)
metrics["loss"] = loss
return metrics
validation_epoch_end(self, outputs)
Aggregate metrics of a validation epoch
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs |
List[Dict[str, torch.Tensor]] |
Aggregated outputs from validation_step |
required |
Source code in slp/plbind/module.py
def validation_epoch_end(self, outputs):
"""Aggregate metrics of a validation epoch
Args:
outputs (List[Dict[str, torch.Tensor]]): Aggregated outputs from validation_step
"""
outputs = self.aggregate_epoch_metrics(outputs, mode="Validation")
if torch.isnan(outputs["val_loss"]) or torch.isinf(outputs["val_loss"]):
outputs["val_loss"] = 1000000
outputs["best_score"] = min(
outputs[self.trainer.early_stopping_callback.monitor].detach().cpu(),
self.trainer.early_stopping_callback.best_score.detach().cpu(),
)
self.log_to_console(outputs, mode="Validation")
validation_step(self, batch, batch_idx)
Compute loss for a single validation step and log metrics to loggers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Tuple[torch.Tensor, ...] |
Input batch |
required |
batch_idx |
int |
Index of batch |
required |
Returns:
Type | Description |
---|---|
Dict[str, torch.Tensor] |
computed metrics |
Source code in slp/plbind/module.py
def validation_step(self, batch, batch_idx):
"""Compute loss for a single validation step and log metrics to loggers
Args:
batch (Tuple[torch.Tensor, ...]): Input batch
batch_idx (int): Index of batch
Returns:
Dict[str, torch.Tensor]: computed metrics
"""
y_hat, targets = self.predictor.get_predictions_and_targets(self, batch)
loss = self.criterion(y_hat, targets)
metrics = self._compute_metrics(
self.val_metrics, loss, y_hat, targets, mode="val"
)
metrics[
"best_score"
] = self.trainer.early_stopping_callback.best_score.detach().cpu()
return metrics
TransformerClassificationPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(TransformerClassificationPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_TransformerClassification,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
TransformerPLModule
__init__(self, model, optimizer, criterion, lr_scheduler=None, hparams=None, metrics=None, calculate_perplexity=False)
special
Pass arguments through to base class
Source code in slp/plbind/module.py
def __init__(
self,
model: nn.Module,
optimizer: Union[Optimizer, List[Optimizer]],
criterion: LossType,
lr_scheduler: Union[_LRScheduler, List[_LRScheduler]] = None,
hparams: Configuration = None,
metrics: Optional[Dict[str, pl.metrics.Metric]] = None,
calculate_perplexity=False,
):
"""Pass arguments through to base class"""
super(TransformerPLModule, self).__init__(
model,
optimizer,
criterion,
predictor_cls=_Transformer,
lr_scheduler=lr_scheduler,
hparams=hparams,
metrics=metrics,
calculate_perplexity=calculate_perplexity,
)
add_optimizer_args(parent_parser)
Augment parser with optimizer arguments
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parent_parser |
ArgumentParser |
Parser created by the user |
required |
Returns:
Type | Description |
---|---|
ArgumentParser |
argparse.ArgumentParser: Augmented parser |
Source code in slp/plbind/trainer.py
def add_optimizer_args(
parent_parser: argparse.ArgumentParser,
) -> argparse.ArgumentParser:
"""Augment parser with optimizer arguments
Args:
parent_parser (argparse.ArgumentParser): Parser created by the user
Returns:
argparse.ArgumentParser: Augmented parser
"""
parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
parser.add_argument(
"--optimizer",
dest="optimizer",
type=str,
choices=[
"Adam",
"AdamW",
"SGD",
"Adadelta",
"Adagrad",
"Adamax",
"ASGD",
"RMSprop",
],
default="Adam",
help="Which optimizer to use",
)
parser.add_argument(
"--lr",
dest="optim.lr",
type=float,
default=1e-3,
help="Learning rate",
)
parser.add_argument(
"--weight-decay",
dest="optim.weight_decay",
type=float,
default=0,
help="Learning rate",
)
parser.add_argument(
"--lr-scheduler",
dest="lr_scheduler",
action="store_true",
# type=str,
# choices=["ReduceLROnPlateau"],
help="Use learning rate scheduling. Currently only ReduceLROnPlateau is supported out of the box",
)
parser.add_argument(
"--lr-factor",
dest="lr_schedule.factor",
type=float,
default=0.1,
help="Multiplicative factor by which LR is reduced. Used if --lr-scheduler is provided.",
)
parser.add_argument(
"--lr-patience",
dest="lr_schedule.patience",
type=int,
default=10,
help="Number of epochs with no improvement after which learning rate will be reduced. Used if --lr-scheduler is provided.",
)
parser.add_argument(
"--lr-cooldown",
dest="lr_schedule.cooldown",
type=int,
default=0,
help="Number of epochs to wait before resuming normal operation after lr has been reduced. Used if --lr-scheduler is provided.",
)
parser.add_argument(
"--min-lr",
dest="lr_schedule.min_lr",
type=float,
default=0,
help="Minimum lr for LR scheduling. Used if --lr-scheduler is provided.",
)
return parser
add_trainer_args(parent_parser)
Augment parser with trainer arguments
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parent_parser |
ArgumentParser |
Parser created by the user |
required |
Returns:
Type | Description |
---|---|
ArgumentParser |
argparse.ArgumentParser: Augmented parser |
Source code in slp/plbind/trainer.py
def add_trainer_args(parent_parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
"""Augment parser with trainer arguments
Args:
parent_parser (argparse.ArgumentParser): Parser created by the user
Returns:
argparse.ArgumentParser: Augmented parser
"""
parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False)
parser.add_argument(
"--seed",
dest="seed",
type=int,
default=None,
help="Seed for reproducibility",
)
parser.add_argument(
"--config",
dest="config",
type=str, # dir_path,
default=None,
help="Path to YAML configuration file",
)
parser.add_argument(
"--experiment-name",
dest="trainer.experiment_name",
type=str,
default="experiment",
help="Name of the running experiment",
)
parser.add_argument(
"--run-id",
dest="trainer.run_id",
type=str,
default=None,
help="Unique identifier for the current run. If not provided it is inferred from datetime.now()",
)
parser.add_argument(
"--experiment-group",
dest="trainer.experiment_group",
type=str,
default=None,
help="Group of current experiment. Useful when evaluating for different seeds / cross-validation etc.",
)
parser.add_argument(
"--experiments-folder",
dest="trainer.experiments_folder",
type=str,
default="experiments",
help="Top-level folder where experiment results & checkpoints are saved",
)
parser.add_argument(
"--save-top-k",
dest="trainer.save_top_k",
type=int,
default=3,
help="Save checkpoints for top k models",
)
parser.add_argument(
"--patience",
dest="trainer.patience",
type=int,
default=3,
help="Number of epochs to wait before early stopping",
)
parser.add_argument(
"--wandb-project",
dest="trainer.wandb_project",
type=str,
default=None,
help="Wandb project under which results are saved",
)
parser.add_argument(
"--tags",
dest="trainer.tags",
type=str,
nargs="*",
default=[],
help="Tags for current run to make results searchable.",
)
parser.add_argument(
"--stochastic_weight_avg",
dest="trainer.stochastic_weight_avg",
action="store_true",
help="Use Stochastic weight averaging.",
)
parser.add_argument(
"--gpus", dest="trainer.gpus", type=int, default=0, help="Number of GPUs to use"
)
parser.add_argument(
"--val-interval",
dest="trainer.check_val_every_n_epoch",
type=int,
default=1,
help="Run validation every n epochs",
)
parser.add_argument(
"--clip-grad-norm",
dest="trainer.gradient_clip_val",
type=float,
default=0,
help="Clip gradients with ||grad(w)|| >= args.clip_grad_norm",
)
parser.add_argument(
"--epochs",
dest="trainer.max_epochs",
type=int,
default=100,
help="Maximum number of training epochs",
)
parser.add_argument(
"--num-nodes",
dest="trainer.num_nodes",
type=int,
default=1,
help="Number of nodes to run",
)
parser.add_argument(
"--steps",
dest="trainer.max_steps",
type=int,
default=None,
help="Maximum number of training steps",
)
parser.add_argument(
"--tbtt_steps",
dest="trainer.truncated_bptt_steps",
type=int,
default=None,
help="Truncated Back-propagation-through-time steps.",
)
parser.add_argument(
"--debug",
dest="debug",
action="store_true",
help="If true, we run a full run on a small subset of the input data and overfit 10 training batches",
)
parser.add_argument(
"--offline",
dest="trainer.force_wandb_offline",
action="store_true",
help="If true, forces offline execution of wandb logger",
)
parser.add_argument(
"--early-stop-on",
dest="trainer.early_stop_on",
type=str,
default="val_loss",
help="Metric for early stopping",
)
parser.add_argument(
"--early-stop-mode",
dest="trainer.early_stop_mode",
type=str,
choices=["min", "max"],
default="min",
help="Minimize or maximize early stopping metric",
)
return parser
make_trainer(experiment_name='experiment', experiment_description=None, run_id=None, experiment_group=None, experiments_folder='experiments', save_top_k=3, patience=3, wandb_project=None, wandb_user=None, force_wandb_offline=False, tags=None, stochastic_weight_avg=False, auto_scale_batch_size=False, gpus=0, check_val_every_n_epoch=1, gradient_clip_val=0, precision=32, num_nodes=1, max_epochs=100, max_steps=None, truncated_bptt_steps=None, fast_dev_run=None, overfit_batches=None, terminate_on_nan=False, profiler='simple', early_stop_on='val_loss', early_stop_mode='min')
Configure trainer with preferred defaults
- Experiment folder and run_id configured (based on datetime.now())
- Wandb and CSV loggers run by default
- Wandb configured to save code and checkpoints
- Wandb configured in online mode except if no internet connection is available
- Early stopping on best validation loss is configured by default
- Checkpointing on best validation loss is configured by default *
Parameters:
Name | Type | Description | Default |
---|---|---|---|
experiment_name |
str |
Experiment name. Defaults to "experiment". |
'experiment' |
experiment_description |
Optional[str] |
Detailed description of the experiment. Defaults to None. |
None |
run_id |
Optional[str] |
Unique run_id. Defaults to datetime.now(). Defaults to None. |
None |
experiment_group |
Optional[str] |
Group experiments over multiple runs. Defaults to None. |
None |
experiments_folder |
str |
Folder to save outputs. Defaults to "experiments". |
'experiments' |
save_top_k |
int |
Save top k checkpoints. Defaults to 3. |
3 |
patience |
int |
Patience for early stopping. Defaults to 3. |
3 |
wandb_project |
Optional[str] |
Wandb project to save the experiment. Defaults to None. |
None |
wandb_user |
Optional[str] |
Wandb username. Defaults to None. |
None |
force_wandb_offline |
bool |
Force offline execution of wandb |
False |
tags |
Optional[Sequence] |
Additional tags to attach to the experiment. Defaults to None. |
None |
stochastic_weight_avg |
bool |
Use stochastic weight averaging. Defaults to False. |
False |
auto_scale_batch_size |
bool |
Find optimal batch size for the available resources when running trainer.tune(). Defaults to False. |
False |
gpus |
int |
number of GPUs to use. Defaults to 0. |
0 |
check_val_every_n_epoch |
int |
Run validation every n epochs. Defaults to 1. |
1 |
gradient_clip_val |
float |
Clip gradient norm value. Defaults to 0 (no clipping). |
0 |
precision |
int |
Floating point precision. Defaults to 32. |
32 |
num_nodes |
int |
Number of nodes to run on |
1 |
max_epochs |
Optional[int] |
Maximum number of epochs for training. Defaults to 100. |
100 |
max_steps |
Optional[int] |
Maximum number of steps for training. Defaults to None. |
None |
truncated_bptt_steps |
Optional[int] |
Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None. |
None |
fast_dev_run |
Optional[int] |
Run training on a small number of batches for debugging. Defaults to None. |
None |
overfit_batches |
Optional[int] |
Try to overfit a small number of batches for debugging. Defaults to None. |
None |
terminate_on_nan |
bool |
Terminate on NaN gradients. Warning this makes training slow. Defaults to False. |
False |
profiler |
Union[pytorch_lightning.profiler.profilers.BaseProfiler, bool, str] |
Use profiler to track execution times of each function |
'simple' |
early_stop_on |
str |
metric for early stopping |
'val_loss' |
early_stop_mode |
str |
"min" or "max" |
'min' |
Returns:
Type | Description |
---|---|
Trainer |
pl.Trainer: Configured trainer |
Source code in slp/plbind/trainer.py
def make_trainer(
experiment_name: str = "experiment",
experiment_description: Optional[str] = None,
run_id: Optional[str] = None,
experiment_group: Optional[str] = None,
experiments_folder: str = "experiments",
save_top_k: int = 3,
patience: int = 3,
wandb_project: Optional[str] = None,
wandb_user: Optional[str] = None,
force_wandb_offline: bool = False,
tags: Optional[Sequence] = None,
stochastic_weight_avg: bool = False,
auto_scale_batch_size: bool = False,
gpus: int = 0,
check_val_every_n_epoch: int = 1,
gradient_clip_val: float = 0,
precision: int = 32,
num_nodes: int = 1,
max_epochs: Optional[int] = 100,
max_steps: Optional[int] = None,
truncated_bptt_steps: Optional[int] = None,
fast_dev_run: Optional[int] = None,
overfit_batches: Optional[int] = None,
terminate_on_nan: bool = False, # Be careful this makes training very slow for large models
profiler: Optional[Union[pl.profiler.BaseProfiler, bool, str]] = "simple",
early_stop_on: str = "val_loss",
early_stop_mode: str = "min",
) -> pl.Trainer:
"""Configure trainer with preferred defaults
* Experiment folder and run_id configured (based on datetime.now())
* Wandb and CSV loggers run by default
* Wandb configured to save code and checkpoints
* Wandb configured in online mode except if no internet connection is available
* Early stopping on best validation loss is configured by default
* Checkpointing on best validation loss is configured by default
*
Args:
experiment_name (str, optional): Experiment name. Defaults to "experiment".
experiment_description (Optional[str], optional): Detailed description of the experiment. Defaults to None.
run_id (Optional[str], optional): Unique run_id. Defaults to datetime.now(). Defaults to None.
experiment_group (Optional[str], optional): Group experiments over multiple runs. Defaults to None.
experiments_folder (str, optional): Folder to save outputs. Defaults to "experiments".
save_top_k (int, optional): Save top k checkpoints. Defaults to 3.
patience (int, optional): Patience for early stopping. Defaults to 3.
wandb_project (Optional[str], optional): Wandb project to save the experiment. Defaults to None.
wandb_user (Optional[str], optional): Wandb username. Defaults to None.
force_wandb_offline (bool): Force offline execution of wandb
tags (Optional[Sequence], optional): Additional tags to attach to the experiment. Defaults to None.
stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
auto_scale_batch_size (bool, optional): Find optimal batch size for the available resources when running
trainer.tune(). Defaults to False.
gpus (int, optional): number of GPUs to use. Defaults to 0.
check_val_every_n_epoch (int, optional): Run validation every n epochs. Defaults to 1.
gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
precision (int, optional): Floating point precision. Defaults to 32.
num_nodes (int): Number of nodes to run on
max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
sequence. Defaults to None.
fast_dev_run (Optional[int], optional): Run training on a small number of batches for debugging. Defaults to None.
overfit_batches (Optional[int], optional): Try to overfit a small number of batches for debugging. Defaults to None.
terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
profiler (Optional[Union[pl.profiler.BaseProfiler, bool, str]]): Use profiler to track execution times of each function
early_stop_on (str): metric for early stopping
early_stop_mode (str): "min" or "max"
Returns:
pl.Trainer: Configured trainer
"""
if overfit_batches is not None:
trainer = pl.Trainer(overfit_batches=overfit_batches, gpus=gpus)
return trainer
if fast_dev_run is not None:
trainer = pl.Trainer(fast_dev_run=fast_dev_run, gpus=gpus)
return trainer
logging_dir = os.path.join(experiments_folder, experiment_name)
safe_mkdirs(logging_dir)
run_id = run_id if run_id is not None else date_fname()
if run_id in os.listdir(logging_dir):
logger.warning(
"The run id you provided {run_id} already exists in {logging_dir}"
)
run_id = date_fname()
logger.info("Setting run_id={run_id}")
checkpoint_dir = os.path.join(logging_dir, run_id, "checkpoints")
logger.info(f"Logs will be saved in {logging_dir}")
logger.info(f"Logs will be saved in {checkpoint_dir}")
if wandb_project is None:
wandb_project = experiment_name
connected = has_internet_connection()
offline_run = force_wandb_offline or not connected
loggers = [
pl.loggers.CSVLogger(logging_dir, name="csv_logs", version=run_id),
FixedWandbLogger( # type: ignore
name=experiment_name,
project=wandb_project,
anonymous=False,
save_dir=logging_dir,
version=run_id,
save_code=True,
checkpoint_dir=checkpoint_dir,
offline=offline_run,
log_model=not offline_run,
entity=wandb_user,
group=experiment_group,
notes=experiment_description,
tags=tags,
),
]
if gpus > 1:
del loggers[
1
] # https://github.com/PyTorchLightning/pytorch-lightning/issues/6106
logger.info("Configured wandb and CSV loggers.")
logger.info(
f"Wandb configured to run {experiment_name}/{run_id} in project {wandb_project}"
)
if connected:
logger.info("Results will be stored online.")
else:
logger.info("Results will be stored offline due to bad internet connection.")
logger.info(
f"If you want to upload your results later run\n\t wandb sync {logging_dir}/wandb/run-{run_id}"
)
if experiment_description is not None:
logger.info(
f"Experiment verbose description:\n{experiment_description}\n\nTags:{'n/a' if tags is None else tags}"
)
callbacks = [
EarlyStoppingWithLogs(
monitor=early_stop_on,
mode=early_stop_mode,
patience=patience,
verbose=True,
),
pl.callbacks.ModelCheckpoint(
dirpath=checkpoint_dir,
filename="{epoch}-{val_loss:.2f}",
monitor=early_stop_on,
save_top_k=save_top_k,
mode=early_stop_mode,
),
pl.callbacks.LearningRateMonitor(logging_interval="step"),
]
logger.info("Configured Early stopping and Model checkpointing to track val_loss")
trainer = pl.Trainer(
default_root_dir=logging_dir,
gpus=gpus,
max_epochs=max_epochs,
max_steps=max_steps,
callbacks=callbacks,
logger=loggers,
check_val_every_n_epoch=check_val_every_n_epoch,
gradient_clip_val=gradient_clip_val,
auto_scale_batch_size=auto_scale_batch_size,
stochastic_weight_avg=stochastic_weight_avg,
precision=precision,
truncated_bptt_steps=truncated_bptt_steps,
terminate_on_nan=terminate_on_nan,
progress_bar_refresh_rate=10,
profiler=profiler,
num_nodes=num_nodes,
)
return trainer
make_trainer_for_ray_tune(patience=3, stochastic_weight_avg=False, gpus=0, gradient_clip_val=0, precision=32, max_epochs=100, max_steps=None, truncated_bptt_steps=None, terminate_on_nan=False, early_stop_on='val_loss', early_stop_mode='min', metrics_map=None, **extra_kwargs)
Configure trainer with preferred defaults
- Early stopping on best validation loss is configured by default
- Ray tune callback configured
Parameters:
Name | Type | Description | Default |
---|---|---|---|
patience |
int |
Patience for early stopping. Defaults to 3. |
3 |
stochastic_weight_avg |
bool |
Use stochastic weight averaging. Defaults to False. |
False |
gpus |
int |
number of GPUs to use. Defaults to 0. |
0 |
gradient_clip_val |
float |
Clip gradient norm value. Defaults to 0 (no clipping). |
0 |
precision |
int |
Floating point precision. Defaults to 32. |
32 |
max_epochs |
Optional[int] |
Maximum number of epochs for training. Defaults to 100. |
100 |
max_steps |
Optional[int] |
Maximum number of steps for training. Defaults to None. |
None |
truncated_bptt_steps |
Optional[int] |
Truncated back prop breaks performs backprop every k steps of much longer sequence. Defaults to None. |
None |
terminate_on_nan |
bool |
Terminate on NaN gradients. Warning this makes training slow. Defaults to False. |
False |
early_stop_on |
str |
metric for early stopping |
'val_loss' |
early_stop_mode |
str |
"min" or "max" |
'min' |
metrics_map |
Optional[Dict[str, str]] |
The mapping from pytorch lightning logged metrics to ray tune metrics. The --tune-metric argument should be one of the keys of this mapping |
None |
extra_kwargs |
kwargs |
Ignored. We use it so that we are able to pass the same config object as in make_trainer |
{} |
Returns:
Type | Description |
---|---|
Trainer |
pl.Trainer: Configured trainer |
Source code in slp/plbind/trainer.py
def make_trainer_for_ray_tune(
patience: int = 3,
stochastic_weight_avg: bool = False,
gpus: int = 0,
gradient_clip_val: float = 0,
precision: int = 32,
max_epochs: Optional[int] = 100,
max_steps: Optional[int] = None,
truncated_bptt_steps: Optional[int] = None,
terminate_on_nan: bool = False, # Be careful this makes training very slow for large models
early_stop_on: str = "val_loss",
early_stop_mode: str = "min",
metrics_map: Optional[Dict[str, str]] = None,
**extra_kwargs,
) -> pl.Trainer:
"""Configure trainer with preferred defaults
* Early stopping on best validation loss is configured by default
* Ray tune callback configured
Args:
patience (int, optional): Patience for early stopping. Defaults to 3.
stochastic_weight_avg (bool, optional): Use stochastic weight averaging. Defaults to False.
gpus (int, optional): number of GPUs to use. Defaults to 0.
gradient_clip_val (float, optional): Clip gradient norm value. Defaults to 0 (no clipping).
precision (int, optional): Floating point precision. Defaults to 32.
max_epochs (Optional[int], optional): Maximum number of epochs for training. Defaults to 100.
max_steps (Optional[int], optional): Maximum number of steps for training. Defaults to None.
truncated_bptt_steps (Optional[int], optional): Truncated back prop breaks performs backprop every k steps of much longer
sequence. Defaults to None.
terminate_on_nan (bool, optional): Terminate on NaN gradients. Warning this makes training slow. Defaults to False.
early_stop_on (str): metric for early stopping
early_stop_mode (str): "min" or "max"
metrics_map (Optional[Dict[str, str]]): The mapping from pytorch lightning logged metrics
to ray tune metrics. The --tune-metric argument should be one of the keys of this
mapping
extra_kwargs (kwargs): Ignored. We use it so that we are able to pass the same config
object as in make_trainer
Returns:
pl.Trainer: Configured trainer
"""
if metrics_map is None:
raise ValueError("Need to pass metrics for TuneReportCallback")
callbacks = [
EarlyStoppingWithLogs(
monitor=early_stop_on,
mode=early_stop_mode,
patience=patience,
verbose=True,
),
TuneReportCallback(metrics_map, on="validation_end"),
pl.callbacks.LearningRateMonitor(logging_interval="step"),
]
logger.info("Configured Early stopping to track val_loss")
trainer = pl.Trainer(
gpus=gpus,
max_epochs=max_epochs,
max_steps=max_steps,
callbacks=callbacks,
logger=[],
check_val_every_n_epoch=1,
gradient_clip_val=gradient_clip_val,
stochastic_weight_avg=stochastic_weight_avg,
precision=precision,
truncated_bptt_steps=truncated_bptt_steps,
terminate_on_nan=terminate_on_nan,
progress_bar_refresh_rate=0,
num_sanity_val_steps=0,
auto_scale_batch_size=False,
)
return trainer
watch_model(trainer, model)
If wandb logger is configured track gradient and weight norms
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trainer |
Trainer |
Trainer |
required |
model |
Module |
Module to watch |
required |
Source code in slp/plbind/trainer.py
def watch_model(trainer: pl.Trainer, model: nn.Module) -> None:
"""If wandb logger is configured track gradient and weight norms
Args:
trainer (pl.Trainer): Trainer
model (nn.Module): Module to watch
"""
if trainer.num_gpus > 1:
return
if isinstance(trainer.logger.experiment, list):
for log in trainer.logger.experiment:
try:
log.watch(model, log="all")
logger.info("Tracking model weights & gradients in wandb.")
break
except:
pass
else:
try:
trainer.logger.experiment.watch(model, log="all")
logger.info("Tracking model weights & gradients in wandb.")
except:
pass
configure_logging(logfile_prefix=None)
configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile
We use logure for stdout/stderr logging in this project. This function configures loguru to intercept logs from other modules that use the default python logging module. It also configures loguru so that it plays well with writes in the tqdm progress bars If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using logfile_prefix and datetime.now()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
logfile_prefix |
Optional[str] |
Optional prefix to file where logs will be written. |
None |
Returns:
Type | Description |
---|---|
Optional[str] |
str: The logfile where logs are written |
Examples:
>>> configure_logging("logs/my-cool-experiment)
logs/my-cool-experiment.20210228-211832.log
Source code in slp/util/log.py
def configure_logging(logfile_prefix: Optional[str] = None) -> Optional[str]:
"""configure_logging Configure loguru to intercept logging module logs, tqdm.writes and write to a logfile
We use logure for stdout/stderr logging in this project.
This function configures loguru to intercept logs from other modules that use the default python logging module.
It also configures loguru so that it plays well with writes in the tqdm progress bars
If a logfile_prefix is provided, loguru will also write all logs into a logfile with a unique name constructed using
logfile_prefix and datetime.now()
Args:
logfile_prefix (Optional[str]): Optional prefix to file where logs will be written.
Returns:
str: The logfile where logs are written
Examples:
>>> configure_logging("logs/my-cool-experiment)
logs/my-cool-experiment.20210228-211832.log
"""
class InterceptHandler(logging.Handler):
def emit(self, record):
"""Intercept standard logging logs in loguru. Should test this for distributed pytorch lightning"""
# Get corresponding Loguru level if it exists
try:
level = logger.level(record.levelname).name
except ValueError:
level = record.levelno
# Find caller from where originated the logged message
frame, depth = logging.currentframe(), 2
while frame.f_code.co_filename == logging.__file__:
frame = frame.f_back
depth += 1
logger.opt(depth=depth, exception=record.exc_info).log(
level, record.getMessage()
)
logger.info("Intercepting standard logging logs in loguru")
# Make loguru play well with tqdm
logger.remove()
def tqdm_write(msg: str) -> Any:
"""Loguru wrapper for tqdm.write"""
return tqdm.write(msg, end="")
logger.add(tqdm_write, colorize=True)
logging.basicConfig(handlers=[InterceptHandler()], level=logging.INFO)
logfile = None
if logfile_prefix is not None:
logfile = log_to_file(logfile_prefix)
logger.info(f"Log file will be saved in {logfile}")
return logfile
log_to_file(fname_prefix)
log_to_file Configure loguru to log to a logfile
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fname_prefix |
Optional[str] |
Optional prefix to file where logs will be written. |
required |
Returns:
Type | Description |
---|---|
str |
str: The logfile where logs are written |
Source code in slp/util/log.py
def log_to_file(fname_prefix: Optional[str]) -> str:
"""log_to_file Configure loguru to log to a logfile
Args:
fname_prefix (Optional[str]): Optional prefix to file where logs will be written.
Returns:
str: The logfile where logs are written
"""
logfile = f"{fname_prefix}.{date_fname()}.log"
logger.add(
logfile,
colorize=False,
level="DEBUG",
enqueue=True,
)
return logfile
NoOp
forward(self, x)
Defines the computation performed at every call.
Should be overridden by all subclasses.
.. note::
Although the recipe for forward pass needs to be defined within
this function, one should call the :class:Module
instance afterwards
instead of this since the former takes care of running the
registered hooks while the latter silently ignores them.
Source code in slp/util/pytorch.py
def forward(self, x):
return x
PackSequence
__init__(self, batch_first=True)
special
Wrap sequence packing in nn.Module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_first |
bool |
Use batch first representation. Defaults to True. |
True |
Source code in slp/util/pytorch.py
def __init__(self, batch_first: bool = True):
"""Wrap sequence packing in nn.Module
Args:
batch_first (bool, optional): Use batch first representation. Defaults to True.
"""
super(PackSequence, self).__init__()
self.batch_first = batch_first
forward(self, x, lengths)
Pack a padded sequence and sort lengths
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
Tensor |
Padded tensor |
required |
lengths |
Tensor |
Original lengths befor padding |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor] |
Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths) |
Source code in slp/util/pytorch.py
def forward(
self, x: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]:
"""Pack a padded sequence and sort lengths
Args:
x (torch.Tensor): Padded tensor
lengths (torch.Tensor): Original lengths befor padding
Returns:
Tuple[torch.nn.utils.rnn.PackedSequence, torch.Tensor]: (packed sequence, sorted lengths)
"""
out: torch.nn.utils.rnn.PackedSequence = pack_padded_sequence(
x, lengths, batch_first=self.batch_first, enforce_sorted=False
)
lengths = lengths[out.sorted_indices]
return out, lengths
PadPackedSequence
__init__(self, batch_first=True, max_length=-1)
special
Wrap sequence padding in nn.Module
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_first |
bool |
Use batch first representation. Defaults to True. |
True |
Source code in slp/util/pytorch.py
def __init__(self, batch_first: bool = True, max_length: int = -1):
"""Wrap sequence padding in nn.Module
Args:
batch_first (bool, optional): Use batch first representation. Defaults to True.
"""
super(PadPackedSequence, self).__init__()
self.batch_first = batch_first
self.max_length = max_length if max_length > 0 else None
forward(self, x, lengths)
Convert packed sequence to padded sequence
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
PackedSequence |
Packed sequence |
required |
lengths |
Tensor |
Sorted original sequence lengths |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: Padded sequence |
Source code in slp/util/pytorch.py
def forward(
self, x: torch.nn.utils.rnn.PackedSequence, lengths: torch.Tensor
) -> torch.Tensor:
"""Convert packed sequence to padded sequence
Args:
x (torch.nn.utils.rnn.PackedSequence): Packed sequence
lengths (torch.Tensor): Sorted original sequence lengths
Returns:
torch.Tensor: Padded sequence
"""
out, _ = pad_packed_sequence(
x, batch_first=self.batch_first, total_length=self.max_length # type: ignore
)
return out # type: ignore
from_checkpoint(checkpoint_file, obj, map_location='cpu', dataparallel=False)
Load model or optimizer from saved state_dict
Parameters:
Name | Type | Description | Default |
---|---|---|---|
checkpoint_file |
Optional[str] |
File containing the state dict |
required |
obj |
Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer] |
Module or optimizer instance to load the checkpoint |
required |
map_location |
Union[torch.device, str] |
Where to load. Defaults to "cpu". |
'cpu' |
dataparallel |
bool |
If data parallel remove leading "module." from statedict keys. Defaults to False. |
False |
Returns:
Type | Description |
---|---|
Union[torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer] |
types.ModuleOrOptimizer: Loaded module or optimizer |
Source code in slp/util/pytorch.py
def from_checkpoint(
checkpoint_file: Optional[str],
obj: types.ModuleOrOptimizer,
map_location: Optional[types.Device] = "cpu",
dataparallel: bool = False,
) -> types.ModuleOrOptimizer:
"""Load model or optimizer from saved state_dict
Args:
checkpoint_file (Optional[str]): File containing the state dict
obj (types.ModuleOrOptimizer): Module or optimizer instance to load the checkpoint
map_location (Optional[types.Device], optional): Where to load. Defaults to "cpu".
dataparallel (bool, optional): If data parallel remove leading "module." from statedict keys. Defaults to False.
Returns:
types.ModuleOrOptimizer: Loaded module or optimizer
"""
if checkpoint_file is None:
return obj
if not system.is_file(checkpoint_file):
logger.warning(
f"The checkpoint {checkpoint_file} you are trying to load "
"does not exist. Continuing without loading..."
)
return obj
state_dict = torch.load(checkpoint_file, map_location=map_location)
if dataparallel:
state_dict = {k.replace("module.", ""): v for k, v in state_dict.items()}
obj.load_state_dict(state_dict)
return obj
mktensor(data, dtype=torch.float32, device='cpu', requires_grad=False, copy_tensor=True)
Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This can copy data or make the operation in place.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[numpy.ndarray, torch.Tensor, List[~T]] |
(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor. |
required |
dtype |
dtype |
(torch.dtype): The type of the tensor elements (Default value = torch.float) |
torch.float32 |
device |
Union[torch.device, str] |
(torch.device, str): Device where the tensor should be (Default value = 'cpu') |
'cpu' |
requires_grad |
bool |
(bool): Trainable tensor or not? (Default value = False) |
False |
copy_tensor |
bool |
(bool): If false creates the tensor inplace else makes a copy (Default value = True) |
True |
Returns:
Type | Description |
---|---|
Tensor |
(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data |
Source code in slp/util/pytorch.py
def mktensor(
data: types.NdTensor,
dtype: torch.dtype = torch.float,
device: types.Device = "cpu",
requires_grad: bool = False,
copy_tensor: bool = True,
) -> torch.Tensor:
"""Convert a list or numpy array to torch tensor. If a torch tensor
is passed it is cast to dtype, device and the requires_grad flag is
set. This can copy data or make the operation in place.
Args:
data: (list, np.ndarray, torch.Tensor): Data to be converted to
torch tensor.
dtype: (torch.dtype): The type of the tensor elements
(Default value = torch.float)
device: (torch.device, str): Device where the tensor should be
(Default value = 'cpu')
requires_grad: (bool): Trainable tensor or not? (Default value = False)
copy_tensor: (bool): If false creates the tensor inplace else makes a copy
(Default value = True)
Returns:
(torch.Tensor): A tensor of appropriate dtype, device and
requires_grad containing data
"""
tensor_factory = t if copy_tensor else t_
return tensor_factory(data, dtype=dtype, device=device, requires_grad=requires_grad)
moore_penrose_pinv(x, num_iter=6)
Calculate approximate Moore-Penrose pseudoinverse, via iterative method
- Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
- Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
torch.Tensor |
(*, M, M) The square tensors to inverse. Dimension * can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M) |
required |
num_iter |
int |
Number of iterations to run for approximation (6 is good enough usually) |
6 |
Returns:
Type | Description |
---|---|
(torch.Tensor) |
(B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat |
Source code in slp/util/pytorch.py
def moore_penrose_pinv(x, num_iter=6):
"""Calculate approximate Moore-Penrose pseudoinverse, via iterative method
* Method is described in (Razavi et al 2014) https://www.hindawi.com/journals/aaa/2014/563787/
* Implementation modified from lucidrains https://github.com/lucidrains/nystrom-attention/blob/main/nystrom_attention/nystrom_attention.py#L13
Args:
x (torch.Tensor): (*, M, M) The square tensors to inverse.
Dimension * can be any number of additional dimensions, e.g. (batch_size, num_heads, M, M)
num_iter (int): Number of iterations to run for approximation (6 is good enough usually)
Returns:
(torch.Tensor): (B, H, N, N) The approximate Moore-Penrose pseudoinverse of mat
"""
abs_x = torch.abs(x)
col = abs_x.sum(dim=-1)
row = abs_x.sum(dim=-2)
z = x.transpose(-1, -2).contiguous()
z = z / (torch.max(col) * torch.max(row))
I = torch.eye(x.shape[-1], device=x.device).unsqueeze(0)
for _ in range(num_iter):
xz = x @ z
z = 0.25 * z @ (13 * I - (xz @ (15 * I - (xz @ (7 * I - xz)))))
return z
pad_mask(lengths, max_length=None)
Generate mask for padded tokens
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lengths |
Tensor |
Original sequence lengths before padding |
required |
max_length |
Union[torch.Tensor, int] |
Maximum sequence length. Defaults to None. |
None |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: padding mask |
Source code in slp/util/pytorch.py
def pad_mask(
lengths: torch.Tensor, max_length: Optional[Union[torch.Tensor, int]] = None
) -> torch.Tensor:
"""Generate mask for padded tokens
Args:
lengths (torch.Tensor): Original sequence lengths before padding
max_length (Optional[Union[torch.Tensor, int]], optional): Maximum sequence length. Defaults to None.
Returns:
torch.Tensor: padding mask
"""
if max_length is None or max_length < 0:
max_length = cast(int, torch.max(lengths).item())
max_length = cast(int, max_length)
idx = torch.arange(0, max_length, device=lengths.device).unsqueeze(0)
mask: torch.Tensor = (idx < lengths.unsqueeze(1)).float()
return mask
pad_sequence(sequences, batch_first=False, padding_value=0.0, max_length=-1)
Pad a list of variable length Tensors with padding_value
pad_sequence
stacks a list of Tensors along a new dimension,
and pads them to equal length. For example, if the input is list of
sequences with size L x *
and if batch_first is False, and T x B x *
otherwise.
B
is batch size. It is equal to the number of elements in sequences
.
T
is length of the longest sequence.
L
is length of the sequence.
*
is any number of trailing dimensions, including none.
Examples:
>>> from torch.nn.utils.rnn import pad_sequence
>>> a = torch.ones(25, 300)
>>> b = torch.ones(22, 300)
>>> c = torch.ones(15, 300)
>>> pad_sequence([a, b, c]).size()
torch.Size([25, 3, 300])
!!! note
This function returns a Tensor of size T x B x *
or B x T x *
where T
is the length of the longest sequence. This function assumes
trailing dimensions and type of all the Tensors in sequences are same.
Note:
This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
max_length argument for fixed length padding
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequences |
List[torch.Tensor] |
list of variable length sequences. |
required |
batch_first |
bool |
output will be in |
False |
padding_value |
Union[float, int] |
value for padded elements. Default: 0. |
0.0 |
max_length |
int |
If max length is > 0 then this function will pad to a fixed maximum length. If any sequence is longer than max_length, it will be trimmed. |
-1 |
Returns:
Type | Description |
---|---|
Tensor of size ``T x B x *`` if |
attr: |
Source code in slp/util/pytorch.py
def pad_sequence(
sequences: List[torch.Tensor],
batch_first: bool = False,
padding_value: Union[float, int] = 0.0,
max_length: int = -1,
):
r"""Pad a list of variable length Tensors with ``padding_value``
``pad_sequence`` stacks a list of Tensors along a new dimension,
and pads them to equal length. For example, if the input is list of
sequences with size ``L x *`` and if batch_first is False, and ``T x B x *``
otherwise.
`B` is batch size. It is equal to the number of elements in ``sequences``.
`T` is length of the longest sequence.
`L` is length of the sequence.
`*` is any number of trailing dimensions, including none.
Example:
>>> from torch.nn.utils.rnn import pad_sequence
>>> a = torch.ones(25, 300)
>>> b = torch.ones(22, 300)
>>> c = torch.ones(15, 300)
>>> pad_sequence([a, b, c]).size()
torch.Size([25, 3, 300])
Note:
This function returns a Tensor of size ``T x B x *`` or ``B x T x *``
where `T` is the length of the longest sequence. This function assumes
trailing dimensions and type of all the Tensors in sequences are same.
Note:
This implementation is modified from torch.nn.utils.rnn.pad_sequence, to accept a
max_length argument for fixed length padding
Args:
sequences (list[Tensor]): list of variable length sequences.
batch_first (bool, optional): output will be in ``B x T x *`` if True, or in
``T x B x *`` otherwise
padding_value (float, optional): value for padded elements. Default: 0.
max_length (int): If max length is > 0 then this function will pad to a fixed maximum
length. If any sequence is longer than max_length, it will be trimmed.
Returns:
Tensor of size ``T x B x *`` if :attr:`batch_first` is ``False``.
Tensor of size ``B x T x *`` otherwise
"""
# assuming trailing dimensions and type of all the Tensors
# in sequences are same and fetching those from sequences[0]
max_size = sequences[0].size()
trailing_dims = max_size[1:]
if max_length < 0:
max_len = max([s.size(0) for s in sequences])
else:
max_len = max_length
if batch_first:
out_dims = (len(sequences), max_len) + trailing_dims
else:
out_dims = (max_len, len(sequences)) + trailing_dims
out_tensor = sequences[0].new_full(out_dims, padding_value)
for i, tensor in enumerate(sequences):
length = tensor.size(0)
# use index notation to prevent duplicate references to the tensor
if batch_first:
out_tensor[i, : min(length, max_len), ...] = tensor[
: min(length, max_len), ...
]
else:
out_tensor[: min(length, max_len), i, ...] = tensor[
: min(length, max_len), ...
]
return out_tensor
repeat_layer(l, times)
Clone a layer multiple times
Parameters:
Name | Type | Description | Default |
---|---|---|---|
l |
Module |
nn.Module to stack |
required |
times |
int |
Times to clone |
required |
Returns:
Type | Description |
---|---|
List[torch.nn.modules.module.Module] |
List[nn.Module]: List of identical clones of input layer |
Source code in slp/util/pytorch.py
def repeat_layer(l: nn.Module, times: int) -> List[nn.Module]:
"""Clone a layer multiple times
Args:
l (nn.Module): nn.Module to stack
times (int): Times to clone
Returns:
List[nn.Module]: List of identical clones of input layer
"""
return [l] + [copy.deepcopy(l) for _ in range(times - 1)]
rotate_tensor(l, n=1)
Roate tensor by n positions to the right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
l |
Tensor |
input tensor |
required |
n |
int |
positions to rotate. Defaults to 1. |
1 |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: rotated tensor |
Source code in slp/util/pytorch.py
def rotate_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
"""Roate tensor by n positions to the right
Args:
l (torch.Tensor): input tensor
n (int, optional): positions to rotate. Defaults to 1.
Returns:
torch.Tensor: rotated tensor
"""
return torch.cat((l[n:], l[:n]))
shift_tensor(l, n=1)
Shift tensor by n positions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
l |
Tensor |
input tensor |
required |
n |
int |
positions to shift. Defaults to 1. |
1 |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: shifted tensor |
Source code in slp/util/pytorch.py
def shift_tensor(l: torch.Tensor, n: int = 1) -> torch.Tensor:
"""Shift tensor by n positions
Args:
l (torch.Tensor): input tensor
n (int, optional): positions to shift. Defaults to 1.
Returns:
torch.Tensor: shifted tensor
"""
out = rotate_tensor(l, n=n)
out[-n:] = 0
return out
sort_sequences(inputs, lengths)
Sort sequences according to lengths (descending)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Tensor |
input sequences, size [B, T, D] |
required |
lengths |
Tensor |
length of each sequence, size [B] |
required |
Returns:
Type | Description |
---|---|
Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]] |
Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]: (sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state) |
Source code in slp/util/pytorch.py
def sort_sequences(
inputs: torch.Tensor, lengths: torch.Tensor
) -> Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.Tensor]]:
"""Sort sequences according to lengths (descending)
Args:
inputs (torch.Tensor): input sequences, size [B, T, D]
lengths (torch.Tensor): length of each sequence, size [B]
Returns:
Tuple[torch.Tensor, torch.Tensor, Callable[[torch.Tensor], torch.tensor]]:
(sorted inputs, sorted lengths, function to revert inputs and lengths to unsorted state)
"""
lengths_sorted, sorted_idx = lengths.sort(descending=True)
_, unsorted_idx = sorted_idx.sort()
def unsort(tt: torch.Tensor) -> torch.Tensor:
"""Restore original unsorted sequence"""
return tt[unsorted_idx]
return inputs[sorted_idx], lengths_sorted, unsort
subsequent_mask(max_length)
Generate subsequent (lower triangular) mask for transformer autoregressive tasks
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_length |
int |
Maximum sequence length |
required |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: The subsequent mask |
Source code in slp/util/pytorch.py
def subsequent_mask(max_length: int) -> torch.Tensor:
"""Generate subsequent (lower triangular) mask for transformer autoregressive tasks
Args:
max_length (int): Maximum sequence length
Returns:
torch.Tensor: The subsequent mask
"""
mask = torch.ones(max_length, max_length)
# Ignore typecheck because pytorch types are incomplete
return mask.triu().t().unsqueeze(0).contiguous() # type: ignore
t(data, dtype=torch.float32, device='cpu', requires_grad=False)
Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set. This always copies data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[numpy.ndarray, torch.Tensor, List[~T]] |
(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor. |
required |
dtype |
dtype |
(torch.dtype): The type of the tensor elements (Default value = torch.float) |
torch.float32 |
device |
Union[torch.device, str] |
(torch.device, str): Device where the tensor should be (Default value = 'cpu') |
'cpu' |
requires_grad |
bool |
(bool): Trainable tensor or not? (Default value = False) |
False |
Returns:
Type | Description |
---|---|
Tensor |
(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data |
Source code in slp/util/pytorch.py
def t(
data: types.NdTensor,
dtype: torch.dtype = torch.float,
device: types.Device = "cpu",
requires_grad: bool = False,
) -> torch.Tensor:
"""Convert a list or numpy array to torch tensor. If a torch tensor
is passed it is cast to dtype, device and the requires_grad flag is
set. This always copies data.
Args:
data: (list, np.ndarray, torch.Tensor): Data to be converted to
torch tensor.
dtype: (torch.dtype): The type of the tensor elements
(Default value = torch.float)
device: (torch.device, str): Device where the tensor should be
(Default value = 'cpu')
requires_grad: (bool): Trainable tensor or not? (Default value = False)
Returns:
(torch.Tensor): A tensor of appropriate dtype, device and
requires_grad containing data
"""
tt = torch.tensor(data, dtype=dtype, device=device, requires_grad=requires_grad)
return tt
t_(data, dtype=torch.float32, device='cpu', requires_grad=False)
Convert a list or numpy array to torch tensor. If a torch tensor is passed it is cast to dtype, device and the requires_grad flag is set IN PLACE.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[numpy.ndarray, torch.Tensor, List[~T]] |
(list, np.ndarray, torch.Tensor): Data to be converted to torch tensor. |
required |
dtype |
dtype |
(torch.dtype): The type of the tensor elements (Default value = torch.float) |
torch.float32 |
device |
Union[torch.device, str] |
(torch.device, str): Device where the tensor should be (Default value = 'cpu') |
'cpu' |
requires_grad |
bool |
bool): Trainable tensor or not? (Default value = False) |
False |
Returns:
Type | Description |
---|---|
Tensor |
(torch.Tensor): A tensor of appropriate dtype, device and requires_grad containing data |
Source code in slp/util/pytorch.py
def t_(
data: types.NdTensor,
dtype: torch.dtype = torch.float,
device: Optional[types.Device] = "cpu",
requires_grad: bool = False,
) -> torch.Tensor:
"""Convert a list or numpy array to torch tensor. If a torch tensor
is passed it is cast to dtype, device and the requires_grad flag is
set IN PLACE.
Args:
data: (list, np.ndarray, torch.Tensor): Data to be converted to
torch tensor.
dtype: (torch.dtype): The type of the tensor elements
(Default value = torch.float)
device: (torch.device, str): Device where the tensor should be
(Default value = 'cpu')
requires_grad: bool): Trainable tensor or not? (Default value = False)
Returns:
(torch.Tensor): A tensor of appropriate dtype, device and
requires_grad containing data
"""
if isinstance(device, str):
device = torch.device(device)
tt = torch.as_tensor(data, dtype=dtype, device=device).requires_grad_(requires_grad)
return tt
to_device(tt, device='cpu', non_blocking=False)
Send a tensor to a device
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tt |
Tensor |
input tensor |
required |
device |
Union[torch.device, str] |
Output device. Defaults to "cpu". |
'cpu' |
non_blocking |
bool |
Use blocking or non-blocking memory transfer. Defaults to False. |
False |
Returns:
Type | Description |
---|---|
Tensor |
torch.Tensor: Tensor in the desired device |
Source code in slp/util/pytorch.py
def to_device(
tt: torch.Tensor, device: Optional[types.Device] = "cpu", non_blocking: bool = False
) -> torch.Tensor:
"""Send a tensor to a device
Args:
tt (torch.Tensor): input tensor
device (Optional[types.Device], optional): Output device. Defaults to "cpu".
non_blocking (bool, optional): Use blocking or non-blocking memory transfer. Defaults to False.
Returns:
torch.Tensor: Tensor in the desired device
"""
return tt.to(device, non_blocking=non_blocking)
date_fname()
date_fname Generate a filename based on datetime.now().
If multiple calls are made within the same second, the filename will not be unique. We could add miliseconds etc. in the fname but that would hinder readability. For practical purposes e.g. unique logs between different experiments this should be enough. Either way if we need a truly unique descriptor, there is the uuid module.
Returns:
Type | Description |
---|---|
str |
str: A filename, e.g. 20210228-211832 |
Source code in slp/util/system.py
def date_fname() -> str:
"""date_fname Generate a filename based on datetime.now().
If multiple calls are made within the same second, the filename will not be unique.
We could add miliseconds etc. in the fname but that would hinder readability.
For practical purposes e.g. unique logs between different experiments this should be enough.
Either way if we need a truly unique descriptor, there is the uuid module.
Returns:
str: A filename, e.g. 20210228-211832
"""
return datetime.now().strftime("%Y%m%d-%H%M%S")
download_url(url, dest_path)
download_url Download a file to a destination path given a URL
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str |
A url pointing to the file we want to download |
required |
dest_path |
str |
The destination path to write the file |
required |
Returns:
Type | Description |
---|---|
str |
(str): The filename where the downloaded file is written |
Source code in slp/util/system.py
def download_url(url: str, dest_path: str) -> str:
"""download_url Download a file to a destination path given a URL
Args:
url (str): A url pointing to the file we want to download
dest_path (str): The destination path to write the file
Returns:
(str): The filename where the downloaded file is written
"""
name = url.rsplit("/")[-1]
dest = os.path.join(dest_path, name)
safe_mkdirs(dest_path)
response = urllib.request.urlopen(url)
with open(dest, "wb") as fd:
shutil.copyfileobj(response, fd)
return dest
has_internet_connection(timeout=3)
has_internet_connection Check if you are connected to the internet
Check if internet connection exists by pinging Google DNS server
Host: 8.8.8.8 (google-public-dns-a.google.com) OpenPort: 53/tcp Service: domain (DNS/TCP)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
timeout |
int |
Seconds to wait before giving up |
3 |
Returns:
Type | Description |
---|---|
bool |
bool: True if connection is established, False if we are not connected to the internet |
Source code in slp/util/system.py
def has_internet_connection(timeout: int = 3) -> bool:
"""has_internet_connection Check if you are connected to the internet
Check if internet connection exists by pinging Google DNS server
Host: 8.8.8.8 (google-public-dns-a.google.com)
OpenPort: 53/tcp
Service: domain (DNS/TCP)
Args:
timeout (int): Seconds to wait before giving up
Returns:
bool: True if connection is established, False if we are not connected to the internet
"""
host, port = "8.8.8.8", 53
try:
socket.setdefaulttimeout(timeout)
socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port))
return True
except socket.error as ex:
print(ex)
return False
is_file(inp)
is_file Check if the provided string is valid file in the system path
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inp |
Optional[str] |
A potential file or None |
required |
Returns:
Type | Description |
---|---|
Union[validators.utils.ValidationFailure, bool] |
types.ValidationResult: True if a valid file is provided, False if the string is not a url |
Examples:
>>> is_file("/bin/bash")
True
>>> is_file("/supercalifragilisticexpialidocious") # This does not exist. I hope...
False
Source code in slp/util/system.py
def is_file(inp: Optional[str]) -> types.ValidationResult:
"""is_file Check if the provided string is valid file in the system path
Args:
inp (Optional[str]): A potential file or None
Returns:
types.ValidationResult: True if a valid file is provided, False if the string is not a url
Examples:
>>> is_file("/bin/bash")
True
>>> is_file("/supercalifragilisticexpialidocious") # This does not exist. I hope...
False
"""
if not inp:
return False
return os.path.isfile(inp)
is_subpath(child, parent)
is_subpath Check if child path is a subpath of parent
Parameters:
Name | Type | Description | Default |
---|---|---|---|
child |
str |
Child path |
required |
parent |
str |
parent path |
required |
Returns:
Type | Description |
---|---|
bool |
bool: True if child is a subpath of parent, false if not |
Examples:
>>> is_subpath("/usr/bin/Xorg", "/usr")
True
Source code in slp/util/system.py
def is_subpath(child: str, parent: str) -> bool:
"""is_subpath Check if child path is a subpath of parent
Args:
child (str): Child path
parent (str): parent path
Returns:
bool: True if child is a subpath of parent, false if not
Examples:
>>> is_subpath("/usr/bin/Xorg", "/usr")
True
"""
parent = os.path.abspath(parent)
child = os.path.abspath(child)
return cast(
bool, os.path.commonpath([parent]) == os.path.commonpath([parent, child])
)
is_url(inp)
is_url Check if the provided string is a URL
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inp |
Optional[str] |
A potential link or None |
required |
Returns:
Type | Description |
---|---|
Union[validators.utils.ValidationFailure, bool] |
types.ValidationResult: True if a valid url is provided, False if the string is not a url |
Examples:
>>> is_url("Hello World")
ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
>>> is_url("http://google.com")
True
Source code in slp/util/system.py
def is_url(inp: Optional[str]) -> types.ValidationResult:
"""is_url Check if the provided string is a URL
Args:
inp (Optional[str]): A potential link or None
Returns:
types.ValidationResult: True if a valid url is provided, False if the string is not a url
Examples:
>>> is_url("Hello World")
ValidationFailure(func=url, args={'value': 'Hello World', 'public': False})
>>> is_url("http://google.com")
True
"""
if not inp:
return False
return validators.url(inp)
json_dump(data, fname)
json_dump Save dict to a json file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Dict[~K, ~V] |
Dict to save |
required |
fname |
str |
Output json file |
required |
Source code in slp/util/system.py
def json_dump(data: types.GenericDict, fname: str) -> None:
"""json_dump Save dict to a json file
Args:
data (types.GenericDict): Dict to save
fname (str): Output json file
"""
with open(fname, "w") as fd:
json.dump(data, fd)
json_load(fname)
json_load Load dict from a json file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fname |
str |
Json file to load |
required |
Returns:
Type | Description |
---|---|
Dict[~K, ~V] |
types.GenericDict: Dict of loaded data |
Source code in slp/util/system.py
def json_load(fname: str) -> types.GenericDict:
"""json_load Load dict from a json file
Args:
fname (str): Json file to load
Returns:
types.GenericDict: Dict of loaded data
"""
with open(fname, "r") as fd:
data = json.load(fd)
return cast(types.GenericDict, data)
pickle_dump(data, fname)
pickle_dump Save data to pickle file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Any |
Data to save |
required |
fname |
str |
Output pickle file |
required |
Source code in slp/util/system.py
def pickle_dump(data: Any, fname: str) -> None:
"""pickle_dump Save data to pickle file
Args:
data (Any): Data to save
fname (str): Output pickle file
"""
with open(fname, "wb") as fd:
pickle.dump(data, fd)
pickle_load(fname)
pickle_load Load data from pickle file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fname |
str |
file name of pickle file |
required |
Returns:
Type | Description |
---|---|
Any |
Any: Loaded data |
Source code in slp/util/system.py
def pickle_load(fname: str) -> Any:
"""pickle_load Load data from pickle file
Args:
fname (str): file name of pickle file
Returns:
Any: Loaded data
"""
with open(fname, "rb") as fd:
data = pickle.load(fd)
return data
print_separator(symbol='*', n=10, print_fn=<built-in function print>)
print_separator Print a repeated symbol as a separator
Parameters:
Name | Type | Description | Default |
---|---|---|---|
symbol |
str |
Symbol to print |
'*' |
n |
int |
Number of times to print the symbol |
10 |
print_fn |
Callable[[str], NoneType] |
Print function to use, e.g. print or logger.info |
<built-in function print> |
Examples:
>>> print_separator(symbol="-", n=2)
--
Source code in slp/util/system.py
def print_separator(
symbol: str = "*", n: int = 10, print_fn: Callable[[str], None] = print
):
"""print_separator Print a repeated symbol as a separator
*********************************************************
Args:
symbol (str): Symbol to print
n (int): Number of times to print the symbol
print_fn (Callable[[str], None]): Print function to use, e.g. print or logger.info
Examples:
>>> print_separator(symbol="-", n=2)
--
"""
print_fn(symbol * n)
read_wav(wav_sample)
read_wav Reads a wav clip into a string and returns the hex string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wav_sample |
str |
Path to wav file |
required |
Returns:
Type | Description |
---|---|
str |
A hex string with the audio information. |
Source code in slp/util/system.py
def read_wav(wav_sample: str) -> str:
"""read_wav Reads a wav clip into a string and returns the hex string.
Args:
wav_sample (str): Path to wav file
Returns:
A hex string with the audio information.
"""
with open(wav_sample, "r") as wav_fd:
clip = wav_fd.read()
return clip
run_cmd(command)
run_cmd Run given shell command
!!! args
command (str): Shell command to run
!!! returns
(int, str): Status code, stdout of shell command
!!! examples
>>> run_cmd("ls /")
(0, 'bin
boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')
Source code in slp/util/system.py
def run_cmd(command: str) -> Tuple[int, str]:
"""run_cmd Run given shell command
Args:
command (str): Shell command to run
Returns:
(int, str): Status code, stdout of shell command
Examples:
>>> run_cmd("ls /")
(0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
"""
command = f'{os.getenv("SHELL")} -c "{command}"'
pipe = subprocess.Popen(
command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT
)
stdout = ""
if pipe.stdout is not None:
stdout = "".join(
[line.decode("utf-8") for line in iter(pipe.stdout.readline, b"")]
)
pipe.stdout.close()
returncode = pipe.wait()
return returncode, stdout
run_cmd_silent(command)
run_cmd_silent Run command without printing to console
!!! args
command (str): Shell command to run
!!! returns
(int, str): Status code, stdout of shell command
!!! examples
>>> run_cmd("ls /")
(0, 'bin
boot dev etc home init lib lib32 lib64 libx32 lost+found media mnt opt proc root run sbin snap srv sys tmp usr var ')
Source code in slp/util/system.py
def run_cmd_silent(command: str) -> Tuple[int, str]:
"""run_cmd_silent Run command without printing to console
Args:
command (str): Shell command to run
Returns:
(int, str): Status code, stdout of shell command
Examples:
>>> run_cmd("ls /")
(0, 'bin\nboot\ndev\netc\nhome\ninit\nlib\nlib32\nlib64\nlibx32\nlost+found\nmedia\nmnt\nopt\nproc\nroot\nrun\nsbin\nsnap\nsrv\nsys\ntmp\nusr\nvar\n')
"""
return cast(Tuple[int, str], suppress_print(run_cmd)(command))
safe_mkdirs(path)
Makes recursively all the directories in input path
Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str |
Path to mkdir -p |
required |
Examples:
>>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")
Source code in slp/util/system.py
def safe_mkdirs(path: str) -> None:
"""Makes recursively all the directories in input path
Utility function similar to mkdir -p. Makes directories recursively, if given path does not exist
Args:
path (str): Path to mkdir -p
Examples:
>>> safe_mkdirs("super/cali/fragi/listic/expi/ali/docious")
"""
if not os.path.exists(path):
try:
os.makedirs(path)
except Exception as e:
logger.warning(e)
raise IOError((f"Failed to create recursive directories: {path}"))
suppress_print(func)
suppress_print Decorator to supress stdout of decorated function
Examples:
>>> @slp.util.system.timethis
>>> def very_verbose_function(...): ...
Source code in slp/util/system.py
def suppress_print(func: Callable) -> Callable:
"""suppress_print Decorator to supress stdout of decorated function
Examples:
>>> @slp.util.system.timethis
>>> def very_verbose_function(...): ...
"""
def func_wrapper(*args: types.T, **kwargs: types.T):
"""Inner function for decorator closure"""
with open("/dev/null", "w") as sys.stdout:
ret = func(*args, **kwargs)
sys.stdout = sys.__stdout__
return ret
return cast(Callable, func_wrapper)
timethis(method=False)
Decorator to measure the time it takes for a function to complete
Examples:
>>> @slp.util.system.timethis
>>> def time_consuming_function(...): ...
Source code in slp/util/system.py
def timethis(method=False) -> Callable:
"""Decorator to measure the time it takes for a function to complete
Examples:
>>> @slp.util.system.timethis
>>> def time_consuming_function(...): ...
"""
def timethis_inner(func: Callable) -> Callable:
"""Inner function for decorator closure"""
@functools.wraps(func)
def timed(*args: types.T, **kwargs: types.T):
"""Inner function for decorator closure"""
ts = time.time()
result = func(*args, **kwargs)
te = time.time()
elapsed = f"{te - ts}"
if method:
logger.info(
"BENCHMARK: {cls}.{f}(*{a}, **{kw}) took: {t} sec".format(
f=func.__name__, cls=args[0], a=args[1:], kw=kwargs, t=elapsed
)
)
else:
logger.info(
"BENCHMARK: {f}(*{a}, **{kw}) took: {t} sec".format(
f=func.__name__, a=args, kw=kwargs, t=elapsed
)
)
return result
return cast(Callable, timed)
return timethis_inner
write_wav(byte_str, wav_file)
write_wav Write a hex string into a wav file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
byte_str |
str |
The hex string containing the audio data |
required |
wav_file |
str |
The output wav file |
required |
Source code in slp/util/system.py
def write_wav(byte_str: str, wav_file: str) -> None:
"""write_wav Write a hex string into a wav file
Args:
byte_str (str): The hex string containing the audio data
wav_file (str): The output wav file
"""
with open(wav_file, "w") as fd:
fd.write(byte_str)
yaml_dump(data, fname)
yaml_dump Save dict to a yaml file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Dict[~K, ~V] |
Dict to save |
required |
fname |
str |
Output json file |
required |
Source code in slp/util/system.py
def yaml_dump(data: types.GenericDict, fname: str) -> None:
"""yaml_dump Save dict to a yaml file
Args:
data (types.GenericDict): Dict to save
fname (str): Output json file
"""
with open(fname, "w") as fd:
yaml.dump(data, fd)
yaml_load(fname)
yaml_load Load dict from a yaml file
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fname |
str |
Json file to load |
required |
Returns:
Type | Description |
---|---|
Dict[~K, ~V] |
types.GenericDict: Dict of loaded data |
Source code in slp/util/system.py
def yaml_load(fname: str) -> types.GenericDict:
"""yaml_load Load dict from a yaml file
Args:
fname (str): Json file to load
Returns:
types.GenericDict: Dict of loaded data
"""
with open(fname, "r") as fd:
data = yaml.load(fd)
return cast(types.GenericDict, data)
dir_path(path)
dir_path Type to use when parsing a path in argparse arguments
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str |
User provided path |
required |
Exceptions:
Type | Description |
---|---|
argparse.ArgumentTypeError |
Path does not exists, so argparse fails |
Returns:
Type | Description |
---|---|
str |
User provided path |
Examples:
>>> from slp.util.types import dir_path
>>> import argparse
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--config", type=dir_path)
>>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
Traceback (most recent call last):
argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist
Source code in slp/util/types.py
def dir_path(path):
"""dir_path Type to use when parsing a path in argparse arguments
Args:
path (str): User provided path
Raises:
argparse.ArgumentTypeError: Path does not exists, so argparse fails
Returns:
str: User provided path
Examples:
>>> from slp.util.types import dir_path
>>> import argparse
>>> parser = argparse.ArgumentParser("My cool model")
>>> parser.add_argument("--config", type=dir_path)
>>> parser.parse_args(args=["--config", "my_random_config_that_does_not_exist.yaml"])
Traceback (most recent call last):
argparse.ArgumentTypeError: User provided path 'my_random_config_that_does_not_exist.yaml' does not exist
"""
if os.path.isdir(path):
return path
raise argparse.ArgumentTypeError(f"User provided path '{path}' does not exist")