Experimental Emulator API

class summit.benchmarks.experimental_emulator.ExperimentalEmulator(model_name, domain, **kwargs)[source]

Experimental Emulator

Train a machine learning model based on experimental data. The model acts a benchmark for testing optimisation strategies.

Parameters
  • model_name (str) – Name of the model, ideally with no spaces

  • domain (Domain) – The domain of the emulator

  • dataset (Dataset, optional) – Dataset used for training/validation

  • regressor (torch.nn.Module, optional) – Pytorch LightningModule class. Defaults to the ANNRegressor

  • output_variable_names (str or list, optional) – The names of the variables that should be trained by the predictor. Defaults to all objectives in the domain.

  • descriptors_features (list, optional) – A list of input categorical variable names that should be transformed into their descriptors instead of using one-hot encoding.

  • clip (bool or list, optional) – Whether to clip predictions to the limits of the objectives in the domain. True (default) means clipping is activated for all outputs and False means it is not activated at all. A list of specific outputs to clip can also be passed.

Notes

By default, categorical features are pre-processed using one-hot encoding. If descriptors are avaialble, they can be used on a feature-by-feature basis by specifying names of categorical variables in the descriptors_features keyword argument.

Examples

>>> from summit.benchmarks import ExperimentalEmulator, ReizmanSuzukiEmulator
>>> from summit.utils.dataset import DataSet
>>> import matplotlib.pyplot as plt
>>> import pathlib
>>> import pkg_resources
>>> # Steal domain and data from Reizman example
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> # Create emulator and train (bump max_epochs to 1000 to get better training)
>>> exp = ExperimentalEmulator(model_name,domain,dataset=ds)
>>> res = exp.train(max_epochs=10, cv_folds=2, random_state=100, test_size=0.2)
>>> # Plot to show the quality of the fit
>>> fig, ax = exp.parity_plot(include_test=True)
>>> plt.show()
classmethod from_dict(d, **kwargs)[source]

Create ExperimentalEmulator from a dictionary

Notes

This does not load the regressor weights and biases. After calling from_dict, call load_regressor to load the weights and biases.

classmethod load(model_name, save_dir, **kwargs)[source]

Load all the essential parameters of the ExperimentalEmulator from disk

Parameters

save_dir (str or pathlib.Path) – The directory from which to load emulator files.

load_regressor(save_dir)[source]

Load the weights and biases of the regressor from disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

parity_plot(**kwargs)[source]

Produce a parity plot based for the trained model using matplotlib

Parameters
  • output_variable_names (str or list, optional) – The output variables to plot. Defaults to all.

  • include_test (bool, optional) – Include the performance of the model on the test set. Defaults to False.

  • train_color (str, optional) – Hex string for the train points. Defaults to “#6f3666”

  • test_color (str, optional) – Hex string for the train points. Defaults to “#3c328c”

save(save_dir)[source]

Save all the essential parameters of the ExperimentalEmulator to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

save_regressor(save_dir)[source]

Save the weights and biases of the regressor to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

test(**kwargs)[source]

Get test results

This requires that train has already been called or the ExperimentalEmulator was initialized from a pretrained model.

Parameters

scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

to_dict(**experiment_params)[source]

Convert emulator parameters to dictionary

Notes

This does not save the weights and biases of the regressor. You need to use save_regressor method.

train(**kwargs)[source]

Train the model on the dataset

This will automatically do a train-test split and then train via cross-validation on the train set.

Parameters
  • test_size (float, optional) – The size of the test as a fraction of the total dataset. Defaults to 0.1.

  • cv_folds (int, optional) – The number of cross validation folds. Defaults to 5.

  • max_epochs (int, optional) – The max number of epochs for each CV fold. Defaults to 100.

  • scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

  • regressor_kwargs (dict, optional) – You can pass extra arguments to the regressor here.

  • callbacks (None, "disable" or list of Callbacks) – Skorch callbacks passed to skorch.net. See: https://skorch.readthedocs.io/en/latest/net.html

  • verbose (int) – 0 for no logging, 1 for logging

Notes

If predictor was set in the initialization, it will not be overwritten.

Returns

Return type

A dictionary containing the results of the training.

class summit.benchmarks.ANNRegressor(input_dim, output_dim, hidden_units=512, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class summit.benchmarks.RegressorRegistry[source]

Registry for Regressors

The registry stores regressors that can be used with the :class:~`summit.benchmarks.ExperimentalEmulator`. A regressor can be any torch.nn.Module that takes the parameeters input_dim and output_dim for the input and output dimensions respectively.

Registering a regressor means that it can be serialized and deserialized using the save/load functionality of the emulator.

register(regressor)[source]

Register a new regresssor

Parameters

regressor (torch.nn.Module) – A torch neural network module