Already Implemented Benchmarks

SnAr Benchmark

class summit.benchmarks.SnarBenchmark(noise_level=0, **kwargs)[source]

Benchmark representing a nucleophilic aromatic substitution (SnAr) reaction

The SnAr reactions occurs in a plug flow reactor where residence time, stoichiometry and temperature can be adjusted. Maximizing Space time yield (STY) and minimising E-factor are the objectives.

Parameters

noise_level (float, optional) – The mean of the random noise added to the concentration measurements in terms of percent of the signal. Default is 0.

Examples

>>> b = SnarBenchmark()
>>> columns = [v.name for v in b.domain.variables]
>>> values = [v.bounds[0]+0.1*(v.bounds[1]-v.bounds[0]) for v in b.domain.variables]
>>> values = np.array(values)
>>> values = np.atleast_2d(values)
>>> conditions = DataSet(values, columns=columns)
>>> results = b.run_experiments(conditions)

Notes

This benchmark relies on the kinetics observerd by [Hone] et al. The mechanistic model is integrated using scipy to find outlet concentrations of all species. These concentrations are then used to calculate STY and E-factor.

References

Hone

C. A. Hone et al., React. Chem. Eng., 2017, 2, 103–108. DOI: 10.1039/C6RE00109B

property data

Datast of all experiments run

property domain

The domain for the experiment

pareto_plot(objectives=None, colorbar=False, ax=None)

Make a 2D pareto plot of the experiments thus far

Parameters
  • objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives

  • ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to

Returns

  • if ax is None returns a tuple with the first component

  • as the a new figure and the second component the axis

  • if ax is a matplotlib axis, returns only the axis

Raises

ValueError – If the number of objectives is not equal to two

reset()

Reset the experiment

This will clear all data.

run_experiments(conditions, computation_time=None, **kwargs)

Run the experiment(s) at the given conditions

Parameters
  • conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.

  • computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.

to_dict(**kwargs)[source]

Serialize the class to a dictionary

Subclasses can add a experiment_params dictionary key with custom parameters for the experiment

Cross-Coupling Emulator Benchmarks

summit.benchmarks.get_pretrained_reizman_suzuki_emulator(case=1)[source]

Get the pretrained Reziman Suzuki Emulator

Parameters

case (int, optional, default=1) – Reizman et al. (2016) reported experimental data for 4 different cases. Each case was has a different set of substrates but the same possible catalysts. Please see their paper for more information on the cases.

Examples

>>> import matplotlib.pyplot as plt
>>> from summit.benchmarks import get_pretrained_reizman_suzuki_emulator
>>> from summit.utils.dataset import DataSet
>>> import pandas as pd
>>> b = get_pretrained_reizman_suzuki_emulator(case=1)
>>> fig, ax = b.parity_plot(include_test=True)
>>> plt.show()
>>> columns = [v.name for v in b.domain.variables]
>>> values = { "catalyst": ["P1-L3"], "t_res": [600], "temperature": [30],"catalyst_loading": [0.498],}
>>> conditions = pd.DataFrame(values)
>>> conditions = DataSet.from_df(conditions)
>>> results = b.run_experiments(conditions, return_std=True)
class summit.benchmarks.ReizmanSuzukiEmulator(case=1, **kwargs)[source]

Reizman Suzuki Emulator

Virtual experiments representing the Suzuki-Miyaura Cross-Coupling reaction similar to Reizman et al. (2016). Experimental outcomes are based on an emulator that is trained on the experimental data published by Reizman et al.

You should use get_pretrained_reizman_suzuki_emulator to get a pretrained verison.

Parameters

case (int, optional, default=1) – Reizman et al. (2016) reported experimental data for 4 different cases. Each case was has a different set of substrates but the same possible catalysts. Please see their paper for more information on the cases.

Examples

>>> reizman_emulator = ReizmanSuzukiEmulator(case=1)

Notes

This benchmark is based on data from [Reizman] et al.

References

Reizman

B. J. Reizman et al., React. Chem. Eng., 2016, 1, 658–666. DOI: 10.1039/C6RE00153J.

property data

Datast of all experiments run

property domain

The domain for the experiment

classmethod from_dict(d, **kwargs)

Create ExperimentalEmulator from a dictionary

Notes

This does not load the regressor weights and biases. After calling from_dict, call load_regressor to load the weights and biases.

classmethod load(save_dir, case=1, **kwargs)[source]

Load all the essential parameters of the ExperimentalEmulator from disk

Parameters

save_dir (str or pathlib.Path) – The directory from which to load emulator files.

Notes

This loads the parameters needed to reproduce results but not the associated data. You can separately load X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.

Examples

>>> from summit import *
>>> import pkg_resources, pathlib
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor)
>>> res = exp.train(max_epochs=10)
>>> exp.save("reizman_test")
>>> #Load data for new experimental emulator
>>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test")
>>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test
>>> res = exp_new.test()
>>> fig, ax = exp_new.parity_plot(include_test=True)
load_regressor(save_dir)

Load the weights and biases of the regressor from disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

pareto_plot(objectives=None, colorbar=False, ax=None)

Make a 2D pareto plot of the experiments thus far

Parameters
  • objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives

  • ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to

Returns

  • if ax is None returns a tuple with the first component

  • as the a new figure and the second component the axis

  • if ax is a matplotlib axis, returns only the axis

Raises

ValueError – If the number of objectives is not equal to two

parity_plot(**kwargs)

Produce a parity plot based for the trained model using matplotlib

Parameters
  • output_variable_names (str or list, optional) – The output variables to plot. Defaults to all.

  • include_test (bool, optional) – Include the performance of the model on the test set. Defaults to False.

  • train_color (str, optional) – Hex string for the train points. Defaults to “#6f3666”

  • test_color (str, optional) – Hex string for the train points. Defaults to “#3c328c”

reset()

Reset the experiment

This will clear all data.

run_experiments(conditions, computation_time=None, **kwargs)

Run the experiment(s) at the given conditions

Parameters
  • conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.

  • computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.

save(save_dir)

Save all the essential parameters of the ExperimentalEmulator to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

Notes

This saves the parameters needed to reproduce results but not the associated data. You can separately save X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.

Examples

>>> from summit import *
>>> import pkg_resources, pathlib
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor)
>>> res = exp.train(max_epochs=10)
>>> exp.save("reizman_test/")
>>> #Load data for new experimental emulator
>>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test/")
>>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test
>>> res = exp_new.test()
>>> fig, ax = exp_new.parity_plot(include_test=True)
save_regressor(save_dir)

Save the weights and biases of the regressor to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

test(**kwargs)

Get test results

This requires that train has already been called or the ExperimentalEmulator was initialized from a pretrained model.

Parameters

Notes

The method loops over the predictors, so the resulting are scores averaged over all objectives for each of the predictors. In contrast, the parity_plot code gives the scores for each objective averaged over the predictors.

Returns

scores_dict – A dictionary of scores with test_SCORE as the key and values as an array of scores for each of the models in the ensemble.

Return type

dict

to_dict()[source]

Serialize the class to a dictionary

train(**kwargs)

Train the model on the dataset

This will automatically do a train-test split and then train via cross-validation on the train set.

Parameters
  • test_size (float, optional) – The size of the test as a fraction of the total dataset. Defaults to 0.1.

  • cv_folds (int, optional) – The number of cross validation folds. Defaults to 5.

  • max_epochs (int, optional) – The max number of epochs for each CV fold. Defaults to 100.

  • scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

  • search_params (dict, optional) – A dictionary with parameter values to change in a gridsearch.

  • regressor_kwargs (dict, optional) – You can pass extra arguments to the regressor here.

  • callbacks (None, "disable" or list of Callbacks) – Skorch callbacks passed to skorch.net. See: https://skorch.readthedocs.io/en/latest/net.html

  • verbose (int) – 0 for no logging, 1 for logging

Notes

If predictor was set in the initialization, it will not be overwritten.

Returns

Return type

A dictionary containing the results of the training.

Examples

>>> from summit import *
>>> import pkg_resources, pathlib
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor)
>>> # Test grid search cross validation and training
>>> params = { "regressor__net__max_epochs": [1, 1000]}
>>> exp.train(cv_folds=5, random_state=100, search_params=params, verbose=0) 
summit.benchmarks.get_pretrained_baumgartner_cc_emulator(include_cost=False, use_descriptors=False)[source]

Get a pretrained BaumgartnerCrossCouplingEmulator

Parameters
  • include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.

  • use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.

Examples

>>> import matplotlib.pyplot as plt
>>> from summit.benchmarks import get_pretrained_baumgartner_cc_emulator
>>> from summit.utils.dataset import DataSet
>>> import pandas as pd
>>> b = get_pretrained_baumgartner_cc_emulator(include_cost=True, use_descriptors=False)
>>> fig, ax = b.parity_plot(include_test=True)
>>> plt.show()
>>> columns = [v.name for v in b.domain.variables]
>>> values = { "catalyst": ["tBuXPhos"], "base": ["DBU"], "t_res": [328.717801570892],"temperature": [30],"base_equivalents": [2.18301549894049]}
>>> conditions = pd.DataFrame(values)
>>> conditions = DataSet.from_df(conditions)
>>> results = b.run_experiments(conditions, return_std=True)
class summit.benchmarks.BaumgartnerCrossCouplingEmulator(include_cost=False, use_descriptors=False, **kwargs)[source]

Baumgartner Cross Coupling Emulator

Virtual experiments representing the Aniline Cross-Coupling reaction similar to Baumgartner et al. (2019). Experimental outcomes are based on an emulator that is trained on the experimental data published by Baumgartner et al.

This is a five dimensional optimisation of temperature, residence time, base equivalents, catalyst and base.

The categorical variables (catalyst and base) contain descriptors calculated using COSMO-RS. Specifically, the descriptors are the first two sigma moments.

To use the pretrained version, call get_pretrained_baumgartner_cc_emulator

Parameters
  • include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.

  • use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.

Examples

>>> bemul = BaumgartnerCrossCouplingEmulator()

Notes

This benchmark is based on data from [Baumgartner] et al.

References

Baumgartner

L. M. Baumgartner et al., Org. Process Res. Dev., 2019, 23, 1594–1601 DOI: 10.1021/acs.oprd.9b00236

property data

Datast of all experiments run

property domain

The domain for the experiment

classmethod from_dict(d, **kwargs)

Create ExperimentalEmulator from a dictionary

Notes

This does not load the regressor weights and biases. After calling from_dict, call load_regressor to load the weights and biases.

classmethod load(save_dir, include_cost=False, use_descriptors=False, **kwargs)[source]

Load all the essential parameters of the BaumgartnerCrossCouplingEmulator from disc

Parameters
  • save_dir (str or pathlib.Path) – The directory from which to load emulator files.

  • include_cost (bool, optional) – Include minimization of cost as an extra objective. Cost is calculated as a deterministic function of the inputs (i.e., no model is trained). Defaults to False.

  • use_descriptors (bool, optional) – Use descriptors for the catalyst and base instead of one-hot encoding (defaults to False). T The descriptors been pre-calculated using COSMO-RS. To only use descriptors with a single feature, pass descriptors_features a list where the only item is the name of the desired categorical variable.

load_regressor(save_dir)

Load the weights and biases of the regressor from disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

pareto_plot(objectives=None, colorbar=False, ax=None)

Make a 2D pareto plot of the experiments thus far

Parameters
  • objectives (array-like, optional) – List of names of objectives to plot. By default picks the first two objectives

  • ax (matplotlib.pyplot.axes, optional) – An existing axis to apply the plot to

Returns

  • if ax is None returns a tuple with the first component

  • as the a new figure and the second component the axis

  • if ax is a matplotlib axis, returns only the axis

Raises

ValueError – If the number of objectives is not equal to two

parity_plot(**kwargs)

Produce a parity plot based for the trained model using matplotlib

Parameters
  • output_variable_names (str or list, optional) – The output variables to plot. Defaults to all.

  • include_test (bool, optional) – Include the performance of the model on the test set. Defaults to False.

  • train_color (str, optional) – Hex string for the train points. Defaults to “#6f3666”

  • test_color (str, optional) – Hex string for the train points. Defaults to “#3c328c”

reset()

Reset the experiment

This will clear all data.

run_experiments(conditions, computation_time=None, **kwargs)

Run the experiment(s) at the given conditions

Parameters
  • conditions (summit.utils.dataset.Dataset) – A dataset with columns matching the variables in the domain of a experiment(s) to run.

  • computation_time (float, optional) – The time used by the strategy in calculating the next experiments. By default, the time since the last call to run_experiment is used.

save(save_dir)

Save all the essential parameters of the ExperimentalEmulator to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

Notes

This saves the parameters needed to reproduce results but not the associated data. You can separately save X_test, y_test, X_train, and y_train attributes if you want to be able to reproduce splits, test results and parity plots.

Examples

>>> from summit import *
>>> import pkg_resources, pathlib
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor)
>>> res = exp.train(max_epochs=10)
>>> exp.save("reizman_test/")
>>> #Load data for new experimental emulator
>>> exp_new = ExperimentalEmulator.load(model_name, "reizman_test/")
>>> exp_new.X_train, exp_new.y_train, exp_new.X_test, exp_new.y_test = exp.X_train, exp.y_train, exp.X_test, exp.y_test
>>> res = exp_new.test()
>>> fig, ax = exp_new.parity_plot(include_test=True)
save_regressor(save_dir)

Save the weights and biases of the regressor to disk

Parameters

save_dir (str or pathlib.Path) – The directory used for saving emulator files.

test(**kwargs)

Get test results

This requires that train has already been called or the ExperimentalEmulator was initialized from a pretrained model.

Parameters

Notes

The method loops over the predictors, so the resulting are scores averaged over all objectives for each of the predictors. In contrast, the parity_plot code gives the scores for each objective averaged over the predictors.

Returns

scores_dict – A dictionary of scores with test_SCORE as the key and values as an array of scores for each of the models in the ensemble.

Return type

dict

to_dict(**experiment_params)

Convert emulator parameters to dictionary

Notes

This does not save the weights and biases of the regressor. You need to use save_regressor method.

train(**kwargs)

Train the model on the dataset

This will automatically do a train-test split and then train via cross-validation on the train set.

Parameters
  • test_size (float, optional) – The size of the test as a fraction of the total dataset. Defaults to 0.1.

  • cv_folds (int, optional) – The number of cross validation folds. Defaults to 5.

  • max_epochs (int, optional) – The max number of epochs for each CV fold. Defaults to 100.

  • scoring (str or list, optional) – A list of scoring functions or names of them. Defaults to R2 and MSE. See here for more https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

  • search_params (dict, optional) – A dictionary with parameter values to change in a gridsearch.

  • regressor_kwargs (dict, optional) – You can pass extra arguments to the regressor here.

  • callbacks (None, "disable" or list of Callbacks) – Skorch callbacks passed to skorch.net. See: https://skorch.readthedocs.io/en/latest/net.html

  • verbose (int) – 0 for no logging, 1 for logging

Notes

If predictor was set in the initialization, it will not be overwritten.

Returns

Return type

A dictionary containing the results of the training.

Examples

>>> from summit import *
>>> import pkg_resources, pathlib
>>> DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
>>> model_name = f"reizman_suzuki_case_1"
>>> domain = ReizmanSuzukiEmulator.setup_domain()
>>> ds = DataSet.read_csv(DATA_PATH / f"{model_name}.csv")
>>> exp = ExperimentalEmulator(model_name, domain, dataset=ds, regressor=ANNRegressor)
>>> # Test grid search cross validation and training
>>> params = { "regressor__net__max_epochs": [1, 1000]}
>>> exp.train(cv_folds=5, random_state=100, search_params=params, verbose=0)