Creating New Benchmarks¶

Here we give a demonstration of how to train a new benchmark based on experimental data. We call these type of benchmarks ExperimentalEmulator. As an example, we are going to create a benchmark for the Suzuki-Miyaura Cross-Coupling reaction in Reizman et al. (2016).

Google Colab¶

If you would like to follow along with this tutorial, you can open it in Google Colab using the button below.

You will need to run the following cell to make sure Summit and all its dependencies are installed. If prompted, restart the runtime.

[ ]:

!pip install summit

Create the domain¶

Let’s first import the needed parts of Summit.

[1]:

from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet

We first need to create a domain. A domain contains all the decision variables, constraints and objectives for a benchmark.

[2]:

domain = Domain()

Above, we instantiate a new domain without any variables. Here, we are going to manipulate the catalyst, base, catalyst loading, base stoichiometry and temperature. Our objectives are to maximise yield and minimise turn over number (TON). We can use the increment operator += to add variables to the domain.

[3]:

# Decision variables
des_1 = "Catalyst type - different ligands"
domain += CategoricalVariable(
    name="catalyst",
    description=des_1,
    levels=[
        "P1-L1",
        "P2-L1",
        "P1-L2",
        "P1-L3",
        "P1-L4",
        "P1-L5",
        "P1-L6",
        "P1-L7",
    ],
)

des_2 = "Residence time in seconds (s)"
domain += ContinuousVariable(name="t_res", description=des_2, bounds=[60, 600])

des_3 = "Reactor temperature in degrees Celsius (ºC)"
domain += ContinuousVariable(
    name="temperature", description=des_3, bounds=[30, 110]
)

des_4 = "Catalyst loading in mol%"
domain += ContinuousVariable(
    name="catalyst_loading", description=des_4, bounds=[0.5, 2.5]
)

# Objectives
des_5 = (
    "Turnover number - moles product generated divided by moles catalyst used"
)
domain += ContinuousVariable(
    name="ton",
    description=des_5,
    bounds=[0, 200],  # TODO: not sure about bounds, maybe redefine
    is_objective=True,
    maximize=True,
)

des_6 = "Yield"
domain += ContinuousVariable(
    name="yield",
    description=des_6,
    bounds=[0, 100],
    is_objective=True,
    maximize=True,
)

domain

[3]:

Name	Type	Description	Values
catalyst	categorical, input	Catalyst type - different ligands	8 levels
t_res	continuous, input	Residence time in seconds (s)	[60,600]
temperature	continuous, input	Reactor temperature in degrees Celsius (ºC)	[30,110]
catalyst_loading	continuous, input	Catalyst loading in mol%	[0.5,2.5]
ton	continuous, maximize objective	Turnover number - moles product generated divided by moles catalyst used	[0,200]
yield	continuous, maximize objective	Yield	[0,100]

Create the Experimental Emulator¶

Now we just need two lines of code to train the experimental emulator! We first instantiate ExperimentalEmulator passing in the domain and a name for the model. Next we train it on this dataset with two-fold cross-validation and a test set size of 25%. Make sure to replace the csv_dataset keyword argument with the path to your csv file. When you run this code, you will see the outputs from the training loop.

If you are running this yourself, uncomment the second line.

[4]:

import pathlib
FOLDER = pathlib.Path("../_static/")  # When using this in the context of docs
# FOLDER = pathlib.Path(".")

[ ]:

emul = ExperimentalEmulator(domain=domain, model_name='my_reizman')
emul.train(csv_dataset=FOLDER / "reizman_suzuki_case1_train_test.csv", cv_fold=2, test_size=0.25)

Now that the interal model is trained, we can use the experimental emulator. I print out the domain again to remind us of the variables

[7]:

domain

[7]:

Name	Type	Description	Values
catalyst	categorical, input	Catalyst type - different ligands	8 levels
t_res	continuous, input	Residence time in seconds (s)	[60,600]
temperature	continuous, input	Reactor temperature in degrees Celsius (ºC)	[30,110]
catalyst_loading	continuous, input	Catalyst loading in mol%	[0.5,2.5]
ton	continuous, maximize objective	Turnover number - moles product generated divided by moles catalyst used	[0,200]
yield	continuous, maximize objective	Yield	[0,100]

[8]:

conditions = [["P1-L1", 60, 100, 1.0]]
conditions = DataSet(conditions, columns=[v.name for v in domain.input_variables])
emul.run_experiments(conditions)

[8]:

	catalyst	t_res	temperature	catalyst_loading	ton	yield	computation_t	experiment_t	strategy
0	P1-L1	60	100	1.0	29.972519	43.924999	0.0	0.063283	NaN

Now we have a benchmark that can accept conditions and predict the yield and TON!

Experimental Emulator API¶

class summit.benchmarks.experimental_emulator.ExperimentalEmulator(domain, dataset=None, csv_dataset=None, model_name='dataset_name_emulator_bnn', regressor_type='BNN', cat_to_descr=False, **kwargs)[source]¶

Experimental Emulator

Parameters

domain (summit.domain.Domain) – The domain of the experiment
dataset (class:~summit.utils.dataset.DataSet, optional) – A DataSet with data for training where the data columns correspond to the domain and the data rows correspond to the training points. By default: None
csv_dataset (string, optional) – Path to csv_file with data for training where columns correspond to the domain and the rows correspond to the training points. Note that the first row should exactly match the variable names of the domain and the second row should only have “DATA” as entry. By default: None
model_name (string, optional) – Name of the model that is used for saving model parameters. Should be unique. By default: “dataset_emulator_model_name”
regressor_type (string, optional) – Type of the regressor that is used within the emulator (available: “BNN”). By default: “BNN”
cat_to_descr (Boolean, optional) – If True, transform categorical variable to one or more continuous variable(s) corresponding to the descriptors of the categorical variable (else do nothing). By default: False

Examples

>>> test_domain = ReizmanSuzukiEmulator().domain
>>> e = ExperimentalEmulator(domain=test_domain, model_name="Pytest")
No trained model for Pytest. Train this model with ExperimentalEmulator.train() in order to use this Emulator as an virtual Experiment.
>>> columns = [v.name for v in e.domain.variables]
>>> train_values = {("catalyst", "DATA"): ["P1-L2", "P1-L7", "P1-L3", "P1-L3"], ("t_res", "DATA"): [60, 120, 110, 250], ("temperature", "DATA"): [110, 30, 70, 80], ("catalyst_loading", "DATA"): [0.508, 0.6, 1.4, 1.3], ("yield", "DATA"): [20, 40, 60, 34], ("ton", "DATA"): [33, 34, 21, 22]}
>>> train_dataset = DataSet(train_values, columns=columns)
>>> e.train(train_dataset, verbose=False, cv_fold=2, test_size=0.25)
>>> columns = [v.name for v in e.domain.variables]
>>> values = [float(v.bounds[0] + 0.6 * (v.bounds[1] - v.bounds[0])) if v.variable_type == 'continuous' else v.levels[-1] for v in e.domain.variables]
>>> values = np.array(values)
>>> values = np.atleast_2d(values)
>>> conditions = DataSet(values, columns=columns)
>>> results = e.run_experiments(conditions)