Creating New Benchmarks

Here we give a demonstration of how to train a new benchmark based on experimental data. We call these type of benchmarks ExperimentalEmulator. As an example, we are going to create a benchmark for the Suzuki-Miyaura Cross-Coupling reaction in Reizman et al. (2016).

Google Colab

If you would like to follow along with this tutorial, you can open it in Google Colab using the button below.

Open in colab

You will need to run the following cell to make sure Summit and all its dependencies are installed. If prompted, restart the runtime.

[ ]:
!pip install summit

Create the domain

Let’s first import the needed parts of Summit.

[1]:
from summit.benchmarks import ExperimentalEmulator
from summit.domain import *
from summit.utils.dataset import DataSet
import pkg_resources
import pathlib
import pprint

We first need to create a domain. A domain contains all the decision variables, constraints and objectives for a benchmark.

[2]:
domain = Domain()

Above, we instantiate a new domain without any variables. Here, we are going to manipulate the catalyst, base, catalyst loading, base stoichiometry and temperature. Our objectives are to maximise yield and minimise turn over number (TON). We can use the increment operator += to add variables to the domain.

[3]:
# Decision variables
des_1 = "Catalyst type - different ligands"
domain += CategoricalVariable(
    name="catalyst",
    description=des_1,
    levels=[
        "P1-L1",
        "P2-L1",
        "P1-L2",
        "P1-L3",
        "P1-L4",
        "P1-L5",
        "P1-L6",
        "P1-L7",
    ],
)

des_2 = "Residence time in seconds (s)"
domain += ContinuousVariable(name="t_res", description=des_2, bounds=[60, 600])

des_3 = "Reactor temperature in degrees Celsius (ºC)"
domain += ContinuousVariable(
    name="temperature", description=des_3, bounds=[30, 110]
)

des_4 = "Catalyst loading in mol%"
domain += ContinuousVariable(
    name="catalyst_loading", description=des_4, bounds=[0.5, 2.5]
)

# Objectives
des_5 = (
    "Turnover number - moles product generated divided by moles catalyst used"
)
domain += ContinuousVariable(
    name="ton",
    description=des_5,
    bounds=[0, 200],  # TODO: not sure about bounds, maybe redefine
    is_objective=True,
    maximize=True,
)

des_6 = "Yield"
domain += ContinuousVariable(
    name="yield",
    description=des_6,
    bounds=[0, 100],
    is_objective=True,
    maximize=True,
)

domain
[3]:
NameTypeDescriptionValues
catalystcategorical, inputCatalyst type - different ligands8 levels
t_rescontinuous, inputResidence time in seconds (s)[60,600]
temperaturecontinuous, inputReactor temperature in degrees Celsius (ºC)[30,110]
catalyst_loadingcontinuous, inputCatalyst loading in mol%[0.5,2.5]
toncontinuous, maximize objectiveTurnover number - moles product generated divided by moles catalyst used[0,200]
yieldcontinuous, maximize objectiveYield[0,100]

Create the Experimental Emulator

Now we just need two lines of code to train the experimental emulator! We first instantiate ExperimentalEmulator passing in the domain and a name for the model. Next we train it on this dataset with two-fold cross-validation and a test set size of 25%. Make sure to replace the csv_dataset keyword argument with the path to your csv file. When you run this code, you will see the outputs from the training loop.

Here, we import the data that we already have in the Summit package, but you could use your own data. Change verbose to 1 if you want streaming updates of the training.

[4]:
DATA_PATH = pathlib.Path(pkg_resources.resource_filename("summit", "benchmarks/data"))
ds = DataSet.read_csv(DATA_PATH / "reizman_suzuki_case_1.csv",)
emul = ExperimentalEmulator(model_name='my_reizman', domain=domain, dataset=ds)
res = emul.train(max_epochs=100, cv_fold=2, test_size=0.25, verbose=0)

Now that the interal model is trained, we can use the experimental emulator. I print out the domain again to remind us of the variables

[5]:
domain
[5]:
NameTypeDescriptionValues
catalystcategorical, inputCatalyst type - different ligands8 levels
t_rescontinuous, inputResidence time in seconds (s)[60,600]
temperaturecontinuous, inputReactor temperature in degrees Celsius (ºC)[30,110]
catalyst_loadingcontinuous, inputCatalyst loading in mol%[0.5,2.5]
toncontinuous, maximize objectiveTurnover number - moles product generated divided by moles catalyst used[0,200]
yieldcontinuous, maximize objectiveYield[0,100]
[6]:
conditions = [["P1-L1", 60, 100, 1.0]]
conditions = DataSet(conditions, columns=[v.name for v in domain.input_variables])
emul.run_experiments(conditions)
[6]:
catalyst t_res temperature catalyst_loading ton yield computation_t experiment_t strategy
0 P1-L1 60 100 1.0 23.364954 33.13002 0.0 0.058378 NaN

Now we have a benchmark that can accept conditions and predict the yield and TON!

Experimental Emulator API

class summit.benchmarks.experimental_emulator.ExperimentalEmulator(model_name, domain, **kwargs)[source]

Experimental Emulator

Train a machine learning model based on experimental data. The model acts a benchmark for testing optimisation strategies.

Parameters
  • model_name (str) – Name of the model, ideally with no spaces

  • domain (Domain) – The domain of the emulator

  • dataset (Dataset, optional) – Dataset used for training/validation

  • regressor (:classs:`torch.nn.Module`, optional) – Pytorch LightningModule class. Defaults to the ANNRegressor

  • output_variable_names (str or list, optional) – The names of the variables that should be trained by the predictor. Defaults to all objectives in the domain.

  • clip (bool or list) – Whether to clip predictions to the limits of the objectives in the domain. True (default) means clipping is activated for all outputs and False means it is not activated at all. A list of specific outputs to clip can also be passed.