Strategies ¶

Summit has several machine learning strategies available for optimisation, as well as some more naive ones.

All strategies have a similar API. They are instantiated by passing in a Domain. New reaction conditions are requested using the suggest_experiments method which, potentially, takes results from previous reactions.

Contents

Strategies
- Bayesian Optimisation
  - TSEMO
  - SOBO
  - MTBO
  - ENTMOOT
- Reinforcement Learning
- Simplex
- Random
  - Random
  - Latin Hypercube Sampling
- Other
  - SNOBFIT
  - Full Factorial

Bayesian Optimisation ¶

Bayesian optimisation (BO) is an efficient way to optimise a wide variety of functions, inculding chemical reactions. In BO, you begin by specifying some prior beliefs about your functions. In many cases, we start with an assumption that we know very little. Then, we create a probabilistic model that incorporates this prior belief and some data (i.e, reactions at different conditions), called a posterior. In reaction optimisation, this model will predict the value of an objective (e.g., yield) at particular reaction conditions. One key factor is that these models are probabalistic, so they do not give precise predictions but instead a distribution that is sampled.

With the updated model, we use one of two classes of techniques to select our next experiments. Some BO strategies optimise an acquisition function, which is a function that takes in the model parameters and some suggested next experiement and predicts the quality of that experiment. Alternatively, a deterministic function can be sampled from the model, which is then optimised.

Illustration of how acquisition functions eanble BO strategies to reduce uncertainty and maximise objective simulataneously. Dotted line is actual objective and solid line is posterior of surrogate model. Acquisition function is high where objective to be optimal (exploration) and where there is high uncertainty (exploitation). Adapted from Shahriari et al.¶

To learn more about BO, we suggest reading the review by Shahriari et al.

The BO strategies available in Summit are:

TSEMO
SOBO
MTBO
ENTMOOT

TSEMO ¶

class summit.strategies.tsemo.TSEMO(domain, transform=None, **kwargs)[source]¶

Thompson-Sampling for Efficient Multiobjective Optimization (TSEMO)

TSEMO is a multiobjective Bayesian optimisation strategy. It is designed to find optimal values in as few iterations as possible. This comes at the price of higher computational time.

Parameters

domain (Domain) – The domain of the optimization
transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.
use_descriptors (bool, optional) – Whether to use descriptors of categorical variables. Defaults to False.
kernel (Kern, optional) – A GPy kernel class (not instantiated). Must be Exponential, Matern32, Matern52 or RBF. Default Exponential.
n_spectral_points (int, optional) – Number of spectral points used in spectral sampling. Default is 1500. Note that the Matlab TSEMO version uses 4000 which will improve accuracy but significantly slow down optimisation speed.
n_retries (int, optional) – Number of retries to use for spectral sampling iF the singular value decomposition fails. Retrying chooses a new Monte Carlo sampling which usually fixes the problem. Defualt is 10.
generations (int, optional) – Number of generations used in the internal optimisation with NSGAII. Default is 100.
pop_size (int, optional) – Population size used in the internal optimisation with NSGAII. Default is 100.

Examples

>>> from summit.domain import *
>>> from summit.strategies import TSEMO
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> columns = [v.name for v in domain.variables]
>>> values = {("temperature", "DATA"): 60,("flowrate_a", "DATA"): 0.5,("flowrate_b", "DATA"): 0.5,("yield_", "DATA"): 50,("de", "DATA"): 90}
>>> previous_results = DataSet([values], columns=columns)
>>> strategy = TSEMO(domain)
>>> result = strategy.suggest_experiments(5)

Notes

TSEMO trains a gaussian process (GP) to model each objective. Internally, we use GPy for GPs, and we accept any kernel in the Matérn family, including the exponential and squared exponential kernel. See [Rasmussen] for more information about GPs.

A deterministic function is sampled from each of the trained GPs. We use spectral sampling available in pyrff. These sampled functions are optimised using NSGAII (via pymoo) to find a selection of potential conditions. Each of these conditions are evaluated using the hypervolume improvement (HVI) criterion, and the one(s) that offer the best HVI are suggested as the next experiments. More details about TSEMO can be found in the original paper [Bradford].

The number of spectral points is the parameter that most affects TSEMO performance. By default, it’s set at 1500, but increase it to around 4000 to get the best performance at the cost of longer computational times. You can change it using the n_spectral_points keyword argument.

The other two parameters are the number of generations and population size used in NSGA-II. Increasing their values can improve performance in some cases.

References

Rasmussen: C. E. Rasmussen et al. Gaussian Processes for Machine Learning, MIT Press, 2006.
Bradford: E. Bradford et al. “Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm.” J. Glob. Optim., 2018, 71, 407–438.

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

reset()[source]¶: Reset TSEMO state

suggest_experiments(num_experiments, prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using TSEMO

Parameters

num_experiments (int) – The number of experiments (i.e., samples) to generate
prev_res (DataSet, optional) – Dataset with data from previous experiments. If no data is passed, then latin hypercube sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments The lengthscales column tells the significance of each variable (assuming automatic relevance detection is turned on, which it is in Botorch). Smaller values mean significant changes in output happen on a smaller change in the input, suggesting a more important input. The variance column scales the output of the posterior of the kernel to the correct scale for your objective The noise column is the constant noise in outputs (e.g., assumed uniform experiment error)

Return type

DataSet

to_dict()[source]¶: Convert strategy to a dictionary

SOBO ¶

class summit.strategies.sobo.SOBO(domain, transform=None, gp_model_type=None, acquisition_type=None, optimizer_type=None, evaluator_type=None, **kwargs)[source]¶

Single-objective Bayesian Optimization (SOBO)

This is a general BO method since it is a wrapper around GPyOpt.

Parameters

domain (Domain) – The Summit domain describing the optimization problem.
transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.
gp_model_type (string, optional) – The GPy Gaussian Process model type. See notes for options. By default, gaussian processes with the Matern 5.2 kernel will be used.
use_descriptors (bool, optional) – Whether to use descriptors of categorical variables. Defaults to False.
acquisition_type (string, optional) – The acquisition function type from GPyOpt. See notes for options. By default, Excpected Improvement (EI).
optimizer_type (string, optional) – The internal optimizer used in GPyOpt for maximization of the acquisition function. By default, lfbgs will be used.
evaluator_type (string, optional) – The evaluator type used for batch mode (how multiple points are chosen in one iteration). By default, thompson sampling will be used.
kernel (kern, optional) – The kernel used in the GP. By default a Matern 5.2 kernel (GPy object) will be used.
exact_feval (boolean, optional) – Whether the function evaluations are exact (True) or noisy (False). By default: False.
ARD (boolean, optional) – Whether automatic relevance determination should be applied (True). By default: True.
standardize_outputs (boolean, optional) – Whether the outputs should be standardized (True). By default: True.

Examples

>>> from summit.domain import *
>>> from summit.strategies import SOBO
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += CategoricalVariable(name='flowrate_a', description='flow of reactant a in mL/min', levels=[1,2,3,4,5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="yld", description='yield of reaction', bounds=[0,100], is_objective=True)
>>> strategy = SOBO(domain)
>>> next_experiments = strategy.suggest_experiments(5)

Notes

Gaussian Process (GP) model

GP: standard Gaussian Process

GP_MCMC: Gaussian Process with prior in hyperparameters

sparseGP: sparse Gaussian Process

warpedGP: warped Gaussian Process

InputWarpedGP: input warped Gaussian Process

RF: random forest (scikit-learn)

Acquisition function type

EI: expected improvement

EI_MCMC: integrated expected improvement (requires GP_MCMC model) (https://dash.harvard.edu/bitstream/handle/1/11708816/snoek-bayesopt-nips-2012.pdf?sequence%3D1)

LCB: lower confidence bound

LCB_MCMC: integrated GP-Lower confidence bound (requires GP_MCMC model)

MPI: maximum probability of improvement

MPI_MCMC: maximum probability of improvement (requires GP_MCMC model)

LP: local penalization

ES: entropy search

This implementation uses the python package GPyOpt provided by the Machine Learning Group of the University of Sheffield.

Github repository: https://github.com/SheffieldML/GPyOpt

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

reset()[source]¶: Reset the internal parameters

suggest_experiments(num_experiments=1, prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using GPyOpt single-objective Bayesian Optimization

Parameters

num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.
prev_res (DataSet, optional) – Dataset with data from previous experiments of previous iteration. If no data is passed, then random sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

to_dict()[source]¶: Convert strategy to a dictionary

MTBO ¶

class summit.strategies.MTBO(domain: summit.domain.Domain, pretraining_data: Optional[summit.utils.dataset.DataSet] = None, transform: Optional[summit.strategies.base.Transform] = None, task: int = 1, categorical_method: str = 'one-hot', **kwargs)[source]¶

Multitask Bayesian Optimisation

This strategy enables pre-training a model with past reaction data in order to enable faster optimisation.

Parameters

domain (Domain) – The domain of the optimization
pretraining_data (DataSet, optional) – A DataSet with pretraining data. Must contain a metadata column named “task” that specfies the task for all data.
transform (Transform, optional) – A transform object. By default, no transformation will be done on the input variables or objectives.
task (int, optional) – The index of the task being optimized. Defaults to 1.
categorical_method (str, optional) – The method for transforming categorical variables. Either “one-hot” or “descriptors”. Descriptors must be included in the categorical variables for the later.

Notes

This strategy is based on a paper from the NIPs2020 ML4Molecules workshop by Felton. See Swersky for more information on multitask Bayesian optimization.

References

Felton

Felton, et al, in ML4 Molecules 2020 workshop.

Swersky

Swersky et al., in NIPS Proceedings, 2013, pp. 2004–2012.

Examples

>>> from summit.benchmarks import MIT_case1, MIT_case2
>>> from summit.strategies import LHS, MTBO
>>> from summit import Runner
>>> # Get pretraining data
>>> exp_pt = MIT_case1(noise_level=1)
>>> lhs = LHS(exp_pt.domain)
>>> conditions = lhs.suggest_experiments(10)
>>> pt_data = exp_pt.run_experiments((conditions))
>>> pt_data[("task", "METADATA")] = 0
>>> # Use MTBO on a new mechanism
>>> exp = MIT_case2(noise_level=1)
>>> strategy = MTBO(exp.domain,pretraining_data=pt_data, categorical_method="one-hot",task=1)
>>> r = Runner(strategy=strategy, experiment=exp, max_iterations=2)
>>> r.run(progress_bar=False)

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

classmethod load(filename)¶: Load a strategy from a JSON file

reset()[source]¶: Reset MTBO state

save(filename)¶: Save a strategy to a JSON file

suggest_experiments(num_experiments, prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using MTBO

Parameters

num_experiments (int) – The number of experiments (i.e., samples) to generate
prev_res (DataSet, optional) – Dataset with data from previous experiments. If no data is passed, then latin hypercube sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Examples

>>> from summit.benchmarks import MIT_case1, MIT_case2
>>> from summit.strategies import LHS, MTBO
>>> from summit import Runner
>>> # Get pretraining data
>>> exp_pt = MIT_case1(noise_level=1)
>>> lhs = LHS(exp_pt.domain)
>>> conditions = lhs.suggest_experiments(10)
>>> pt_data = exp_pt.run_experiments(conditions)
>>> pt_data["task", "METADATA"] = 0
>>> # Use MTBO on a new mechanism
>>> exp = MIT_case2(noise_level=1)
>>> new_conditions = lhs.suggest_experiments(10)
>>> data = exp.run_experiments(new_conditions)
>>> data[("task", "METADATA")] = 1
>>> strategy = MTBO(exp.domain,pretraining_data=pt_data, categorical_method="one-hot",task=1)
>>> res = strategy.suggest_experiments(1, prev_res=data)

to_dict()[source]¶: Convert strategy to a dictionary

ENTMOOT ¶

class summit.strategies.ENTMOOT(domain, transform=None, estimator_type=None, std_estimator_type=None, acquisition_type=None, optimizer_type=None, generator_type=None, initial_points=50, min_child_samples=5, **kwargs)[source]¶

Single-objective Bayesian optimization, using gradient-boosted trees instead of Gaussian processes, via ENTMOOT (ENsemble Tree MOdel Optimization Tool)

This is currently an experimental feature and requires Gurobipy to be installed.

Parameters

domain (Domain) – The Summit domain describing the optimization problem.
transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.
estimator_type (string, optional) – The ENTMOOT base_estimator type. By default, Gradient-Boosted Regression
std_estimator_type (string, optional) – The ENTMOOT std_estimator By default, bounded data distance
acquisition_type (string, optional) – The acquisition function type from ENTMOOT. See notes for options. By default, Lower Confidence Bound.
optimizer_type (string, optional) – The optimizer used in ENTMOOT for maximization of the acquisition function. By default, sampling will be used.
generator_type (string, optional) – The method for generating initial points before a model can be trained. By default, uniform random points will be used.
initial_points (int, optional) – How many points to require before training models
min_child_samples (int, optional) – Minimum size of a leaf in tree models

Examples

>>> from summit.domain import *
>>> from summit.strategies.entmoot import ENTMOOT
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += CategoricalVariable(name='flowrate_a', description='flow of reactant a in mL/min', levels=[1,2,3,4,5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="yld", description='yield of reaction', bounds=[0,100], is_objective=True)
>>> # strategy = ENTMOOT(domain)
>>> # next_experiments = strategy.suggest_experiments(5)

Notes

Estimator type can either by GBRT (Gradient-boosted regression trees) or RF (random forest from scikit-learn).

Acquisition function type can only be LCB (lower confidence bound).

Based on the paper from [Thebelt] et al.

Thebelt: A. Thebelt et al. “ENTMOOT: A Framework for Optimization over Ensemble Tree Models”, ArXiv

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

gurobi_missing¶: Sets an initial points generator. Can be either - “random” for uniform random numbers, - “sobol” for a Sobol sequence, - “halton” for a Halton sequence, - “hammersly” for a Hammersly sequence, - “lhs” for a latin hypercube sequence, - “grid” for a uniform grid sequence

reset()[source]¶: Reset the internal parameters

suggest_experiments(num_experiments=1, prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using ENTMOOT tree-based Bayesian Optimization

Parameters

num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.
prev_res (DataSet, optional) – Dataset with data from previous experiments of previous iteration. If no data is passed, then random sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

to_dict()[source]¶: Convert strategy to a dictionary

Reinforcement Learning ¶

Reinforcement learning (RL) is distinct because it focuses on creating a custom policy for a particular problem instead of a model of the problem. In the case of reaction optimisation, the policy directly predicts the next experiment(s) should be given a history of past experiments. Policies are trained to maximise some sort of reward, such as achieving the maximum number of yield in as few experiments possible.

For more information about RL, see the book by Sutton and Barto or David Silver’s course.

class summit.strategies.deep_reaction_optimizer.DRO(domain: summit.domain.Domain, transform: Optional[summit.strategies.base.Transform] = None, pretrained_model_config_path=None, model_size='standard', **kwargs)[source]¶

Deep Reaction Optimizer (DRO)

The DRO relies on a pretrained RL policy that can predict a next set of experiments given a set of past experiments. We suggest reading the notes below before using the DRO.

Parameters

domain (Domain) – A summit domain object
transform (Transform, optional) – A transform class (i.e, not the object itself). By default no transformation will be done the input variables or objectives.
pretrained_model_config_path (string, optional) – Path to the config file of a pretrained DRO model (note that the number of inputs parameters should match the domain inputs) By default: a pretrained model will be used.
model_size (string, optional) – Whether the model (policy) has the same size as originally published by the developers of DRO (“standard”), or whether the model is bigger w.r.t. number of pretraining epochs, LSTM hidden size, unroll_length (“bigger”). Note that the pretraining can increase exponentially when changing these hyperparameters and the number of input variables, the number of epochs the each bigger model was trained can be found in the “checkpoint” file in the respective save directory. By default: “standard” (these models were all pretrained for 50 epochs)

xbest, internal state: Best point from all iterations.

fbest, internal state: Objective value at best point from all iterations.

param, internal state: A dict containing: state of LSTM of DRO, last requested point, xbest, fbest, number of iterations (corresponding to the unroll length of the LSTM)

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import DRO
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = DRO(domain)

Notes

The DRO requires Tensorflow version 1, while all other parts of Summit use Tensorflow version 2. Therefore, we have created a Docker container for running DRO which has TFv1 installed. We also have an option in the pip package to install TFv1.

However, if you simply want to analyse results from a DRO run (i.e., use from_dict), then you will not get a tensorflow import error.

We have pretrained policies for domains with up to six continuous decision variables

For applying the DRO it is necessary to define reasonable bounds of the objective variable, e.g., yield in [0, 1], since the DRO normalizes the objective function values to be between 0 and 1.

The DRO is based on the paper in ACS Central Science by [Zhou].

References

Zhou: Z. Zhou et al., ACS Cent. Sci., 2017, 3, 1337–1344. DOI: 10.1021/acscentsci.7b00492

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

reset()[source]¶: Reset internal parameters

suggest_experiments(prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using the Deep Reaction Optimizer

Parameters

num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.
prev_res (DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the DRO optimization algorithm will be initialized and suggest initial experiments.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Notes

to_dict()[source]¶: Convert hyperparameters and internal state to a dictionary

Simplex ¶

class summit.strategies.neldermead.NelderMead(domain: summit.domain.Domain, transform: Optional[summit.strategies.base.Transform] = None, **kwargs)[source]¶

Nelder-Mead Simplex

A reimplementation of the Nelder-Mead Simplex method adapted for sequential calls. This includes adaptions in terms of reflecting points, dimension reduction and dimension recovery proposed by Cortes-Borda et al.

Parameters

domain (Domain) – The domain of the optimization
transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.
random_start (bool, optional) – Whether to start at a random point or the value specified by x_start
adaptive (bool, optional) – Adapt algorithm parameters to dimensionality of problem. Useful for high-dimensional minimization. Default is False.
x_start (array_like of shape (1, N), optional) – Initial center point of simplex Default: empty list that will initialize generation of x_start as geoemetrical center point of bounds Note that x_start is ignored when initial call of suggest_exp contains prev_res and/or prev_param
dx (float, optional) – Parameter for stopping criterion: two points are considered to be different if they differ by at least dx(i) in at least one coordinate i. Default is 1E-5.
df (float, optional) – Parameter for stopping criterion: two function values are considered to be different if they differ by at least df. Default is 1E-5.

Notes

This is inspired by the work by [Cortés-Borda]. Implementation partly follows the Nelder-Mead Simplex implementation in scipy-optimize

After the initialisation, the number of suggested experiments depends on the internal state of Nelder Mead. Usually the algorithm requests 1 point per iteration, e.g., a reflection. In some cases it requests more than 1 point, e.g., for shrinking the simplex.

References

Cortés-Borda: Cortés-Borda, D.; Kutonova, K. V.; Jamet, C.; Trusova, M. E.; Zammattio, F.; Truchet, C.; Rodriguez-Zubiri, M.; Felpin, F.-X. Optimizing the Heck–Matsuda Reaction in Flow with a Constraint-Adapted Direct Search Algorithm. Organic ProcessResearch & Development 2016,20, 1979–1987

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import NelderMead
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[0, 1])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0, 1])
>>> domain += ContinuousVariable(name="yld", description='relative conversion to xyz', bounds=[0,100], is_objective=True, maximize=True)
>>> strategy = NelderMead(domain)
>>> next_experiments  = strategy.suggest_experiments()
>>> print(next_experiments)
NAME temperature flowrate_a             strategy
TYPE        DATA       DATA             METADATA
0          0.500      0.500  Nelder-Mead Simplex
1          0.625      0.500  Nelder-Mead Simplex
2          0.500      0.625  Nelder-Mead Simplex

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

reset()[source]¶: Reset internal parameters

round(x, bounds, dx)[source]¶

function x = round(x, bounds, dx)

A point x is projected into the interior of [u, v] and x[i] is rounded to the nearest integer multiple of dx[i].

Input: x vector of length n bounds matrix of length nx2 such that bounds[:,0] < bounds[:,1] dx float

Output: x projected and rounded version of x

suggest_experiments(prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using Nelder-Mead Simplex method

Parameters: prev_res (summit.utils.data.DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the Nelder-Mead optimization algorithm will be initialized and suggest initial experiments.
Returns: next_experiments – A Dataset object with the suggested experiments by Nelder-Mead Simplex algorithm
Return type: DataSet

Notes

After the initialisation, the number of suggested experiments depends on the internal state of Nelder Mead. Usually the algorithm requests 1 point per iteration, e.g., a reflection. In some cases it requests more than 1 point, e.g., for shrinking the simplex. Thus, there is no num_experiments keyword argument.

to_dict()[source]¶: Convert strategy to a dictionary

Random ¶

class summit.strategies.random.Random(domain: summit.domain.Domain, transform: Optional[summit.strategies.base.Transform] = None, random_state: Optional[numpy.random.mtrand.RandomState] = None, **kwargs)[source]¶

Random strategy for experiment suggestion

Parameters

domain (summit.domain.Domain) – A summit domain object
random_state (np.random.RandomState`) – A random state object to seed the random generator

domain¶

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import Random
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = Random(domain, random_state=np.random.RandomState(3))
>>> strategy.suggest_experiments(5)
NAME temperature flowrate_a flowrate_b strategy
TYPE        DATA       DATA       DATA METADATA
0      77.539895   0.458517   0.111950   Random
1      85.407391   0.150234   0.282733   Random
2      64.545237   0.182897   0.359658   Random
3      75.541380   0.120587   0.211395   Random
4      94.647348   0.276324   0.370502   Random

Notes

Descriptors variables are selected randomly as if they were discrete variables instead of sampling evenly in the continuous space.

suggest_experiments(num_experiments: int, **kwargs) → summit.utils.dataset.DataSet [source]¶

Suggest experiments for a random experimental design

Parameters: num_experiments (int) – The number of experiments (i.e., samples) to generate
Returns: next_experiments – A Dataset object with the suggested experiments
Return type: DataSet

Latin Hypercube Sampling ¶

class summit.strategies.random.LHS(domain: summit.domain.Domain, transform: Optional[summit.strategies.base.Transform] = None, random_state: Optional[numpy.random.mtrand.RandomState] = None, categorical_method: Optional[str] = None)[source]¶

Latin hypercube sampling (LHS) strategy for experiment suggestion

LHS samples evenly throughout the continuous part of the domain, which can result in better data for model training.

Parameters

domain (summit.domain.Domain) – A summit domain object
random_state (np.random.RandomState`) – A random state object to seed the random generator
categorical_method (str, optional) – The method for transforming categorical variables. Either “one-hot” or “descriptors”. Descriptors must be included in the categorical variables for the later.

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import Random
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = LHS(domain, random_state=np.random.RandomState(3))
>>> strategy.suggest_experiments(5)
NAME temperature flowrate_a flowrate_b strategy
TYPE        DATA       DATA       DATA METADATA
0           95.0       0.46       0.38      LHS
1           65.0       0.14       0.14      LHS
2           55.0       0.22       0.30      LHS
3           85.0       0.30       0.46      LHS
4           75.0       0.38       0.22      LHS

Notes

LHS was first introduced by [McKay] and coworkers in 1979. We rely on the implementation from pyDoE2.

Our version randomly selects a categorical variable if no descriptors are available. If descriptors are available it samples in the continuous space and then chooses the closest point by Euclidean distance.

References

McKay: R.J. Beckman et al., Technometrics, 1979, 21, 239–245.

suggest_experiments(num_experiments, criterion='center', exclude=[], **kwargs) → summit.utils.dataset.DataSet [source]¶

Generate latin hypercube intial design

Parameters

num_experiments (int) – The number of experiments (i.e., samples) to generate
criterion (str, optional) – The criterion used for the LHS. Allowable values are “center” or “c”, “maximin” or “m”, “centermaximin” or “cm”, and “correlation” or “corr”. Default is center.
exclude (array like, optional) – List of variable names that should be excluded from the design. Default is None.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Other ¶

SNOBFIT ¶

class summit.strategies.snobfit.SNOBFIT(domain: summit.domain.Domain, **kwargs)[source]¶

Stable Noisy Optimization by Branch and Fit (SNOBFIT)

SNOBFIT is designed to quickly optimise noisy functions.

Parameters

domain (Domain) – The domain of the optimization
transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.
probability_p (float, optional) – The probability p that a point of class 4 is generated, i.e., higher p leads to more exploration. Default is 0.5.
dx_dim (float, optional) – only used for the definition of a new problem: two points are considered to be different if they differ by at least dx(i) in at least one coordinate i. Default is 1E-5.

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import SNOBFIT
>>> from summit.utils.dataset import DataSet
>>> import pandas as pd
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[0, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0, 1])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.9])
>>> domain += ContinuousVariable(name="yld", description='relative conversion to xyz', bounds=[0,100], is_objective=True, maximize=True)
>>> d = {'temperature': [50,40,70,30], 'flowrate_a': [0.6,0.3,0.2,0.1], 'flowrate_b': [0.1,0.3,0.2,0.1], 'yld': [0.7,0.6,0.3,0.1]}
>>> df = pd.DataFrame(data=d)
>>> initial = DataSet.from_df(df)
>>> strategy = SNOBFIT(domain)
>>> next_experiments = strategy.suggest_experiments(5, initial)

Notes

SNOBFIT was created by [Huyer] et al. This implementation is based on the python reimplementation [SQSnobFit] of the original MATLAB code by [Neumaier].

Note that SNOBFIT sometimes returns more experiments than requested when the number of experiments request is small (i.e., 1 or 2). This seems to be a general issue with the algorithm instead of the specific implementation used here.

References

Huyer: W. Huyer et al., ACM Trans. Math. Softw., 2008, 35, 1–25. DOI: 10.1145/1377612.1377613.
SQSnobFit: Lavrijsen, W. SQSnobFit https://pypi.org/project/SQSnobFit/
Neumaier: https://www.mat.univie.ac.at/~neum/software/snobfit/

classmethod from_dict(d)[source]¶: Create a strategy from a dictionary

reset()[source]¶: Reset internal parameters

snobfit(x, f, config, dx=None, prev_param=None)[source]¶

The following snobfit code was copied and modified from the SQSnobFit package and was originally published by Wim Lavrijsen. The SQSnobFit package includes a python version of SNOBFIT which was originally published by A. Neumaier.

Copyright of SNOBFIT (v2.1):

Neumaier, University of Vienna

Website: https://www.mat.univie.ac.at/~neum/software/snobfit/

Copyright of SQSnobfit (v0.4.2)

UC Regents, Berkeley

Website: https://pypi.org/project/SQSnobFit/

suggest_experiments(num_experiments=1, prev_res: Optional[summit.utils.dataset.DataSet] = None, **kwargs)[source]¶

Suggest experiments using the SNOBFIT method

Parameters

num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.
prev_res (summit.utils.data.DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the SNOBFIT optimization algorithm will be initialized and suggest initial experiments.

Returns

next_experiments – A Dataset object with the suggested experiments by SNOBFIT algorithm

Return type

DataSet

to_dict()[source]¶: Convert hyperparameters and internal state to a dictionary

Full Factorial ¶

class summit.strategies.factorial_doe.FullFactorial(domain: summit.domain.Domain, transform: Optional[summit.strategies.base.Transform] = None, **kwargs)[source]¶

Full factorial DoE Strategy for full factorial design of experiments in all decision variables.

Parameters: domain (Domain) – The Summit domain describing the optimization problem.

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import FullFactorial
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> levels = dict(temperature=[50,100], flowrate_a=[0.1,0.5], flowrate_b=[0.1,0.5])
>>> strategy = FullFactorial(domain)
>>> strategy.suggest_experiments(levels)
NAME temperature flowrate_a flowrate_b       strategy
TYPE        DATA       DATA       DATA       METADATA
0           50.0        0.1        0.1  FullFactorial
1          100.0        0.1        0.1  FullFactorial
2           50.0        0.5        0.1  FullFactorial
3          100.0        0.5        0.1  FullFactorial
4           50.0        0.1        0.5  FullFactorial
5          100.0        0.1        0.5  FullFactorial
6           50.0        0.5        0.5  FullFactorial
7          100.0        0.5        0.5  FullFactorial

Notes

We rely on the implementation from pyDoE2.

suggest_experiments(levels_dict, **kwargs) → summit.utils.dataset.DataSet [source]¶

Suggest experiments for a full factorial experimental design

Parameters: levels_dict (dict) – A dictionary with the number of levels for each variable. Keys are the variable names and values are arrays with the values of each level.
Returns: A Dataset object with the random design
Return type: ds