Strategies

Summit has several machine learning strategies available for optimisation, as well as some more naive ones.

All strategies have a similar API. They are instantiated by passing in a Domain. New reaction conditions are requested using the suggest_experiments method which, potentially, takes results from previous reactions.

Bayesian Optimisation

Bayesian optimisation (BO) is an efficient way to optimise a wide variety of functions, inculding chemical reactions. In BO, you begin by specifying some prior beliefs about your functions. In many cases, we start with an assumption that we know very little. Then, we create a probabilistic model that incorporates this prior belief and some data (i.e, reactions at different conditions), called a posterior. In reaction optimisation, this model will predict the value of an objective (e.g., yield) at particular reaction conditions. One key factor is that these models are probabalistic, so they do not give precise predictions but instead a distribution that is sampled.

With the updated model, we use one of two classes of techniques to select our next experiments. Some BO strategies optimise an acquisition function, which is a function that takes in the model parameters and some suggested next experiement and predicts the quality of that experiment. Alternatively, a deterministic function can be sampled from the model, which is then optimised.

_images/acquistion_function.png

Illustration of how acquisition functions eanble BO strategies to reduce uncertainty and maximise objective simulataneously. Dotted line is actual objective and solid line is posterior of surrogate model. Acquisition function is high where objective to be optimal (exploration) and where there is high uncertainty (exploitation). Adapted from Shahriari et al.

To learn more about BO, we suggest reading the review by Shahriari et al.

The BO strategies available in Summit are:

TSEMO

class summit.strategies.tsemo.TSEMO(domain, transform=None, **kwargs)[source]

Thompson-Sampling for Efficient Multiobjective Optimization (TSEMO)

TSEMO is a multiobjective Bayesian optimisation strategy. It is designed to find optimal values in as few iterations as possible. This comes at the price of higher computational time.

Parameters
  • domain (Domain) – The domain of the optimization

  • transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.

  • kernel (Kern, optional) – A GPy kernel class (not instantiated). Must be Exponential, Matern32, Matern52 or RBF. Default Exponential.

  • n_spectral_points (int, optional) – Number of spectral points used in spectral sampling. Default is 1500. Note that the Matlab TSEMO version uses 4000 which will improve accuracy but significantly slow down optimisation speed.

  • n_retries (int, optional) – Number of retries to use for spectral sampling iF the singular value decomposition fails. Retrying chooses a new Monte Carlo sampling which usually fixes the problem. Defualt is 10.

  • generations (int, optional) – Number of generations used in the internal optimisation with NSGAII. Default is 100.

  • pop_size (int, optional) – Population size used in the internal optimisation with NSGAII. Default is 100.

Examples

>>> from summit.domain import *
>>> from summit.strategies import TSEMO
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> columns = [v.name for v in domain.variables]
>>> values = {("temperature", "DATA"): 60,("flowrate_a", "DATA"): 0.5,("flowrate_b", "DATA"): 0.5,("yield_", "DATA"): 50,("de", "DATA"): 90}
>>> previous_results = DataSet([values], columns=columns)
>>> strategy = TSEMO(domain)
>>> result = strategy.suggest_experiments(5)

Notes

TSEMO trains a gaussian process (GP) to model each objective. Internally, we use GPy for GPs, and we accept any kernel in the Matérn family, including the exponential and squared exponential kernel. See [Rasmussen] for more information about GPs.

A deterministic function is sampled from each of the trained GPs. We use spectral sampling available in pyrff. These sampled functions are optimised using NSGAII (via pymoo) to find a selection of potential conditions. Each of these conditions are evaluated using the hypervolume improvement (HVI) criterion, and the one(s) that offer the best HVI are suggested as the next experiments. More details about TSEMO can be found in the original paper [Bradford].

References

Rasmussen

C. E. Rasmussen et al. Gaussian Processes for Machine Learning, MIT Press, 2006.

Bradford

E. Bradford et al. “Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm.” J. Glob. Optim., 2018, 71, 407–438.

classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset TSEMO state

suggest_experiments(num_experiments, prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using TSEMO

Parameters
  • num_experiments (int) – The number of experiments (i.e., samples) to generate

  • prev_res (DataSet, optional) – Dataset with data from previous experiments. If no data is passed, then latin hypercube sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

to_dict()[source]

Convert strategy to a dictionary

GRYFFIN

class summit.strategies.gryffin.GRYFFIN(domain, transform=None, use_descriptors=True, auto_desc_gen=False, sampling_strategies=4, batches=1, logging=- 1, parallel=True, boosted=True, sampler='uniform', softness=0.001, continuous_optimizer='adam', categorical_optimizer='naive', discrete_optimizer='naive', **kwargs)[source]

Gryffin is a single objective Bayesian optimisation strategy.

It is designed to work well with mixed domains (i.e., categorical and continuous variables).

Parameters
  • domain (Domain) – The Summit domain describing the optimization problem.

  • transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.

  • use_descriptors (bool, optional) – Whether descriptors of categorical variables are used. If not,auto_desc_gen must be True when categorical variables are used. Default is True.

  • auto_desc_gen (bool, optional) – Whether Dynamic Gryffin is used if descriptors are provided. Gryffin applies automatic descriptor generation, hence transforms the given descriptors with a non-linear transformation to new descriptors (more “meaningful” or higher-correlated ones). Defaults to False (i.e., Static Gryffin with originally given descriptors is used).

  • sampling_strategies (int, optional) – Number of sampling strategies (similar to sampling of GPs). One factor (next to batches) for the number of suggested new points in one optimization step. Total number of suggested points: sampling_strategies x batches. Defaults to 4.

  • batches (int, optional) – Number of suggested points within one sampling strategy. One factor (next to sampling_strategies) for the number of suggested new points in one optimization step. Total number of suggested points: sampling_strategies x batches. Defaults to 1.

  • logging (-1, optional) – Corresponds to the verbosity level of logging of Gryffin. See the Notes for potential logging levels. Defaults to -1

  • parallel (Boolean, optional) – Run optimisation in parallel. Default True.

  • boosted (Boolean, optional) – Whether “pseudo-boosting” is applied See the original paper in references below for more details.

  • sampler (string, optional) – A priori distribution of categorical variables. By default: ‘uniform’

  • softness (float, optional) – Softness of Chimera. By default: 0.001

  • continuous_optimizer (string, optional) – Optimizer type for continuous variables (available: “adam”). By default: ‘adam’

  • categorical_optimizer (string, optional) – Optimizer type for categorical variables (available: “naive”). By default: naive

  • discrete_optimizer (string, optional) – Optimizer type for discrete variables ((available: “naive”). By default: naive

xbest

Best point from all iterations.

Type

internal state

fbest

Objective value at best point from all iterations.

Type

internal state

param

A list containing all evaluated X and corresponding Y values.

Type

internal state

Examples

>>> from summit.domain import *
>>> from summit.strategies import GRYFFIN
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name="temperature", description="reaction temperature in celsius", bounds=[50, 100])
>>> domain += CategoricalVariable(name="flowrate_a", description="flow of reactant a in mL/min", levels=[1,2,3,4,5])
>>> base_df = DataSet([[1,2,3],[2,3,4],[8,8,8]], index = ["solv1","solv2","solv3"], columns=["MP","mol_weight","area"])
>>> domain += CategoricalVariable(name="solvent", description="solvent type - categorical", descriptors=base_df)
>>> domain += ContinuousVariable(name="yield", description="yield of reaction", bounds=[0,100], is_objective=True)
>>> strategy = GRYFFIN(domain, auto_desc_gen=True)
>>> next_experiments = strategy.suggest_experiments()

Notes

verbosity_levels: * -1= ‘’ * 0= [‘INFO’, ‘FATAL’] * 1= [‘INFO’, ‘ERROR’, ‘FATAL’] * 2= [‘INFO’, ‘WARNING’, ‘ERROR’, ‘FATAL’] * 3= [‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘FATAL’]

Gryffin was created by the Aspuru-Guzik group. See the paper by [Hase] or the Github repository.

References

Hase

Häse, F., Roch, L.M. and Aspuru-Guzik, A., 2020. Gryffin: An algorithm for Bayesian optimization for categorical variables informed by physical intuition with applications to chemistry. arXiv preprint arXiv:2003.12127.

classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset the internal parameters

suggest_experiments(prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using Gryffin optimization strategy

Parameters

prev_res (DataSet, optional) – Dataset with data from previous experiments of previous iteration. If no data is passed, then random sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

to_dict()[source]

Convert strategy to a dictionary

SOBO

class summit.strategies.sobo.SOBO(domain, transform=None, gp_model_type=None, acquisition_type=None, optimizer_type=None, evaluator_type=None, **kwargs)[source]

Single-objective Bayesian Optimization (SOBO)

This is a general BO method since it is a wrapper around GPyOpt.

Parameters
  • domain (Domain) – The Summit domain describing the optimization problem.

  • transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.

  • gp_model_type (string, optional) – The GPy Gaussian Process model type. See notes for options. By default, gaussian processes with the Matern 5.2 kernel will be used.

  • use_descriptors (bool, optional) – Whether to use descriptors of categorical variables. Defaults to False.

  • acquisition_type (string, optional) – The acquisition function type from GPyOpt. See notes for options. By default, Excpected Improvement (EI).

  • optimizer_type (string, optional) – The internal optimizer used in GPyOpt for maximization of the acquisition function. By default, lfbgs will be used.

  • evaluator_type (string, optional) – The evaluator type used for batch mode (how multiple points are chosen in one iteration). By default, thompson sampling will be used.

  • kernel (kern, optional) – The kernel used in the GP. By default a Matern 5.2 kernel (GPy object) will be used.

  • exact_feval (boolean, optional) – Whether the function evaluations are exact (True) or noisy (False). By default: False.

  • ARD (boolean, optional) – Whether automatic relevance determination should be applied (True). By default: True.

  • standardize_outputs (boolean, optional) – Whether the outputs should be standardized (True). By default: True.

Examples

>>> from summit.domain import *
>>> from summit.strategies import SOBO
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += CategoricalVariable(name='flowrate_a', description='flow of reactant a in mL/min', levels=[1,2,3,4,5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='yield', description='yield of reaction', bounds=[0,100], is_objective=True)
>>> strategy = SOBO(domain)
>>> next_experiments = strategy.suggest_experiments(5)

Notes

Gaussian Process (GP) model

GP: standard Gaussian Process

GP_MCMC: Gaussian Process with prior in hyperparameters

sparseGP: sparse Gaussian Process

warpedGP: warped Gaussian Process

InputWarpedGP: input warped Gaussian Process

RF: random forest (scikit-learn)

Acquisition function type

EI: expected improvement

EI_MCMC: integrated expected improvement (requires GP_MCMC model) (https://dash.harvard.edu/bitstream/handle/1/11708816/snoek-bayesopt-nips-2012.pdf?sequence%3D1)

LCB: lower confidence bound

LCB_MCMC: integrated GP-Lower confidence bound (requires GP_MCMC model)

MPI: maximum probability of improvement

MPI_MCMC: maximum probability of improvement (requires GP_MCMC model)

LP: local penalization

ES: entropy search

This implementation uses the python package GPyOpt provided by the Machine Learning Group of the University of Sheffield.

Github repository: https://github.com/SheffieldML/GPyOpt

classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset the internal parameters

suggest_experiments(num_experiments=1, prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using GPyOpt single-objective Bayesian Optimization

Parameters
  • num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.

  • prev_res (DataSet, optional) – Dataset with data from previous experiments of previous iteration. If no data is passed, then random sampling will be used to suggest an initial design.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

to_dict()[source]

Convert strategy to a dictionary

Reinforcement Learning

Reinforcement learning (RL) is distinct because it focuses on creating a custom policy for a particular problem instead of a model of the problem. In the case of reaction optimisation, the policy directly predicts the next experiment(s) should be given a history of past experiments. Policies are trained to maximise some sort of reward, such as achieving the maximum number of yield in as few experiments possible.

For more information about RL, see the book by Sutton and Barto or David Silver’s course.

class summit.strategies.deep_reaction_optimizer.DRO(domain: summit.domain.Domain, transform: summit.strategies.base.Transform = None, pretrained_model_config_path=None, model_size='standard', **kwargs)[source]

Deep Reaction Optimizer (DRO)

The DRO relies on a pretrained RL policy that can predict a next set of experiments given a set of past experiments. We suggest reading the notes below before using the DRO.

Parameters
  • domain (Domain) – A summit domain object

  • transform (Transform, optional) – A transform class (i.e, not the object itself). By default no transformation will be done the input variables or objectives.

  • pretrained_model_config_path (string, optional) – Path to the config file of a pretrained DRO model (note that the number of inputs parameters should match the domain inputs) By default: a pretrained model will be used.

  • model_size (string, optional) – Whether the model (policy) has the same size as originally published by the developers of DRO (“standard”), or whether the model is bigger w.r.t. number of pretraining epochs, LSTM hidden size, unroll_length (“bigger”). Note that the pretraining can increase exponentially when changing these hyperparameters and the number of input variables, the number of epochs the each bigger model was trained can be found in the “checkpoint” file in the respective save directory. By default: “standard” (these models were all pretrained for 50 epochs)

xbest, internal state

Best point from all iterations.

fbest, internal state

Objective value at best point from all iterations.

param, internal state

A dict containing: state of LSTM of DRO, last requested point, xbest, fbest, number of iterations (corresponding to the unroll length of the LSTM)

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import DRO
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = DRO(domain)

Notes

The DRO requires Tensorflow version 1, while all other parts of Summit use Tensorflow version 2. Therefore, we have created a Docker container for running DRO which has TFv1 installed. We also have an option in the pip package to install TFv1.

However, if you simply want to analyse results from a DRO run (i.e., use from_dict), then you will not get a tensorflow import error.

We have pretrained policies for domains with up to six continuous decision variables

For applying the DRO it is necessary to define reasonable bounds of the objective variable, e.g., yield in [0, 1], since the DRO normalizes the objective function values to be between 0 and 1.

The DRO is based on the paper in ACS Central Science by [Zhou].

References

Zhou

Z. Zhou et al., ACS Cent. Sci., 2017, 3, 1337–1344. DOI: 10.1021/acscentsci.7b00492

classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset internal parameters

suggest_experiments(prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using the Deep Reaction Optimizer

Parameters
  • num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.

  • prev_res (DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the DRO optimization algorithm will be initialized and suggest initial experiments.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Notes

to_dict()[source]

Convert hyperparameters and internal state to a dictionary

Simplex

class summit.strategies.neldermead.NelderMead(domain: summit.domain.Domain, transform: summit.strategies.base.Transform = None, **kwargs)[source]

Nelder-Mead Simplex

A reimplementation of the Nelder-Mead Simplex method adapted for sequential calls. This includes adaptions in terms of reflecting points, dimension reduction and dimension recovery proposed by Cortes-Borda et al.

Parameters
  • domain (Domain) – The domain of the optimization

  • transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.

  • random_start (bool, optional) – Whether to start at a random point or the value specified by x_start

  • adaptive (bool, optional) – Adapt algorithm parameters to dimensionality of problem. Useful for high-dimensional minimization. Default is False.

  • x_start (array_like of shape (1, N), optional) – Initial center point of simplex Default: empty list that will initialize generation of x_start as geoemetrical center point of bounds Note that x_start is ignored when initial call of suggest_exp contains prev_res and/or prev_param

  • dx (float, optional) – Parameter for stopping criterion: two points are considered to be different if they differ by at least dx(i) in at least one coordinate i. Default is 1E-5.

  • df (float, optional) – Parameter for stopping criterion: two function values are considered to be different if they differ by at least df. Default is 1E-5.

Notes

This is inspired by the work by [Cortés-Borda]. Implementation partly follows the Nelder-Mead Simplex implementation in scipy-optimize

After the initialisation, the number of suggested experiments depends on the internal state of Nelder Mead. Usually the algorithm requests 1 point per iteration, e.g., a reflection. In some cases it requests more than 1 point, e.g., for shrinking the simplex.

References

Cortés-Borda

Cortés-Borda, D.; Kutonova, K. V.; Jamet, C.; Trusova, M. E.; Zammattio, F.; Truchet, C.; Rodriguez-Zubiri, M.; Felpin, F.-X. Optimizing the Heck–Matsuda Reaction in Flow with a Constraint-Adapted Direct Search Algorithm. Organic ProcessResearch & Development 2016,20, 1979–1987

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import NelderMead
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[0, 1])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0, 1])
>>> domain += ContinuousVariable(name='yield', description='relative conversion to xyz', bounds=[0,100], is_objective=True, maximize=True)
>>> strategy = NelderMead(domain)
>>> next_experiments  = strategy.suggest_experiments()
>>> print(next_experiments)
NAME temperature flowrate_a             strategy
TYPE        DATA       DATA             METADATA
0          0.500      0.500  Nelder-Mead Simplex
1          0.625      0.500  Nelder-Mead Simplex
2          0.500      0.625  Nelder-Mead Simplex
classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset internal parameters

round(x, bounds, dx)[source]

function x = round(x, bounds, dx)

A point x is projected into the interior of [u, v] and x[i] is rounded to the nearest integer multiple of dx[i].

Input: x vector of length n bounds matrix of length nx2 such that bounds[:,0] < bounds[:,1] dx float

Output: x projected and rounded version of x

suggest_experiments(prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using Nelder-Mead Simplex method

Parameters

prev_res (summit.utils.data.DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the Nelder-Mead optimization algorithm will be initialized and suggest initial experiments.

Returns

next_experiments – A Dataset object with the suggested experiments by Nelder-Mead Simplex algorithm

Return type

DataSet

Notes

After the initialisation, the number of suggested experiments depends on the internal state of Nelder Mead. Usually the algorithm requests 1 point per iteration, e.g., a reflection. In some cases it requests more than 1 point, e.g., for shrinking the simplex. Thus, there is no num_experiments keyword argument.

to_dict()[source]

Convert strategy to a dictionary

Random

Random

class summit.strategies.random.Random(domain: summit.domain.Domain, transform: summit.strategies.base.Transform = None, random_state: numpy.random.mtrand.RandomState = None, **kwargs)[source]

Random strategy for experiment suggestion

Parameters
  • domain (summit.domain.Domain) – A summit domain object

  • random_state (np.random.RandomState`) – A random state object to seed the random generator

domain

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import Random
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = Random(domain, random_state=np.random.RandomState(3))
>>> strategy.suggest_experiments(5)
NAME temperature flowrate_a flowrate_b strategy
TYPE        DATA       DATA       DATA METADATA
0      77.539895   0.458517   0.111950   Random
1      85.407391   0.150234   0.282733   Random
2      64.545237   0.182897   0.359658   Random
3      75.541380   0.120587   0.211395   Random
4      94.647348   0.276324   0.370502   Random

Notes

Descriptors variables are selected randomly as if they were discrete variables instead of sampling evenly in the continuous space.

suggest_experiments(num_experiments: int, **kwargs)summit.utils.dataset.DataSet[source]

Suggest experiments for a random experimental design

Parameters

num_experiments (int) – The number of experiments (i.e., samples) to generate

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Latin Hypercube Sampling

class summit.strategies.random.LHS(domain: summit.domain.Domain, transform: summit.strategies.base.Transform = None, random_state: numpy.random.mtrand.RandomState = None)[source]

Latin hypercube sampling (LHS) strategy for experiment suggestion

LHS samples evenly throughout the continuous part of the domain, which can result in better data for model training.

Parameters
  • domain (summit.domain.Domain) – A summit domain object

  • random_state (np.random.RandomState`) – A random state object to seed the random generator

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import Random
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> strategy = LHS(domain, random_state=np.random.RandomState(3))
>>> strategy.suggest_experiments(5)
NAME temperature flowrate_a flowrate_b strategy
TYPE        DATA       DATA       DATA METADATA
0           95.0       0.46       0.38      LHS
1           65.0       0.14       0.14      LHS
2           55.0       0.22       0.30      LHS
3           85.0       0.30       0.46      LHS
4           75.0       0.38       0.22      LHS

Notes

LHS was first introduced by [McKay] and coworkers in 1979. We rely on the implementation from pyDoE2.

Our version randomly selects a categorical variable if no descriptors are available. If descriptors are available it samples in the continuous space and then chooses the closest point by Euclidean distance.

References

McKay

R.J. Beckman et al., Technometrics, 1979, 21, 239–245.

suggest_experiments(num_experiments, criterion='center', exclude=[], **kwargs)summit.utils.dataset.DataSet[source]

Generate latin hypercube intial design

Parameters
  • num_experiments (int) – The number of experiments (i.e., samples) to generate

  • criterion (str, optional) – The criterion used for the LHS. Allowable values are “center” or “c”, “maximin” or “m”, “centermaximin” or “cm”, and “correlation” or “corr”. Default is center.

  • exclude (array like, optional) – List of variable names that should be excluded from the design. Default is None.

Returns

next_experiments – A Dataset object with the suggested experiments

Return type

DataSet

Other

SNOBFIT

class summit.strategies.snobfit.SNOBFIT(domain: summit.domain.Domain, **kwargs)[source]

Stable Noisy Optimization by Branch and Fit (SNOBFIT)

SNOBFIT is designed to quickly optimise noisy functions.

Parameters
  • domain (Domain) – The domain of the optimization

  • transform (Transform, optional) – A transform object. By default no transformation will be done on the input variables or objectives.

  • probability_p (float, optional) – The probability p that a point of class 4 is generated, i.e., higher p leads to more exploration. Default is 0.5.

  • dx_dim (float, optional) – only used for the definition of a new problem: two points are considered to be different if they differ by at least dx(i) in at least one coordinate i. Default is 1E-5.

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import SNOBFIT
>>> from summit.utils.dataset import DataSet
>>> import pandas as pd
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[0, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0, 1])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.9])
>>> domain += ContinuousVariable(name='yield', description='relative conversion to xyz', bounds=[0,100], is_objective=True, maximize=True)
>>> d = {'temperature': [50,40,70,30], 'flowrate_a': [0.6,0.3,0.2,0.1], 'flowrate_b': [0.1,0.3,0.2,0.1], 'yield': [0.7,0.6,0.3,0.1]}
>>> df = pd.DataFrame(data=d)
>>> initial = DataSet.from_df(df)
>>> strategy = SNOBFIT(domain)
>>> next_experiments = strategy.suggest_experiments(5, initial)

Notes

SNOBFIT was created by [Huyer] et al. This implementation is based on the python reimplementation [SQSnobFit] of the original MATLAB code by [Neumaier].

Note that SNOBFIT sometimes returns more experiments than requested when the number of experiments request is small (i.e., 1 or 2). This seems to be a general issue with the algorithm instead of the specific implementation used here.

References

Huyer

W. Huyer et al., ACM Trans. Math. Softw., 2008, 35, 1–25. DOI: 10.1145/1377612.1377613.

SQSnobFit

Lavrijsen, W. SQSnobFit https://pypi.org/project/SQSnobFit/

Neumaier

https://www.mat.univie.ac.at/~neum/software/snobfit/

classmethod from_dict(d)[source]

Create a strategy from a dictionary

reset()[source]

Reset internal parameters

suggest_experiments(num_experiments=1, prev_res: summit.utils.dataset.DataSet = None, **kwargs)[source]

Suggest experiments using the SNOBFIT method

Parameters
  • num_experiments (int, optional) – The number of experiments (i.e., samples) to generate. Default is 1.

  • prev_res (summit.utils.data.DataSet, optional) – Dataset with data from previous experiments. If no data is passed, the SNOBFIT optimization algorithm will be initialized and suggest initial experiments.

Returns

next_experiments – A Dataset object with the suggested experiments by SNOBFIT algorithm

Return type

DataSet

to_dict()[source]

Convert hyperparameters and internal state to a dictionary

Full Factorial

class summit.strategies.factorial_doe.FullFactorial(domain: summit.domain.Domain, transform: summit.strategies.base.Transform = None, **kwargs)[source]

Full factorial DoE Strategy for full factorial design of experiments in all decision variables.

Parameters

domain (Domain) – The Summit domain describing the optimization problem.

Examples

>>> from summit.domain import Domain, ContinuousVariable
>>> from summit.strategies import FullFactorial
>>> import numpy as np
>>> domain = Domain()
>>> domain += ContinuousVariable(name='temperature', description='reaction temperature in celsius', bounds=[50, 100])
>>> domain += ContinuousVariable(name='flowrate_a', description='flow of reactant a in mL/min', bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name='flowrate_b', description='flow of reactant b in mL/min', bounds=[0.1, 0.5])
>>> levels = dict(temperature=[50,100], flowrate_a=[0.1,0.5], flowrate_b=[0.1,0.5])
>>> strategy = FullFactorial(domain)
>>> strategy.suggest_experiments(levels)
NAME temperature flowrate_a flowrate_b       strategy
TYPE        DATA       DATA       DATA       METADATA
0           50.0        0.1        0.1  FullFactorial
1          100.0        0.1        0.1  FullFactorial
2           50.0        0.5        0.1  FullFactorial
3          100.0        0.5        0.1  FullFactorial
4           50.0        0.1        0.5  FullFactorial
5          100.0        0.1        0.5  FullFactorial
6           50.0        0.5        0.5  FullFactorial
7          100.0        0.5        0.5  FullFactorial

Notes

We rely on the implementation from pyDoE2.

suggest_experiments(levels_dict, **kwargs)summit.utils.dataset.DataSet[source]

Suggest experiments for a full factorial experimental design

Parameters

levels_dict (dict) – A dictionary with the number of levels for each variable. Keys are the variable names and values are arrays with the values of each level.

Returns

A Dataset object with the random design

Return type

ds