Below I will describe the essential components of the Gaussian Process (GP) Model, designed by Dr. Pritam Ranjan at Acadia University. You will quickly begin to see why efficient and robust optimization of the Model’s processes (particularly through the maximum likelihood approach) is so crucial to the performance and accuracy of the GP model.

The GP model requires as input a set of design points, , and the corresponding simulator outputs, , where , n is the number of user supplied design points, and d is the dimension of the problem. We denote as the set of n design points, and as the vector that holds the corresponding simulator output. The simulator output is thus modelled as

where is the overall mean, and is a collection of random variables, which have a joint normal distribution, and thus is a Gaussian Process. For example, in such a case where the overall mean, , then the random variables, , will represent the true function value at the location . The GP is therefore a generalization of the Gaussian probability distribution, and is fully defined by its expected value, , variance , and covariance function .

Here, R is the spatial correlation matrix, which defines the degree of dependency between any two design points, , based on their observed simulator value. There are many choices for the correlation matrix R, however we will use the squared exponential correlation function provided by Macdonald et al.

where is an vector of scaling parameters. Determining the optimal scaling parameter is crucial for ensuring the integrity of the GP model fit. Originally, Dr. Ranjan proposed the use of the scaling parameter in place of , which confines the parameter space to . However, they noticed that optimal values would often be position at the boundary, , making the optimization process challenging. We therefore proposed a re-parameterization of the -space, by letting , such that \cite{P2}. Such a re-parameterization is preferred as it smoothes the parameter space, such that it is centered about , and ensures that potential solutions do not lie on a boundary.

The GP model’s ability to predict the simulator output for an unknown point is directly related to the underlying properties of the correlation matrix R. Prediction error at any point is relative to the distance between that point and nearby design points. Therefore, the GP model will predict accurately when the distance between and nearby design points is small, whereas the prediction error will be larger for predictor points that are far away from the design points. The hyper-parameter can therefore be interpreted as a measure of the sensitivity of to the spatial distribution for all and

The model uses the closed form estimators of the mean , and variance, , given by

Here is an vector of all ones. Calculation of the mean and variance estimators, along with the repeated update of the inverse and determinant of the correlation matrix, , and respectively, allow us to derive the profile log-likelihood

The GP model will provide as output the simulator estimate, , for a desired predictor point . Through the maximum likelihood approach, the best function estimate for a given point is

$

where , and the correlation between and . The GP model will also output the uncertainty in this estimate, , as measured by the mean squared error (MSE),

INTRODUCING A NUGGET PARAMETER

Typically, a GP model attempts to find the best fit interpolant of the user supplied design points and its corresponding simulator outputs. The maximum likelihood approach enables the user to then estimate the corresponding simulator values for any number of predictor points. Algorithmic searching for the optimal parameter requires a repeated update of the correlation matrix R for each potentially optimal . As shown above, the profile likelihood requires accurate computation of both the inverse and determinant of R. However, if any pair of design points are excessively close in space, then the matrix R can become near-singular, resulting in unstable computation of the inverse and determinant of R, which can lead to an unreliable model fit.

To overcome this instability, we follow a popular technique which adds a small “nugget” parameter, to the model fitting procedure, which in turn replaces R with . The addition of the nugget parameter smoothes the model predictor and therefore the GP does not produce an exact interpolant of our design points. To avoid unnecessary over-smoothing of the model predictions a lower bound on the nugget parameter is introduced, and is given by

Here, the condition number is given by , where are the eigenvalues of R, and is the desired threshold for the $. We define an nXn matrix R to be near-singular if its condition number exceeds the threshold value , where denotes the matrix L2-norm. A value of a=25 is recommended for a space-filling Latin hypercube designs (LHDs) that is used in determining the design points. In terms of optimization, it is important to note that is a function of both the design points and the optimization parameter .

## Leave a Reply