# A Quick Analysis of Heuristic Optimization by Stochastic Genetic Algorithms and Particle Swarm

I have been inspired by Varadi’s latest series of posts in http://cssanalytics.wordpress.com regarding Heuristic optimization techniques (particularly Genetic Algorithms, GAs, and Particle Swarm Optimization, PSO) to perform a quick analysis comparing said techniques. I highly recommend you read the http://cssanalytics.wordpress.com blog posts from Sept. 3-6, 2013, for a detailed explanation of GAs and PSO before viewing my analysis.

A Quick Analysis of Heuristic Optimization by Stochastic Genetic Algorithms and Particle Swarm
We investigate the optimization performance of both Genetic Algorithms (GAs) and Particle Swarm Optimization (PSO). Particularly we are interested in comparing the efficiency and accuracy of these techniques in optimizing a variety of well known test functions. The test functions used are outlined below.
We consider three classes of GAs:

1. Matlab’s Built in Traditional GA (MatlabGA).
3.  A simplistic Micro-GA (mGA) as outlined by Varadi in http://cssanalytics.wordpress.com/.

A Brief Review of Traditional GAs
Recall, like most evolutionary algorithms, a GA is a stochastic sampling method that repeatedly refines a population or set of candidate solutions. The method is based on the Darwinian notion of “Survival of the Fittest” in which only the elite solutions, (those that return the most optimal objective function values) are kept, while sub-optimal solutions are discarded.

A typical GA begins by randomly generating an initial population candidate solutions, called the parents. The initial population of each generation is then enlarged through production of a combination of  cross-over children and/or mutation children. Cross-over children are derived by mixing and matching individual components of the parents, in what Varadi describes as Mating. Formally, at each component $i$ of the child vector, the default crossover function randomly selects a value, $x_i$, from one of the two parents and assigns it to the $i^{th}$ component of the cross-over child. Mutation children are created by randomly perturbing the parent solutions, often via a Gaussian distribution.

Next we evaluate the corresponding objective function value at each of the potential solution in the current population. Only a fraction of the individuals in the current population that provide the most optimal objective function value will continue onward to form the parent population of the next generation.

The mGA, on the other hand, is a simplified version of the traditional GA and involves a Tournament phase for determining which parent solutions are worthy to mate. Those parents that are worthy to mate produce cross-over children, as described above and in http://cssanalytics.wordpress.com/.

For consistency amongst the varying GAs we have kept each population size and maximum number of generations fixed at 100 and 100d, respectively, where d is the number of input dimensions to the test function. Convergence is established either by achieving the maximum number of generations or if the average of the relative difference between a generation’s solution and best solution found over the most recent 50 generations is less than a tolerance, $tol=10^{-6}$.

A Brief Review of PSO

PSO is a gradient free, stochastic optimization algorithm. Like GAs and mGAs, PSO begins by first generating an initial set of candidate solutions or particles, collectively called the swarm. At each iteration, individual particles of the swarm navigate the search-space based on each particles personal best position and the overall best position found by the swarm. Therefore, previous locations that minimize the objective function, $f$, act as attractors to which the swarm will migrate towards. The position of each particle, and the swarm alike, is defined in terms of its previous position, $x$, and the velocity, $v$, of the particle(or swarm). The velocity (or momentum factor) describes a particles tendency to move in a certain direction. The position and velocity are defined as:
$x_{n+1}=x_n+v_n$ and
$v_{n+1}=v_n+r_1(x_p-x_n) +r_2(x_s-x_n)$

where $r_1$ and $r_2$ are random numbers on the interval $(0,1)$, $x_p$ denotes the best previous position of that particle, and $x_s$ denotes the best previous position of the swarm.

For optimization of each test function I have chosen to use a swarm size of 20. For a fair comparison with the GAs, the stopping criteria for PSO is that the average relative difference of the best solution found over the most recent 50 iterations is less than a tolerance, $tol=10^{-6}$.

Lastly, note that since both the GA and PSO are stochastic optimization routines, the results presented below have each been averaged over 100 simulations.

Numerical Results
We analyze three quantities:

1. Accuracy: Average Percent Error = $\frac{|f^*-f_{opt}|}{|f_{opt}|}\times 100\%$, where $f_{opt}$ is the true, known, global optimum value.
2. Efficiency: Average number of Function Evaluations (FEs) required for convergence.
3. General Performance Assessment:} GPA= FEs $\times (\%Err)^2,$ in which accuracy holds twice the weight as efficiency and smaller GPA values indicate optimal performance.

Simulation results for each test function can be found in Table 1. From Table 1 we can draw the following conclusions:

• In general MatlabGA outperforms both my own traditional GA and mGA, with the exception of optimizing the 2-D Hump Function and 6-D Hartmann function. This is not surprising given the level of sophistication of Matlab’s built in optimization algorithms as compared to the 1-day coding blitz of my GAs.
• The mGA is indeed the most efficient of the GAs, as measured by the number of FEs required to establish convergence. Of course improved efficiency comes at the cost of decreased accuracy – ultimately the user’s optimization goals will play a huge factor in algorithm selection.
• As expected, PSO outperforms all GAs for objective functions that are strictly convex, with minimal noise and a single global optimum.
• As Varadi suggests, PSO (or a hybrid PSO method) may be the preferred technique for several portfolio optimization problems.
• Objective surfaces such as the Schwefel Function and the Rastrigin Function are noisy and multi-modal, containing several local optima. For global optimization of both these functions, MatlabGA proves to outperform PSO.

For full test function details see:
http://www-optima.amp.i.kyoto-u.ac.jp/member/student/hedar/Hedar_files/TestGO.htm

Analysis of GAs and PSO

Posted in Optimization Algorithms

# Hybridization of DIRECT with Local Optimization Techniques

The hybridization of DIRECT with IF or BFGS operates by first allowing DIRECT to globally explore the search space. As described in my previous post, a large number of function evaluations would be required in order for DIRECT to find the global optimum to our desired level of accuracy. However, DIRECT converges rapidly to locations that are in the neighbourhood of the global optimum, providing a good starting position to execute either implicit filter or BFGS.

Based on experimentation, a budget of 200 times the dimension of the problem, function evaluations enables DIRECT to efficiently determine a reasonable starting position to begin IF or BFGS. Starting from the termination position of DIRECT, we then allow either IF or BFGS to fine-tune DIRECT’s global search, in what is called the precision phase. Assuming that the global optimum is contained within the boundaries of the feasible region, IF should rapidly converge to a highly accurate global optimum.

It is, however, possible for the position of the global optimum to be located outside of the domain provided to IF. In this case, IF is strictly confined to this region and will not be able to converge to this position. In theory, this could have a negative effect on the model fitting process. We therefore provide a second hybridization option by replacing IF with the unbounded optimization algorithm, BFGS, which is able to explore outside of the bound constraints provided to IF.

Posted in Optimization Algorithms

# DIviding RECTangle = DIRECT; A Partitioning Algorithm

DIRECT, which is shorthand for Dividing Rectangles, is a derivative-free, sampling optimization algorithm. More specifically, DIRECT is a partitioning algorithm that samples points in a given domain and then refines the search domain at each iteration based on the function values at the sampled points. Formally, DIRECT is a global optimization algorithm, that is, given enough function evaluations and a highly sensitive stopping criteria, DIRECT will always find the global optimum of a function within a given domain. Of course, this exhaustive search can be computationally expensive and thus impractical for optimization of $\mathcal{L_\beta}$. Therefore, we will use DIRECT to find a sub-region within the feasible region that contains the global optimum, while simultaneously reducing the chance of searching in regions that contain local optima.

The first step of DIRECT is to transform the user-supplied domain of the problem into a d-dimensional unit hypercube. Therefore, the feasible region as described by Finkel is
$\Omega = \{x \ \epsilon \ R^d: 0 \leq x_i \leq 1\}.$
The objective function is first evaluated at the center of this hypercube, $c_1$. DIRECT then divides the hypercube into smaller hyper-rectangles by evaluating the objective function at one-third of the distance, $\delta$, from the center in all coordinate directions ,
$c_1 \pm \delta e_i, \ i=1,...,d.$
The first division is performed so that the region with the most optimal function value is given the largest space.

Once the first division is complete, the next step is to identify potential optimal hyper-rectangles. Finkel provides the following definition for selecting potential optimal hyper-rectangles.

DEFINITION: Let $\epsilon > 0$ be a positive constant and let $f_{min}$ be the current best function value. A hyper-rectangle j is said to be potentially optimal if there exists $\hat{K} >0$ such that}
$f(c_j)-\hat{K}d_j \leq f(c_i) -\hat{d_i}, \ \forall \ i, \ \text{and}$
$f(c_j)-\hat{K}d_j \leq f_{min}- \epsilon |f_{min}|.$

Here, $c_j$ is the center of hyper-rectangle j and $d_j$ is the distance from the center to the corresponding vertices. In general, a value of $\epsilon = 10^{-4}$ is used to ensure that $f(c_j)$ exceeds $f_{min}$ by some non-trivial amount.

From the definition, we observe that a hyper-rectangle i is potentially optimal if:
1. $f(c_i) \leq f(c_j)$ for all hyper-rectangles of equal size, $d_i=d_j$ .
2. $d_i \geq d_k$ for all other hyper-rectangles k, and $f(c_i) \leq f(c_j)$ for all hyper-rectangles of equal size, $d_i=d_j$.
3. $d_i \leq d_k$ for all other hyper-rectangles k, and $f(c_i) = f_{min}$

Once a hyper-rectangle is selected as being potentially optimal, DIRECT begins dividing the hyper-rectangle into smaller hyper-rectangles. First, the objective function is evaluated at points $c \pm \delta e_i$ for all longest dimensions i. Of course, if the hyper-rectangle is a hypercube, then the objective function is evaluated in all coordinate directions $e_i$, which is identical to the initial sampling step. We then divide the hyper-rectangle, with order determined by the objective function values returned during the sampling phase.

The three Figures below provides a 2-dimensional visualization of how DIRECT samples and divides its search space. The bold regions are the potentially optimal rectangles, which are to be divided. The first iteration, (a), shows that the right-most rectangle is potentially optimal, and is therefore sampled and divided along its vertical dimension. The second iteration, (b), shows that both the left-most rectangle, and the right-most, central rectangles are potential optimal. Here we apply the second condition for selecting potential optimal rectangles, in which the longest side of the left-most rectangle is greater than the longest side of all other rectangles, and therefore by default, the function value at the center of the left-most rectangle is less than the function value of all other rectangles of equal size. Again, we sample and divide the rectangles according to their dimension. This process of selecting potentially optimal rectangles, sampling and dividing is once again repeated in the third iteration, (c) and continues for subsequent iterations until the stopping criteria is met.

After the initial sampling phase, the rectangle on the far right is potentially optimal, and is therefore divided into thirds along its longest side.

The central rectangle on the far right is potentially optimal. This rectangle is a cube, therefore division is identical to that of the initial sampling phase. The rectangle on the far left is also potentially optimal as it has a side which is longer than the sides of all other rectangles.

he cube in the center of the far right rectangle is potentially optimal and is divided. In this case, the central cube on the bottom row, which has a side length that is greater than or equal to the side length of all other rectangles, is is also potentially optimal and is therefore divided.

Posted in Optimization Algorithms

# Particle Swarm Optimization and General Pattern Search

My post-doc supervisor is using a hybridization of particle swarm with a general pattern search for his oil well placement and parameterization optimization problem. PSO and GPS is described below:

PSO
Particle Swarm Optimization is a gradient free optimization method that uses a stochastic, meta-heuristic approach in finding the optimum value. The algorithm begins by first generating an initial array of candidate solutions, called the swarm. The individual particles of the swarm then move through the search space based on each particles knowledge of where a better solution is located, as well as the swarms knowledge, as a whole, in to where the optimum value is located. Therefore, previous successful solutions act as attractors to which the swarm will migrate towards. The position of each particle, and the swarm alike, is defined in terms of its previous position, $x$, and the velocity, $v$, of the particle (or swarm) which describes a particles tendency to move in a certain direction. We therefore define the position and velocity as: $x_{n+1} = x_{n} + v_{n}$ $v_{n+1} = v_n + c_1 r_1(x^{\*}_p - x_n) + c_2 r_2(x^{\*}_s - x_n)$ where $c_{1,2}$ is a weighting parameter specified by the used, $r_{1,2}$ are random numbers on the interval $(0,1)$ which enforce exploration of the search space, $x^{\*}_p$ denotes the best previous position of that particle, and $x^{\*}_s$ denotes the best previous position of the swarm. Qualitatively, the velocity component of PSO is specified by three parameters:
1. Inertial tendency of the particle to continue moving in it’s current direction.
2. Cognitive: tendency of the particle to move towards that particle’s best found solution.
3. Social: tendency of a particle to move towards the swarm’s best found solution.
In general, PSO handles bound constraint optimization problems, as well as linear and non-linear general constraints.

General Pattern Search
Though we will not be implementing a General Pattern Search (GPS) algorithm, the method is worth mentioning as Humphries’ has incorporated GPS along with PSO as a hybridized optimization technique. GPS begins at a user provided initial point $x_0$. Unless otherwise specified, the first iteration searches with a mesh size of 1 in all n directions, where n is the dimension of the objective function. The objective function is then computed at each of the mesh points, until it finds a value that is less than the value of the current objective function. Once it has done so, the algorithm sets to that point and begins polling again. After a successful poll, the mesh size is multiplied by a factor of 2. If a poll is unsuccessful, the algorithm remains at it’s current position, and polls once again, this time with a mesh size that is 0.5 the former mesh size. GPS can handle both bound and general constraints optimization problems. Humphries’ hybridization technique involves implementing PSO until we encounter an iteration that does not improve our solution. From there we perform one step of GPS, to poll for a better solution. If no better solution is found, we reduce the stencil size and poll again.

Posted in Optimization Algorithms

# BFGS – Gradient Approximation Methods

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is the most commonly used update strategy for implementing a Quasi-Newtown optimization technique. Newton’s method was first derived as a numerical technique for solving for the roots of a nonlinear equation. Newton’s Method solves for the roots of a nonlinear equation by providing a linear approximation to the nonlinear equation at a given guess $x_k$. The linear approximation, as measured by the tangent to the curve at the point $x_k$ is then used to obtain a new guess to the solution, and the process repeats. Formally, the iterative method is expressed as
$x_{k+1}=x_k+\frac{f(x_k)}{f'(x_k)}.$
In terms of optimization, the ultimate goal is to find a stationary point $x^*$ such that $f'(x^*)=0$. Therefore, assuming the objective function is twice differentiable, Newton’s Method uses $f''$ in order calculate the roots of $f'$, and the iterative process becomes
$x_{k+1} = x_k + \frac{f'(x_k)}{f''(x_k)}$

This iterative scheme can be generalized to multiple dimensions by replacing $f'(x)$ and $f''(x)$ with the gradient, $\nabla f(x)$ and the Hessian, $\nabla^2 f(x)$, respectively. The main advantages of Newton’s method is its robustness and rapid local quadratic rate of convergence. Unfortunately, this method requires computation and storage of the Hessian matrix $\nabla^2 f(x)$ and solving the resulting system of equations, which can be costly.

The most rudimentary Quasi-Newton method is the one dimensional secant method, which approximates $f''(x_k)$ with a finite differencing scheme
$f'(x_k) \approx \frac{f'(x_k)-f'(x_{k-1})}{x_k-x_{k-1}}.$ The BFGS quasi-Newton approach can therefore be thought of as a generalization of the secant method.

The general form of a quasi-Newton optimization is, given a point $x_k$, we attempt to solve for the descent direction $p_k$ by approximating the Hessian matrix $\beta_k$ at $x_k$. We can then obtain a search direction $p_k$ by solving
$\beta_k p_k = -\nabla f(x_k) \ \text{for} \ k=0,1,... .$ Like Newton’s Method, we follow along this search direction to obtain a new guess
$x_{k+1}=x_k+\alpha_k p_k,$ where $\alpha_k$ is the step-size, often determined through a scalar optimization or a line search procedure.

The simplest and most cost efficient generalized quasi-Newton method is the method of steepest descent. This method assumes $\beta_k$ is equal to the identity matrix I for all k, and therefore does not require costly computation and storage of $\beta_k$. As a result, the descent direction simplifies to $p_k = -\nabla f(x_k)$, and is that of “most” descent. This simplification, however, decreases the rate of convergence such that the method of steepest descent converges linearly.

For optimization of $\mathcal{L_\beta}$, we use Matlab’s built-in unconstrained optimization algorithm fminunc. fminunc uses a quasi-Newton optimization technique and includes multiple options for tackling a wide variety of optimization problems. For convenience, we will describe fminunc in terms of the options that we have selected for our specific optimization problem.

fminunc uses the BFGS method to update the Hessian matrix $\beta_k$ at each point $x_k$. The BFGS quasi-Newton algorithm can be summarized by the following steps:

1. Specify an initial $\beta_0$ and $x_0$.
2. For k=0,1,2,…
a)Stop if $x_k$ is optimal
b) Solve $\beta_k p_k = -\nabla f(x_k)$ for search direction $p_k$.
c) Use a line search to determine the step-size $\alpha_k$.
d) Update $x_{k+1} = x_k +\alpha_k p_k$.
e) Compute $\beta_{k+1}$ using the BFGS update.

We have chosen to use the medium-scale implementation of fminunc, in which the user must supply an initial value for x0. The user, however, does not need to supply the initial Hessian matrix $\beta_0$ as by default $\beta_0 = I$ under the medium-scale implementation. Therefore, the first iteration of fminunc follows the method of steepest descent. We define $x_k$ as being optimal if $|\nabla f(x_k)| < 10^{-6}$, where $|\nabla f(x_k)|$ is the magnitude of the gradient of the likelihood function, $|\nabla \mathcal{L_\beta}|$. The gradient function, $\nabla f(x_k)$, is derived using a forward finite differencing scheme. If $x_k$ is not optimal, then we proceed to solve for the search direction $p_k$. A mixed quadratic/cubic line search procedure (described below) is used to determine an appropriate step-size $\alpha_k$, which allows us to update to our new position $x_{k+1}$.

The BFGS update formula falls into a broader class of rank two update formulas that maintains both symmetry and positive definiteness (SPD) of the Hessian approximation. Preservation of SPD ensures that the search direction is always a descent direction. The general rank two update formula is given by
$\beta_{k+1} = \beta_k - \frac{(\beta_k s_k)(\beta_k s_k)^T}{s_k^T \beta_k s_k} + \frac{y_k y_k^T}{y_k^Ts_k} + \phi(s_k^T\beta_k s_k)v_k v_k^T,$
where $\phi$ is a scalar and
$s_k = x_{k+1}-x_{k}, \quad y_k = \nabla f(x_{k+1})- \nabla f(x_k), \quad v_k =\frac{ y_k}{y_k^T s_k}-\frac{\beta_k s_k}{s_k^T \beta_k s_k}.$
The BFGS Hessian update is obtained by setting $\phi = 0$, and therefore simplifies to
$\beta_{k+1} = \beta_k - \frac{(\beta_k s_k)(\beta_k s_k)^T}{s_k^T \beta_k s_k} + \frac{y_k y_k^T}{y_k^Ts_k}.$

LINE SEARCH PROCEDURE

A mixed quadratic/cubic line search procedure is used in order to determine a suitable step-size $\alpha_k$. The line search procedure operates by first perturbing the initial point $\lambda_1$, along the line of steepest descent in order to obtain a second point $\lambda_2$. The two points along with the initial gradient approximation are used to obtain a quadratic interpolation. A third point, $\lambda_3$, is obtained through extrapolation of the quadratic interpolant, and brackets the minimum such that $\lambda_1 < \lambda_2 < \lambda_3$. Using each of $\lambda _{i}$, for $i=\{1,2,3\},$ and the first gradient approximation, allows us to obtain a cubic interpolant, whose minimum, $\hat{\lambda}$, estimates the true minimum. We then proceed to bisect the bracket interval using two of $\lambda_{i}$ along with the minimum estimate $\hat{\lambda}$ to obtain a new cubic interpolant, and the process repeats.

Posted in Optimization Algorithms

# Implicit Filtering – Thank You C.T. Kelley

Implicit Filtering (IF) (designed by C.T. Kelley) is a sophisticated, deterministic pattern search method for bound constrained optimization. Like most pattern search algorithms, the evaluation of the objective function at time step i is used to determine the next cluster of points for the following time step i+1. IF, however, uses a combination of sampling, thus exploring the search space by evaluating the objective function at multiple feasible points, and interpolation, which uses a first-order interpolant constructed via linear least squares, and a quasi-Newton Hessian approximation to infer descent directions.

The sampling method is arranged on a stencil, which, given a current iterate $x_c$, will sample the objective function in all coordinate directions via $x_c \pm hv_i$. Here $v_i = (L_i-U_i)e_i$, where $L_i$ and $U_i$ represent the components of the lower and upper bounds of the feasible region, respectively, and $e_i$ is the unit coordinate vector. The scale, h varies as the optimization progresses, according to $\{2^{-n}\}_{n=1}^7$. Therefore, if a sampling phase is unsuccessful in finding a more optimal position than the current incumbent point, the scale, and thus the stencil, is reduced by a factor of 0.5, and the sampling phase is repeated. By default, the optimization will terminate after the scale has been reduced by a factor of $2^{-7}$. It is, however, the quasi-Newton iteration that provides IF with a clear advantage over a general pattern search algorithm, and has shown to improve overall efficiency and ability of global convergence.

In terms of algorithm structure, IF can be broken into two main loops: the outer loop, which controls the pattern search, and the inner loop, which implements the quasi-Newton iteration at the current best position, $x_{min}$, determined at the end of each sampling step. Therefore, IF must be given two main stopping criterion; one for each loop.

The quasi-Newton iteration will terminate when the norm of the difference gradient $||\nabla f(x,v,h)|| \leq \tau h$. Here, $\tau = \xi *\epsilon$, where $\xi$ is a scaling factor that attempts to eliminate discrepancies in the step size of the quasi-Newton iteration, which arise when the values of the objective function are either very small or very large. The default scale value is $\xi= 1.2|f(x_0)|$, where $x_0$ is the user-supplied initial starting point. $\epsilon$ is the tolerance level for terminating the quasi-Newton iteration. Highly sensitive stopping tolerances are required for optimization of $\mathcal{L_\beta}$ as the likelihood function contains many regions that are relatively flat, which can result in a premature termination of the optimization process if the stopping tolerance is too large.

Kelley states that IF works efficiently for functions which are noisy, non-smooth or random. Therefore in theory, IF should be well suited for optimization of our particular objective function, $\mathcal{L_\beta}$. The general thought is that the sampling phase, with a suitable step size, should step over the local minima, whereas the quasi-Newton iteration will establish efficient convergence in regions near the global optimum. For more information on IF’s options and specifications, please refer to Implicit Filtering by C.T. Kelley.

Posted in Optimization Algorithms

# Genetic Algorithm – A class of Evolutionary Algorithms

Originally a Genetic Algorithm (GA) was proposed for optimization of the likelihood function $\mathcal{L_\beta}$. Like most evolutionary algorithms, a GA is a stochastic sampling method that repeatedly refines a population or set of candidate solutions. The method is based on the Darwinian notion of “Survival of the Fittest” in which only the elite solutions, (those that return the most optimal objective function values) are kept, while sub-optimal solutions are discarded.

The GA begins by randomly generating an initial population of candidate solutions, called the parents, that are contained to the feasible region of the objective function. In order to further explore this domain, the initial population of each generation is quadrupled through generation of cross-over children and mutation children. First, additional cross-over children are derived through combining individual components of the vectors of independent variables, which doubles the current population. At each coordinate of the child vector, the default crossover function randomly selects a value, $x_i$, from one of the two parents and assigns it to the cross-over child, providing a mix-and-match solution. Mutation children are then created by randomly perturbing both the parent solutions and cross-over children, via a Gaussian distribution. The final result is a population that is four times the size of the parent population.

Next we evaluate the corresponding $\mathcal{L_\beta}$ value of each of the candidate solutions in the current population in order to create the offspring for the next generation. The 25% of individuals in the current population that provide the most optimal objective function value will continue onward to form the parent population of the next generation.

A GA was originally proposed for optimization of \$$\mathcal{L_\beta}$ in their original GP model fitting procedure for several reasons. First, non-deterministic evolutionary algorithms such as GA are generally robust and do not require the use of gradient information. Moreover, using a large population size and maximum number of generations, which are scaled to the dimension of the problem, will help guarantee accurate convergence to the global optimum. In order to achieve both robustness and a high level of accuracy using a GA, however, requires excessively evaluating $\mathcal{L_\beta}$, which is generally inefficient.

For example, optimizing $\mathcal{L_\beta}$ of a 10-D function requires under this scheme 4000 function evaluations (FE) per generation, for a maximum of 50 generations, which could potentially cost 200,000 FE in total. Assuming we are operating on a single core, then this single optimization process would translate to days of computation time. From this realistic example, the demand for a more efficient optimization technique is evident.

Posted in Optimization Algorithms