Originally a Genetic Algorithm (GA) was proposed for optimization of the likelihood function . Like most evolutionary algorithms, a GA is a stochastic sampling method that repeatedly refines a population or set of candidate solutions. The method is based on the Darwinian notion of “Survival of the Fittest” in which only the elite solutions, (those that return the most optimal objective function values) are kept, while sub-optimal solutions are discarded.
The GA begins by randomly generating an initial population of candidate solutions, called the parents, that are contained to the feasible region of the objective function. In order to further explore this domain, the initial population of each generation is quadrupled through generation of cross-over children and mutation children. First, additional cross-over children are derived through combining individual components of the vectors of independent variables, which doubles the current population. At each coordinate of the child vector, the default crossover function randomly selects a value, , from one of the two parents and assigns it to the cross-over child, providing a mix-and-match solution. Mutation children are then created by randomly perturbing both the parent solutions and cross-over children, via a Gaussian distribution. The final result is a population that is four times the size of the parent population.
Next we evaluate the corresponding value of each of the candidate solutions in the current population in order to create the offspring for the next generation. The 25% of individuals in the current population that provide the most optimal objective function value will continue onward to form the parent population of the next generation.
A GA was originally proposed for optimization of $ in their original GP model fitting procedure for several reasons. First, non-deterministic evolutionary algorithms such as GA are generally robust and do not require the use of gradient information. Moreover, using a large population size and maximum number of generations, which are scaled to the dimension of the problem, will help guarantee accurate convergence to the global optimum. In order to achieve both robustness and a high level of accuracy using a GA, however, requires excessively evaluating , which is generally inefficient.
For example, optimizing of a 10-D function requires under this scheme 4000 function evaluations (FE) per generation, for a maximum of 50 generations, which could potentially cost 200,000 FE in total. Assuming we are operating on a single core, then this single optimization process would translate to days of computation time. From this realistic example, the demand for a more efficient optimization technique is evident.