Origin web site Origin Developer's Guide
1.0.2
Running Origin Table of Contents Developing for Origin

The Origin Distributed Grid Models

The Origin distributed grid models use the Frontier® Grid Platform to distribute the work of an evolutionary run across the machines of a Frontier Grid. An evolutionary run is broken into units of work called tasks and these tasks are sent to a Frontier Grid Server for scheduling. The server forwards tasks to host computers running the Frontier Grid Engine for execution. When a task completes, its results are returned to the Origin application on the local machine. A properly-written Origin application can run without modification either conventionally on a local machine or as a distributed application on a Frontier Grid.

The grid is best suited to evolutionary problems that have some combination of:

Because of the latency inherent in distributing tasks over the grid an evolutionary run with a small population may run faster locally. Since changing between conventional and grid models usually requires only modifying a few parameter settings you can take advantage of this to test your evolutionary model by running Origin locally, then scale up to a much larger population running on the Frontier grid.

Origin supports two grid models: master/slave, where individual fitness evaluations are distributed over the grid, and Opportunistic Evolution (OE), which combines remote generational evolution with an asynchronous steady-state local model. The master/slave model can be used with almost any generational evolutionary model and is the simplest grid model to work with - a properly written generational model will produce identical results in master/slave mode as when running locally. The OE model makes optimal use of the grid and is well suited for very large populations or long evolutionary runs.

The Master/Slave Model

In the Origin Master/Slave model individual fitness evaluations are distributed over the grid, while selection and breeding is performed locally. To increase performance multiple individuals are sent to each Frontier task, but the individuals' fitness values are computed independently. The computed fitness values are returned to the Origin application and selection and breeding proceeds as usual, followed by launching another set of Frontier tasks to compute the next generation's fitness.

You can change most Origin generational evolutionary programs to a master/slave model with the following parameter file:

parent.0 = original-parameter-file
parent.1 = ${origin.parameter.dir}/com/parabon/ec/eval/master.origin.params
This loads the original program parameters and then the Origin master/slave model parameters. Any parameter setting specific to the master/slave model should also go in this file. You can then switch between master/slave and local mode by specifying the new parameter file instead of original-parameter-file on the Origin command line.

Because of the fixed overhead of distributing tasks over the Frontier grid and contention for compute resources with other grid users each task instance should be sent enough individuals to require at least several minutes to evaluate them. However on the other hand reducing the individuals sent to each task and thus increasing the number of tasks used to evaluate each generation will distribute evaluations over a greater number of grid machines. The optimum balance between the amount of work per task and the number of tasks is usually determined empirically based on earlier evolutionary runs for this problem.

Origin provides two parameters to control the number of individuals sent to each task: job-size (or eval.masterproblem.max-jobs-per-slave) and eval.masterproblem.max-data-per-slave. job-size sets the maximum number of individuals ("jobs" in ECJ terminology) per task. This should be based on the average time required to compute an individual's fitness- reduce this value as fitness evaluation times increase. eval.masterproblem.max-data-per-slave is best suited to GP problems where individuals grow in size over generations. As individuals get larger and take longer to evaluate Origin will reduce the number of individuals sent to each task. If grid tasks are failing due to "out of memory" exceptions decrease this value.

The Opportunistic Evolution Model

Opportunistic Evolution (OE) combines remote generational evolution with a local steady-state evolutionary model. OE distributes groups of individuals to remote tasks, where these subpopulations are evolved for a specified number of generations. The remote tasks return their final individuals to the local Origin application, which merges them into its population, then selects and breeds a new subset of individuals to be evolved by a remote task.

Unlike the Master/Slave model, where all remote tasks must complete before the next generation is evalated, OE is asynchronous - as each task returns the OE application merges that task's individuals into the population, breeds new individuals, and launches another task.

To change an existing steady-state evolutionary model to an Opportunistic Evolution model use a parameter file like this:

parent.0 = original-parameter-file
parent.1 = ${origin.parameter.dir}/ec/steadystate/steadystate.origin.params
generations = local-evaluations
slave.generations = remote-generations
slave.generations is the number of generations each remote task should evolve, while generations is the number of times the population size is returned from remote tasks- e.g. if generations is 10 and the population size is 50,000 the Origin application would complete after 500,000 individuals were returned from remote evolution.

OE also supports the same parameters to control the number of individuals sent to each task as Master/Slave and the standard steady-state evolutionary model parameters.

Origin Grid Parameters

These are the Origin parameters that control the master/slave and opportunistic evolution grid models:

Grid Parameters
Parameter Description Default
eval.masterproblem Enables Master/Slave or OE models; must be set to com.parabon.ec.eval.FrontierMasterProblem. None
job-size Maximum number of individuals sent to a task. 1024 for Master/Slave, 256 for OE
eval.masterproblem.max-jobs-per-slave Maximum number of individuals sent to a task. ${job-size}
eval.masterproblem.max-data-per-slave Maximum size of serialized individual data sent to a task. 1000000 bytes
exch.max-active-slaves Maximum number of concurrently executing slave tasks. 1000
slave.generations For OE, the number of generations to evolve each remote population. 10
select.tournament.size For OE using the default tournament selection, the number of individuals to evalate in the tournament. Increasing this parameter tends to make remote evolution more elitist, which can cause premature convergence. 3
steady.deselector.0.size For OE using the default tournament deselector, the number of individuals to evaluate in the tournament. Usually you want this to be greater than select.tournament.size. 7
exch.job-name Frontier grid job name. ${user.name}_origin
exch.keep-job Don't delete Frontier job at end of evolutionary run. False