The Frontier API > The Task Runtime API

  Documentation Home 

The Task Runtime API

The Task Runtime API is the portion of the Frontier API used to create tasks to run on a provider node. While running on a provider node, tasks are restricted to a Java sandbox with a security policy that denies access to such items as the network, the disk, and the graphical display and other I/O. The only communication tasks are permitted to make with the outside world is via the Task Runtime API. Through this API, task execution is initiated, task parameters are set, data elements are accessed, status and results are reported, and checkpoints are logged.

This section describes the role the Task Runtime API serves in the execution of a task, outlines its various components and interfaces, and explains how to write tasks that use the API correctly and efficiently to take full advantage of the Frontier platform. This includes what the format and contents of task parameters, results, and checkpoints are, as well as how to use the Task and TaskContext interfaces to access and manipulate them.

 

The Task Lifecycle

Tasks undergo a relatively complex lifecycle that can be described from two perspectives: that of the Frontier user and that of the provider. This section discusses these viewpoints.

Frontier User Perspective

A Frontier user initiates tasks via the Client API (which is described in more detail in The Client API section). Tasks are submitted to the Frontier server for execution, after which client applications can be used to observe status and obtain the results of tasks. In the meantime, however, several activities occur behind the scenes.

After a task specification and all elements required to run that task are submitted to the Frontier server, the task enters the queue for scheduling on one or more provider boxes. Once scheduled, the task specification and required elements are sent to the target provider. Subsequently—generally when the provider machine becomes idle—actual task execution occurs.

During the time that the task resides on the provider node, the Frontier compute engine periodically sends status reports, which the server routes back to the client sessions until the task is complete. However, the Frontier server may schedule the task redundantly on several nodes or 'migrate' it from one node to another. This may be done, for example, when a faster node is needed to meet deadlines or when one that has more RAM must be used to overcome resource limitations reported by the original node. To accomplish this, the server obtains a checkpoint from the node currently executing the task and sends that checkpoint as a task specification to another node. All of this is invisible to the client application, which merely observes status. When a task is completed, its final results persist until the user explicitly removes the task.

Provider Perspective

Task execution on a provider node consists of several phases. At any given time, the state of a non-running task is given entirely by its specification. Initially, this is exactly equal to the original specification that the Frontier server provided. When a task is to be executed, its primary class (as given in the specification), which must implement the com.parabon.runtime.Task interface, will be instantiated and given a 'context' object (implements the com.parabon.runtime.TaskContext interface) through which to communicate. Next, parameters are set on the task object via mutator methods named according to conventions described in The Task Interface. Finally, the task's run() method is called.

The run() method is executed either until complete, until an exception is thrown, or until the engine requests that the task stop. While running, a task may request data elements named in the original parameter list from the task's context. When convenient, a task may also report status and log checkpoints, which are dealt with or ignored at the engine's discretion except for the last status reported, which is always saved and reported back to the server. Compute engines save some checkpoints by replacing the task's specification with the contents of the checkpoint.

If the run() method exits normally, the last reported status is tagged as the final result of the task and sent back to the server. If the engine must cease task execution, it may simply terminate execution or first call the stop() method on the task instance. The latter method gives the task a short amount of time during which it may optionally report a last-minute status and/or log one last checkpoint, and then exit the run() method by throwing a com.parabon.runtime.TaskStoppedException.

After stopping a task, an engine may later attempt to restart it from something close to its state before termination by repeating the entire process of task execution, using the last reported checkpoint as the task specification rather than that originally sent from the server.

 

The Task Interface

A task's primary class, which is instantiated when the task is to be run, must implement the com.parabon.runtime.Task interface. The explicit portions of this interface include the run() and stop() methods. Other important, implicit portions of this interface, however, come into play even before the run() and stop() methods. These include the constructor and parameter mutator methods.

Instantiation

The constructor requirements are straightforward. A public constructor that takes an instance of com.parabon.runtime.TaskContext as its single argument must exist. This context should be saved, as it is the primary means of communication with the outside world and is used to access data elements, report status, and log checkpoints.

Parameters

After a task is instantiated, its parameters are set via carefully named mutator methods, a mechanism similar to that used by, for example, JavaBeans™. Client applications can specify arbitrarily named parameters, each of which is associated with a value representing one of a set of predefined types of 'parameter values' (detailed below). Each parameter value type has a set of mappings to more primitive types.

For each parameter given for a task with a name xxx and type Y, a public accessor should exist in the primary task class with a name setXxx that takes a single argument of type Z, where Z is one of the types to which Y maps. For each parameter, such a mutator will be searched for, the corresponding value mapped to the type expected by the mutator, and the mutator invoked with the resulting mapped value. If several mutators fitting the template exist, the mutator having an argument type with the best mapping is picked. Mappings from type Y to type Z include, in order from best to worst, Y, Y's type-specific primitive type mappings (given below), and finally ParameterValue. Note that ownership of the argument to a mutator is not transferred to the task, and no guarantee is made as to its integrity after the mutator completes. That is, mutators must make copies of parameter arguments and not try to keep references to the actual values passed in. The exceptions, of course, are immutable primitive types (e.g., int or String).

Let's consider an example in which a parameter is "numIterations" with a value of type 'integer' and a corresponding mutator method in the primary task class with the signature "public void setNumIterations(short x)". The engine would map the value of the "numIterations" to a short primitive and invoke this method with the resulting value as an argument. If a method with the same name that took an argument of type int or java.lang.Long existed, the engine would try to call that method instead. On the other hand, a similar method that took an argument of type byte or com.parabon.common.ParameterValue would not be used, since a mapping to short takes precedence over these types for integer parameter values.

The parameter value types, the primary Java types used to represent them, and their primitive type mappings include the following:

Running and Stopping

Once the primary task class has been instantiated and all of its parameter values set via mutator methods, the task will be started via the run() method. This method is guaranteed to be called at most once for any given instance of the task class. Subsequent task invocations via checkpoints occur using new instances of the task class.

While run() is executing, the stop() method may be called to request that a task stop 'gracefully'. This gives the task a chance to quickly post one last status and/or log one last checkpoint, after which the run() method should throw a com.parabon.runtime.TaskStoppedException. the Frontier compute engine may also decide to stop the task more abruptly, either without calling stop() or, if the task does not exit gracefully, within some arbitrary timeout after stop() is called.

The run() method can exit one of four ways:

 

The Task Context

When a task object is instantiated, its constructor is passed an instance of com.parabon.runtime.TaskContext. This object is used for communication with the outside world. Specifically, it can be used to obtain data elements, post results, report progress, and log checkpoints.

Accessing Data Elements

Data elements required during the running of a task can be obtained via a request to the context's getDataElement() method, giving the ID of the requested data element. As DataElementID instances can only be obtained as task parameters, required data elements must be specified somewhere in a task's parameters. Note that although it is possible to obtain IDs through other means (i.e., directly implementing com.parabon.common.DataElementID within a task), a task should never attempt to do so. Frontier does not guarantee that data elements identified through such means will be available to a running task.

A requested data element is returned in the form of a com.parabon.runtime.DataElement object. This object can then be queried to obtain a 'black-box' InputStream as well as other properties of the data element, which may or may not be specified (for instance, total data length). The actual data can then be read from the obtained stream. Each time the getStream() method is called on a data element, a new stream is returned, which will return data starting from the beginning of the data element. Both TaskContext.getDataElement() and DataElement.getStream() can be called as many times as desired over the course of a task's execution, at the possible price of additional resource usage.

Posting Results and Reporting Status

Whenever convenient, a task can post results via its context object. Each new result posted replaces any and all previously reported results. Thus, at any given time, the 'current results' of a task are those last reported by the task. Earlier reported results may be thrown away and possibly never reported back to the server or the client application. The last results posted are considered to be the final results of a task.

Results themselves are the primary component of status and are represented by a set of name <-> value pairs. This is similar to the format of task parameters, encapsulated within a single com.parabon.common.NamedParameterMap instance owned by the task. When results are posted, this structure is copied and possibly persisted and/or reported back to the server, and from there, possibly to the client application. To put it simply, some reported intermediate results may make it back to the client application, but in general only the transmission of the final result should be relied upon.

The frequency at which results should be posted depends strongly on both the nature of the task being run and the size of the results. An upper limit on reporting frequency can generally be set naturally, for instance, at the beginning of a central outer loop of a task when results are well defined and can easily be gathered. However, actual frequency should generally be much less than the upper bound. Posting results too often will slow down the running of a task, as some overhead related to the size of the results is incurred on every report. On the other hand, posting results too infrequently generally reduces the amount of status information available to a client application session during the course of task execution. In general, reasonably large results should be reported only once, at the end of a task, as the bandwidth cost incurred in transferring large intermediate results is generally greater than any benefit gained. Instead, intermediate results should generally be only a brief summary.

Logging Checkpoints

Checkpoints consist of modifications to a task specification in the form of a new set of parameters and optionally a new 'primary class'. Each checkpoint, when logged, replaces the relevant portions of either the previous checkpoint or the original task specification. In particular, checkpoint parameters do not augment previous parameter sets, but replace them in their entirety. Similar to results, checkpoint parameter sets are represented by a single com.parabon.common.NamedParameterMap instance owned by the task.

Similar to status, checkpoints should be reported whenever a task is in a sufficiently coherent state that its state can be summarized into a checkpoint and reported. The frequency tradeoff for checkpoints is somewhat more subtle, but even more important than that of status frequency. Although checkpoints tend not to be reported back to the server on a regular basis and so can be somewhat larger than results, they do require disk space on a provider machine and the size-related overhead incurred in logging. More frequent checkpoints result in more overhead—computational time lost with no benefit if a checkpoint is never needed for a restart. However, less frequent checkpoints result in more computation lost whenever a task is terminated and restarted.

In an extreme case, imagine a provider whose machine is only idle for thirty-minute spans and a task that only checkpoints every hour: this task will never make any progress. Such pathological cases may be somewhat rare, but nonetheless, a task that only reports checkpoints every hour will tend to lose a half-hour's worth of computation on average at every termination and restart. This problem can be overcome for many cases by a task that checkpoints when a stop is requested, but such functionality tends to be arbitrarily difficult to implement for many tasks. More significantly, it is not guaranteed that such an opportunity will always—or ever—be made available to a task, and so should not be relied upon.

Reporting Progress

Progress is a single piece of data that represents the amount of computation a given task has completed thus far and can be reported arbitrarily often by a task. The only restrictions are that it be a positive, nondecreasing (ideally strictly increasing) scalar over the lifetime of a task. That is, each time progress is reported, the value given should be no less than—and ideally greater than—the value given at the previous progress report, if any. Note that the lifetime of a task can span several checkpoint/restart instances of a task. That is, any progress reported after restarting from a checkpoint must be no less than that associated with that checkpoint when it was created by a previous instance of the same task. A task does not have to report progress, but it can aid in efficient scheduling and other task management issues and can provide a useful hint to the client application as well.

The current progress is defined as the value of the most recently reported progress and exists for a task at any point after it is first reported. The current progress is associated with each task status and checkpoint. Progress can be reported by itself or in conjunction with a result or checkpoint, which is equivalent to reporting progress by itself followed directly by posting a result or logging a checkpoint without an associated progress.

As progress is much cheaper to report to the engine than results or checkpoints, it makes sense to report it as often as is convenient, without the efficiency considerations involved in reporting results of logging checkpoints. Most progress reports will not be propagated back to the server, however, but merely saved internally by the engine runtime.