The Frontier API > The Task Runtime API |
![]()
The Task Runtime API is the portion of the Frontier API used to create tasks to run on a provider node. While running on a provider node, tasks are restricted to a Java sandbox with a security policy that denies access to such items as the network, the disk, and the graphical display and other I/O. The only communication tasks are permitted to make with the outside world is via the Task Runtime API. Through this API, task execution is initiated, task parameters are set, data elements are accessed, status and results are reported, and checkpoints are logged.
This section describes the role the Task Runtime API serves in
the execution of a task, outlines its various components and
interfaces, and explains how to write tasks that use the API
correctly and efficiently to take full advantage of the Frontier
platform. This includes what the format and contents of task
parameters, results, and checkpoints are, as well as how to use the
Task and
TaskContext interfaces to access and
manipulate them.
Tasks undergo a relatively complex lifecycle that can be described from two perspectives: that of the Frontier user and that of the provider. This section discusses these viewpoints.
A Frontier user initiates tasks via the Client API (which is described in more detail in The Client API section). Tasks are submitted to the Frontier server for execution, after which client applications can be used to observe status and obtain the results of tasks. In the meantime, however, several activities occur behind the scenes.
After a task specification and all elements required to run that task are submitted to the Frontier server, the task enters the queue for scheduling on one or more provider boxes. Once scheduled, the task specification and required elements are sent to the target provider. Subsequently—generally when the provider machine becomes idle—actual task execution occurs.
During the time that the task resides on the provider node, the Frontier compute engine periodically sends status reports, which the server routes back to the client sessions until the task is complete. However, the Frontier server may schedule the task redundantly on several nodes or 'migrate' it from one node to another. This may be done, for example, when a faster node is needed to meet deadlines or when one that has more RAM must be used to overcome resource limitations reported by the original node. To accomplish this, the server obtains a checkpoint from the node currently executing the task and sends that checkpoint as a task specification to another node. All of this is invisible to the client application, which merely observes status. When a task is completed, its final results persist until the user explicitly removes the task.
Task execution on a provider node consists of several phases.
At any given time, the state of a non-running task is
given entirely by its specification. Initially, this
is exactly equal to the original specification that the Frontier
server provided. When a task is to be executed, its primary class
(as given in the specification), which must implement the
com.parabon.runtime.Task
interface, will be instantiated and given a 'context'
object (implements the
com.parabon.runtime.TaskContext
interface) through which to communicate. Next,
parameters are set on the task object via mutator
methods named according to conventions described in
The Task Interface. Finally, the task's
run() method is called.
The run() method is executed
either until complete, until an exception is thrown,
or until the engine requests that the task stop.
While running, a task may request data elements named
in the original parameter list from the task's
context. When convenient, a task may also report
status and log checkpoints, which are dealt with or
ignored at the engine's discretion except for the last
status reported, which is always saved and reported
back to the server. Compute engines save some
checkpoints by replacing the task's specification with
the contents of the checkpoint.
If the run() method exits
normally, the last reported status is tagged as the
final result of the task and sent back to the server.
If the engine must cease task execution, it may simply
terminate execution or first call the
stop() method on the task
instance. The latter method gives the task a short
amount of time during which it may optionally report a
last-minute status and/or log one last checkpoint, and
then exit the run() method by
throwing a
com.parabon.runtime.TaskStoppedException.
After stopping a task, an engine may later attempt to restart it from something close to its state before termination by repeating the entire process of task execution, using the last reported checkpoint as the task specification rather than that originally sent from the server.
A task's primary class, which is instantiated when the task is
to be run, must implement the
com.parabon.runtime.Task interface.
The explicit portions of this interface include the
run() and stop()
methods. Other important, implicit portions of this
interface, however, come into play even before the
run() and stop()
methods. These include the constructor and parameter mutator
methods.
The constructor requirements are straightforward. A public
constructor that takes an instance of
com.parabon.runtime.TaskContext as
its single argument must exist. This context should be
saved, as it is the primary means of communication with the
outside world and is used to access data elements, report
status, and log checkpoints.
After a task is instantiated, its parameters are set via carefully named mutator methods, a mechanism similar to that used by, for example, JavaBeans. Client applications can specify arbitrarily named parameters, each of which is associated with a value representing one of a set of predefined types of 'parameter values' (detailed below). Each parameter value type has a set of mappings to more primitive types.
For each parameter given for a task with a name
xxx and type Y, a
public accessor should exist in the primary task class with
a name setXxx that takes a single
argument of type Z, where
Z is one of the types to which
Y maps. For each parameter, such a
mutator will be searched for, the corresponding value mapped
to the type expected by the mutator, and the mutator invoked
with the resulting mapped value. If several mutators
fitting the template exist, the mutator having an argument type
with the best mapping is picked. Mappings from type
Y to type Z include,
in order from best to worst, Y,
Y's type-specific primitive type mappings
(given below), and finally
ParameterValue. Note that ownership of
the argument to a mutator is not transferred to the task,
and no guarantee is made as to its integrity after the
mutator completes. That is, mutators must make copies of
parameter arguments and not try to keep references to the
actual values passed in. The exceptions, of course, are
immutable primitive types (e.g., int
or String).
Let's consider an example in which a parameter is
"numIterations" with a value of type
'integer' and a corresponding mutator method in the primary
task class with the signature "public void
setNumIterations(short x)". The engine would map
the value of the "numIterations" to a
short primitive and invoke this
method with the resulting value as an argument. If a method
with the same name that took an argument of type
int or
java.lang.Long existed, the engine
would try to call that method instead. On the other hand, a
similar method that took an argument of type
byte or
com.parabon.common.ParameterValue
would not be used, since a mapping to
short takes precedence over these
types for integer parameter values.
The parameter value types, the primary Java types used to represent them, and their primitive type mappings include the following:
boolean, represented
by
com.parabon.common.BooleanParameterValue,
which maps to boolean and
java.lang.Boolean
integer, represented
by
com.parabon.common.IntegerParameterValue,
which maps to long,
java.lang.Long,
int,
java.lang.Integer,
short,
java.lang.Short,
byte, and
java.lang.Byte
scalar (floating
point), represented by
com.parabon.common.ScalarParameterValue,
which maps to double,
java.lang.Double,
float, and
java.lang.Float
string, represented by
com.parabon.common.StringParameterValue,
which maps to java.lang.String
date, represented by
com.parabon.common.DateParameterValue,
which maps to java.util.Date
data element
reference, represented by
com.parabon.common.DataReferenceParameterValue,
which maps to
com.parabon.common.DataElementID
binary data,
represented by
com.parabon.common.BinaryParameterValue,
which maps to byte[]
array of parameter
values, represented by
com.parabon.common.ArrayParameterValue
structure (set of
name <-> parameter value pairs), represented by
com.parabon.common.NamedParameterMap
Once the primary task class has been instantiated and all of
its parameter values set via mutator methods, the task will
be started via the run() method. This
method is guaranteed to be called at most once for
any given instance of the task class. Subsequent task
invocations via checkpoints occur using new instances of the
task class.
While run() is executing,
the stop() method may be called to
request that a task stop 'gracefully'. This gives the task
a chance to quickly post one last status and/or log one last
checkpoint, after which the run()
method should throw a
com.parabon.runtime.TaskStoppedException.
the Frontier compute engine may also decide to stop the task more abruptly,
either without calling stop() or, if
the task does not exit gracefully, within some arbitrary
timeout after stop() is called.
The run() method can exit one of four
ways:
run() can terminate
normally, after which the task is considered complete
and the last status reported is marked as its final
result and sent back to the server.
A
com.parabon.runtime.TaskStoppedException
can be thrown, indicating a graceful exit from an
incomplete task after the stop()
method has been invoked. Throwing this exception
without stop() having been
invoked is an error and will result in a task being
marked complete with an exceptional result. On the
other hand, gracefully terminating an incomplete task
after stop() has been invoked but
without throwing a
com.parabon.runtime.TaskStoppedException
will be considered successful completion despite the
stop request, and no attempt will be made to restart
the task from a checkpoint at a later time.
A set of specific runtime exceptions represents such
conditions as resource limitations and spurious engine
errors. These include any number of exceptions
that are defined by the Frontier runtime environment (not the
task itself) and that extend the
com.parabon.runtime.TaskRuntimeException
class, as well as other throwables including
java.lang.OutOfMemoryError. If
such an exception is thrown, the task is considered
incomplete and its mode marked as aborted. The
Frontier server may attempt to restart the task on
another compute engine from the original specification
or the last checkpoint, or may propagate this
condition back to the client and cease further
attempts to run the task.
When any other exception is thrown by the task from
the run() method, it is
considered the 'natural' result of the task. Hence,
the task is marked as complete, and the exception is
propagated back to the server and eventually to the
client application.
When a task object is instantiated, its constructor is passed
an instance of
com.parabon.runtime.TaskContext. This
object is used for communication with the outside world.
Specifically, it can be used to obtain data elements, post
results, report progress, and log checkpoints.
Data elements required during the running of a task can be
obtained via a request to the context's
getDataElement() method, giving the ID
of the requested data element. As
DataElementID instances can only be
obtained as task parameters, required data elements must be
specified somewhere in a task's parameters. Note that
although it is possible to obtain IDs through other means
(i.e., directly implementing
com.parabon.common.DataElementID
within a task), a task should never attempt to do so.
Frontier does not guarantee that data elements identified
through such means will be available to a running task.
A requested data element is returned in the form of a
com.parabon.runtime.DataElement
object. This object can then be queried to obtain a
'black-box' InputStream as well as
other properties of the data element, which may or may not
be specified (for instance, total data length). The actual
data can then be read from the obtained stream. Each time
the getStream() method is called on a
data element, a new stream is returned, which will return
data starting from the beginning of the data element. Both
TaskContext.getDataElement() and
DataElement.getStream() can be called
as many times as desired over the course of a task's
execution, at the possible price of additional resource
usage.
Whenever convenient, a task can post results via its context object. Each new result posted replaces any and all previously reported results. Thus, at any given time, the 'current results' of a task are those last reported by the task. Earlier reported results may be thrown away and possibly never reported back to the server or the client application. The last results posted are considered to be the final results of a task.
Results themselves are the primary component of status and
are represented by a set of name <-> value pairs.
This is similar to the format of task parameters, encapsulated
within a single
com.parabon.common.NamedParameterMap
instance owned by the task. When results are posted, this
structure is copied and possibly persisted and/or reported
back to the server, and from there, possibly to the client
application. To put it simply, some reported intermediate
results may make it back to the client application, but in
general only the transmission of the final result should be
relied upon.
The frequency at which results should be posted depends strongly on both the nature of the task being run and the size of the results. An upper limit on reporting frequency can generally be set naturally, for instance, at the beginning of a central outer loop of a task when results are well defined and can easily be gathered. However, actual frequency should generally be much less than the upper bound. Posting results too often will slow down the running of a task, as some overhead related to the size of the results is incurred on every report. On the other hand, posting results too infrequently generally reduces the amount of status information available to a client application session during the course of task execution. In general, reasonably large results should be reported only once, at the end of a task, as the bandwidth cost incurred in transferring large intermediate results is generally greater than any benefit gained. Instead, intermediate results should generally be only a brief summary.
Checkpoints consist of modifications to a task specification
in the form of a new set of parameters and optionally a new
'primary class'. Each checkpoint, when logged, replaces the
relevant portions of either the previous checkpoint or the
original task specification. In particular, checkpoint
parameters do not augment previous parameter sets, but
replace them in their entirety. Similar to results,
checkpoint parameter sets are represented by a single
com.parabon.common.NamedParameterMap
instance owned by the task.
Similar to status, checkpoints should be reported whenever a task is in a sufficiently coherent state that its state can be summarized into a checkpoint and reported. The frequency tradeoff for checkpoints is somewhat more subtle, but even more important than that of status frequency. Although checkpoints tend not to be reported back to the server on a regular basis and so can be somewhat larger than results, they do require disk space on a provider machine and the size-related overhead incurred in logging. More frequent checkpoints result in more overhead—computational time lost with no benefit if a checkpoint is never needed for a restart. However, less frequent checkpoints result in more computation lost whenever a task is terminated and restarted.
In an extreme case, imagine a provider whose machine is only idle for thirty-minute spans and a task that only checkpoints every hour: this task will never make any progress. Such pathological cases may be somewhat rare, but nonetheless, a task that only reports checkpoints every hour will tend to lose a half-hour's worth of computation on average at every termination and restart. This problem can be overcome for many cases by a task that checkpoints when a stop is requested, but such functionality tends to be arbitrarily difficult to implement for many tasks. More significantly, it is not guaranteed that such an opportunity will always—or ever—be made available to a task, and so should not be relied upon.
Progress is a single piece of data that represents the amount of computation a given task has completed thus far and can be reported arbitrarily often by a task. The only restrictions are that it be a positive, nondecreasing (ideally strictly increasing) scalar over the lifetime of a task. That is, each time progress is reported, the value given should be no less than—and ideally greater than—the value given at the previous progress report, if any. Note that the lifetime of a task can span several checkpoint/restart instances of a task. That is, any progress reported after restarting from a checkpoint must be no less than that associated with that checkpoint when it was created by a previous instance of the same task. A task does not have to report progress, but it can aid in efficient scheduling and other task management issues and can provide a useful hint to the client application as well.
The current progress is defined as the value of the most recently reported progress and exists for a task at any point after it is first reported. The current progress is associated with each task status and checkpoint. Progress can be reported by itself or in conjunction with a result or checkpoint, which is equivalent to reporting progress by itself followed directly by posting a result or logging a checkpoint without an associated progress.
As progress is much cheaper to report to the engine than results or checkpoints, it makes sense to report it as often as is convenient, without the efficiency considerations involved in reporting results of logging checkpoints. Most progress reports will not be propagated back to the server, however, but merely saved internally by the engine runtime.
![]()