WRENCH 101 {#wrench-101}

@WRENCHUserDoc

User Documentation
@endWRENCHDoc @WRENCHDeveloperDoc
Developer Documentation
@endWRENCHDoc @WRENCHInternalDoc
Internal Documentation
@endWRENCHDoc

WRENCH 101 is a page and a set of documents that provide detailed information for each WRENCH's classes of users, and higher-level content than the API Reference. For instructions on how to install, run a first example, or create a basic WRENCH-based simulator, please refer to their respective sections in the documentation.

@WRENCHUserDoc

This User 101 guide describes all the WRENCH simulation components (building blocks) necessary to build a custom simulator and run simulation scenarios.


10,000-ft view of a WRENCH-based simulator # {#wrench-101-simulator-10000ft}

A WRENCH-based simulator can be as simple as a single main() function that first creates a platform to be simulated (the hardware) and a set of services that run on the platform (the software). These services correspond to software that knows how to store data, perform computation, and many other useful things that real-world cyberinfrastructure services can do.

The simulator then needs to create a workflow (or a set of workflows) to be executed, which consists of a set of compute tasks each with input and output files, and thus data-dependencies. A special service is then created, called a Workflow Management System (WMS), that will be in charge of executing the workflow on the platform. (This service must have been implemented by a WRENCH "developer", i.e., a user that has used the Developer API). The set of input files to the workflow, if any, are staged on the platform at particular storage locations.

The simulation is then launched via a single call. When this call returns, the WMS has cleanly_terminated (typically after completing the execution of the workflow, or failing to executed it) and the simulation output can be analyzed.

Blueprint for a WRENCH-based simulator # {#wrench-101-simulator-blueprint}

Here are the steps that a WRENCH-based simulator typically follows:

-# Create and initialize a simulation -- In WRENCH, a user simulation is defined via the wrench::Simulation class. An instance of this class must be created, and the wrench::Simulation::init() method is called to initialize the simulation (and parse WRENCH-specific and SimGrid-specific command-line arguments). Two useful such arguments are --help-wrench, which displays help messages about optional WRENCH-specific command-line arguments, and --help-simgrid, which displays help messages about optional Simgrid-specific command-line arguments.

-# Instantiate a simulated platform -- This is done with the wrench::Simulation::instantiatePlatform() method which takes as argument a SimGrid virtual platform description file. Any SimGrid simulation must be provided with the description of the platform on which an application/system execution is to be simulated (compute hosts, clusters of hosts, storage resources, network links, routers, routes between hosts, etc.)

-# Instantiate services on the platform -- The wrench::Simulation::add() method is used to add services to the simulation. Each class of service is created with a particular constructor, which also specifies host(s) on which the service is to be started. Typical kinds of services include compute services, storage services, network proximity services, and file registry services.

-# Create at least one workflow -- This is done by creating an instance of the wrench::Workflow class, which has methods to manually add tasks and files to the workflow application, but also methods to import workflows from standard workflow description files (DAX and JSON). If there are input files to the workflow's entry tasks, these must be staged on instantiated storage services.

-# Instantiate at least one WMS per workflow -- At least one of the services instantiated must be a wrench::WMS instance, i.e., a service that is in charge of executing the workflow, as implemented by a WRENCH "developer" using the Developer API. Associating a workflow to a WMS is done via the wrench::WMS::addWorkflow() method.

-# Launch the simulation -- This is done via the wrench::Simulation::launch() call which first sanity checks the simulation setup and then launches all simulated services, until all WMS services have exited (after they have completed or failed to complete workflows).

-# Process simulation output -- The wrench::Simulation::getOutput() method returns an object that is a collection of time-stamped traces of simulation events. These traces can be processed/analyzed at will.

Available services # {#wrench-101-simulator-services}

To date, these are the (simulated) services that can be instantiated on the simulated platform:

Customizing Services # {#wrench-101-customizing-services}

Each service is customizable by passing to its constructor a property list, i.e., a key-value map where each key is a property and each value is a string. Each service defines a property class. For instance, the wrench::Service class has an associated wrench::ServiceProperty class, the wrench::ComputeService class has an associated wrench::ComputeServiceProperty class, and so on at all levels of the service class hierarchy.

The API documentation for these property classes explains what each property means, what possible values are, and what default values are. Other properties have more to do with what the service can or should do when in operation. For instance, the wrench::BatchComputeServiceProperty class defines a wrench::BatchComputeServiceProperty::BATCH_SCHEDULING_ALGORITHM which specifies what scheduling algorithm a batch service should use for prioritizing jobs. All property classes inherit from the wrench::ServiceProperty class, and one can explore that hierarchy to discover all possible (and there are many) service customization opportunities.

Finally, each service exchanges messages on the network with other services (e.g., a WMS service sends a "do some work" message to a compute service). The size in bytes, or payload, of all messages can be customized similarly to the properties, i.e., by passing a key-value map to the service's constructor. For instance, the wrench::ServiceMessagePayload class defines a wrench::ServiceMessagePayload::STOP_DAEMON_MESSAGE_PAYLOAD property which can be used to customize the size, in bytes, of the control message sent to the service daemon (that is the entry point to the service) to tell it to terminate. Each service class has a corresponding message payload class, and the API documentation for these message payload classes details all messages whose payload can be customized.

Customizing logging # {#wrench-101-logging}

When running a WRENCH simulator you will notice that there is quite a bit of logging output. While logging output can be useful to inspect visually the way in which the simulation proceeds, it often becomes necessary to disable it. WRENCH's logging system is a thin layer on top of SimGrid's logging system, and as such is controlled via command-line arguments. The simple example in examples/simple-example is executed as follows, assuming the working directory is examples/simple-example:

./wrench-simple-example-cloud  platform_files/cloud_hosts.xml workflow_files/genome.dax

One first way in which to modify logging is to disable colors, which can be useful to redirect output to a file, is to use the --wrench-no-color command-line option, anywhere in the argument list, for instance:

./wrench-simple-example-cloud  --wrench-no-color platform_files/cloud_hosts.xml workflow_files/genome.dax

Disabling all logging is done with the SimGrid option --wrench-no-log:

./wrench-simple-example-cloud  --wrench-no-log platform_files/cloud_hosts.xml workflow_files/genome.dax

The above --wrench-no-log option is a simple wrapper around the sophisticated Simgrid logging capabilities (it is equivalent to the Simgrid argument --log=root.threshold:critical). Details on these capabilities are displayed when passing the --help-logs command-line argument to your simulator. In a nutshell particular "log categories" can be toggled on and off. Log category names are attached to *.cpp files in the WRENCH and SimGrid code. Using the --help-log-categories command-line argument shows the entire log category hierarchy. For instance, there is a log category that is called wms for the WMS, i.e., those logging messages in the wrench:WMS class and a log category that is called simple_wms for logging message in the wrench::SimpleWMS class, which inherits from wrench::WMS. These messages are thus logging output produced by the WMS in the simple example. They can be enabled while other messages are disabled as follows:

./wrench-simple-example-cloud   platform_files/cloud_hosts.xml workflow_files/genome.dax --log=root.threshold:critical --log=simple_wms.threshold=debug --log=wms.threshold=debug

Use the --help-logs option displays information on the way SimGrid logging works. See the full SimGrid logging documentation for all details.

Analyzing Simulation Output # {#wrench-101-simulation-output}

Once the wrench::Simulation::launch() method has returned, it is possible to process time-stamped traces to analyze simulation output. The wrench::Simulation::getOutput() method returns an instance of wrench::SimulationOutput. This object has a templated wrench::SimulationOutput::getTrace() method to retrieve traces for various information types. For instance, the call

simulation.getOutput().getTrace<wrench::SimulationTimestampTaskCompletion>()

returns a vector of time-stamped task completion events. The classes that implement time-stamped events are all classes named wrench::SimulationTimestampSomething, where Something is pretty self-explanatory (e.g., TaskCompletion).

Measuring Energy Consumption # {#wrench-101-energy}

WRENCH leverages SimGrid's energy plugin, which provides accounting for computing time and dissipated energy in the simulated platform. SimGrid's energy plugin requires host pstate definitions (levels of performance, CPU frequency) in the XML platform description file. The following is a list of current available information provided by the plugin:

Note: The energy plugin is NOT enabled by default in WRENCH simulation. To enable the plugin, the --activate-energy command line option should be provided when running a simulator.

@endWRENCHDoc

@WRENCHDeveloperDoc

This Developer 101 guide describes WRENCH's architectural components necessary to build your own WMS (Workflow Management Systems).


10,000-ft view of a simulated WMS # {#wrench-101-WMS-10000ft}

A Workflow Management System (WMS), i.e., the software that makes all decisions and takes all actions for executing a workflow, is implemented in WRENCH as a simulated process. This process has a main() function that goes through a simple loop as follows:

  while ( workflow_execution_hasnt_completed_or_failed ) {
    // interact with services
    // wait for an event
  }

Blueprint for a WMS in WRENCH # {#wrench-101-WMS-blueprint}

A WMS implementation in WRENCH must derive the wrench::WMS class, and typically follows the following steps:

-# Get references to running services: The wrench::WMS base class implements a set of methods named wrench::WMS::getAvailableComputeServices(), wrench::WMS::getAvailableStorageServices(), etc. These methods return sets of services that can be used by the WMS to execute its workflow.

-# Acquire information about the services: Some service classes provide methods to get information about the capabilities of the services. For instance, a wrench::ComputeService() has a wrench::ComputeService::getNumHosts() method that makes it possible to find out how many compute hosts the service has access to in total. A wrench::StorageService has a wrench::StorageService::getFreeSpace() method to find out have many bytes of free space are available to it. Note that these methods actually involve communication with the service, and thus incur (simulated) overhead.

-# Go through a main loop: The heart of the WMS's execution consists in going through a loop until the workflow is executed or has failed to execute. This loop consists of two main steps:

Interacting with services # {#wrench-101-WMS-services}

Each service type provides its own API. For instance, a network proximity service provides methods to query the service's host distance databases. The API Reference provides all necessary documentation, which also explains which methods are synchronous and which are asynchronous (in which case some event will likely occur in the future). However, the WRENCH developer will find that many methods that one would expect are nowhere to be found. For instance, the compute services do not have methods for compute job submissions!

The rationale for the above is that many methods need to be asynchronous so that the WMS can use services concurrently. For instance, a WMS could submit a compute job to two distinct compute services asynchronously, and then wait for the service which completes its job first and cancel the job on the other service. Exposing this asynchronicity to the WMS would require that the WRENCH developer uses data structures to perform the necessary bookkeeping of ongoing service interactions, and process incoming control messages from the services on the (simulated) network or register many callbacks. Instead, WRENCH provides managers. One can think of managers are separate threads that handle all asynchronous interactions with services, and which have been implemented for your convenience to make service interactions easy.

For now there are two possible managers: a job manager manager (class wrench::JobManager) and a data movement manager (class wrench::DataMovementManager). The base wrench::WMS class provides two methods for instantiating and starting these managers: wrench::WMS::createJobManager() and wrench::WMS::createDataMovementManager(). Creating these managers typically is the first thing a WMS does. Each manager has its own documented API, and is discussed further in sections below.

Copying workflow data # {#wrench-101-WMS-data}

The WMS may need to explicitly copy files from one storage service to another storage service, e.g., to improve data locality when executing workflow tasks. File copies are accomplished through the data movement manager, which provides two methods:

Both methods take an optional wrench::FileRegistryService argument, in which case they will also update this file registry service with a new entry once the file copy has been completed.

Running workflow tasks # {#wrench-101-WMS-tasks}

A workflow comprises tasks, and a WMS must pack tasks into jobs to execute them. There are two kinds of jobs in WRENCH: wrench::PilotJob and wrench::StandardJob. A pilot job (sometimes called a "placeholder job") is a concept that is mostly relevant for batch scheduling. In a nutshell, it is a job that allows late binding of tasks to resources. It is submitted to a compute service (provided that service supports pilot jobs), and when a pilot job starts it just looks to the WMS like a short-lived compute service to which standard jobs can be submitted.

The most common kind of jobs is the standard job. A standard job is a unit of execution by which a WMS tells a compute service to do things. More specifically, in its most complete form, a standard job specifies:

Any of the above can actually be empty, and in the extreme a standard job does nothing.

Standard jobs and pilot jobs are created via the job manager (see multiple versions of the wrench::JobManager::createStandardJob() and wrench::JobManager::createPilotJob() methods). The job manager thus acts as a job factory, and provides job management methods:

Workflow execution events # {#wrench-101-WMS-events}

Because the WMS, in part via the managers, performs many asynchronous operations, it needs to act as an event handler. This is called by calling the wrench::WMS::waitForAndProcessNextEvent() method implemented by the base wrench::WMS class. A call to this method blocks until some event occurs. The possible event classes all derive the wrench::WorkflowExecutionEvent class. A WMS can override a method to react to each possible event (the default method does nothing but print some log message). At the time this documentation is being written, these overridable methods are:

Each method above takes in an event object as parameter, and each even class offers several methods to inspect the meaning of the event. In the case of failure, the event includes a wrench::FailureCause object, which can be accessed to understand the root cause of the failure.

Exceptions # {#wrench-101-WMS-exceptions}

Most methods in the WRENCH Developer API throw wrench::WorkflowExecutionException instances when exceptions occur. These are exceptions that corresponds to failures during the simulated workflow executions (i.e., errors that would occur in a real-world execution). Each such exception contains a wrench::FailureCause object, which can be accessed to understand the root cause of the execution failure.
Other exceptions (e.g., std::invalid_arguments, std::runtime_error) are thrown as well, which are used for detecting mis-uses of the WRENCH API or internal WRENCH errors.

Schedulers for decision-making # {#wrench-101-WMS-schedulers}

A large part of what a WMS does is make decisions. It is often a good idea for decision-making algorithms (often simply called "scheduling algorithms") to be re-usable across multiple WMS implementations, or plug-and-play-able for a single WMS implementation. For this reason, the wrench::WMS constructor takes as parameters two objects (or null pointers if not needed):

Although not required, it is possible to implement most (or even all) decision-making in these two methods so at to have a clean separation of concern between the decision-making part of the WMS and the rest of its functionality. This kind of design is used in the simple example provided in the examples/simple-example directory.

Logging # {#wrench-101-WMS-logging}

It is often desirable for the WMS to print log output to the terminal. This is easily accomplished using the wrench::WRENCH_INFO, wrench::WRENCH_DEBUG, and wrench::WRENCH_WARN macros, which are used just like printf. Each of these macros corresponds to a different logging level in SimGrid. See the SimGrid logging documentation for all details.

Furthermore, one can change the color of the log messages with the wrench::TerminalOutput::setThisProcessLoggingColor() method, which takes as parameter a color specification:

wrench::TerminalOutput::COLOR_BLACK
wrench::TerminalOutput::COLOR_RED
wrench::TerminalOutput::COLOR_GREEN
wrench::TerminalOutput::COLOR_YELLOW
wrench::TerminalOutput::COLOR_BLUE
wrench::TerminalOutput::COLOR_MAGENTA
wrench::TerminalOutput::COLOR_CYAN
wrench::TerminalOutput::COLOR_WHITE

@endWRENCHDoc

@WRENCHInternalDoc

This Internal 101 guide is intended for users who want to contribute code to WRENCH to extend its capabilities.


Make sure to read the User 101 Guide and the Developer 101 Guide to understand how WRENCH works from those perspectives. The largest portion of the WRENCH code base is the "Internal" code base, and for now, the way to go is to look at the API Reference. Do not hesitate to contact the WRENCH team with questions about the internals of WRENCH if you want to contribute.
Of course, forking The WRENCH repository and creating pull requests is the preferred way to contribute to WRENCH as an Internal developer.

Here is a in-progress misc item list of interest:

@endWRENCHDoc