wfGenes¶
wfGenes (workflow generator) is a tool to generate various type of workflow management systems (WMSs) by parsing single workflow configuration file called WConfig. Naturally, within wfGenes framework, workflows are defined in human readable formats, JSON or YAML, with efficient and concise structure to generate different type of WMSs by performing dependency analysis and automatic code generation for specific WMS. This approach enables users to examine different type of WMS based on the application requirement and available computing environment. Here we demonstrate the wfGenes capability by constructing four different WMSs using WConfig (configuration file) and wfGenes. Following lines briefly introduce these tools and summarize their main features in the table below. For more information about these systems and their potentials refer to hyperlinks.
FireWorks is an Open source WMS with well separated data storage and computation phases using MongoDB to offer powerfull workflow management system for distributed workers across single or multiple cluster(s) while providing strong query mechanism thanks to strong database back-end.
SimStack is a commercial tool featured with Graphical User Interface (GUI) with available set of customizable blocks (IPs) deployed in nano material simulation domain.
DasK is an open source library for parallel python computing. With various parallel array data types built on top of Numpy and Pandas arrays, Dask is suitable for memory intensive computation with the capability to scale on many nodes. Dask builtin task scheduler coordinates execution sequence between tasks and exploit parallelism in a lazy manner. This enable users to generate their task graph prior to simulation with minimum amount of code modification.
Parsl is a parallel python library to scale python scripts across many cores. With various kind of executors, Parsl enable users to accelerate their applications and achieve extreme scalability using Parsl specific syntax and decorators.
All discussed tools are able to scale applications from personal laptops to super computers.
WMS |
Input Language |
GUI |
Post-Processing |
Fault tolerance |
License |
---|---|---|---|---|---|
FireWorks |
YAML/JSON/Python |
No |
Monitoring/Database Query |
Relaunching fizzled subworkflow (fireworks) |
BSD |
SimStack |
XML |
Yes |
Monitoring |
Relaunching fizzled subworkflow (WaNos) |
Commercial |
Dask |
Python |
No |
Monitoring |
none |
BSD |
Parsl |
Python |
No |
Monitoring |
Lazy failure and check-pointing |
Apache |
How it works¶
To get started with wfGenes, WConfig should be prepared based on workflow graph. In fact, WConfig is an abstract description of inputs, outputs and function names to be parsed by wfGenes. Apart from configuration file, additional arguments provides controlability over automation process and output generation.
- workflowconfig
Path to workflow config file that contains data of workflow in YAML/JSON format. i.e. Input/Outputfile, modules and arguments names, the default is workflow.yaml.
- inputpath
Set input path of Workflow, the directory to fetch input data.. Default is current working directory.
- wms
Choose specific workflow manageme system. Possible values are FireWorks, SimStack, Dask and Parsl.
In the following snippets, two simple workflows are described in YML format and presents the WConfig structure. During Modeling phase, wfGenes validates user’s input against WConfig schema to ensure successful generation phase.
workflow_name: First workflow
nodes:
- name: node_1
id: 1
tasks:s
- func: [source_1 , module_1]
input: [input1]
outputs: [output1_id1]
kwargs: {}
- func: [source_1, module_2]
inputs: [input1, output1_id1]
outputs: [output2_id1]
kwargs: {}
- name: node_2
id: 2
tasks:
- func: [source_1, module_3]
inputs: [input2, input1]
outputs: [output1_id2]
kwargs: {}
- name: node_3
id: 3
tasks:
- func: [source_2, module_1]
inputs: [output2_id1, output1_id2]
outputs: [output1_id3]
kwargs: {}
workflow_name: Second workflow
nodes:
- name: node_1
id: 1
tasks:
- func: [source_1, module_1]
input: [input1]
outputs: [output1_id1]
kwargs: {}
- func: [source_1, module_2]
inputs: [input2, output1_id1]
outputs: [output2_id1 , output3_id1 ]
kwargs: {}
- name: node_2
id: 2
tasks:
- func: [source_1, module_3]
inputs: [input1, output2_id1]
outputs: [output1_id2]
kwargs: {}
- name: node_3
id: 3
tasks:
- func: [source_2, module_1]
inputs: [input1, output3_id1]
outputs: [output1_id3]
kwargs: {}
However, wfGenes constructs and adopts workflow based on user preferred WMS, several common consideration are taking to account to ensure the quality of result is not affected by automation process.
In a unified fashion, configuration file contains source and module names to pass necessary information to tool for automatic wrapper generation.
The WGenerator generates executable python wrapper from custom configuration while taking care of three main criteria that boosts performance while preserving functionality:
One time load of extra inputs.
One time import of duplicate modules.
Resolving dependency in data-flow and optimize code generation.
wfGenes construct task graph by matching names, e.g , in the first workflow, in node one output1_id1 is passed to next function (local dependency) or , in the second workflow, output2_id1 is passed to node_3 (global dependency).
Regardless of target WMS, generated outputs by wfGenes are validated against schema to assure early stage validation and ease further improvement of the tool.
wfGenesLab¶
wfGenesLab is a widget based user interface for wfGenes that runs on top of JupyterLab. It provides a light-weight and intuitive interface to generate, visualize and execute workflow graphs. wfGenesLab offers a dashboard of Jupyter widgets –various type of buttons and clickable links– to couple modeling phase to execution in a customizable manner using wfGenes under the hood.
wfGenesEngine¶
The wfGenes engine mainly designed to execute generated workflows by wfGenes via a graphical environment built on top of JupyterLab. In the case of FireWorks, thanks to available python APIs, the engine is also equipped with monitoring instruments to capture the state of each workflow and enable users to face with failure and fizzled workflows.
wfGenesLab executor runs generated python models for Dask and Parsl on two different type of resources 1. Local to be used on personal working stations or 2. Slurm to run the workflow on supercomputers.
Hands-on¶
Click Here to go to the hands-on page and start exploring. Let your curiosity guide your journey!
Contacts¶
Legal Notice¶
The documentation is licensed under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons License http://creativecommons.org/licenses/by-nc-nd/4.0/
Copyright © 2021 Karlsruhe Institute of Technology (KIT)