Learning WDL

An approach I like to use when learning a new tool is to get started by trying to run an example and then gradually work out the details. In this post, I’m trying to learn the basics of the Workflow Description Language (WDL) so that I can adapt GATK workflows for my own use. WDL, pronounced “widdle”, is yet another workflow language that allows you to build computational pipelines and was originally developed for genome analysis pipelines by the Broad Institute. To execute WDL scripts, we will need Cromwell and to make it easier to work with WDL scripts we will need WOMtool, so download them if you want to follow along. I am using Java 8 on my MacBook Pro (15-inch, 2017) running macOS Mojave (until someone tells me it is safe to upgrade to Catalina) for this post.

wget https://github.com/broadinstitute/cromwell/releases/download/47/cromwell-47.jar
wget https://github.com/broadinstitute/cromwell/releases/download/47/womtool-47.jar

# for reference
java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

I found a Hello World example and thought I would get started with that.

task hello {
  input {
    String pattern
    File in
  }

  command {
    egrep '${pattern}' '${in}'
  }

  runtime {
    docker: "broadinstitute/my_image"
  }

  output {
    Array[String] matches = read_lines(stdout())
  }
}

workflow wf {
  call hello
}

WOMtool has a validate subcommand that performs full validation of the WDL file including syntax and semantic checking.

java -jar ~/bin/womtool-47.jar validate hello.wdl
ERROR: Unexpected symbol (line 2, col 3) when parsing '_gen4'.

Expected rbrace, got input.

  input {
  ^

$task = :task :identifier :lbrace $_gen3 $_gen4 :rbrace -> Task( name=$1, declarations=$3, sections=$4 )

After some digging around I found out that there is no explicitly named “input” definition component and that input variables are just defined inside the task block.

task hello {
  String pattern
  File in

  command {
    egrep '${pattern}' '${in}'
  }

  runtime {
    docker: "broadinstitute/my_image"
  }

  output {
    Array[String] matches = read_lines(stdout())
  }
}

workflow wf {
  call hello
}

Validate again.

java -jar ~/bin/womtool-47.jar validate hello2.wdl
Success!

The syntax of our WDL script is fine, but it seems we will have to customise the runtime block. The runtime component is an optional property of a task, so let’s get rid of it for now.

task hello {
  String pattern
  File in

  command {
    egrep '${pattern}' '${in}'
  }

  output {
    Array[String] matches = read_lines(stdout())
  }
}

workflow wf {
  call hello
}

To specify inputs for our WDL script, we should use a JSON file. WOMtool can generate an input JSON file for all your inputs from a WDL script.

java -jar ~/bin/womtool-47.jar inputs hello_no_docker.wdl > hello_no_docker.json

cat hello_no_docker.json 
{
  "wf.hello.pattern": "String",
  "wf.hello.in": "File"
}

We will need to specify two inputs: a string and a file. Let’s search for “forest” in the play MacBeth.

# download the play from my server
wget https://davetang.org/file/shakespeare-macbeth-46.txt

Modify hello_no_docker.json to include our search pattern and the name of the file.

{
  "wf.hello.pattern": "forest",
  "wf.hello.in": "shakespeare-macbeth-46.txt"
}

Now to execute our WDL script.

java -jar ~/bin/cromwell-47.jar run hello_no_docker.wdl -i hello_no_docker.json

Two new folders will be created on successful completion: cromwell-executions and cromwell-workflow-logs. Results will be in the executions folder and the directory tree follows this structure.

The results of our command were passed to standard output.

cat cromwell-executions/wf/4f514820-bf2e-44ba-9309-b05487f58f78/call-hello/execution/stdout
	Who can impress the forest, bid the tree
	Till Birnam forest come to Dunsinane.

I have Docker installed on my computer, so I’d like to try run the WDL script again with the runtime component so that the Ubuntu image will be used instead of running the command using my laptop’s operating system. The runtime section:

defines key/value pairs for runtime information needed for this task. Individual backends will define which keys they will inspect so a key/value pair may or may not actually be honoured depending on how the task is run.

task hello {
  String pattern
  File in

  command {
    egrep '${pattern}' '${in}'
  }

  runtime {
    docker: "ubuntu:16.04"
  }

  output {
    Array[String] matches = read_lines(stdout())
  }
}

workflow wf {
  call hello
}

I can reuse the same JSON file since the two WDL scripts have the same inputs.

java -jar ~/bin/cromwell-47.jar run hello_docker.wdl -i hello_no_docker.json

# results are saved in a new directory with a different hash code
# but the results are the same as before
cat cromwell-executions/wf/63a4ec68-5597-4d2d-b275-01aef941038a/call-hello/execution/stdout
	Who can impress the forest, bid the tree
	Till Birnam forest come to Dunsinane.

Summary

A WDL script has five basic components:

  • Task
  • Command
  • Output
  • Workflow
  • Call

Inputs for a task are simply specified within a task block; the task block also contains a command block, output block, and various optional blocks such as a runtime block. Inputs can be used in the command block and can be referred to by following the syntax ‘${input_name}’. The optional runtime component that can be used to specify the type of backend to use.

The output section:

defines which values should be exposed as outputs after a successful run of the task. Outputs are declared just like task inputs or declarations in the workflow. The difference being that they are evaluated only after the command line executes and files generated by the command can be used to determine values. Note that outputs require a value expression (unlike inputs, for which an expression is optional)

The Array[String] read_lines(String|File) stores the results from the egrep command into an array of strings called matches. The expression read_lines(stdout()) reads the output, line by line, and is stored in matches.

  output {
    Array[String] matches = read_lines(stdout())
  }

The workflow block defines the workflow and is where tasks are called by using call.

Resources

Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.
One comment Add yours
  1. Your initial hello world WDL is in 1.0 (aka draft-3) syntax format of the WDL spec. It is missing the required ‘version’ on the first line; if it is not defined it will be parsed as draft-2. This indeed does not support the input stanza. Happy coding!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.