Sydney-Informatics-Hub · mgeaghan · Apr 29, 2025 · May 5, 2025 · May 7, 2025 · May 12, 2025
diff --git a/docs/index.md b/docs/index.md
@@ -12,10 +12,12 @@ This workshop will provide you with the foundational knowledge required to build
 
 ## Prerequisites
 
-This is an intermediate-advanced workshop for people developing reproducible bioinformatics workflows.
+This is an intermediate-advanced workshop for people developing reproducible bioinformatics workflows. It assumes some experience with the following:
 
 * Experience working on the command line/Linux environment.
-* Experience developing reproducible workflows (e.g., bash, CWL, WDL, or Snakemake). 
+* Experience with basic scripting (e.g. Bash).
+
+In addition, experience with other reproducible workflow tools (e.g. CWL, WDL, or Snakemake) will be very useful, although not at all required for this workshop.
 
 ## Set up requirements
 
@@ -31,7 +33,7 @@ In order to foster a positive and professional learning environment we encourage
 * Show courtesy and respect towards other community members
 * Our full code of conduct, with incident reporting guidelines, is available here.
 
-## Workshop schedule  
+## Workshop schedule
 
 ### Day 1
 
@@ -65,6 +67,6 @@ at the end of the workshop. Help us help you! 😁
 
 ## Credits and acknowledgements
 
-This workshop event and accompanying materials were developed by the Sydney Informatics Hub, University of Sydney in partnership with Seqera. The workshop was enabled through the Australian BioCommons - [BioCLI Platforms Project](https://www.biocommons.org.au/biocli) (NCRIS via Bioplatforms Australia). 
+This workshop event and accompanying materials were developed by the Sydney Informatics Hub, University of Sydney in partnership with Seqera. The workshop was enabled through the Australian BioCommons - [BioCLI Platforms Project](https://www.biocommons.org.au/biocli) (NCRIS via Bioplatforms Australia).
 
 ![](./img/logos.png)
diff --git a/docs/part1/00_intro.md b/docs/part1/00_intro.md
@@ -11,7 +11,7 @@ During **Part 2**, the skills and concepts you have learned in Part 1 will be ap
 It is good practice to organize projects into their own folders to make it easier to track and replicate experiments over time.
 We have created separate directories for each part (`~/part1/` and `~/part2/`).
 
-!!!question "Exercise"
+!!! question "Exercise"
 
     In the VSCode terminal, move into the directory for all Part 1 activities:
 

diff --git a/docs/part1/01_hellonextflow.md b/docs/part1/01_hellonextflow.md
@@ -22,21 +22,21 @@ Nextflow’s **core features** are:
 
 ## Processes, tasks, and channels
 
-A Nextflow workflow is made by joining together **processes**. Each process can be written in any scripting language that can be executed by the Linux platform Processes can be written in any language that can be executed from the command line, such as Bash, Python, or R.
+A Nextflow workflow is made by joining together **processes**. Each process can be written in any scripting language that can be executed from the command line, such as Bash, Python, or R.
 
 Processes in are executed independently (i.e., they do not share a common writable state) as **tasks** and can run in parallel, allowing for efficient utilization of computing resources. Nextflow automatically manages the data dependencies between processes, ensuring that each process is executed only when its input data is available and all of its dependencies have been satisfied.
 
 The only way they can communicate is via asynchronous first-in, first-out (FIFO) queues, called **channels**. Simply, every input and output of a process is represented as a channel. The interaction between these processes, and ultimately the workflow execution flow itself, is implicitly defined by these input and output declarations.
 
-![Image title](img/myworkflow.excalidraw.png)
+![An example Nextflow schematic](img/myworkflow.excalidraw.png)
 
 ## Execution abstraction
 
 While a process defines what command or script is executed, the **executor** determines how and where the script is executed.
 
 Nextflow provides an **abstraction** between the workflow’s functional logic and the underlying execution system. This abstraction allows users to define a workflow once and execute it on different computing platforms without having to modify the workflow definition. Nextflow provides a variety of built-in execution options, such as local execution, HPC cluster execution, and cloud-based execution, and allows users to easily switch between these options using command-line arguments.
 
-![Image title](img/abstraction.excalidraw.png)
+![Execution abstraction of a Nextflow workflow](img/abstraction.excalidraw.png)
 
 ## More information
 

diff --git a/docs/part1/02_helloworld.md b/docs/part1/02_helloworld.md
@@ -12,7 +12,7 @@ Let's demonstrate this with simple commands that you can run directly in the ter
 
 The **`echo`** command in Linux is a built-in command that allows users to display lines of text or strings that are passed as arguments. It is commonly used in shell scripts and batch files to output status text to the screen or a file.
 
-The most straightforward usage of the `echo` command is to display a text or string on the terminal. To do this, you simply provide the desired text or string as an argument to the `echo` command:
+The most straightforward usage of the `echo` command is to display text or a string on the terminal. To do this, you simply provide the desired text or string as an argument to the `echo` command:
 
 ```bash
 echo <string>
@@ -28,6 +28,10 @@ echo <string>
         echo 'Hello World!'
         ```
 
+        ``` title="Output"
+        Hello World!
+        ```
+
 ## Redirect outputs
 
 The output of the `echo` can be redirected to a file instead of displaying it on the terminal. You can achieve this by using the **`>`** operator for output redirection. For example:
@@ -36,7 +40,13 @@ The output of the `echo` can be redirected to a file instead of displaying it on
 echo 'Welcome!' > output.txt
 ```
 
-This will write the output of the echo command to the file name `output.txt`.
+Notice that nothing is printed in the terminal.
+
+``` title="Output"
+
+```
+
+Instead, this will write the output of the echo command to the file name `output.txt`.
 
 !!!question "Exercise"
 
@@ -48,6 +58,10 @@ This will write the output of the echo command to the file name `output.txt`.
         echo 'Hello World!' > output.txt
         ```
 
+        ``` title="Output"
+
+        ```
+
 ## List files
 
 The Linux shell command **`ls`** lists directory contents of files and directories. It provides valuable information about files, directories, and their attributes.
@@ -70,6 +84,10 @@ ls
 
         A file named `output.txt` should now be listed in your current directory.
 
+        ``` title="Output"
+        output.txt
+        ```
+
 ## View file contents
 
 The **`cat`** command in Linux is a versatile companion for various file-related operations, allowing users to view, concatenate, create, copy, merge, and manipulate file contents.
@@ -92,6 +110,10 @@ cat <file name>
 
         You should see `Hello World!` printed to your terminal.
 
+        ``` title="Output"
+        Hello World!
+        ```
+
 !!! abstract "Summary"
 
     In this step you have learned:

diff --git a/docs/part1/03_hellonf.md b/docs/part1/03_hellonf.md
@@ -7,15 +7,15 @@
 
 Workflow languages are better than Bash scripts because they handle errors and run tasks in parallel more easily, which is important for complex jobs. They also have clearer structure, making it easier to maintain and work on with others.
 
-Here, you're going learn more about the Nextflow language and take your first steps making a **your first pipeline** with Nextflow.
+Here, you're going learn more about the Nextflow language and take your first steps making **your first pipeline** with Nextflow.
 
-## `hello-world.nf`
+## Writing you first pipeline: `hello-world.nf`
 
-Nextflow pipelines need to be saved as `.nf` files.
+Nextflow pipelines are written inside `.nf` files. They consist of a combination of two main components: **processes** and the **workflow** itself. Each process describes a single step of the pipeline, including its inputs and expected outputs, as well as the code to run it. The workflow then defines the logic that puts all of the processes together.
 
-The process definition starts with the keyword `process`, followed by process name, and finally the process body delimited by curly braces. The process body must contain a `script` block which represents the command or, more generally, a script that is executed by it.
+A process definition starts with the keyword `process`, followed by process name, and finally the process body delimited by curly braces. The process body must contain a `script` block which represents the command or, more generally, a script that is executed by it.
 
-A process may contain any of the following definition blocks: `directives`, `inputs`, `outputs`, `when` clauses, and of course, `script`.
+A process may contain any of the following definition blocks: `directives`, `input`, `output`, `when` clauses, and of course, `script`.
 
 ```groovy
 process < name > {
@@ -41,7 +41,7 @@ A workflow is a composition of processes and dataflow logic.
 
 The workflow definition starts with the keyword `workflow`, followed by an optional name, and finally the workflow body delimited by curly braces.
 
-Let's review the structure of `hello-world.nf`, a toy example you will be executing and developing:
+Let's review the structure of `hello-world.nf`, a toy example you will be developing and executing:
 
 ```groovy title="hello-world.nf" linenums="1"
 process SAYHELLO {
@@ -101,7 +101,7 @@ As a developer you can to choose how and where to comment your code.
 
         The solution may look something like this:
 
-        ```groovy title="hello-world.nf"
+        ```groovy title="hello-world.nf" hl_lines="1-3"
         /*
          * Use echo to print 'Hello World!' to standard out
          */
@@ -111,7 +111,7 @@ As a developer you can to choose how and where to comment your code.
 
         Or this:
 
-        ```groovy title="hello-world.nf"
+        ```groovy title="hello-world.nf" hl_lines="1"
         // Use echo to print 'Hello World!' to standard out
         process SAYHELLO {
         <truncated>
@@ -165,7 +165,7 @@ Hello World!
 4. The first process is executed once, which means there is one task. The line starts with a unique hexadecimal value, and ends with the task completion information
 5. The result string from stdout is printed
 
-## Task directories
+## Understanding the task directories
 
 When a Nextflow pipeline is executed, a `work` directory is created. Processes are executed in isolated **task** directories. Each task uses a unique directory based on its [hash](https://www.nextflow.io/docs/latest/cache-and-resume.html#task-hash) (e.g., `4e/6ba912`) within the work directory.
 

diff --git a/docs/part1/04_output.md b/docs/part1/04_output.md
@@ -5,7 +5,7 @@
     1. Utlizie Nextflow process output blocks
     2. Publish results from your pipeline with directives
 
-Instead of printing 'Hello World!' to the standard output it can be saved to a file. In a "real-world" pipeline, this is like having a command that specifies an output file as part of its normal syntax.
+Currently, our pipeline is simply printing 'Hello World!' to the terminal via the standard output (`stdout`). This isn't particularly useful if we want to do anything with the outputs of our processes. Instead, we can save the output of our process to a file that can be passed on to other processes later on. In a "real-world" pipeline, this is like having a command that specifies an output file as part of its normal syntax.
 
 Here you're going to update the `script` and the `output` definition blocks to save the 'Hello World!' as an output.
 
@@ -36,11 +36,14 @@ The `>` operator can be used for output redirection.
         }
         ```
 
-## Outputs blocks
+## Capturing outputs
 
-Outputs in the output definition block typically require an **output qualifier** and a **output name**:
+We have now updated our script to write 'Hello World!' to `output.txt`, but we also need to tell Nextflow to expect this file - otherwise, it will ignore it! Nextflow requires us to **declare** what outputs should be captured from each process. This is particularly useful for a number of reasons. First, many tools will generate intermediate files that we don't need, and capturing all of them would be messy and unnecessary. Second, Nextflow uses the outputs we declare to figure out how and when to run each process. And finally, by declaring our process outputs, Nextflow has a way to determine whether our process succeeded or not; if an output is declared but is missing at the end of the process, Nextflow will assume it has failed.
+
+We declare our outputs using the `output` definition block. Typically this will require both an **output qualifier** and an **output name**:
 
 ```groovy
+output:
 <output qualifier> <output name>
 ```
 
@@ -66,15 +69,23 @@ output:
 path 'output.txt'
 ```
 
-The output name and the file generated by the script must match (or be picked up by a glob pattern).
+The output name and the file generated by the script must exactly match (or be picked up by a glob pattern), or else Nextflow won't find it and will throw an error.
+
+!!! note
+
+    It is important to understand that the `output` block does not *determine* the output of the process. Instead, it simply *declares* what output should be expected. It is up to the logic inside the `script` block to ensure that the file is actually being created.
+
+So far, we have been using the `stdout` output declaration, which tells Nextflow to capture all of the information sent to the standard output. This is a special output qualifier in that it doesn't require an output name to go along with it.
+
+Now that we are redirecting our 'Hello World!' message to a file, we want to tell nextflow to expect an output file called `output.txt`.
 
 !!!question "Exercise"
 
     Add `path 'output.txt'` in the `SAYHELLO` output block.
 
     ???Solution
 
-        ```groovy title="hello-world.nf" hl_lines="4-6"
+        ```groovy title="hello-world.nf" hl_lines="6"
         // Use echo to print 'Hello World!' and redirect to output.txt
         process SAYHELLO {
             debug true
@@ -93,19 +104,17 @@ The output name and the file generated by the script must match (or be picked up
 
     This example is brittle because the output filename is hardcoded in two separate places (the `script` and the `output` definition blocks). If you change one but not the other, the script will break.
 
-## Publishing directory
-
-Without a **publishing** strategy any files that are created by a process will only exist in the `work` directory.
+## Publish outputs
 
-Realistically, you may want to capture a set of outputs and save them in a specific directory.
+By default, all files created by processes exist only inside the `work` directory. To make our outputs more accessible and neatly organised, we define a **publishing strategy**, which determines which outputs should be copied to a final **publishing directory**.
 
 The [`publishDir` directive](https://www.nextflow.io/docs/latest/process.html#publishdir) can be used to specify where and how output files should be saved. For example:
 
 ```groovy
 publishDir 'results'
 ```
 
-By adding the above to a process, all output files would be saved in a new folder called `results` in the current working directory. The process directive is process specific.
+By adding the above to a process, all output files would be saved in a new folder called `results` in the current working directory. The `publishDir` directive is process specific.
 
 !!!question "Exercise"