2021-04-26-umich-rnaseqDemystified

More QC - Cutadapt and MultiQC

In this module we will learn:

about the cutadapt software and its uses
how to use the cutadapt tool for trimming adapters
how to trim all of our samples in a for-loop
about the MultiQC tool and its capabilities
how to run multiQC on a remote system, transfer and view the reports locally

Differential Expression Workflow

As a reminder, our overall differential expression workflow is shown below. In this lesson, we will go over the bold part of the workflow.

Step	Task
1	Experimental Design
2	Biological Samples / Library Preparation
3	Sequence Reads
4	Assess Quality of Reads
5	Splice-aware Mapping to Genome
6	Count Reads Associated with Genes
:--:	----
7	Organize project files locally
8	Initialize DESeq2 and fit DESeq2 model
9	Assess expression variance within treatment groups
10	Specify pairwise comparisons and test for differential expression
11	Generate summary figures for comparisons
12	Annotate differential expression result tables

Cutadapt

Cutadapt is a very widely used read trimming and fastq processing software, cited several thousands of times. It's written in python, and is user-friendly and reasonably fast.

It is used for removing adapter sequences, primers, and poly-A tails, for trimming based on quality thresholds, for filtering reads based on characteristics, etc.

It can operate on both FASTA and FASTQ file formats, and it supports compressed or raw inputs and outputs.

Notably, cutadapt's error-tolerant adapter trimming likely contributed greatly to its early popularity. We will use it to trim the adapters from our reads. As usual, we'll view the help page to get a sense for how to structure our command.

Cutadapt Exercise:

View the help page of the cutadapt tool
Construct a cutadapt command to trim the adapters from paired-end reads
View the output of cutadapt, and verify that it's correct
Construct commands to trim the reads for all of our samples

Click here for solution - cutadapt exercise

Log back in to aws instance with ssh <username>@50.17.210.255

View cutadapt help page

 cutadapt --help | less
 # Will need to type `q` to exit from `less`

Trim the adapters from sample_01 with cutadapt

 # Need to create directory for trimmed outputs
 mkdir ~/analysis/trimmed

 cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o ~/analysis/trimmed/sample_01.trimmed.fastq.gz -p ~/analysis/trimmed/sample_01_R2.trimmed.fastq.gz ~/data/reads/sample_01_R1.fastq.gz ~/data/reads/sample_01_R2.fastq.gz

View cutadapt output
```
 ls -l ~/analysis/trimmed
```
Construct commands to trim the reads for all of our samples

Note: We're re-using the same command. We can update $SAMPLE, then press 'up' to re-run cutadapt command with newly defined variable.

    SAMPLE=sample_02
    cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o ~/analysis/trimmed/${SAMPLE}_R1.trimmed.fastq.gz -p ~/analysis/trimmed/${SAMPLE}_R2.trimmed.fastq.gz ~/data/reads/${SAMPLE}_R1.fastq.gz ~/data/reads/${SAMPLE}_R2.fastq.gz

    SAMPLE=sample_03
    cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o ~/analysis/trimmed/${SAMPLE}_R1.trimmed.fastq.gz -p ~/analysis/trimmed/${SAMPLE}_R2.trimmed.fastq.gz ~/data/reads/${SAMPLE}_R1.fastq.gz ~/data/reads/${SAMPLE}_R2.fastq.gz

    SAMPLE=sample_04
    cutadapt -a AGATCGGAAGAG -A AGATCGGAAGAG -o ~/analysis/trimmed/${SAMPLE}_R1.trimmed.fastq.gz -p ~/analysis/trimmed/${SAMPLE}_R2.trimmed.fastq.gz ~/data/reads/${SAMPLE}_R1.fastq.gz ~/data/reads/${SAMPLE}_R2.fastq.gz

Re-running FastQC

Now that we've run cutadapt and trimmed the adapters from our reads, we will quickly re-run FastQC on these trimmed read FASTQs. This will confirm that we've successfully trimmed the adapters, and we'll see that our FASTQ files are ready for sequencing.

Re-running FastQC Exercise:

Create directory for new fastqc results
Construct and execute FastQC command to evaluate trimmed read FASTQ files
View the output (filenames)

Click here for solution - re-running FastQC exercise

Create directory for new fastqc results
```
 mkdir ~/analysis/fastqc_trimmed
```

FastQC command to evaluate trimmed FASTQ files

 fastqc -o ~/analysis/fastqc_trimmed ~/analysis/trimmed/*.fastq.gz

MultiQC

FastQC is an excellent tool, but it can be tedious to look at the report for each sample separately, while keeping track of what trends emerge. It would be much easier to look at all the FastQC reports compiled into a single report. MultiQC is a tool that does exactly this.

MultiQC is designed to interpret and aggregate reports from various tools and output a single report as an HTML document.

MultiQC Exercise:

View the multiQC help page
Construct a MultiQC command to aggregate our QC results into a single report
Transfer the MultiQC report to personal computer using scp
View the MultiQC report

MultiQC solution

View MultiQC help page

 multiqc --help | less
 # Will need to type `q` to exit from `less`

MultiQC command to process our trimmed read results

 multiqc --outdir ~/analysis/multiqc ~/analysis/fastqc_trimmed/

Log out of aws instance and use scp to transfer MultiQC report to local computer

 exit # log out from remote

 # Now on local
 scp <username>@50.17.210.255:~/analysis/multiqc/multiqc_report.html ~/workshop_rsd/

View MultiQC report Use GUI file manager to find your ~/workshop_rsd folder Double-click multiqc_report.html (open it with an internet browser)

Opening the HTML report, we see it is organized by the same modules and each plot has all samples for which FastQC was run. We can see the report confirms that the adapters have been trimmed from our sequence.

These materials have been adapted and extended from materials created by the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.