DNA methylation workflow
DNA methylation workflow

Methods

QC

We use FastQC (FastQC) (v0.11.8) to assess the overall quality of each sequenced sample.

We use TrimGalore (@ article?{noauthor_trimgalore_nodate,) (v0.4.5) with the following parameters: --adapter AGATCGGAAGAGC -e 0.1 --stringency 6 --length 20 --nextseq 20, and with the additional --rrbs flag in the ERRBS case.

Reads are aligned to the reference genome with Bismark (Krueger and Andrews, 2011) (v0.22.1) using Bowtie2 (Langmead and Salzberg, 2012) (v2.3.4) with default settings (multi-seed length of 20bp with 0 mismatches).

For WGBS, duplicate reads are marked and removed with Picard (Picard) (v2.20.2). This step is not performed for ERRBS.

Alignments below a MAPQ threshold are removed with samtools (Li et al., 2009) (v1.2) and the parameters: -q 10.

MethylDackel (MethylDackel) (v0.4.0) then calls methylation rates with parameters -d 5 -D 2000 --mergeContext.

In the event of a BS/oxBS or BS/TAB library preparation, methylation mark deconvolution is then performed using the MLML2R R package (Qu et al., 2013). Briefly, methylated and unmethylated counts from bisulfite-only treated samples and oxidative-bisulfte treated samples are extracted and passed to MLML2R::MLML() to determine the levels of methylcytosine (mC), hydroxymethylcytosine (hmC), and cytosine (C) using the exact method provided in the package.

Differential Testing

Without Mark Deconvolution

Differential methylation testing is performed with the MethylSig R Bioconductor package which performs group-versus-group tests using a beta-binomial approach to calculate differential methylation statistics, accounting for variation among replicates within each group (Park et al., 2014). Alternately, differential methylation testing can be performed with the DSS R bioconductor package which is a beta-binomial approach with arcsine link function to test under general experimental design (Wu and Park, 2016).

With Mark Deconvolution

We use the gamlss R package to identify differentially methylated probes (DMPs) by fitting a beta-regression model on the beta-values (Stasinopoulos and Rigby, 2007). The DMPs are then annotated to CpG island and genic annotations using the annotatr R Bioconductor package (Cavalcante and Sartor, 2017).

References

Cavalcante,R.G. and Sartor,M.A. (2017) Annotatr: Genomic regions in context. Bioinformatics, 33, 2381–2383.
FastQC https://github.com/s-andrews/FastQC.
Krueger,F. and Andrews,S.R. (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics, 27, 1571–1572.
Langmead,B. and Salzberg,S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357–359.
Li,H. et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079.
MethylDackel https://github.com/dpryan79/MethylDackel.
Park,Y. et al. (2014) MethylSig: a whole genome DNA methylation analysis pipeline. Bioinformatics, 30, 2414–2422.
Picard https://github.com/broadinstitute/picard.
Qu,J. et al. (2013) MLML: consistent simultaneous estimates of DNA methylation and hydroxymethylation. Bioinformatics, 29, 2645–2646.
Stasinopoulos,D.M. and Rigby,R.A. (2007) Generalized Additive Models for Location Scale and Shape (GAMLSS) in r. Journal of Statistical Software, 23.
Wu,H. and Park,Y. (2016) Differential methylation analysis for BS-seq data under general experimental design. Bioinformatics, 32, 1446–1453.