EPIC methylation array workflow
EPIC methylation array workflow

Methods

QC

Samples are processed with the SeSAMe R Bioconductor package (Zhou et al., 2018). Briefly, red/green IDAT files are read and processed according to the preparation code. Experiment-independent masking occurs with the quality mask code (“Q”), and accounts for probes containing SNPs, with known cross-hybridization issues (Pidsley et al., 2013), and with other issues. Experiment-dependent is done according to the p-value with out-of-bound array hybridization (pOOBAH) algorithm (“P”), which is an improvement on detection p-value filtering (Zhou et al., 2018). Non-linear dye bias correction (“D”) is performed followed by background correction with the NOOB method (“B”) (Fortin et al., 2014). This processing pipeline constitutes a within-array normalization procedure. We note that recent studies have indicated within-array normalization with dye-bias correction and NOOB performs as well or better than between-array normalization procedures (Welsh et al., 2023).

A probe has with pOOBAH masking (p-value < 0.05) in more than 5% of samples it is removed. Similarly, if a sample has more than 10% of probes pOOBAH masked, then that sample is removed.

Cell type deconvolution of whole blood or cord blood may be performed with the FlowSorted.Blood.EPIC or FlowSorted.CordBlood.450k R Bioconductor package, respectively, using a modified version of the Houseman method (Houseman et al., 2012).

In the event of a BS/oxBS or BS/TAB library preparation, methylation mark deconvolution may be performed using the MLML2R R package (Qu et al., 2013). Briefly, methylated and unmethylated channel matrices from bisulfite-only treated samples and oxidative-bisulfte treated samples are extracted and passed to MLML2R::MLML() to determine the levels of methylcytosine (mC), hydroxymethylcytosine (hmC), and cytosine (C) using the exact method provided in the package.

Differential Testing

Without Mark Deconvolution

For each comparison we use the limma R Bioconductor package to identify differentially methylated probes (DMPs) by fitting a linear model on the M-values whose standard errors are then moderated using an empirical Bayes model (Ritchie et al., 2015). The DMPs are then annotated to CpG island and genic annotations using the annotatr R Bioconductor package (Cavalcante and Sartor, 2017).

With Mark Deconvolution

For each comparison we use the gamlss R package to identify differentially methylated probes (DMPs) by fitting a beta-regression model on the beta-values (Stasinopoulos and Rigby, 2007). The DMPs are then annotated to CpG island and genic annotations using the annotatr R Bioconductor package (Cavalcante and Sartor, 2017).

References

Cavalcante,R.G. and Sartor,M.A. (2017) Annotatr: Genomic regions in context. Bioinformatics, 33, 2381–2383.
Fortin,J.-P. et al. (2014) Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biology, 15, 503.
Houseman,E.A. et al. (2012) DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics, 13, 86.
Pidsley,R. et al. (2013) A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics, 14.
Qu,J. et al. (2013) MLML: consistent simultaneous estimates of DNA methylation and hydroxymethylation. Bioinformatics, 29, 2645–2646.
Ritchie,M.E. et al. (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43, e47–e47.
Stasinopoulos,D.M. and Rigby,R.A. (2007) Generalized Additive Models for Location Scale and Shape (GAMLSS) in r. Journal of Statistical Software, 23.
Welsh,H. et al. (2023) A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data. Clinical Epigenetics, 15, 41.
Zhou,W. et al. (2018) SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Research, 46, e123–e123.