ChIP-Seq workflow
ChIP-Seq workflow

Methods

QC

We use FastQC (FastQC) (v0.11.8) to assess the overall quality of each sequenced sample.

We use TrimGalore (TrimGalore) (v0.4.5) and cutadapt (Martin) (v1.15) with the following parameters: --nextera -e 0.1 --stringency 6 --length 20 --nextseq 20.

We align trimmed reads to the reference genome with Bowtie2 (Langmead and Salzberg, 2012) (v2.3.4.1) using default parameters with the exception of the following flags: -X 2000.

Duplicate reads may be are marked with Picard (Picard) (v2.20.2).

Alignments are filtered with samtools (Li et al., 2009) (v1.2) using the flags: -q 10 -F 1024.

Alignments completely overlapping blacklisted regions (ENCODE Blacklist Regions) are removed with bedtools (Quinlan and Hall, 2010) (v2.28.0).

Sample-wise peaks are called with macs2 (Zhang et al., 2008) (v2.1.2) with flags: --format BAMPE --gsize mm --qvalue 0.05.

Optionally, consensus peaks for replicates are determined by mutual intersection of sample-wise peaks with bedops –intersect. Consensus peaks are then annotated with annotatr (Cavalcante and Sartor, 2017).

Optionally, motifs for consensus peaks are found with HOMER (Heinz et al., 2010) (v4.11.1).

Peaks over all samples are merged with bedops (Neph et al., 2012) (v2.4.36) for the purpose of principal component analysis and unsupervised clustering to assess the similarity of samples. Read cross-correlations within samples are plotted using phantompeakqualtools (Landt et al., 2012). MultiQC (Ewels et al., 2016) (v1.7) generates a report combining FastQC, trimming, alignment, and duplicate calling over all the samples. The software DeepTools is used to calculate coverage and IP efficiency plots (Ramírez et al., 2016) (v3.3.0).

Differential Testing

We use the edgeR R Bioconductor package (McCarthy et al., 2012) to identify regions of differential binding. For each sample, the number of reads in the merged peaks is counted for each sample, and a library size normalization factor is determined. With no replicates, we manually tune the BCV (biological coefficient of variance) parameter. We then fit each model using the glmFit function, and test each contrast with a likelihood ratio test. With replicates, the common, trended, and tagwise negative binomial dispersions are calculated. We then fit each model using the glmQLFit function, and test each contrast with an empirical Bayes quasi-likelihood F-test. The differential peaks are then annotated to genic and CpG island annotations using the annotatr R Bioconductor package (Cavalcante and Sartor, 2017).

References

Cavalcante,R.G. and Sartor,M.A. (2017) Annotatr: Genomic regions in context. Bioinformatics, 33, 2381–2383.
ENCODE Blacklist Regions https://sites.google.com/site/anshulkundaje/projects/blacklists.
Ewels,P. et al. (2016) MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32, 3047–3048.
FastQC https://github.com/s-andrews/FastQC.
Heinz,S. et al. (2010) Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Molecular Cell, 38, 576–589.
Landt,S.G. et al. (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research, 22, 1813–1831.
Langmead,B. and Salzberg,S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357–359.
Li,H. et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079.
Martin,M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17, 3.
McCarthy,D.J. et al. (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40, 4288–4297.
Neph,S. et al. (2012) BEDOPS: High-performance genomic feature operations. Bioinformatics, 28, 1919–1920.
Picard https://github.com/broadinstitute/picard.
Quinlan,A.R. and Hall,I.M. (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–842.
Ramírez,F. et al. (2016) deepTools2: A next generation web server for deep-sequencing data analysis. Nucleic Acids Research, 44, W160–W165.
TrimGalore https://github.com/FelixKrueger/TrimGalore.
Zhang,Y. et al. (2008) Model-based Analysis of ChIP-Seq (MACS). Genome Biology, 9, R137.