ATAC-Seq workflow
ATAC-Seq workflow

Methods

QC

We use FastQC (FastQC) (v0.11.8) to assess the overall quality of each sequenced sample.

We use TrimGalore (TrimGalore) (v0.4.5) and cutadapt (Martin) (v1.15) with the following parameters: --nextera -e 0.1 --stringency 6 --length 20 --nextseq 20.

We align trimmed reads to the reference genome with Bowtie2 (Langmead and Salzberg, 2012) (v2.3.4.1) with the following parameters: -X 2000 --no-mixed --no-discordant, and defaults multi-seed length of 20bp with 0 mismatches.

Duplicate reads are marked with Picard (Picard) (v2.20.2).

Alignments to autosomes and sex chromosomes are kept (i.e. mitochondrial reads are removed), duplicates marked by Picard are removed, and alignments below a MAPQ threshold are removed. These filtering steps are performed with samtools (Li et al., 2009) (v1.2) and the parameters: -q 10 -F 1024.

Reads completely overlapping blacklisted regions (ENCODE Blacklist Regions) are removed with bedtools (Quinlan and Hall, 2010) (v2.28.0).

Sample-wise peaks are called with macs2 (Zhang et al., 2008) (v2.1.2) with flags: -f BAM --nomodel --shift -100 --extsize 200.

Peaks over all samples are merged with bedops (Neph et al., 2012) (v2.4.36) for the purpose of principal component analysis and unsupervised clustering to assess the similarity of samples.

Finally, MultiQC (Ewels et al., 2016) (v1.7) generates a report combining FastQC, trimming, alignment, and duplicate calling over all the samples. For ATAC specific QC metrics we use ataqv (Ataqv) (v1.0.0).

Differential Testing

We use the edgeR R Bioconductor package (McCarthy et al., 2012) to identify regions of differentially open chromatin (DOC). For each sample, the number of reads in the merged peaks is counted for each sample, and a library size normalization factor is determined. With no replicates, we manually tune the BCV (biological coefficient of variance) parameter. We then fit each model using the glmFit function, and test each contrast with a likelihood ratio test. With replicates, the common, trended, and tagwise negative binomial dispersions are calculated. We then fit each model using the glmQLFit function, and test each contrast with an empirical Bayes quasi-likelihood F-test. The DOC are then annotated to genic and CpG island annotations using the annotatr R Bioconductor package (Cavalcante and Sartor, 2017).

Footprinting

We use the HINT tool in the Regulatory Genomics Toolbox (rgt-hint) to do footprinting and motif binding analysis (Li et al., 2019). Briefly, for each condition, the reads among the replicates are pooled and tag counts are determined within the merged peaks. Next motifs from the JASPAR database (Khan et al., 2017) are queried against the footprints found within each condition. Finally, a differential score representing the transcription factor binding activity and the openness of the surrounding chromatin is calculated and visualized between pairs of conditions.

References

Ataqv https://github.com/ParkerLab/ataqv.
Cavalcante,R.G. and Sartor,M.A. (2017) Annotatr: Genomic regions in context. Bioinformatics, 33, 2381–2383.
ENCODE Blacklist Regions https://sites.google.com/site/anshulkundaje/projects/blacklists.
Ewels,P. et al. (2016) MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32, 3047–3048.
FastQC https://github.com/s-andrews/FastQC.
Khan,A. et al. (2017) JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Research, 46, D260–D266.
Langmead,B. and Salzberg,S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods, 9, 357–359.
Li,H. et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079.
Li,Z. et al. (2019) Identification of transcription factor binding sites using ATAC-seq. Genome Biology, 20.
Martin,M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17, 3.
McCarthy,D.J. et al. (2012) Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40, 4288–4297.
Neph,S. et al. (2012) BEDOPS: High-performance genomic feature operations. Bioinformatics, 28, 1919–1920.
Picard https://github.com/broadinstitute/picard.
Quinlan,A.R. and Hall,I.M. (2010) BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics, 26, 841–842.
TrimGalore https://github.com/FelixKrueger/TrimGalore.
Zhang,Y. et al. (2008) Model-based Analysis of ChIP-Seq (MACS). Genome Biology, 9, R137.