Project

General

Profile

Wiki » History » Revision 2

Revision 1 (Katie Lennard, 05/28/2018 03:13 PM) → Revision 2/3 (Katie Lennard, 05/28/2018 03:33 PM)

# Wiki

# DADA2 (Divisive Amplicon Denoising Algorithm) pipeline overview
a)The 1. The main difference compared to OTU-clustering-based methods is that dada2 detects 'exact amplicon sequence variants' (ASVs), which unlike OTUs consist of a single unique sequence as opposed to a cluster of closely related (97% identical) sequences.
b) 2. DADA2 uses sequence quality information to build an error model, using machine learning methods, alternating info on sample composition and error rates until convergence. DADA2 therefore performs error correction, assigning all relevant reads to an error-corrected sequence.
c) 3. Each ASV has an associated quality estimate, which informs inference/denoising (can pool samples to improve inference, especially for low abundance ASVs)
d) 4. The main steps of DADA2 are:
Once demultiplexed fastq files without non-biological nucleotides (strip primers) are in hand, the dada2 pipeline proceeds as follows:
* Filter and trim: filterAndTrim() (filters the forward and reverse reads jointly, outputting only those pairs of reads that both pass the filter)
* Dereplicate: derepFastq()
* c) Learn error rates: learnErrors()
* d) Infer sample composition: dada()
* Merge paired reads: mergePairs()
* Make sequence table: makeSequenceTable()
* Remove chimeras: removeBimeraDenovo()
e) 5. An important consideration: If using paired-end sequencing data, you must maintain a suitable overlap (>20nts) between the forward and reverse reads after trimming!