Wiki » History » Version 3
Katie Lennard, 05/28/2018 03:35 PM
1 | 1 | Katie Lennard | # Wiki |
---|---|---|---|
2 | |||
3 | # DADA2 (Divisive Amplicon Denoising Algorithm) pipeline overview |
||
4 | 2 | Katie Lennard | a)The main difference compared to OTU-clustering-based methods is that dada2 detects 'exact amplicon sequence variants' (ASVs), which unlike OTUs consist of a single unique sequence as opposed to a cluster of closely related (97% identical) sequences. |
5 | b) DADA2 uses sequence quality information to build an error model, using machine learning methods, alternating info on sample composition and error rates until convergence. DADA2 therefore performs error correction, assigning all relevant reads to an error-corrected sequence. |
||
6 | c) Each ASV has an associated quality estimate, which informs inference/denoising (can pool samples to improve inference, especially for low abundance ASVs) |
||
7 | d) The main steps of DADA2 are: |
||
8 | 1 | Katie Lennard | Once demultiplexed fastq files without non-biological nucleotides (strip primers) are in hand, the dada2 pipeline proceeds as follows: |
9 | * Filter and trim: filterAndTrim() (filters the forward and reverse reads jointly, outputting only those pairs of reads that both pass the filter) |
||
10 | * Dereplicate: derepFastq() |
||
11 | 2 | Katie Lennard | * Learn error rates: learnErrors() |
12 | * Infer sample composition: dada() |
||
13 | * Merge paired reads: mergePairs() |
||
14 | * Make sequence table: makeSequenceTable() |
||
15 | 3 | Katie Lennard | * Remove chimeras: removeBimeraDenovo() (Chimeric sequences are removed after ASV identification and not based on a database: chemeric seqs are IDed as a seq that can be exactly reconstructed from a left and right segment from two more abundant parent seqs) |
16 | 2 | Katie Lennard | e) An important consideration: If using paired-end sequencing data, you must maintain a suitable overlap (>20nts) between the forward and reverse reads after trimming! |