Wiki » History » Revision 8
Revision 7 (Katie Lennard, 10/16/2019 10:12 AM) → Revision 8/10 (Katie Lennard, 01/13/2020 12:06 PM)
# Wiki 
# Sample processing details from Pieter de Waal
'The Illumina® 16S metagenomics workflow will be used to analyze the three differently sourced milk types. Each sample will be analyzed in duplicate, producing a total number of six extracted libraries. A positive control will also be included in the analyses. The hypervariable V3 and V4 regions of the 16S ribosomal RNA gene (16S rRNA) will be amplified, using an Illumina® pre-designed primer pair. Barcoding for multiplexing of the samples will entail the use of the Nextera XT®-Index Kit.'
From Jeanne Korsman (Jeanne.Korsman@cpgr.org.za):
"16S Amplicon PCR Forward Primer = 5' TCGTCGGCAGCGTC _AGATGTGTATAAGAGACAG_ CCTACGGGNGGCWGCAG
(Equivalent to the 16S "337F" primer)
16S Amplicon PCR Reverse Primer = 5' GTCTCGTGGGCTCGG _AGATGTGTATAAGAGACAG_ GACTACHVGGGTATCTAATCC
(Equivalent to the 16S "800R" primer)
The underlined bases are the sequencing primer binding site, so these should not be present in any of the sequences. The sequences on the 5'end are where the barcodes are attached. They are also available in the 16S Metagenomic Sequencing Library Preparation Manual (attached).
The Klindworth et al 2013 paper which Illumina based their primer sequences on is also attached."
# Data expected
'Data collection will take place from 01/03/2019 to 03/03/2019. The samples will be transported to CPGR on Monday, 04/03/2019. Results of the data analysis will be available within two weeks from then.'
# Bioinformatic analyses to be done
* Preprocessing of raw fastq files using the dada2 pipeline
* 'Alpha and beta diversity (Shannon diversity (H′) and Bray-Curtis dissimilarity index). Calculation of the relative abundance of each OTU per specimen. Fisher’s exact test for two-way tables. Construction of Log ratio biplots, dendograms and other illustrative (graphical) representation of data analysis.'
# Raw data
Raw (.fastq) copied from CPGR flashdisk to ilifu 26/6/2019
/ceph/cbio/users/katie/Levin/raw/
# Processed data
* The dada2 pipeline [[https://github.com/kviljoen/16S-rDNA-dada2-pipeline]] was used to process raw reads supplied by Pieter. The processed results are here: /ceph/cbio/users/katie/Levin/trunc245test
* Taxonomic DB used: RefSeq_RDP ( [[https://zenodo.org/record/3266798#.XRyfaNMzZTY]] )
 
## Parameters considered
For the trimming/filtering stage primers had not been removed by CPGR (only adapters/barcodes). In order to detect the correct trimFor trimRev settings for dada2, egrep --colour CCTACGGG for R1 reads and egrep --colour GACTAC for R2 reads were used. --trimFor was set to 17 and trimRev to 21.
Some of the samples had quality drops in the last 3 bases after this filtering for R2 reads (particularly Ruralcow2 and HomemadeAmazi). To investigate whether they should be trimmed, --truncRev was set to 245 (to trim the last 6bp off the 251 bp reads). Note that there is not too much scope for trimming as this affects the R1/R2 overlap - see example seqs were which aligned (R1/R2) with Emboss Water. The result downstream (of setting --truncRev 245) was that the two samples most affected (Ruralcow2 and HomemadeAmazi) had a couple of species that were unique to each. The final run was however done without truncation as follows:
Final run on Ilifu: trim settings: nextflow run kviljoen/16S-rDNA-dada2-pipeline --reads '/ceph/cbio/users/katie/Levin/raw/*_R{1,2}_001.fastq.gz' --trimFor 17 --trimRev 21 --truncFor 250 --truncRev 245 --reference /ceph/cbio/users/katie/dada2-test/gg_13_8_train_set_97.fa.gz --outdir /ceph/cbio/users/katie/Levin/refseq_RDP --reference /ceph/cbio/users/katie/RefSeq-RDP16S_v3_May2018.fa.gz /ceph/cbio/users/katie/Levin/trunc245test -profile ilifu
  
# Downstream analyses in R
Exploratory analyses and differential abundance testing was conducted in R (alpha, beta diversity, per-sample barplots, heatmap, metagenomeSeq differential abundance testing, files attached)
 
        
        
    # Sample processing details from Pieter de Waal
'The Illumina® 16S metagenomics workflow will be used to analyze the three differently sourced milk types. Each sample will be analyzed in duplicate, producing a total number of six extracted libraries. A positive control will also be included in the analyses. The hypervariable V3 and V4 regions of the 16S ribosomal RNA gene (16S rRNA) will be amplified, using an Illumina® pre-designed primer pair. Barcoding for multiplexing of the samples will entail the use of the Nextera XT®-Index Kit.'
From Jeanne Korsman (Jeanne.Korsman@cpgr.org.za):
"16S Amplicon PCR Forward Primer = 5' TCGTCGGCAGCGTC _AGATGTGTATAAGAGACAG_ CCTACGGGNGGCWGCAG
(Equivalent to the 16S "337F" primer)
16S Amplicon PCR Reverse Primer = 5' GTCTCGTGGGCTCGG _AGATGTGTATAAGAGACAG_ GACTACHVGGGTATCTAATCC
(Equivalent to the 16S "800R" primer)
The underlined bases are the sequencing primer binding site, so these should not be present in any of the sequences. The sequences on the 5'end are where the barcodes are attached. They are also available in the 16S Metagenomic Sequencing Library Preparation Manual (attached).
The Klindworth et al 2013 paper which Illumina based their primer sequences on is also attached."
# Data expected
'Data collection will take place from 01/03/2019 to 03/03/2019. The samples will be transported to CPGR on Monday, 04/03/2019. Results of the data analysis will be available within two weeks from then.'
# Bioinformatic analyses to be done
* Preprocessing of raw fastq files using the dada2 pipeline
* 'Alpha and beta diversity (Shannon diversity (H′) and Bray-Curtis dissimilarity index). Calculation of the relative abundance of each OTU per specimen. Fisher’s exact test for two-way tables. Construction of Log ratio biplots, dendograms and other illustrative (graphical) representation of data analysis.'
# Raw data
Raw (.fastq) copied from CPGR flashdisk to ilifu 26/6/2019
/ceph/cbio/users/katie/Levin/raw/
# Processed data
* The dada2 pipeline [[https://github.com/kviljoen/16S-rDNA-dada2-pipeline]] was used to process raw reads supplied by Pieter. The processed results are here: /ceph/cbio/users/katie/Levin/trunc245test
* Taxonomic DB used: RefSeq_RDP ( [[https://zenodo.org/record/3266798#.XRyfaNMzZTY]] )
## Parameters considered
For the trimming/filtering stage primers had not been removed by CPGR (only adapters/barcodes). In order to detect the correct trimFor trimRev settings for dada2, egrep --colour CCTACGGG for R1 reads and egrep --colour GACTAC for R2 reads were used. --trimFor was set to 17 and trimRev to 21.
Some of the samples had quality drops in the last 3 bases after this filtering for R2 reads (particularly Ruralcow2 and HomemadeAmazi). To investigate whether they should be trimmed, --truncRev was set to 245 (to trim the last 6bp off the 251 bp reads). Note that there is not too much scope for trimming as this affects the R1/R2 overlap - see example seqs were which aligned (R1/R2) with Emboss Water. The result downstream (of setting --truncRev 245) was that the two samples most affected (Ruralcow2 and HomemadeAmazi) had a couple of species that were unique to each. The final run was however done without truncation as follows:
Final run on Ilifu: trim settings: nextflow run kviljoen/16S-rDNA-dada2-pipeline --reads '/ceph/cbio/users/katie/Levin/raw/*_R{1,2}_001.fastq.gz' --trimFor 17 --trimRev 21 --truncFor 250 --truncRev 245 --reference /ceph/cbio/users/katie/dada2-test/gg_13_8_train_set_97.fa.gz --outdir /ceph/cbio/users/katie/Levin/refseq_RDP --reference /ceph/cbio/users/katie/RefSeq-RDP16S_v3_May2018.fa.gz /ceph/cbio/users/katie/Levin/trunc245test -profile ilifu
# Downstream analyses in R
Exploratory analyses and differential abundance testing was conducted in R (alpha, beta diversity, per-sample barplots, heatmap, metagenomeSeq differential abundance testing, files attached)