Project

General

Profile

Wiki » History » Version 4

Katie Lennard, 06/10/2020 05:35 PM

1 1 Katie Lennard
# Wiki
2
3 2 Katie Lennard
# Library prep summary
4
Sample concentration and quality was assessed by Eukaryote Total RNA Pico on Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). Samples were treated with DNAse prior to library preparation. Library preparation was performed with SMARTer Stranded Total RNA (Clontech Inc, Mountain View, CA) following manufacturer’s instructions. Average final library size is between 300-400 bp. Illumina 8-nt dual-indices were used for multiplexing. Samples were pooled and sequenced on Illumina HiSeq X sequencer for 150 bp read length in paired-end mode, with an output of 80 million reads per sample.
5
6
# Library prep QC
7
Sample QC reports attached. Mostly VERY low RIN scores.
8
9 1 Katie Lennard
# Data location
10
Data is available in the form of compressed fastq files. Approximately 600 GB after unzipping the files. Files are to be uploaded onto the UCT G-drive.
11
12
# Bioinformatic analyses requested
13
Standard RNA sequencing analysis including quality assessment, data normalization, alignment, gene mapping, pairwise comparisons, functional enrichment and visualization.
14
15
# Papers envisaged
16
Data from this analysis will be incorporated in a manuscript phenotyping the changes in immune cells (T regulatory and Th17 cells) during infancy or as a stand-alone manuscript. The authors will include the team in the Clive Gray and Heather Jaspan group involved in this work together with the Bioinfomatician from CBIO who is willing to collaborate with this analysis. 
17 2 Katie Lennard
18
# RNAseq QC
19
20 4 Katie Lennard
Preliminary QC indicates substantial rRNA content, high levels of duplication, a very high proportion of reads to short to map as well as Illumina adapter contamination. The Illumina adapters are usually removed by this pipeline but in this case they seem to have been missed (maybe because they are not right at the end of the read and occur at relatively variable positions across reads). I will therefore use bbduk (as implemented in the YAMP pipeline and now in https://github.com/kviljoen/fastq_QC)
21
22
The default phred score for bbduk trimming in fastq_QC pipeline is 10 (regions with average quality BELOW this will be trimmed). I did however notice severe levels of TTTTTTTT repeats (of varying lengths, in some cases the whole read) after trimming with default phred score of 10. So I raised this to 15 (as most of these T repeats had quality scores of 12 (ASCII '-'). 
23 3 Katie Lennard
24
#Stranded library
25
SMARTer Stranded RNA kit: https://github.com/kviljoen/RNAseq/blob/master/docs/usage.md#library-strandedness So for this library prep, see here https://chipster.csc.fi/manual/library-type-summary.html
26
we should use the flag --forwardStranded