Wiki¶
AIM: To develop a predictive algorithm to determine whether an infectious or other non-infectious cause is likely or not.¶
The aim will be achieved based on
- Human RNASeq & downstream analysis as noted specifically related to immune system genes
- Assess the human immune system genes DNA in particular but not limited to interferon, cytokines and chemokines)
Sample data for all the participants is on ilifu in¶
/cbio/projects/017/definitive/
Detailed information regarding participants is provided in a txt file¶
/cbio/projects/017/patients_clinical_details.txt
Of the planned 47 participants, COVC04, COVC07, COVC23 and COVC30 were excluded based on the clinical notes shared by Ruan Marais on 18 July 2022 on slack: https://cbio.slack.com/files/U02LWC4GQTE/F03PZ1H8J0J/table_1_-_clinical_details.xlsx.
As at 10 August 2022, one participant: COVC26 is outstanding in /cbio/projects/017/definitive/, as such the metadata file excludes this participant.
/cbio/projects/017/metadata.txt
metadata.txt
is a file that consists of the three columns of
/cbio/projects/017/patients_clinical_details.txt
It was created by reading the .xsls file in R and write the "samplename", "COVID-19 status" and "Neurological symptoms due to COVID-19"
Important things to note:¶
We perform the RNA seq gene count using the
nf-core/rnaseq pipeline.
nf-core/rnaseq
does read quality checks using FASTQC , read trimming by TrimGalore , read mapping by STAR & quantification by SALMON.
To run the pipeline, we create a samplesheet.csv for the analysis by using fastq_dir_to_samplesheet.py obtained from the nf-core by using wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py. And changed the file permissions to executable
chmod 755 fastq_dir_to_samplesheet.py
Run the script
./fastq_dir_to_samplesheet.py /cbio/projects/017/definitive/ /cbio/projects/017/analysis/samplesheet.csv --strandedness reverse
Run the nf-core/rnaseq
pipeline,¶
sbatch /cbio/projects/017/rnaseq/rnaseq-pipeline.sh
Upon getting the quantification results (star_salmon), downstream analysis is done using R programming language on a local machine. The working directory is
/home/ephie/UCT-DATA_ANALYST/BioinformaticsSupportTeam/ruan/definitive/
using the R. We have different versions, that is,
v0¶
Details of this analysis and the results are given under the https://bst.cbio.uct.ac.za/redmine/attachments/198. We grouped the samples based on encephalitic (yes or no), COVID-19 status (possible or unlikely) and immunosupression (yes or no)
v1¶
Details of the analysis and the design are provided in https://bst.cbio.uct.ac.za/redmine/attachments/196.
v2¶
Details of the analysis and the design are provided in https://bst.cbio.uct.ac.za/redmine/attachments/197
Generally, the downstream analysis was done with DESeq2 and R packages including ggplot and others. In short, we do
- Count normalization that i.e creation of the DESeq2Dataset object.
- Exploratory data analysis (PCA & hierarchical clustering) - identifying outliers & sources of variation in the data:
- Running the DESeq2 using the "DESeq2" function
- Check the fit of the dispersion estimates: using "plotDispEsts"
- Create contrasts to perform Wald testing on the shrunken log2 fold changes between specific conditions:
- Output significant results
- Visualize results: volcano plots, heat-maps, normalized counts plots of top genes, etc.
- Take note of all the versions of all tools used in the DE analysis:
Updated by Ephie Geza about 2 years ago ยท 6 revisions