Project

General

Profile

Actions

Wiki

AIM: To develop a predictive algorithm to determine whether an infectious or other non-infectious cause is likely or not.

The aim will be achieved based on

  1. Human RNASeq & downstream analysis as noted specifically related to immune system genes
  2. Assess the human immune system genes DNA in particular but not limited to interferon, cytokines and chemokines)

Sample data for all the participants is on ilifu in

    /cbio/projects/017/definitive/

Detailed information regarding participants is provided in a txt file

    /cbio/projects/017/patients_clinical_details.txt

Of the planned 47 participants, COVC04, COVC07, COVC23 and COVC30 were excluded based on the clinical notes shared by Ruan Marais on 18 July 2022 on slack: https://cbio.slack.com/files/U02LWC4GQTE/F03PZ1H8J0J/table_1_-_clinical_details.xlsx.
As at 10 August 2022, one participant: COVC26 is outstanding in /cbio/projects/017/definitive/, as such the metadata file excludes this participant.

/cbio/projects/017/metadata.txt

metadata.txt is a file that consists of the three columns of

/cbio/projects/017/patients_clinical_details.txt

It was created by reading the .xsls file in R and write the "samplename", "COVID-19 status" and "Neurological symptoms due to COVID-19"

Important things to note:

We perform the RNA seq gene count using the

    nf-core/rnaseq pipeline. 

nf-core/rnaseq does read quality checks using FASTQC , read trimming by TrimGalore , read mapping by STAR & quantification by SALMON.

To run the pipeline, we create a samplesheet.csv for the analysis by using fastq_dir_to_samplesheet.py obtained from the nf-core by using wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py. And changed the file permissions to executable

        chmod 755 fastq_dir_to_samplesheet.py

Run the script

 ./fastq_dir_to_samplesheet.py /cbio/projects/017/definitive/ /cbio/projects/017/analysis/samplesheet.csv --strandedness reverse

Run the nf-core/rnaseq pipeline,

sbatch /cbio/projects/017/rnaseq/rnaseq-pipeline.sh

Upon getting the quantification results (star_salmon), downstream analysis is done using R programming language on a local machine. The working directory is

/home/ephie/UCT-DATA_ANALYST/BioinformaticsSupportTeam/ruan/definitive/

using the R. We have different versions, that is,

v0

Details of this analysis and the results are given under the https://bst.cbio.uct.ac.za/redmine/attachments/198. We grouped the samples based on encephalitic (yes or no), COVID-19 status (possible or unlikely) and immunosupression (yes or no)

v1

Details of the analysis and the design are provided in https://bst.cbio.uct.ac.za/redmine/attachments/196.

v2

Details of the analysis and the design are provided in https://bst.cbio.uct.ac.za/redmine/attachments/197

Generally, the downstream analysis was done with DESeq2 and R packages including ggplot and others. In short, we do

  1. Count normalization that i.e creation of the DESeq2Dataset object.
  2. Exploratory data analysis (PCA & hierarchical clustering) - identifying outliers & sources of variation in the data:
  3. Running the DESeq2 using the "DESeq2" function
  4. Check the fit of the dispersion estimates: using "plotDispEsts"
  5. Create contrasts to perform Wald testing on the shrunken log2 fold changes between specific conditions:
  6. Output significant results
  7. Visualize results: volcano plots, heat-maps, normalized counts plots of top genes, etc.
  8. Take note of all the versions of all tools used in the DE analysis:

Updated by Ephie Geza about 2 years ago ยท 6 revisions