Wiki » History » Version 4
Ephie Geza, 01/24/2023 05:21 PM
| 1 | 1 | Ephie Geza | # Wiki |
|---|---|---|---|
| 2 | |||
| 3 | ## AIM: To develop a predictive algorithm to determine whether an infectious or other non-infectious cause is likely or not. |
||
| 4 | The aim will be achieved based on |
||
| 5 | 1. Human RNASeq & downstream analysis as noted specifically related to immune system genes |
||
| 6 | 1. Assess the human immune system genes DNA in particular but not limited to interferon, cytokines and chemokines) |
||
| 7 | |||
| 8 | ## Sample data for all the participants is on ilifu in |
||
| 9 | /cbio/projects/017/definitive/ |
||
| 10 | |||
| 11 | ## Detailed information regarding participants is provided in a txt file |
||
| 12 | /cbio/projects/017/patients_clinical_details.txt |
||
| 13 | Of the planned 47 participants, COVC04, COVC07, COVC23 and COVC30 were excluded based on the clinical notes shared by Ruan Marais on 18 July 2022 on slack: https://cbio.slack.com/files/U02LWC4GQTE/F03PZ1H8J0J/table_1_-_clinical_details.xlsx. |
||
| 14 | 2 | Ephie Geza | As at **10 August 2022**, one participant: COVC26 is outstanding in **/cbio/projects/017/definitive/**, as such the metadata file excludes this participant. |
| 15 | 1 | Ephie Geza | > /cbio/projects/017/metadata.txt |
| 16 | |||
| 17 | `metadata.txt` is a file that consists of the three columns of |
||
| 18 | > /cbio/projects/017/patients_clinical_details.txt |
||
| 19 | |||
| 20 | It was created by reading the .xsls file in R and write the "samplename", "COVID-19 status" and "Neurological symptoms due to COVID-19" |
||
| 21 | |||
| 22 | ## Important things to note: |
||
| 23 | We perform the RNA seq gene count using the |
||
| 24 | |||
| 25 | nf-core/rnaseq pipeline. |
||
| 26 | `nf-core/rnaseq` does read quality checks using **FASTQC** , read trimming by **TrimGalore** , read mapping by **STAR** & quantification by **SALMON**. |
||
| 27 | |||
| 28 | To run the pipeline, we create a **samplesheet.csv** for the analysis by using **fastq_dir_to_samplesheet.py** obtained from the **nf-core** by using **wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py**. And changed the file permissions to executable |
||
| 29 | ``` shell |
||
| 30 | chmod 755 fastq_dir_to_samplesheet.py |
||
| 31 | ``` |
||
| 32 | Run the script |
||
| 33 | ``` shell |
||
| 34 | ./fastq_dir_to_samplesheet.py /cbio/projects/017/definitive/ /cbio/projects/017/analysis/samplesheet.csv --strandedness reverse |
||
| 35 | ``` |
||
| 36 | ## Run the `nf-core/rnaseq` pipeline, |
||
| 37 | ``` shell |
||
| 38 | sbatch /cbio/projects/017/rnaseq/rnaseq-pipeline.sh |
||
| 39 | |||
| 40 | ``` |
||
| 41 | Upon getting the quantification results **(star_salmon)**, downstream analysis is done using **R programming** language on a local machine. The **working directory** is |
||
| 42 | 4 | Ephie Geza | > /home/ephie/UCT-DATA_ANALYST/BioinformaticsSupportTeam/ruan/definitive/ |
| 43 | 1 | Ephie Geza | |
| 44 | 4 | Ephie Geza | using the **R**. We have different versions, that is, |
| 45 | |||
| 46 | ## v0 |
||
| 47 | Details of this analysis and the results are given under the <https://bst.cbio.uct.ac.za/redmine/attachments/198>. We grouped the samples based on encephalitic (yes or no), COVID-19 status (possible or unlikely) and immunosupression (yes or no) |
||
| 48 | |||
| 49 | |||
| 50 | ## v1 |
||
| 51 | Details of the analysis and the design are provide in <https://bst.cbio.uct.ac.za/redmine/attachments/196>. |
||
| 52 | |||
| 53 | ## v2 |
||
| 54 | |||
| 55 | Details of the analysis and the design are provide in <https://bst.cbio.uct.ac.za/redmine/attachments/197> |
||
| 56 | |||
| 57 | Generally, the downstream analysis was done with **DESeq2** and **R packages** including **ggplot** and others. In short, the we do |
||
| 58 | 1 | Ephie Geza | 1. Count normalization that i.e creation of the DESeq2Dataset object. |
| 59 | 1. Exploratory data analysis (PCA & hierarchical clustering) - identifying outliers & sources of variation in the data: |
||
| 60 | 3 | Ephie Geza | 1. Running the DESeq2 using the "DESeq2" function |
| 61 | 1 | Ephie Geza | 1. Check the fit of the dispersion estimates: using "plotDispEsts" |
| 62 | 3 | Ephie Geza | 1. Create contrasts to perform Wald testing on the shrunken log2 fold changes between specific conditions: |
| 63 | 1 | Ephie Geza | 1. Output significant results |
| 64 | 3 | Ephie Geza | 1. Visualize results: volcano plots, heat-maps, normalized counts plots of top genes, etc. |
| 65 | 1. Take note of all the versions of all tools used in the DE analysis: |