Version 1 - History - Wiki - Metagenomic sequencing of CSF samples - Redmine

1

Ephie Geza

# Wiki

2

3

## AIM: To develop a predictive algorithm to determine whether an infectious or other non-infectious cause is likely or not.

4

The aim will be achieved based on

5

1. Human RNASeq & downstream analysis as noted specifically related to immune system genes

6

1. Assess the human immune system genes DNA in particular but not limited to interferon, cytokines and chemokines)

7

8

## Sample data for all the participants is on ilifu in

9

        /cbio/projects/017/definitive/

10

11

## Detailed information regarding participants is provided in a txt file

12

        /cbio/projects/017/patients_clinical_details.txt

13

Of the planned 47 participants, COVC04, COVC07, COVC23 and COVC30 were excluded based on the clinical notes shared by Ruan Marais on 18 July 2022 on slack: https://cbio.slack.com/files/U02LWC4GQTE/F03PZ1H8J0J/table_1_-_clinical_details.xlsx.

14

As at at **10 August 2022**, one participant: COVC26 is outstanding in **/cbio/projects/017/definitive/**, as such the metadata file excludes this participant.

15

> /cbio/projects/017/metadata.txt

16

17

`metadata.txt` is a file that consists of the three columns of

18

> /cbio/projects/017/patients_clinical_details.txt

19

20

It was created by reading the .xsls file in R and write the "samplename", "COVID-19 status" and "Neurological symptoms due to COVID-19"

21

22

## Important things to note:

23

We perform the RNA seq gene count using the

24

25

        nf-core/rnaseq pipeline.

26

`nf-core/rnaseq` does read quality checks using **FASTQC** , read trimming by **TrimGalore** , read mapping by **STAR** & quantification by **SALMON**.

27

28

To run the pipeline, we create a **samplesheet.csv** for the analysis by using **fastq_dir_to_samplesheet.py** obtained from the **nf-core** by using **wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py**. And changed the file permissions to executable

29

``` shell

30

        chmod 755 fastq_dir_to_samplesheet.py

31

```

32

Run the script

33

``` shell

34

 ./fastq_dir_to_samplesheet.py /cbio/projects/017/definitive/ /cbio/projects/017/analysis/samplesheet.csv --strandedness reverse

35

```

36

## Run the `nf-core/rnaseq` pipeline,

37

``` shell

38

sbatch /cbio/projects/017/rnaseq/rnaseq-pipeline.sh

39

40

```

41

Upon getting the quantification results **(star_salmon)**, downstream analysis is done using **R programming** language on a local machine. The **working directory** is

42

> /home/ephie/UCT-DATA_ANALYST/BioinformaticsSupportTeam/ruan/definitive/results/

43

44

using the **R script**

45

``` shell

46

/home/ephie/UCT-DATA_ANALYST/BioinformaticsSupportTeam/ruan/definitive/dge_downstream.R

47

```

48

We use **DESeq2** for differential gene expression analysis, and **R packages** including **ggplot** and others. In short, the **R script** does

49

1. Count normalization that ie creation of the DESeq2Dataset object.

50

1. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers & sources of variation in the data:

51

1. Running the DESeq2 using the "DESeq2" function

52

1. Check the fit of the dispersion estimates: using "plotDispEsts"

53

1. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:

54

1. Output significant results

55

1. Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.

56

1. Take note of all the versions of all tools used in the DE analysis:

57

Project

General

Profile

Metagenomic sequencing of CSF samples

Wiki » History » Version 1