Project

General

Profile

Wiki » History » Revision 3

Revision 2 (Ephie Geza, 11/24/2023 05:53 PM) → Revision 3/5 (Ephie Geza, 11/24/2023 05:59 PM)

# Wiki

We use the https://nf-co.re/rnaseq/3.13.2 to analyze the RNA seq data (fastq files). The pipeline removes the ribosomal RNA, check the quality of the reads, remove adapter and quality trim, removes genome contaminants, align the reads to the reference genome, sort and index the alignments, mark duplicates and perform quantification.

## Data
The samples were downloaded from AWS to ilifu in the project folder
``` shell
/cbio/projects/028/
```
The rawdata is in
``` shell
/cbio/projects/028/rawdata/CleanData/
```
after git clonning the nextflow nfcore/rnaseq pipeline, we used rnaseq/bin/fastq_dir_to_samplesheet.py to create a samplesheet for the pipeline
``` shell
./rnaseq/bin/fastq_dir_to_samplesheet.py /cbio/projects/028/rawdata/CleanData/ samplesheet.csv --strandedness auto --read1_extension "_1.fq.gz" --read2_extension "_2.fq.gz"
```

NB: We couldn't remove the ribosomal rRNA using the ensembl gtf file. As such we tried the gencode one (see, /cbio/projects/028/scripts/rnaseq24112023.sh).
was