Project

General

Profile

Wiki » History » Version 3

Ephie Geza, 02/01/2024 05:43 PM

1 1 Ephie Geza
# Wiki
2
3
## Data location:
4
The data in FASTQ format was sent via WeTransfer, downloaded and uploaded to ilifu
5
``` shell
6
/cbio/projects/032/rawdata/fastq
7
```
8
9
## Reference data:
10
Currently in project 033 on ilifu: Helicobacter pylori strain: MT5135 (RefSeq accession no. CP071982.1) (n=2)
11
``` shell
12
/cbio/projects/033/refs/Helicobacter_pylori_MT5135_refseq.fa 
13
```
14
15
## Workflow
16
17
### 1. QC:
18
Raw reads were quality checked and trimmed using the  kviljoen/fastq_QC pipeline which also trim adpters and filter quality based bbduk
19
``` shell
20
# For nextflow DSL1 pipeline
21
module load nextflow/22.10.7
22
23
nextflow run kviljoen/fastq_QC --reads '/cbio/projects/032/rawdata/fastq/*_R{1,2}_001.fastq.gz' -profile ilifu -resume --email "ephie.geza@uct.ac.za"
24
```
25
QC reports for raw reads and after trimming can be found in the 
26
### 2. AMR profiling
27 2 Ephie Geza
We used three DBs for AMR ARGannot_r3, CARD_v3.0.8 and ResFinder:
28 1 Ephie Geza
29
ARGannot
30 3 Ephie Geza
31
## MLST
32
To type certain sample pair, we first downloaded the MLST scheme for **Pseudomonas aeruginosa** and renaming files by
33
``` shell
34
img="/cbio/users/katie/singularity_containers/6c884bc3ab5c-2017-12-15-c6ae6fedbccd.img"
35
36
singularity exec ${img} getmlst.py --species "Pseudomonas aeruginosa"
37
mv Pseudomonas_aeruginosa.fasta Pseudomonas.fasta
38
mv profiles_csv Pseudomonas_profiles_csv
39
mv alleles_fasta Pseudomonas_alleles_fasta
40
```
41
This was also done for **Klebsiella pneumoniae**, **Enterobacter cloacae**, **Escherichia coli#1**, **Escherichia coli#2**. It is important to note that **Serratia does not have MLST profile at February 2024**.
42
43
We now run MLST for each species using
44
``` shell
45
nextflow run /cbio/projects/033/uct-srst2/main.nf \
46
        --reads '/cbio/projects/033/analysis/2024-01-11-fastq_QC/bbduk/Pseudomonas/*_{1,2}.fq'  \
47
        -profile ilifu \
48
        --mlst_definitions /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas_profiles_csv \
49
        --mlst_db /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas.fasta \
50
        --mlst_delimiter "_" --outdir /cbio/projects/033/analysis/02_MLST \
51
        -resume -dsl1
52
```
53
### MLST results
54
Most categorized alleles of the select group couldn't match with sufficient depthin the sequences of our short reads. Some fastq pairs had some **mismatches** represented by the number and an "*"
55
56
## Reasons why MLST may fail
57
 - No Match Found i.e sequence data of the specified loci doesn't have a match in the MLST database (variations, mutations, or target genes not present in MLST DB)
58
 - Low-Quality Sequences or ambiguous base calls in the sequenced loci may cause MLST assignment to fail
59
 - Incomplete Sequencing - the seq coverage sholud be sufficient and cover all required loci
60
 - Database Mismatch - the DB used for typing should be appropriate for the organism or strain
61
 - Novel Sequence Type - if isolate carries a novel or uncharacterized sequence type not present in the MLST database, most common when studying less common or newly emerging strains.
62
63
## Antimicrobial Gene Detection
64
Mapping reds of each reference seq in fasta format throough *--gene_db* to report all genes covered beyond 80% (default is 90%)