Project

General

Profile

Actions

Wiki » History » Revision 3

« Previous | Revision 3/4 (diff) | Next »
Ephie Geza, 02/01/2024 05:43 PM


Wiki

Data location:

The data in FASTQ format was sent via WeTransfer, downloaded and uploaded to ilifu

/cbio/projects/032/rawdata/fastq

Reference data:

Currently in project 033 on ilifu: Helicobacter pylori strain: MT5135 (RefSeq accession no. CP071982.1) (n=2)

/cbio/projects/033/refs/Helicobacter_pylori_MT5135_refseq.fa 

Workflow

1. QC:

Raw reads were quality checked and trimmed using the kviljoen/fastq_QC pipeline which also trim adpters and filter quality based bbduk

# For nextflow DSL1 pipeline
module load nextflow/22.10.7

nextflow run kviljoen/fastq_QC --reads '/cbio/projects/032/rawdata/fastq/*_R{1,2}_001.fastq.gz' -profile ilifu -resume --email "ephie.geza@uct.ac.za"

QC reports for raw reads and after trimming can be found in the

2. AMR profiling

We used three DBs for AMR ARGannot_r3, CARD_v3.0.8 and ResFinder:

ARGannot

MLST

To type certain sample pair, we first downloaded the MLST scheme for Pseudomonas aeruginosa and renaming files by

img="/cbio/users/katie/singularity_containers/6c884bc3ab5c-2017-12-15-c6ae6fedbccd.img"

singularity exec ${img} getmlst.py --species "Pseudomonas aeruginosa"
mv Pseudomonas_aeruginosa.fasta Pseudomonas.fasta
mv profiles_csv Pseudomonas_profiles_csv
mv alleles_fasta Pseudomonas_alleles_fasta

This was also done for Klebsiella pneumoniae, Enterobacter cloacae, Escherichia coli#1, Escherichia coli#2. It is important to note that Serratia does not have MLST profile at February 2024.

We now run MLST for each species using

nextflow run /cbio/projects/033/uct-srst2/main.nf \
        --reads '/cbio/projects/033/analysis/2024-01-11-fastq_QC/bbduk/Pseudomonas/*_{1,2}.fq'  \
        -profile ilifu \
        --mlst_definitions /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas_profiles_csv \
        --mlst_db /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas.fasta \
        --mlst_delimiter "_" --outdir /cbio/projects/033/analysis/02_MLST \
        -resume -dsl1

MLST results

Most categorized alleles of the select group couldn't match with sufficient depthin the sequences of our short reads. Some fastq pairs had some mismatches represented by the number and an "*"

Reasons why MLST may fail

  • No Match Found i.e sequence data of the specified loci doesn't have a match in the MLST database (variations, mutations, or target genes not present in MLST DB)
  • Low-Quality Sequences or ambiguous base calls in the sequenced loci may cause MLST assignment to fail
  • Incomplete Sequencing - the seq coverage sholud be sufficient and cover all required loci
  • Database Mismatch - the DB used for typing should be appropriate for the organism or strain
  • Novel Sequence Type - if isolate carries a novel or uncharacterized sequence type not present in the MLST database, most common when studying less common or newly emerging strains.

Antimicrobial Gene Detection

Mapping reds of each reference seq in fasta format throough --gene_db to report all genes covered beyond 80% (default is 90%)

Updated by Ephie Geza about 1 year ago · 3 revisions