Wiki » History » Version 3
Ephie Geza, 02/01/2024 05:43 PM
| 1 | 1 | Ephie Geza | # Wiki |
|---|---|---|---|
| 2 | |||
| 3 | ## Data location: |
||
| 4 | The data in FASTQ format was sent via WeTransfer, downloaded and uploaded to ilifu |
||
| 5 | ``` shell |
||
| 6 | /cbio/projects/032/rawdata/fastq |
||
| 7 | ``` |
||
| 8 | |||
| 9 | ## Reference data: |
||
| 10 | Currently in project 033 on ilifu: Helicobacter pylori strain: MT5135 (RefSeq accession no. CP071982.1) (n=2) |
||
| 11 | ``` shell |
||
| 12 | /cbio/projects/033/refs/Helicobacter_pylori_MT5135_refseq.fa |
||
| 13 | ``` |
||
| 14 | |||
| 15 | ## Workflow |
||
| 16 | |||
| 17 | ### 1. QC: |
||
| 18 | Raw reads were quality checked and trimmed using the kviljoen/fastq_QC pipeline which also trim adpters and filter quality based bbduk |
||
| 19 | ``` shell |
||
| 20 | # For nextflow DSL1 pipeline |
||
| 21 | module load nextflow/22.10.7 |
||
| 22 | |||
| 23 | nextflow run kviljoen/fastq_QC --reads '/cbio/projects/032/rawdata/fastq/*_R{1,2}_001.fastq.gz' -profile ilifu -resume --email "ephie.geza@uct.ac.za" |
||
| 24 | ``` |
||
| 25 | QC reports for raw reads and after trimming can be found in the |
||
| 26 | ### 2. AMR profiling |
||
| 27 | 2 | Ephie Geza | We used three DBs for AMR ARGannot_r3, CARD_v3.0.8 and ResFinder: |
| 28 | 1 | Ephie Geza | |
| 29 | ARGannot |
||
| 30 | 3 | Ephie Geza | |
| 31 | ## MLST |
||
| 32 | To type certain sample pair, we first downloaded the MLST scheme for **Pseudomonas aeruginosa** and renaming files by |
||
| 33 | ``` shell |
||
| 34 | img="/cbio/users/katie/singularity_containers/6c884bc3ab5c-2017-12-15-c6ae6fedbccd.img" |
||
| 35 | |||
| 36 | singularity exec ${img} getmlst.py --species "Pseudomonas aeruginosa" |
||
| 37 | mv Pseudomonas_aeruginosa.fasta Pseudomonas.fasta |
||
| 38 | mv profiles_csv Pseudomonas_profiles_csv |
||
| 39 | mv alleles_fasta Pseudomonas_alleles_fasta |
||
| 40 | ``` |
||
| 41 | This was also done for **Klebsiella pneumoniae**, **Enterobacter cloacae**, **Escherichia coli#1**, **Escherichia coli#2**. It is important to note that **Serratia does not have MLST profile at February 2024**. |
||
| 42 | |||
| 43 | We now run MLST for each species using |
||
| 44 | ``` shell |
||
| 45 | nextflow run /cbio/projects/033/uct-srst2/main.nf \ |
||
| 46 | --reads '/cbio/projects/033/analysis/2024-01-11-fastq_QC/bbduk/Pseudomonas/*_{1,2}.fq' \ |
||
| 47 | -profile ilifu \ |
||
| 48 | --mlst_definitions /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas_profiles_csv \ |
||
| 49 | --mlst_db /cbio/projects/033/analysis/02_MLST/profiles/Pseudomonas.fasta \ |
||
| 50 | --mlst_delimiter "_" --outdir /cbio/projects/033/analysis/02_MLST \ |
||
| 51 | -resume -dsl1 |
||
| 52 | ``` |
||
| 53 | ### MLST results |
||
| 54 | Most categorized alleles of the select group couldn't match with sufficient depthin the sequences of our short reads. Some fastq pairs had some **mismatches** represented by the number and an "*" |
||
| 55 | |||
| 56 | ## Reasons why MLST may fail |
||
| 57 | - No Match Found i.e sequence data of the specified loci doesn't have a match in the MLST database (variations, mutations, or target genes not present in MLST DB) |
||
| 58 | - Low-Quality Sequences or ambiguous base calls in the sequenced loci may cause MLST assignment to fail |
||
| 59 | - Incomplete Sequencing - the seq coverage sholud be sufficient and cover all required loci |
||
| 60 | - Database Mismatch - the DB used for typing should be appropriate for the organism or strain |
||
| 61 | - Novel Sequence Type - if isolate carries a novel or uncharacterized sequence type not present in the MLST database, most common when studying less common or newly emerging strains. |
||
| 62 | |||
| 63 | ## Antimicrobial Gene Detection |
||
| 64 | Mapping reds of each reference seq in fasta format throough *--gene_db* to report all genes covered beyond 80% (default is 90%) |