Version 16 - History - Wiki - Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients - Redmine

Wiki » History » Version 16

Katie Lennard, 10/03/2022 02:42 PM

-Katie Lennard
+# Wiki
 # Data location:
 The data was transferred from Athena medmicro):
 ```
 /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1A_results_17022022
 /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1B_results_21022022
 ```
 to Ilifu:
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/
 ```
-Katie Lennard
+# Reference data:
 Katie Lennard
 Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18);
 Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3);
-Katie Lennard
+Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and
-Katie Lennard
+Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1).
-Katie Lennard
+```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes
 ```
-Katie Lennard
+# Objectives workflow:
-Katie Lennard
+![workflow.png]()
 Katie Lennard
-Katie Lennard
+# QC:
-Katie Lennard
+sample failed QC phred scores before trimming and filtering and had to be rerun. Filtering and trimming were executed as follows:
 Katie Lennard
 ```
-Katie Lennard
+nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu
 ```
 QC reports can be found in the 'files' tab
-Katie Lennard
+The rerun data is under:
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/'CRE study_1A_results_repeat_22092022'
 ```
-Katie Lennard
+# AMR profiling
 The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as:
 Katie Lennard
 ## ARGannot
-Katie Lennard
+```
-Katie Lennard
+nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
-Katie Lennard
+Individual results files compiled as:
 Katie Lennard
-Katie Lennard
+```
 srst2 --prev_output *results.txt --output ARGannot_AMRs
 ```
-Katie Lennard
+## CARD DB:
 Katie Lennard
-Katie Lennard
+This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with:
-Katie Lennard
+```
 wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta
-Katie Lennard
+```
 Pipeline execution as:
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
 Individual results files compiled as:
 ```
 srst2 --prev_output *results.txt --output CARD_AMRs
-Katie Lennard
+```
 Katie Lennard
 # Virulence factors
-Katie Lennard
+Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module (use the /cbio/users/katie/singularity_containers/srst2_v2.simg singularity container for this)
 NB: in order to use the correct python version (2.7.5) for srst2 I first had to comment out the lines at the end of my .bashrc file relating to conda initialize
 Katie Lennard
 Build genus-specific DB:
 ```
-Katie Lennard
+python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella
-Katie Lennard
+```
 was used to create the VF DB Klebsiella.fsa
-Katie Lennard
+The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter
 Katie Lennard
 cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server:
 ```
-Katie Lennard
+singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash
 ```
 then run CD-HIT to cluster the sequences for this genus, at 90% nucleotide identity:
 ```
  cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout
 ```
 Repeat for other .fsa DBs
 Katie Lennard
-Katie Lennard
+NExt parse the cluster output and tabulate the results using the specific Virulence gene DB compatible script (use srst2_v2.simg again)
 Katie Lennard
-Katie Lennard
+```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDB_cdhit_to_csv_KLedit.py --cluster_file Klebsiella_cdhit90.clstr --infile Klebsiella.fsa --outfile Klebsiella_cdhit90.csv
-Katie Lennard
+```
 Next convert the resulting csv table to a SRST2-compatible sequence database using:
 ```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/csv_to_gene_db.py -t Klebsiella_cdhit90.csv -o Klebsiella_VF_clustered.fasta -s 5
 ```
 The actual VF typing can now be done using this clustered DB:
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_VF_clustered.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_VFs/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
 Katie Lennard
 Again combine individual sample results files with e.g.
 ```
 srst2 --prev_output *genes* --output Klebsiella_VFs
 ```
 # MLST
-Katie Lennard
+MLST profiles were downloaded for E. coli and K. pneumoniae as:
 ```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#1'
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#2'
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Klebsiella pneumoniae'
 ```
 Note: MLST profiles not available for Serratia marecescens or Enterobacter cloacae
 Katie Lennard
-Katie Lennard
+MLST profiling execution:
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_pneumoniae.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/Klebsiella_MLSTs
 ```
 Katie Lennard
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_1_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#1.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli1_MLSTs
 ```
 ```
-Katie Lennard
+nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_2_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#2.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli2_MLSTs
-Katie Lennard
+```
 ## Rerun
-Katie Lennard
+samples had to be rerun that failed QC. Here I combine them (post-filter and -trim) with the rest of the samples that passed QC first time via symlinks:
 Note: to agree with srst2 file naming specifications I renamd the trimmed files from e.g. *_R1.fq to *_1.fq (remove R)
 Katie Lennard
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/1A_B_plus_reruns_filtered_trimmed
-Katie Lennard
+```
 Katie Lennard
 All reruns were executed as before but now using the above filepath for input reads and adding extenstion '*_v2' to output folders

Project

General

Profile

Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients

Wiki » History » Version 16