Version 23 - History - Wiki - Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients - Redmine

Wiki » History » Version 23

Katie Lennard, 11/14/2022 09:45 AM

-Katie Lennard
+# Wiki
 # Data location:
-Katie Lennard
+The data was transferred from Athena medmicro) by e.g.:
 Katie Lennard
 ```
-Katie Lennard
+rsync -avvP -e "ssh -i /home/katie/.ssh/id_rsa" /mnt/athena/medmicro/Clinton/CRE\ Pfizer\ Feb\ 2022/CRE\ study_4_results_11112022 katiel@transfer.ilifu.ac.za:/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/
 ```
 Katie Lennard
 to Ilifu:
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/
 ```
-Katie Lennard
+# Reference data:
 Katie Lennard
 Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18);
 Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3);
-Katie Lennard
+Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and
-Katie Lennard
+Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1).
-Katie Lennard
+```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes
 ```
-Katie Lennard
+# Objectives workflow:
-Katie Lennard
+![workflow.png]()
 Katie Lennard
-Katie Lennard
+# QC:
-Katie Lennard
+sample from Run 1 failed QC and had to be rerun. Note that they accidentally reran these 11 (study1A) twice – once on 28 Feb and once on 22 September. These runs were merged by combining samples e.g.:
 Katie Lennard
-Katie Lennard
+```
-Katie Lennard
+cat KLEB-CRE-GSH-0016_S11_L001_R2_001.fastq.gz >> merged_reads/G-16_S11_L001_R2_001.fastq.gz
 ```
 file location:
-Katie Lennard
+```
-Katie Lennard
+/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/11_double_rerun_merged/merged_reads
 ```
 Katie Lennard
-Katie Lennard
+Next these 11 merged-run samples were joined in one folder via symlinks with run B (passed QC):
 Katie Lennard
 ```
-Katie Lennard
+/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined
-Katie Lennard
+```
-Katie Lennard
+Filtering and trimming were executed as follows:
 ```
-Katie Lennard
+nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu
-Katie Lennard
+```
 QC reports can be found in the 'files' tab
 Katie Lennard
-Katie Lennard
+Runs 2 and 3 were combined with symlinks under:
 Katie Lennard
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_2_3_combined
 ```
 FastQC was done (all samples passed):
 ```
 nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_2_3_combined/*_R{1,2}_001.fastq.gz' -profile ilifu
 ```
 Katie Lennard
 and can be found here:
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-11-10-fastq_QC
 ```
 Katie Lennard
 Run 4 was added next and QCed:
 Katie Lennard
 Katie Lennard
 Note: to agree with srst2 file naming specifications I renamed the trimmed files from e.g. *_R1.fq to *_1.fq (remove R) using e.g.
-Katie Lennard
+```
 for f in *.fq; do mv -v "$f" "${f/_R/_}";done
 ```
 Katie Lennard
-Katie Lennard
+# AMR profiling
 The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as:
 Katie Lennard
 ## ARGannot
-Katie Lennard
+```
-Katie Lennard
+nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
-Katie Lennard
+Individual results files compiled as:
 Katie Lennard
-Katie Lennard
+```
 srst2 --prev_output *results.txt --output ARGannot_AMRs
 ```
-Katie Lennard
+## CARD DB:
 Katie Lennard
-Katie Lennard
+This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with:
-Katie Lennard
+```
 wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta
-Katie Lennard
+```
 Pipeline execution as:
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
 Individual results files compiled as:
 ```
 srst2 --prev_output *results.txt --output CARD_AMRs
-Katie Lennard
+```
 Katie Lennard
 # Virulence factors
-Katie Lennard
+Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module (use the /cbio/users/katie/singularity_containers/srst2_v2.simg singularity container for this)
 NB: in order to use the correct python version (2.7.5) for srst2 I first had to comment out the lines at the end of my .bashrc file relating to conda initialize
 Katie Lennard
 Build genus-specific DB:
 ```
-Katie Lennard
+python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella
-Katie Lennard
+```
 was used to create the VF DB Klebsiella.fsa
-Katie Lennard
+The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter
 Katie Lennard
 cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server:
 ```
-Katie Lennard
+singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash
 ```
 then run CD-HIT to cluster the sequences for this genus, at 90% nucleotide identity:
 ```
  cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout
 ```
 Repeat for other .fsa DBs
 Katie Lennard
-Katie Lennard
+NExt parse the cluster output and tabulate the results using the specific Virulence gene DB compatible script (use srst2_v2.simg again)
 Katie Lennard
-Katie Lennard
+```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDB_cdhit_to_csv_KLedit.py --cluster_file Klebsiella_cdhit90.clstr --infile Klebsiella.fsa --outfile Klebsiella_cdhit90.csv
-Katie Lennard
+```
 Next convert the resulting csv table to a SRST2-compatible sequence database using:
 ```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/csv_to_gene_db.py -t Klebsiella_cdhit90.csv -o Klebsiella_VF_clustered.fasta -s 5
 ```
 The actual VF typing can now be done using this clustered DB:
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_VF_clustered.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_VFs/coverage_80_run --min_gene_cov 80
-Katie Lennard
+```
 Katie Lennard
-Katie Lennard
+Same for other genera using:
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_VF_clustered.fasta
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Serratia_VF_clustered.fasta
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Enterobacter_VF_clustered.fasta
 ```
-Katie Lennard
+Again combine individual sample results files with e.g.
 ```
 srst2 --prev_output *genes* --output Klebsiella_VFs
 ```
 # MLST
-Katie Lennard
+MLST profiles were downloaded for E. coli and K. pneumoniae as:
 ```
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#1'
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#2'
 python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Klebsiella pneumoniae'
 Katie Lennard
 ```
 Note: MLST profiles not available for Serratia marecescens or Enterobacter cloacae
-Katie Lennard
+MLST profiling execution:
 Katie Lennard
 ```
 nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_pneumoniae.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/Klebsiella_MLSTs
 ```
 Katie Lennard
 ```
-Katie Lennard
+nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_1_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#1.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli1_MLSTs
 ```
-Katie Lennard
+```
-Katie Lennard
+nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_2_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#2.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli2_MLSTs
 ```
 Katie Lennard
-Katie Lennard
+# Combining runs
 For prelim analysis run1 was combined with the output from runs 2-4 with e.g. (from directory /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_combined_output_run1-4/CARD)
 ```
 ln -s ../../srst2_CARD_run2to4/coverage_80_run/srst2/*genes* ./
 ln -s ../../srst2_CARD_v3/coverage_80_run/srst2/*genes* ./
 ```
-Katie Lennard
+# Tychus alignment module
 Note the ilifu branch of the Tychus repo should be used. Because we don't have a main.nf file we can't use the standard `nextflow pull` syntax so did a git clone instead
 git clone --branch ilifu https://github.com/kviljoen/Tychus/
 A list of fasata files for reference genomes was created here
 NB: Error when trying to use E_coli_ATCC_25922_CP009072_1.fasta as --genome (but not e.g. Serratia)
 NB: error in makephylogenies process:
 ``` .command.sh: 7: [: missing ]
   mv: cannot stat 'kSNP3_results/*.tre': No such file or directory
 ```
 If you go into the working directory you'll find a NameErrors.txt file which states that some of the genome file names are illegal. The one that applies here is:  A file name may contian only one dot ('.') character, that which separates the file ID from the extension.
         EcoSME175.fasta is legal, EcoSME17.5.fasta is not
 So all files were renamed from e.g. E_coli_ATCC_25922_CP009072.1.fasta to E_coli_ATCC_25922_CP009072_1.fasta
 ```
 /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/full_fasta_list
 ```
 Alignment run example against Serratia:
 ```
 nextflow alignment.nf --alignment_out_dir /scratch3/users/katiel/Clinton/CRE_study_August_2022/Tychus_alignment_Serratia --genome /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Serratia_marescens_KS10_CP027798_1.fasta --read_pairs '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-10-10-fastq_QC/bbduk/*_{1,2}.fq' --user_genome_paths /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/full_fasta_list -profile alignment
 ```

Project

General

Profile

Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients

Wiki » History » Version 23