Actions

History

Wiki » History » Revision 9

« Previous | Revision 9/26 (diff) | Next »
Katie Lennard, 09/21/2022 01:43 PM

Wiki¶

Data location:¶

The data was transferred from Athena medmicro):

/MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1A_results_17022022 
/MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1B_results_21022022

to Ilifu:

/scratch3/users/katiel/Clinton/CRE_study_August_2022/

Reference data:¶

Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18);
Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3);
Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and
Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1).

/scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes

Objectives workflow:¶

QC:¶

11 sample failed QC phred scores before trimming and filtering; none failed after filtering and trimming. Filtering and trimming were executed as follows:

nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu

QC reports can be found in the 'files' tab

AMR profiling¶

The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as:

ARGannot¶

nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80

Individual results files compiled as:

srst2 --prev_output *results.txt --output ARGannot_AMRs

CARD DB:¶

This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with:

wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta

Pipeline execution as:

nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80

Individual results files compiled as:

srst2 --prev_output *results.txt --output CARD_AMRs

Virulence factors¶

Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module. This was installed as a virtual environment on Ilifu as follows (from an interactive node):

module add python/3.9.0
virtualenv .srst2_biopython_venv
. .srst2_biopython_venv/bin/activate
pip install biopython==1.68

*Note: biopython 1.68 to avoid error with later versions of "ImportError: Bio.Alphabet has been removed from Biopython"
The module can now be accessed any time by:

. .srst2_biopython_venv/bin/activate

Build genus-specific DB:

python /cbio/users/katiel/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella

was used to create the VF DB Klebsiella.fsa

The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter

cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server:

singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash

then run CD-HIT to cluster the sequences for this genus, at 90% nucleotide identity:

 cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout

Repeat for other .fsa DBs

NExt parse the cluster output and tabulate the results using the specific Virulence gene DB compatible script (use biopython environment again):
Note: here I had issues with the python version being used (we don't have the srst2-recommended python 2.75 on Ilifu its' too old, so the env was built with python3
The syntax has changed a bit when trying to execute VFDB_cdhit_to_csv.py I got 'python NameError: name 'file' is not defined'. This could be fixed by editing the actual script to replace file() with open()

. /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/.srst2_biopython_venv/bin/activate

python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDB_cdhit_to_csv_KLedit.py --cluster_file Klebsiella_cdhit90.clstr --infile Klebsiella.fsa --outfile Klebsiella_cdhit90.csv

Files (1)

Updated by Katie Lennard almost 3 years ago · 9 revisions

Project

General

Profile

Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients

Wiki