Project

General

Profile

Wiki » History » Version 19

Katie Lennard, 10/11/2022 04:54 PM

1 1 Katie Lennard
# Wiki
2
3
# Data location: 
4
5
The data was transferred from Athena medmicro):
6
7
``` 
8
/MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1A_results_17022022 
9
/MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1B_results_21022022
10
```
11
 
12
to Ilifu:
13
14
```
15
/scratch3/users/katiel/Clinton/CRE_study_August_2022/
16
```
17
18 4 Katie Lennard
# Reference data:
19 1 Katie Lennard
20
Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18); 
21
Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3); 
22 2 Katie Lennard
Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and 
23 1 Katie Lennard
Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1).
24
25 2 Katie Lennard
```
26
/scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes
27
```
28
29 4 Katie Lennard
# Objectives workflow:
30 2 Katie Lennard
![workflow.png]()
31 3 Katie Lennard
32 4 Katie Lennard
# QC:
33 17 Katie Lennard
11 sample failed QC and had to be rerun. Note that they accidentally reran these 11 (study1A) twice – once on 28 Feb and once on 22 September. These runs were merged by combining samples e.g.:
34 3 Katie Lennard
35 1 Katie Lennard
```
36 17 Katie Lennard
cat KLEB-CRE-GSH-0016_S11_L001_R2_001.fastq.gz >> merged_reads/G-16_S11_L001_R2_001.fastq.gz
37
``` 
38
39
file location:
40 1 Katie Lennard
```
41 17 Katie Lennard
/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/11_double_rerun_merged/merged_reads
42
```
43 1 Katie Lennard
44 17 Katie Lennard
Next these 11 merged-run samples were joined in one folder via symlinks with run B (passed QC):
45 1 Katie Lennard
46
```
47 17 Katie Lennard
/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined
48 1 Katie Lennard
```
49
50 17 Katie Lennard
Filtering and trimming were executed as follows:
51
52
```
53
nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu
54
```
55
QC reports can be found in the 'files' tab
56
57 18 Katie Lennard
Note: to agree with srst2 file naming specifications I renamd the trimmed files from e.g. *_R1.fq to *_1.fq (remove R) using e.g.
58
```
59
for f in *.fq; do mv -v "$f" "${f/_R/_}";done
60
```
61 17 Katie Lennard
62 4 Katie Lennard
# AMR profiling
63
The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as:
64 6 Katie Lennard
65
## ARGannot
66 1 Katie Lennard
```
67 6 Katie Lennard
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80
68 1 Katie Lennard
```
69 7 Katie Lennard
Individual results files compiled as:
70 5 Katie Lennard
71 7 Katie Lennard
```
72
srst2 --prev_output *results.txt --output ARGannot_AMRs
73
```
74
75 6 Katie Lennard
## CARD DB: 
76 1 Katie Lennard
77 6 Katie Lennard
This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with:
78
79 1 Katie Lennard
```
80
wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta
81 6 Katie Lennard
```
82
83
Pipeline execution as:
84
85
```
86
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80
87 7 Katie Lennard
```
88
89
Individual results files compiled as:
90
91
```
92
srst2 --prev_output *results.txt --output CARD_AMRs
93 5 Katie Lennard
```
94 8 Katie Lennard
95
# Virulence factors
96
97 10 Katie Lennard
Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module (use the /cbio/users/katie/singularity_containers/srst2_v2.simg singularity container for this)
98
NB: in order to use the correct python version (2.7.5) for srst2 I first had to comment out the lines at the end of my .bashrc file relating to conda initialize
99 8 Katie Lennard
100
Build genus-specific DB:
101
```
102 10 Katie Lennard
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella 
103 8 Katie Lennard
```
104
was used to create the VF DB Klebsiella.fsa 
105
106 1 Katie Lennard
The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter
107 8 Katie Lennard
108
cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server:
109
```
110 1 Katie Lennard
singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash
111
```
112
113
then run CD-HIT to cluster the sequences for this genus, at 90% nucleotide identity:
114
115
```
116
 cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout
117
```
118
119
Repeat for other .fsa DBs
120 8 Katie Lennard
121 10 Katie Lennard
NExt parse the cluster output and tabulate the results using the specific Virulence gene DB compatible script (use srst2_v2.simg again)
122 8 Katie Lennard
123 9 Katie Lennard
```
124
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDB_cdhit_to_csv_KLedit.py --cluster_file Klebsiella_cdhit90.clstr --infile Klebsiella.fsa --outfile Klebsiella_cdhit90.csv
125 10 Katie Lennard
```
126
127
Next convert the resulting csv table to a SRST2-compatible sequence database using:
128
129
130
```
131
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/csv_to_gene_db.py -t Klebsiella_cdhit90.csv -o Klebsiella_VF_clustered.fasta -s 5
132
133
```
134
135
The actual VF typing can now be done using this clustered DB:
136
137
```
138
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_VF_clustered.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_VFs/coverage_80_run --min_gene_cov 80
139 9 Katie Lennard
```
140 11 Katie Lennard
141 19 Katie Lennard
Same for other genera using:
142
```
143
/scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_VF_clustered.fasta
144
/scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Serratia_VF_clustered.fasta
145
/scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Enterobacter_VF_clustered.fasta
146
```
147
148 11 Katie Lennard
Again combine individual sample results files with e.g.
149
```
150
srst2 --prev_output *genes* --output Klebsiella_VFs
151
```
152
153
# MLST
154 12 Katie Lennard
MLST profiles were downloaded for E. coli and K. pneumoniae as:
155
156
```
157
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#1'
158
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#2'
159
python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Klebsiella pneumoniae'
160 14 Katie Lennard
161
```
162
163
Note: MLST profiles not available for Serratia marecescens or Enterobacter cloacae
164
165 1 Katie Lennard
MLST profiling execution:
166 15 Katie Lennard
167
```
168
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_pneumoniae.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/Klebsiella_MLSTs
169
```
170 16 Katie Lennard
171
```
172 15 Katie Lennard
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_1_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#1.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli1_MLSTs
173
```
174
175 1 Katie Lennard
```
176 16 Katie Lennard
nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_2_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#2.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli2_MLSTs
177
```