Wiki » History » Version 19
Katie Lennard, 10/11/2022 04:54 PM
1 | 1 | Katie Lennard | # Wiki |
---|---|---|---|
2 | |||
3 | # Data location: |
||
4 | |||
5 | The data was transferred from Athena medmicro): |
||
6 | |||
7 | ``` |
||
8 | /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1A_results_17022022 |
||
9 | /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1B_results_21022022 |
||
10 | ``` |
||
11 | |||
12 | to Ilifu: |
||
13 | |||
14 | ``` |
||
15 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ |
||
16 | ``` |
||
17 | |||
18 | 4 | Katie Lennard | # Reference data: |
19 | 1 | Katie Lennard | |
20 | Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18); |
||
21 | Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3); |
||
22 | 2 | Katie Lennard | Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and |
23 | 1 | Katie Lennard | Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1). |
24 | |||
25 | 2 | Katie Lennard | ``` |
26 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes |
||
27 | ``` |
||
28 | |||
29 | 4 | Katie Lennard | # Objectives workflow: |
30 | 2 | Katie Lennard | ![workflow.png]() |
31 | 3 | Katie Lennard | |
32 | 4 | Katie Lennard | # QC: |
33 | 17 | Katie Lennard | 11 sample failed QC and had to be rerun. Note that they accidentally reran these 11 (study1A) twice – once on 28 Feb and once on 22 September. These runs were merged by combining samples e.g.: |
34 | 3 | Katie Lennard | |
35 | 1 | Katie Lennard | ``` |
36 | 17 | Katie Lennard | cat KLEB-CRE-GSH-0016_S11_L001_R2_001.fastq.gz >> merged_reads/G-16_S11_L001_R2_001.fastq.gz |
37 | ``` |
||
38 | |||
39 | file location: |
||
40 | 1 | Katie Lennard | ``` |
41 | 17 | Katie Lennard | /scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/11_double_rerun_merged/merged_reads |
42 | ``` |
||
43 | 1 | Katie Lennard | |
44 | 17 | Katie Lennard | Next these 11 merged-run samples were joined in one folder via symlinks with run B (passed QC): |
45 | 1 | Katie Lennard | |
46 | ``` |
||
47 | 17 | Katie Lennard | /scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined |
48 | 1 | Katie Lennard | ``` |
49 | |||
50 | 17 | Katie Lennard | Filtering and trimming were executed as follows: |
51 | |||
52 | ``` |
||
53 | nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu |
||
54 | ``` |
||
55 | QC reports can be found in the 'files' tab |
||
56 | |||
57 | 18 | Katie Lennard | Note: to agree with srst2 file naming specifications I renamd the trimmed files from e.g. *_R1.fq to *_1.fq (remove R) using e.g. |
58 | ``` |
||
59 | for f in *.fq; do mv -v "$f" "${f/_R/_}";done |
||
60 | ``` |
||
61 | 17 | Katie Lennard | |
62 | 4 | Katie Lennard | # AMR profiling |
63 | The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as: |
||
64 | 6 | Katie Lennard | |
65 | ## ARGannot |
||
66 | 1 | Katie Lennard | ``` |
67 | 6 | Katie Lennard | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80 |
68 | 1 | Katie Lennard | ``` |
69 | 7 | Katie Lennard | Individual results files compiled as: |
70 | 5 | Katie Lennard | |
71 | 7 | Katie Lennard | ``` |
72 | srst2 --prev_output *results.txt --output ARGannot_AMRs |
||
73 | ``` |
||
74 | |||
75 | 6 | Katie Lennard | ## CARD DB: |
76 | 1 | Katie Lennard | |
77 | 6 | Katie Lennard | This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with: |
78 | |||
79 | 1 | Katie Lennard | ``` |
80 | wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta |
||
81 | 6 | Katie Lennard | ``` |
82 | |||
83 | Pipeline execution as: |
||
84 | |||
85 | ``` |
||
86 | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80 |
||
87 | 7 | Katie Lennard | ``` |
88 | |||
89 | Individual results files compiled as: |
||
90 | |||
91 | ``` |
||
92 | srst2 --prev_output *results.txt --output CARD_AMRs |
||
93 | 5 | Katie Lennard | ``` |
94 | 8 | Katie Lennard | |
95 | # Virulence factors |
||
96 | |||
97 | 10 | Katie Lennard | Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module (use the /cbio/users/katie/singularity_containers/srst2_v2.simg singularity container for this) |
98 | NB: in order to use the correct python version (2.7.5) for srst2 I first had to comment out the lines at the end of my .bashrc file relating to conda initialize |
||
99 | 8 | Katie Lennard | |
100 | Build genus-specific DB: |
||
101 | ``` |
||
102 | 10 | Katie Lennard | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella |
103 | 8 | Katie Lennard | ``` |
104 | was used to create the VF DB Klebsiella.fsa |
||
105 | |||
106 | 1 | Katie Lennard | The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter |
107 | 8 | Katie Lennard | |
108 | cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server: |
||
109 | ``` |
||
110 | 1 | Katie Lennard | singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash |
111 | ``` |
||
112 | |||
113 | then run CD-HIT to cluster the sequences for this genus, at 90% nucleotide identity: |
||
114 | |||
115 | ``` |
||
116 | cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout |
||
117 | ``` |
||
118 | |||
119 | Repeat for other .fsa DBs |
||
120 | 8 | Katie Lennard | |
121 | 10 | Katie Lennard | NExt parse the cluster output and tabulate the results using the specific Virulence gene DB compatible script (use srst2_v2.simg again) |
122 | 8 | Katie Lennard | |
123 | 9 | Katie Lennard | ``` |
124 | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDB_cdhit_to_csv_KLedit.py --cluster_file Klebsiella_cdhit90.clstr --infile Klebsiella.fsa --outfile Klebsiella_cdhit90.csv |
||
125 | 10 | Katie Lennard | ``` |
126 | |||
127 | Next convert the resulting csv table to a SRST2-compatible sequence database using: |
||
128 | |||
129 | |||
130 | ``` |
||
131 | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/csv_to_gene_db.py -t Klebsiella_cdhit90.csv -o Klebsiella_VF_clustered.fasta -s 5 |
||
132 | |||
133 | ``` |
||
134 | |||
135 | The actual VF typing can now be done using this clustered DB: |
||
136 | |||
137 | ``` |
||
138 | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_VF_clustered.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_VFs/coverage_80_run --min_gene_cov 80 |
||
139 | 9 | Katie Lennard | ``` |
140 | 11 | Katie Lennard | |
141 | 19 | Katie Lennard | Same for other genera using: |
142 | ``` |
||
143 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_VF_clustered.fasta |
||
144 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Serratia_VF_clustered.fasta |
||
145 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Enterobacter_VF_clustered.fasta |
||
146 | ``` |
||
147 | |||
148 | 11 | Katie Lennard | Again combine individual sample results files with e.g. |
149 | ``` |
||
150 | srst2 --prev_output *genes* --output Klebsiella_VFs |
||
151 | ``` |
||
152 | |||
153 | # MLST |
||
154 | 12 | Katie Lennard | MLST profiles were downloaded for E. coli and K. pneumoniae as: |
155 | |||
156 | ``` |
||
157 | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#1' |
||
158 | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Escherichia coli#2' |
||
159 | python /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/scripts/getmlst.py --species 'Klebsiella pneumoniae' |
||
160 | 14 | Katie Lennard | |
161 | ``` |
||
162 | |||
163 | Note: MLST profiles not available for Serratia marecescens or Enterobacter cloacae |
||
164 | |||
165 | 1 | Katie Lennard | MLST profiling execution: |
166 | 15 | Katie Lennard | |
167 | ``` |
||
168 | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Klebsiella_pneumoniae.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/Klebsiella_MLSTs |
||
169 | ``` |
||
170 | 16 | Katie Lennard | |
171 | ``` |
||
172 | 15 | Katie Lennard | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_1_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#1.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli1_MLSTs |
173 | ``` |
||
174 | |||
175 | 1 | Katie Lennard | ``` |
176 | 16 | Katie Lennard | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --mlst_definitions /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/E_coli_2_definitions --mlst_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/Escherichia_coli#2.fasta --mlst_delimiter _ --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_MLSTs/E_coli2_MLSTs |
177 | ``` |