Wiki » History » Version 8
Katie Lennard, 09/21/2022 09:11 AM
1 | 1 | Katie Lennard | # Wiki |
---|---|---|---|
2 | |||
3 | # Data location: |
||
4 | |||
5 | The data was transferred from Athena medmicro): |
||
6 | |||
7 | ``` |
||
8 | /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1A_results_17022022 |
||
9 | /MedMicro/Clinton/CRE Pfizer Feb 2022/CRE study_1B_results_21022022 |
||
10 | ``` |
||
11 | |||
12 | to Ilifu: |
||
13 | |||
14 | ``` |
||
15 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ |
||
16 | ``` |
||
17 | |||
18 | 4 | Katie Lennard | # Reference data: |
19 | 1 | Katie Lennard | |
20 | Klebsiella pneumoniae – strain HS11286 (GenBank accession no. CP003200.1) (n=18); |
||
21 | Serratia marcescens – strain KS10 (GenBank accession no. CP027798.1) (n=3); |
||
22 | 2 | Katie Lennard | Escherichia coli – strain ATCC 25922 (GenBank accession no. CP009072.1) (n=1); and |
23 | 1 | Katie Lennard | Enterobacter cloacae – strain ATCC 13047 (GenBank accession no. NC_014121.1) (n=1). |
24 | |||
25 | 2 | Katie Lennard | ``` |
26 | /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_genomes |
||
27 | ``` |
||
28 | |||
29 | 4 | Katie Lennard | # Objectives workflow: |
30 | 2 | Katie Lennard | ![workflow.png]() |
31 | 3 | Katie Lennard | |
32 | 4 | Katie Lennard | # QC: |
33 | 3 | Katie Lennard | 11 sample failed QC phred scores before trimming and filtering; none failed after filtering and trimming. Filtering and trimming were executed as follows: |
34 | |||
35 | ``` |
||
36 | 1 | Katie Lennard | nextflow run kviljoen/fastq_QC --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/raw/study_1A_B_combined/*_R{1,2}_001.fastq.gz' -profile ilifu |
37 | ``` |
||
38 | QC reports can be found in the 'files' tab |
||
39 | 4 | Katie Lennard | |
40 | # AMR profiling |
||
41 | The preference from Clinton is to do AMR profiling with the ResFinder DB. I'm getting errors there that I think relate to the header formatting though so in the interim have run with the ARG_annot DB that we used for previous projects as: |
||
42 | 6 | Katie Lennard | |
43 | ## ARGannot |
||
44 | 1 | Katie Lennard | ``` |
45 | 6 | Katie Lennard | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/ARGannot_r3.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_ARGannot/coverage_80_run --min_gene_cov 80 |
46 | 1 | Katie Lennard | ``` |
47 | 7 | Katie Lennard | Individual results files compiled as: |
48 | 5 | Katie Lennard | |
49 | 7 | Katie Lennard | ``` |
50 | srst2 --prev_output *results.txt --output ARGannot_AMRs |
||
51 | ``` |
||
52 | |||
53 | 6 | Katie Lennard | ## CARD DB: |
54 | 1 | Katie Lennard | |
55 | 6 | Katie Lennard | This database is the recommended by srst2 and has been formatted by them already. The DB was downloaded with: |
56 | |||
57 | 1 | Katie Lennard | ``` |
58 | wget https://github.com/katholt/srst2/blob/master/data/CARD_v3.0.8_SRST2.fasta?raw=true -O CARD_v3.0.8_SRST2.fasta |
||
59 | 6 | Katie Lennard | ``` |
60 | |||
61 | Pipeline execution as: |
||
62 | |||
63 | ``` |
||
64 | nextflow run kviljoen/uct-srst2 --reads '/scratch3/users/katiel/Clinton/CRE_study_August_2022/2022-09-19-fastq_QC/bbduk/*_{1,2}.fq' -profile ilifu --gene_db /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/CARD_v3.0.8_SRST2.fasta --outdir /scratch3/users/katiel/Clinton/CRE_study_August_2022/srst2_CARD/coverage_80_run --min_gene_cov 80 |
||
65 | 7 | Katie Lennard | ``` |
66 | |||
67 | Individual results files compiled as: |
||
68 | |||
69 | ``` |
||
70 | srst2 --prev_output *results.txt --output CARD_AMRs |
||
71 | 5 | Katie Lennard | ``` |
72 | 8 | Katie Lennard | |
73 | # Virulence factors |
||
74 | |||
75 | Building the relevant VFDB for Klebsiella requires a python script that needs the biopython module. This was installed as a virtual environment on Ilifu as follows (from an interactive node): |
||
76 | |||
77 | ``` |
||
78 | module add python/3.9.0 |
||
79 | virtualenv .srst2_biopython_venv |
||
80 | . .srst2_biopython_venv/bin/activate |
||
81 | pip install biopython==1.68 |
||
82 | ``` |
||
83 | *Note: biopython 1.68 to avoid error with later versions of "ImportError: Bio.Alphabet has been removed from Biopython" |
||
84 | The module can now be accessed any time by: |
||
85 | |||
86 | ``` |
||
87 | . .srst2_biopython_venv/bin/activate |
||
88 | ``` |
||
89 | |||
90 | Build genus-specific DB: |
||
91 | ``` |
||
92 | python /cbio/users/katiel/Nicol/Ps_aerug_srst2_MLST/VFDB/srst2/database_clustering/VFDBgenus.py --infile /scratch3/users/katiel/Clinton/CRE_study_August_2022/ref_files/VFDB_setB_nt.fas --genus Klebsiella |
||
93 | ``` |
||
94 | was used to create the VF DB Klebsiella.fsa |
||
95 | |||
96 | The same procedure (as last year ;) was executed for Escherichia, Serratia and Enterobacter |
||
97 | |||
98 | cd-hit (needed to build vfdb as outlined here https://github.com/katholt/srst2#using-the-vfdb-virulence-factor-database-with-srst2) docker images was pulled from here https://quay.io/repository/biocontainers/cd-hit?tab=tags and converted to singularity image on BST server: |
||
99 | ``` |
||
100 | singularity exec /cbio/users/katie/singularity_containers/cd-hit.simg /bin/bash |
||
101 | ``` |
||
102 | |||
103 | then: |
||
104 | ``` |
||
105 | cd-hit -i Klebsiella.fsa -o Klebsiella_cdhit90 -c 0.9 > Klebsiella_cdhit90.stdout |
||
106 | ``` |
||
107 | |||
108 | Repeat for other .fsa DBs |