Project

General

Profile

Wiki » History » Version 10

Katie Lennard, 04/15/2019 11:23 AM

1 1 Katie Lennard
# Wiki
2
3
## Study background
4
This study was prompted by an unusual outbreak of wild type Pseudomonas that coincided with the Cape Town drought. Preliminary molecular analysis suggests clonality, the interest is therefore to try an establish how this outbreak came about and whether the drought is in some way responsible. Pseudomonas are waterborne opportunistic pathogens that can form biofilms in plumbing pipes. One hypothesis is therefore that the drought, with decreased water pressure allowed increased biofilm formation and subsequently increased concentrations in drinking water. The data will include WGS of blood culture isolates and water samples from before, during, and after the outbreak (96 samples).
5
6
## Pipeline tool options considered
7 2 Katie Lennard
* Tychus: A Nextflow-based pipeline for pathogen WGS assembly and annotation (Repo: https://github.com/Abdo-Lab/Tychus Paper: https://www.biorxiv.org/content/biorxiv/early/2018/03/16/283101.full.pdf)
8
* Advice from Arash: Use Velvet for assembly and Prokka for annotation. If the species genome is very diverse across different strains build a pan-genome and consider it as a reference genome. Use Roary (https://academic.oup.com/bioinformatics/article/31/22/3691/240757) or Pyseer or BPGA for pan-genome construction and then perform a gene presence/absence statistical analysis across different populations by Scoary tool. Roary is installed on CHPC and its output files are compatible with Scoary and R.
9 3 Katie Lennard
*  Options from Nicky: 
10
https://www.pathogensurveillance.net/software:
11
A) Microreact (Interactive visualisation of trees, geographic data, and temporal data) - not immediately useful for Pseudomonas (only 1 entry in their database, vs. e.g ~4000 for Staph)
12 4 Katie Lennard
B) Pathogenwatch (Processing and Visualisation of Microbial Genome Sequences in Pylogenetic and Geographical Contexts). Here you can upload your assemblies to do MLST and AMR profiling - can test this once we have assemblies, although seemingly also limited to a handful of pathogens currently (https://pathogen.watch/)
13 1 Katie Lennard
14 5 Katie Lennard
## Implementation of Tychus (nextflow pipeline)
15
* Singularity images (one for the Tychus alignment module and one for the Tychus assembly module) built on BST server based on Dockerfiles in Tychus. First tested on hex so added the relevant bind points to the Dockerfiles and then did docker build -t Tychus_alignment . from folder on BST with dockerfile (git clone repo first). Singularity image then built from docker image using docker2singularity
16
17 10 Katie Lennard
## Tychus pipeline parameters to consider
18
19
Trimmomatic configurable variables include trim length, quality (phred) scores, sliding window and specified adapters file (https://github.com/kviljoen/Tychus/blob/ilifu/nextflow.config). Parameters not specifed in nextflow.config would have to be changed in the main scripts (alignment.nf and assembly.nf)
20
21 3 Katie Lennard
## Data location
22 1 Katie Lennard
23 5 Katie Lennard
### Testing data raw reads
24 3 Katie Lennard
* E. coli test reads: Med Micro server(smb://athena.medmicro.uct.ac.za )in the File Station /MedMicro/Clinton/E. coli/original_data
25 1 Katie Lennard
26 6 Katie Lennard
### Raw data on Ilifu
27
28
* The raw data for Pseudomonas and E. coli have been copied over to Ilifu by Suresh with 
29
30
~~~ text
31
rsync -rv --progress -e 'ssh -vvv' suresh@137.158.204.181:/home/suresh/katie/E.coli /ceph/cbio/tmp/katie/
32
~~~
33
34 7 Katie Lennard
* Moved to 
35
~~~ text
36
/ceph/cbio/users/katie/Nicol/E.coli
37
~~~
38
39
~~~ text
40
/ceph/cbio/users/katie/Nicol/Ps_aerug
41
~~~
42
43 6 Katie Lennard
 ### Pseudomonas reference databases were sourced and downloaded to Ilifu (see DBs and README)
44
45
~~~ text
46
/ceph/cbio/users/katie/Nicol/Tychus_DBs
47
~~~
48
49
* Virulence DB: downloaded from VFDB (http://www.mgc.ac.cn/VFs/download.htm)  on 12/3/2019 and converted to single line fasta (SL_VFDB_setA_nt.fas)
50
* AMR DB: Downloaded from resfinder Downloaded as: git clone https://git@bitbucket.org/genomicepidemiology/resfinder_db.git (Individual .fsa files were merged into a single fasta file with cat *.fsa >> KL_all_resfinder.fa and converted to single line fasta SL_KL_all_resfinder.fa)
51
* Adapters: The file 'adapters.fa' is from the bbmap installation (/opt/conda/opt/bbmap-37.10/resources/adapters.fa) as used in the YAMP pipeine and represnets a more comprehensive list of possible adapters than the TruSeq3-PE.fa default Tychus file
52
* Plasmid DB: PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/plasmids/download/) contained a BLAST formatted DB (.nin etc files) but no fasta so I had to convert this blast db back to a fasta file so I can index it with bowtie2 in the alignment.nf script
53
* Pseudomonas reference genome: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Pseudomonas_aeruginosa/reference/GCA_000006765.1_ASM676v1/GCA_000006765.1_ASM676v1_genomic.fna.gz
54
55
This was done with:
56
/opt/exp_soft/ncbi-blast-2.2.28+/bin/blastdbcmd -entry all -db /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05.fna -out /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05_KL.fasta
57
58
59
### Troubleshooting - common errors during pipeline setup/customization
60
* Segmentation fault error when running csa (coverage sampler): convert input DBs to singleline fastas (they're probably multiline if you're seeing this error). Use awk '{if(NR==1) {print $0} else {if($0 ~ /^>/) {print "\n"$0} else {printf $0}}}' interleaved.fasta > singleline.fasta for conversion
61
* Missing output file(s) `Trees/*.tre` expected by process `BuildPhylogenies (ConfigurationFiles) ` The reference DB name is probably not being extracted correctly from the base (directory). Check if there are '.' in your reference filename and change to '_' e.g. you can't have GCA_000006765.1_ASM676v1_genomic.fna --> convert to GCA_000006765_1_ASM676v1_genomic.fna
62
63 1 Katie Lennard
### Processed data
64 7 Katie Lennard
65 9 Katie Lennard
**Raw reads FastQC/multiQC results on Ilifu:**
66
67 7 Katie Lennard
~~~ text
68
/ceph/cbio/users/katie/Nicol/E_coli_raw_fastqc
69
~~~
70
~~~ text
71
/ceph/cbio/users/katie/Nicol/Ps_aerug_raw_fastqc
72
~~~
73
74 9 Katie Lennard
**Raw reads FastQC/multiQC results on medmicro:**
75
76 7 Katie Lennard
~~~ text
77 1 Katie Lennard
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Katie_results/E_coli_raw_fastqc
78
~~~
79
~~~ text
80 9 Katie Lennard
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/Ps_aerug/Katie_results/Ps_aerug_raw_fastqc
81 1 Katie Lennard
~~~
82
83 9 Katie Lennard
**Trimmomatic-trimmed/filtered reads FastQC/multiQC results on Ilifu:**
84 1 Katie Lennard
85
~~~ text
86 9 Katie Lennard
/ceph/cbio/users/katie/Nicol/E_coli_trimmomatic_fastqc
87
~~~
88
~~~ text
89
/ceph/cbio/users/katie/Nicol/Ps_aerug_trimmomatic_fastqc
90
~~~
91
92
**Trimmomatic-trimmed/filtered reads FastQC/multiQC results on medmicro:**
93
94
~~~ text
95
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Katie_results/E_coli_trimmomatic_fastqc
96
~~~
97
~~~ text
98
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/Ps_aerug/Katie_results/Ps_aerug_trimmomatic_fastqc
99
~~~
100
101
**Tychus alignment module results on Ilifu:**
102
103
~~~ text
104 8 Katie Lennard
/ceph/cbio/users/katie/Nicol/E_coli_alignment
105
~~~
106
~~~ text
107
/ceph/cbio/users/katie/Nicol/Ps_aeruginosa_alignment
108
~~~
109
110 9 Katie Lennard
**Tychus assembly module results on Ilifu:**
111 8 Katie Lennard
112
~~~ text
113
/ceph/cbio/users/katie/Nicol/E_coli_assembly
114
~~~
115
~~~ text
116
/ceph/cbio/users/katie/Nicol/Ps_assembly
117 7 Katie Lennard
~~~
118 1 Katie Lennard
119
### Final results for publishing