Project

General

Profile

Wiki » History » Revision 8

Revision 7 (Katie Lennard, 04/09/2019 06:11 PM) → Revision 8/17 (Katie Lennard, 04/10/2019 11:09 AM)

# Wiki

## Study background
This study was prompted by an unusual outbreak of wild type Pseudomonas that coincided with the Cape Town drought. Preliminary molecular analysis suggests clonality, the interest is therefore to try an establish how this outbreak came about and whether the drought is in some way responsible. Pseudomonas are waterborne opportunistic pathogens that can form biofilms in plumbing pipes. One hypothesis is therefore that the drought, with decreased water pressure allowed increased biofilm formation and subsequently increased concentrations in drinking water. The data will include WGS of blood culture isolates and water samples from before, during, and after the outbreak (96 samples).

## Pipeline tool options considered
* Tychus: A Nextflow-based pipeline for pathogen WGS assembly and annotation (Repo: https://github.com/Abdo-Lab/Tychus Paper: https://www.biorxiv.org/content/biorxiv/early/2018/03/16/283101.full.pdf)
* Advice from Arash: Use Velvet for assembly and Prokka for annotation. If the species genome is very diverse across different strains build a pan-genome and consider it as a reference genome. Use Roary (https://academic.oup.com/bioinformatics/article/31/22/3691/240757) or Pyseer or BPGA for pan-genome construction and then perform a gene presence/absence statistical analysis across different populations by Scoary tool. Roary is installed on CHPC and its output files are compatible with Scoary and R.
* Options from Nicky:
https://www.pathogensurveillance.net/software:
A) Microreact (Interactive visualisation of trees, geographic data, and temporal data) - not immediately useful for Pseudomonas (only 1 entry in their database, vs. e.g ~4000 for Staph)
B) Pathogenwatch (Processing and Visualisation of Microbial Genome Sequences in Pylogenetic and Geographical Contexts). Here you can upload your assemblies to do MLST and AMR profiling - can test this once we have assemblies, although seemingly also limited to a handful of pathogens currently (https://pathogen.watch/)

## Implementation of Tychus (nextflow pipeline)
* Singularity images (one for the Tychus alignment module and one for the Tychus assembly module) built on BST server based on Dockerfiles in Tychus. First tested on hex so added the relevant bind points to the Dockerfiles and then did docker build -t Tychus_alignment . from folder on BST with dockerfile (git clone repo first). Singularity image then built from docker image using docker2singularity

## Data location

### Testing data raw reads
* E. coli test reads: Med Micro server(smb://athena.medmicro.uct.ac.za )in the File Station /MedMicro/Clinton/E. coli/original_data

### Raw data on Ilifu

* The raw data for Pseudomonas and E. coli have been copied over to Ilifu by Suresh with

~~~ text
rsync -rv --progress -e 'ssh -vvv' suresh@137.158.204.181:/home/suresh/katie/E.coli /ceph/cbio/tmp/katie/
~~~

* Moved to
~~~ text
/ceph/cbio/users/katie/Nicol/E.coli
~~~

~~~ text
/ceph/cbio/users/katie/Nicol/Ps_aerug
~~~

### Pseudomonas reference databases were sourced and downloaded to Ilifu (see DBs and README)

~~~ text
/ceph/cbio/users/katie/Nicol/Tychus_DBs
~~~

* Virulence DB: downloaded from VFDB (http://www.mgc.ac.cn/VFs/download.htm) on 12/3/2019 and converted to single line fasta (SL_VFDB_setA_nt.fas)
* AMR DB: Downloaded from resfinder Downloaded as: git clone https://git@bitbucket.org/genomicepidemiology/resfinder_db.git (Individual .fsa files were merged into a single fasta file with cat *.fsa >> KL_all_resfinder.fa and converted to single line fasta SL_KL_all_resfinder.fa)
* Adapters: The file 'adapters.fa' is from the bbmap installation (/opt/conda/opt/bbmap-37.10/resources/adapters.fa) as used in the YAMP pipeine and represnets a more comprehensive list of possible adapters than the TruSeq3-PE.fa default Tychus file
* Plasmid DB: PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/plasmids/download/) contained a BLAST formatted DB (.nin etc files) but no fasta so I had to convert this blast db back to a fasta file so I can index it with bowtie2 in the alignment.nf script
* Pseudomonas reference genome: ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Pseudomonas_aeruginosa/reference/GCA_000006765.1_ASM676v1/GCA_000006765.1_ASM676v1_genomic.fna.gz

This was done with:
/opt/exp_soft/ncbi-blast-2.2.28+/bin/blastdbcmd -entry all -db /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05.fna -out /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05_KL.fasta

### Troubleshooting - common errors during pipeline setup/customization
* Segmentation fault error when running csa (coverage sampler): convert input DBs to singleline fastas (they're probably multiline if you're seeing this error). Use awk '{if(NR==1) {print $0} else {if($0 ~ /^>/) {print "\n"$0} else {printf $0}}}' interleaved.fasta > singleline.fasta for conversion
* Missing output file(s) `Trees/*.tre` expected by process `BuildPhylogenies (ConfigurationFiles) ` The reference DB name is probably not being extracted correctly from the base (directory). Check if there are '.' in your reference filename and change to '_' e.g. you can't have GCA_000006765.1_ASM676v1_genomic.fna --> convert to GCA_000006765_1_ASM676v1_genomic.fna

### Processed data

**Raw Raw reads FastQC/multiQC results on Ilifu:
**
**Ilifu**:

~~~ text
/ceph/cbio/users/katie/Nicol/E_coli_raw_fastqc
~~~
~~~ text
/ceph/cbio/users/katie/Nicol/Ps_aerug_raw_fastqc
~~~

**Raw Raw reads FastQC/multiQC results on medmicro:
**
**medmicro**:

~~~ text
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Katie_results/E_coli_raw_fastqc
~~~
~~~ text
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Ps_aerug/Ps_aerug_raw_fastqc
~~~

**Tychus alignment module results on Ilifu:
**

~~~ text
/ceph/cbio/users/katie/Nicol/E_coli_alignment
~~~
~~~ text
/ceph/cbio/users/katie/Nicol/Ps_aeruginosa_alignment
~~~

**Tychus assembly module results on Ilifu:
**

~~~ text
/ceph/cbio/users/katie/Nicol/E_coli_assembly
~~~
~~~ text
/ceph/cbio/users/katie/Nicol/Ps_assembly

~~~

### Final results for publishing