Project

General

Profile

Actions

Wiki » History » Revision 7

« Previous | Revision 7/17 (diff) | Next »
Katie Lennard, 04/09/2019 06:11 PM


Wiki

Study background

This study was prompted by an unusual outbreak of wild type Pseudomonas that coincided with the Cape Town drought. Preliminary molecular analysis suggests clonality, the interest is therefore to try an establish how this outbreak came about and whether the drought is in some way responsible. Pseudomonas are waterborne opportunistic pathogens that can form biofilms in plumbing pipes. One hypothesis is therefore that the drought, with decreased water pressure allowed increased biofilm formation and subsequently increased concentrations in drinking water. The data will include WGS of blood culture isolates and water samples from before, during, and after the outbreak (96 samples).

Pipeline tool options considered

  • Tychus: A Nextflow-based pipeline for pathogen WGS assembly and annotation (Repo: https://github.com/Abdo-Lab/Tychus Paper: https://www.biorxiv.org/content/biorxiv/early/2018/03/16/283101.full.pdf)
  • Advice from Arash: Use Velvet for assembly and Prokka for annotation. If the species genome is very diverse across different strains build a pan-genome and consider it as a reference genome. Use Roary (https://academic.oup.com/bioinformatics/article/31/22/3691/240757) or Pyseer or BPGA for pan-genome construction and then perform a gene presence/absence statistical analysis across different populations by Scoary tool. Roary is installed on CHPC and its output files are compatible with Scoary and R.
  • Options from Nicky: https://www.pathogensurveillance.net/software: A) Microreact (Interactive visualisation of trees, geographic data, and temporal data) - not immediately useful for Pseudomonas (only 1 entry in their database, vs. e.g ~4000 for Staph) B) Pathogenwatch (Processing and Visualisation of Microbial Genome Sequences in Pylogenetic and Geographical Contexts). Here you can upload your assemblies to do MLST and AMR profiling - can test this once we have assemblies, although seemingly also limited to a handful of pathogens currently (https://pathogen.watch/)

Implementation of Tychus (nextflow pipeline)

  • Singularity images (one for the Tychus alignment module and one for the Tychus assembly module) built on BST server based on Dockerfiles in Tychus. First tested on hex so added the relevant bind points to the Dockerfiles and then did docker build -t Tychus_alignment . from folder on BST with dockerfile (git clone repo first). Singularity image then built from docker image using docker2singularity

Data location

Testing data raw reads

  • E. coli test reads: Med Micro server(smb://athena.medmicro.uct.ac.za )in the File Station /MedMicro/Clinton/E. coli/original_data

Raw data on Ilifu

  • The raw data for Pseudomonas and E. coli have been copied over to Ilifu by Suresh with
rsync -rv --progress -e 'ssh -vvv' suresh@137.158.204.181:/home/suresh/katie/E.coli /ceph/cbio/tmp/katie/
  • Moved to ~~~ text /ceph/cbio/users/katie/Nicol/E.coli ~~~
/ceph/cbio/users/katie/Nicol/Ps_aerug

### Pseudomonas reference databases were sourced and downloaded to Ilifu (see DBs and README)

/ceph/cbio/users/katie/Nicol/Tychus_DBs

This was done with:
/opt/exp_soft/ncbi-blast-2.2.28+/bin/blastdbcmd -entry all -db /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05.fna -out /home/kviljoen/Tychus_DBs/KL_plsdb2019/2019_03_05_KL.fasta

Troubleshooting - common errors during pipeline setup/customization

  • Segmentation fault error when running csa (coverage sampler): convert input DBs to singleline fastas (they're probably multiline if you're seeing this error). Use awk '{if(NR==1) {print $0} else {if($0 ~ />/) {print "\n"$0} else {printf $0}}}' interleaved.fasta > singleline.fasta for conversion
  • Missing output file(s) Trees/*.tre expected by process BuildPhylogenies (ConfigurationFiles) The reference DB name is probably not being extracted correctly from the base (directory). Check if there are '.' in your reference filename and change to '_' e.g. you can't have GCA_000006765.1_ASM676v1_genomic.fna --> convert to GCA_000006765_1_ASM676v1_genomic.fna

Processed data

Raw reads FastQC/multiQC results on Ilifu:

/ceph/cbio/users/katie/Nicol/E_coli_raw_fastqc
/ceph/cbio/users/katie/Nicol/Ps_aerug_raw_fastqc

Raw reads FastQC/multiQC results on medmicro:

http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Katie_results/E_coli_raw_fastqc
http://athena.medmicro.uct.ac.za:5000/MedMicro/Clinton/E. coli/Ps_aerug/Ps_aerug_raw_fastqc

Final results for publishing

Updated by Katie Lennard about 6 years ago · 7 revisions