Project

General

Profile

News

Differential absence/presence srst2 VF/AMRs

Added by Katie Lennard over 4 years ago

These results were updated at Stefan's request to include 5 samples previously typed as 'not found' (NF) due to one marker (of 6 total)that differed from the ST303 type. We decided that these are most likely ST303 as it does not match any other known sequence types. They were therefore included with ST303s in the differential fisher's exact testing. This resulted in 51 significant features after multiple testing correction. Results attached.

Differential absence/presence Ps assembled gene set

Added by Katie Lennard over 5 years ago

From the Ps_assembly_annotated_contigs folder imported into R (one .tsv file per sample). Hypothetical proteins and proteins with no annotation removed. Sample from during Ps outbreak compared to samples from before/after outbreak in terms of gene content: 213 significant hits after multiple testing correction

Results finalized and plots for publication in R

Added by Katie Lennard over 5 years ago

VF results used:
from Ilifu /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/VFDB_cov_80_diverge_10/srst2/Ps_VF_cov_80_ID_90_report_compiledResults.txt
AMR results used:
from Ilifu /cbio/users/katie/Nicol/Ps_aerug_srst2_MLST/AMR_cov_80_diverge_10/srst2/Ps_AMR_cov_80_ID_90_report
_compiledResults.txt
MLST results used:
Manually curated MLST results as previously described
SNP-based phylogenetic tree used:
The 'core SNPs" phylogenetic tree generated in the Tychus alignment module was used to generate publication quality plots in R. This core SNPs tree was generated using kSNP3 and is built based on SNPs that are present in all isolates considered (from Ilifu /cbio/users/katie/Nicol/SNPsAndPhylogenies_Ps_ref_genome/Trees/tree.core.tre)

R plots (phylogenetic SNP tree with heatmap of VFs and AMRs) were generated using the R packages ggtree, ggimage, GeneMates (heatMapPAM() function) and multiple Fisher's exact tests and MTC was conducted with metagenomeSeq's fitPA() function

The attached plot was up to date on 3/9/19

Manually curate uncertain MLST results

Added by Katie Lennard over 5 years ago

MLST results generated with srst2 that were classified as 'uncertain' (designated '?') were manually checked. The majority of uncertain hits were classified as such based on the fact that they had 1 or 2 low coverage bases at the first or last 2bp of the read. By doing multiple sequence alignments for all alleles for each of the 7 markers (acsA, aroE, guaA, mutL, nuoD, ppsA, trpE) I could establish whether the first 2bp and last 2bp were in fact necessary to distinguish from all the other alleles. In most cases these bases were not discriminatory and the 'uncertain' assignment could be passed. Alignment was done with MAFFT and viewed in Jalview. Example attached from acsA. In cases of SNPs (designated '*') srst2 short read alignment results were compared to the P. aeruginosa assembled contigs (from the Tychus assembly module).
nuoD: no difference in first or last two bases across all alleles in MLST file used

Results of manual curation:

acsA: only type 130 (acsA_130) differs in the last base from all other alleles. Type 16 and 11 vs. type 130 à several other changes so that 16 can be confidently distinguished from 130 without the last base

ppsA: several allele changes in first two and last two bases of seq, but manual check with mafft/jalview showed that ppsA_4, ppsA_33 and ppsA_6 can be distinguished from all other seqs independent of the first 2 and last2 bases.

aroE: handful of types with one bp change in 2nd bp of sequence but manual check with jalview shows all can still be distinguished without use of first two bases.

guaA: handful of types with one bp change in 2nd bp of sequence but manual check with jalview shows all can still be distinguished without use of first two bases.

mutL: Types 11 and 29 cannot be distinguished from type 216 if ignoring the first two bases

trpE: manual check with jalview shows all can still be distinguished without use of first two bases.

Note: srst2 by default flags a call as uncertain if –min_depth (the average depth across the entire allele) is less than 5. We will lower this to 4.

Check coverage cutoffs for SNP analysis

Added by Katie Lennard almost 6 years ago

SNP analysis is done with freebayes (on quality trimmed and filtered reads) with default settings ( -C 2 -min_coverage 0) after which consensus fastas are built against the reference genome for P.aerug. These consensus fastas are fed to kSNP3 along with the reference genome for SNP calling and phylogenetic tree building. I ran a test of a small subset of samples (/ceph/cbio/users/katie/Nicol/Ps_small_test on Ilifu) to test the effect of different coverage cutoffs, including -C 10 and there was no change in the resulting kSNP3 phylogenetic tree structure. Freebayes uses information across all samples (bayesian approach) to call samples, making it robust to low coverage bases

Updated results available for kSNP3 phylogenetic tree

Added by Katie Lennard almost 6 years ago

The alignment.nf module was edited so that additional reference genomes could be added to the pipeline and used to generate kSNP3 phylogenetic trees with the new --user_genome_paths flag. Users can now optionally also add draft contigs (as opposed to running kSNP3 on freebayes-generated consensus seqs.

srst2 AMR and VFDB results transferred

Added by Katie Lennard almost 6 years ago

NB: there were some differences in results between Tychus and srst2 which seems to be due to different bowtie2 parameter settings, resulting in differences in alignments. Tychus uses default parameters while srst2 has been optimized for sensitive local alignments with the --very-sensitive-local and -a flags I therefore think we should use srst2 which has been carefully optimized for MLST and gene detection. Furthermore, there appears to be several redundant/duplicate entries in the ResFinder DB, whereas the ARGannot DB that is supplied with srst2 has been curated. Attached an example of differences in alignment, using the same DB and reads (but different bowtie2 settings between the two pipelines).

VF and AMR detection were run against VFDB and ARGannot as described in the srst2 github repo with default settings https://github.com/katholt/srst2#all-usage-options

  • VF results and relevant DB file and README copied to medmicro/Clinton/Ps_aerug/Katie_results/Ps_aerug_srst2_VFDB/
  • MLST and AMR results + relevant DB files and README copied to medmicro/Clinton/Ps_aerug/Katie_results/Ps_aerug_srst2_MLST_argAnnot/

Feedback on pipeline results from Nicol group

Added by Katie Lennard almost 6 years ago

We have been analysing the data you sent us and its looking really good. We are trying to do some additional analyses and hope you can assist.

  • We would like to extract the in silico MLST profiles from these genomes.
  • Reconstruct the phylogenetic tree to include certain outgroups (Burkholderia cepacia, Pseudomonas fluorescens, Pseudomonas putida). This will allow us to root the tree and get a better context for evolution.
  • Are you able to assist constructing a phylogenetic heatmap (see image below) or even 2-dimensional? This would include the phylogenetic data on one side, and some additional data, such as presence of certain genes, etc. on the other?
  • For the plasmid resistome results, we have found hit which is present in all the outbreak isolates and only a few of the non-outbreak isolates. The gene fractions for these results only go up to approximately 60%. Does this mean that only 60% of the reference plasmid is covered? If so, is the rest of the plasmid unique, or perhaps absent? We would like to compare this plasmid from all the isolates to see how similar they are to the reference (CP002153.1) as well as to each other. Can you assist with plasmid assembly and constructing a plasmid map (see below)?
  • For the virulence factors we have identified 3 factors (NP_253217, NP_251844, NP_251850) present in all the outbreak isolates and only a few of the non-outbreak isolates. Could you extract these sequences from the relevant contigs and blast, and do a multiple alignment for comparison of each one? These factors confer different levels of virulence depending on the mutations present.

Tychus pipeline ran successfully for E. coli and Pseudomonas

Added by Katie Lennard about 6 years ago

The results can be found on Ilifu /ceph/cbio/users/katie/Nicol/ and have been transferred to medmicro's Athena server. Furthermore, FastQC and multiqc was again performed on reads after adapter removal and quality trimming/filtering, and results transferred to Athena:

  • /Volumes/medmicro/Clinton/E.\ coli/Katie_results/
  • /Volumes/medmicro/Clinton/Ps_aerug/Katie_results/
(1-10/11)

Also available in: Atom