Manually curate uncertain MLST results
MLST results designated by srst2 as 'uncertain' were manually curated
MLST results generated with srst2 that were classified as 'uncertain' (designated '?') were manually checked. The majority of uncertain hits were classified as such based on the fact that they had 1 or 2 low coverage bases at the first or last 2bp of the read. By doing multiple sequence alignments for all alleles for each of the 7 markers (acsA, aroE, guaA, mutL, nuoD, ppsA, trpE) I could establish whether the first 2bp and last 2bp were in fact necessary to distinguish from all the other alleles. In most cases these bases were not discriminatory and the 'uncertain' assignment could be passed. Alignment was done with MAFFT and viewed in Jalview. Example attached from acsA. In cases of SNPs (designated '*') srst2 short read alignment results were compared to the P. aeruginosa assembled contigs (from the Tychus assembly module).
nuoD: no difference in first or last two bases across all alleles in MLST file used
Results of manual curation:
acsA: only type 130 (acsA_130) differs in the last base from all other alleles. Type 16 and 11 vs. type 130 à several other changes so that 16 can be confidently distinguished from 130 without the last base
ppsA: several allele changes in first two and last two bases of seq, but manual check with mafft/jalview showed that ppsA_4, ppsA_33 and ppsA_6 can be distinguished from all other seqs independent of the first 2 and last2 bases.
aroE: handful of types with one bp change in 2nd bp of sequence but manual check with jalview shows all can still be distinguished without use of first two bases.
guaA: handful of types with one bp change in 2nd bp of sequence but manual check with jalview shows all can still be distinguished without use of first two bases.
mutL: Types 11 and 29 cannot be distinguished from type 216 if ignoring the first two bases
trpE: manual check with jalview shows all can still be distinguished without use of first two bases.
Note: srst2 by default flags a call as uncertain if –min_depth (the average depth across the entire allele) is less than 5. We will lower this to 4.
Comments