Project title Members Short description of project Ojectives Created on Update on Hours spent
*Peopleinvolved:* Principal Investigator: Mpiko Ntsekhe, Robert Wilkinson Bioinformatics Support Team: Kuhle Mcinga, Ephie Geza, Mohammed Farahat Support Requested: Rob Rousseau *Short description of the project:* This clinical trial is a unique opportunity to comprehensively measure and compare Mycobacterium tuberculosis (Mtb) responding immune responses(including conventional & unconventional T cell response as well as innate response) in blood and at the site of the Pericardial Fluid (PCF). *Objectives:* > 1. Compare RNA-Seq data using blood taken at baseline to the end of treatment. To see how anti-tuberculosis therapy (ATT) affects the transcriptional signatures in blood sampling. > 2. Compare the RNA-Seq data from PCF to blood. > 3. Compare PCF collected at baseline compared to PCF collected at day 1-3. This will grant us insight into how ATT affects transcriptional signatures in patients with Pericardial TB. *Peopleinvolved:* Principal Investigator: Mpiko Ntsekhe, Robert Wilkinson Bioinformatics Support Team: Kuhle Mcinga, Ephie Geza, Mohammed Farahat Support Requested: Rob Rousseau *Short description of the project:* This clinical trial is a unique opportunity to comprehensively measure and compare Mycobacterium tuberculosis (Mtb) responding immune responses(including conventional & unconventional T cell response as well as innate response) in blood and at the site of the Pericardial Fluid (PCF). *Objectives:* > 1. Compare RNA-Seq data using blood taken at baseline to the end of treatment. To see how anti-tuberculosis therapy (ATT) affects the transcriptional signatures in blood sampling. > 2. Compare the RNA-Seq data from PCF to blood. > 3. Compare PCF collected at baseline compared to PCF collected at day 1-3. This will grant us insight into how ATT affects transcriptional signatures in patients with Pericardial TB. *Peopleinvolved:* Principal Investigator: Mpiko Ntsekhe, Robert Wilkinson Bioinformatics Support Team: Kuhle Mcinga, Ephie Geza, Mohammed Farahat Support Requested: Rob Rousseau *Short description of the project:* This clinical trial is a unique opportunity to comprehensively measure and compare Mycobacterium tuberculosis (Mtb) responding immune responses(including conventional & unconventional T cell response as well as innate response) in blood and at the site of the Pericardial Fluid (PCF). *Objectives:* > 1. Compare RNA-Seq data using blood taken at baseline to the end of treatment. To see how anti-tuberculosis therapy (ATT) affects the transcriptional signatures in blood sampling. > 2. Compare the RNA-Seq data from PCF to blood. > 3. Compare PCF collected at baseline compared to PCF collected at day 1-3. This will grant us insight into how ATT affects transcriptional signatures in patients with Pericardial TB. *Peopleinvolved:* Principal Investigator: Mpiko Ntsekhe, Robert Wilkinson Bioinformatics Support Team: Kuhle Mcinga, Ephie Geza, Mohammed Farahat Support Requested: Rob Rousseau *Short description of the project:* This clinical trial is a unique opportunity to comprehensively measure and compare Mycobacterium tuberculosis (Mtb) responding immune responses(including conventional & unconventional T cell response as well as innate response) in blood and at the site of the Pericardial Fluid (PCF). *Objectives:* > 1. Compare RNA-Seq data using blood taken at baseline to the end of treatment. To see how anti-tuberculosis therapy (ATT) affects the transcriptional signatures in blood sampling. > 2. Compare the RNA-Seq data from PCF to blood. > 3. Compare PCF collected at baseline compared to PCF collected at day 1-3. This will grant us insight into how ATT affects transcriptional signatures in patients with Pericardial TB. 2024/07/11 2024/08/06 0.0
Assessing the efficacy and safety of new antibiotic regimens against serious bacterial infections caused by carbapenem-resistant Enterobacterales and/or Pseudomonas aeruginosa (CREP) Support requested by: Fadheela Patel Principal investigator: Prof Adrian Brink Bioinformatics support team: Ephifania Geza The overall goals of this observational study are to collect prospective, longitudinal cohort and feasibility data to inform study design and comparator selection for future interventional clinical trial(s). The future trials will assess the efficacy and safety of new antibiotic regimens against serious bacterial infections caused by carbapenem-resistant Enterobacterales and/or Pseudomonas aeruginosa (CREP). 1. To archive the sequence data (raw sequencing data in fastq format) in GenBank 1. To predict the species of each isolate 1. To determine determine the MLST 1. To identify resistance-conferring genes 1. To identify the surface polysaccharide synthesis loci from bacterial whole genome sequences 1. To generate a ML phylogenetic tree 1. To generate a phylogenetic tree associated with the presence of AMR genes. 2024/02/05 2024/03/12 0.0
Genetics factors associated with resistance to Mycobacterium tuberculosis (Mtb) infection in healthcare workers Principal investigators: Muki Shey Co-investigators: Prof. Graeme Meintjes, Dr Charlotte Schutz Bioinformatics support team: Gerrit Botha and Ephifania Geza To determine the genetics factors that are associated with resistance to Mycobacterium tuberculosis (Mtb) infection in healthcare workers with sustained occupational exposure using whole exome sequencing and SNP genotyping data. 1.To determine the genetics factors that are associated with resistance to Mycobacterium tuberculosis 2023/09/05 2024/01/25 0.0
Effects of HIV and ageing on adipose tissue biology Principal investigator : Dr. Amy Mendham Bioinformatics support team : Katie Lennard and Ephifania Geza We propose three time points across a woman’s life course where alterations in fat distribution are evident and may be further exacerbated in people living with HIV. These alterations relate to the influence of hormonal regulation on adipose tissue redistribution from the gynoid to android region, which can have profound effects on subcutaneous adipose tissue biology and subsequent metabolic health. The data will include samples collected at three time points before, during and after menopause (500 human samples). The bioinformatics analysis involve the RNA preprocessing, analysis and downstream analysis. To investigate the effects of HIV and ageing on adipose biology, how it relates to hormonal regulation, fat distribution and subsequent health in African women compared to women without HIV. 2022/02/10 2024/01/24 0.0
Investigating the prevalence of Helicobacter Pylori in South Africa Support requested by: Dr Francis Innocent Ekparolaguaziba Principal investigator: Prof Mashiko Setshedi Coinvestigators: Prof Adrian Brian Bioinformatics support team: Ephifania Geza and Katie Lennard To analyze and characterize the H. pylori using WGS data and strains obtained from public databases. To understand the complete set of genes linked to AMR and virulence of H. pylori:, a case of Cape Town South Africa. 2023/12/11 2023/12/11 0.0
Whole genome sequencing of blowflies (Diptera: Calliphoridae) UCT Principal investigator: Sabelo Hadebe Co-investigators: Nontobeko Mthembu Support requested by: Nontobeko Mthembu Bioinformatics support team: Ephifania Geza and Gerrit Botha The research aims to determine the effects of the early development of severe asthma using 31 mouse models that belong to one of the eight categories. 1. To perform RNA seq analysis for the 31 mouse samples 2. To identify genes that are differentially expressed given the early development of severe asthma 3. To perform functional analysis for the lists of genes 2023/09/20 2023/09/20 17.0
Whole genome sequencing of blowflies (Diptera: Calliphoridae) UCT Principal investigator: A/Prof Laura Heathfield Co-investigators: Dr Marise Heyns, Kyle Kulenkampff Support requested by: Kyle Kulenkampff Bioinformatics support team: Ephifania Geza and Gerrit Botha The research aims to generate whole genome sequences of forensically relevant blowflies (Diptera: Calliphoridae) to assist in molecular identification of said species. 1. To generate *de novo* genome assemblies of the relevant blowflies 2. To perform genome alignment for the three of the species 2023/06/27 2023/06/28 2.5
Lactobacillus iners MAGs: Genital inflammation in adolescent women Support requested by: Adijat Jimoh Principal investigator: Dr. Anna-Ursula Happel, Prof. Heather Jaspan Bioinformatics support team: Ephie Geza, Gerrit Botha This study emanated from the following questions 1. What is the evolutionary relationship between Lactobacillus iners (L. iners) Metagenome-Assembled Genomes (MAGs) found in the vagina of women with genital inflammation and those without 1. Are there are functional pathways that are associated with L. iners strains in adolescent women with low vs high genital inflammation ( or do certain pathways drive transitions from low to high genital inflammation or vice versa? 1. Which MAGs are shared within participants across time points? Do the same L. iners strains/MAGs remain present when genital inflammatory or BV status changes, or do other strains / MAGs appear? The data include metagenomes from 38 samples, that is, 13 participants (12 with three time points each and one with two time points). 1. To discover the L. iners strains that are associated with genital inflammation (and bacterial vaginosis) 1. To determine functional gene pathways, similarities/differences associated with high and low genital inflammation in adolescent women 1. To identify shared genetic relationship/abundance of Lactobacillus iners strains across different longitudinal time points 2023/06/22 2023/06/22 18.0
An atlas of blood regulators of liver fibrosis during schistosomiasis Principal investigator : Dr Justin Nono Student: Severin Kamdem Bioinformatics support team: Katie Lennard Metagenomics data was acquired for the same sample set as we have RNAseq and metabolomics data available. This is a subproject of Project 'RNAseq blood profiling of liver fibrosis during schistosomiasis' Preprocessing and downstream analysis of Ilumina .fastq reads using the nf-core/RNAseq pipeline https://github.com/kviljoen/RNAseq 2022/03/17 2023/05/11 0.0
Testing Redmine project features - sub-project Principal investigator : Prof Nicola Mulder Bioinformatics support team: Gerrit Botha This is just a project where I will be testing all functionalities of a Redmine project. Gant charts, file upload, ticketing emails, permissions e.g. To continue testing features in Redmine. 2023/05/02 2023/05/11 0.0
Variant calling: detecting SNPs in WGS data relative to H37Rv Principal investigator: A/Prof Anna Coussens Coinvestigators: Prof Robert Wilkinson and Dr Anastasia Koch Support request by: Mthawelanga Ndengane Bioinformatics support team: Ephifania Geza and Gerrit Botha Data analysis involve the mapping and alignment, SNP-calling, and detection of SNPs (relative to H37Rv) in WGS for 6 strains. Mapping and alignment, SNP-calling, and detection of SNPs (relative to H37Rv) in WGS for 6 strains 2022/03/28 2023/05/11 26.5
Analysing the sudden unexpected death of an infant through molecular autopsy and co-segregation analysis Principal investigator: Dr Laura Heathfield Coinvestigators: Dr Shameemah Abrahams Support request by: Sune Mostert Bioinformatics support team: Ephifania Geza and Gerrit Botha To use variant calling on next generation sequencing data to investigate (or better explain) the sudden unexpected death of an infant through molecular autopsy and co-segregation analysis. Investigate the sudden unexpected death of an infant through molecular autopsy and co-segregation analysis. 2022/05/30 2023/05/11 0.0
The impact of aSTI on the foreskin: spatial transcriptome and ex vivo HIV-1 infection Principal investigator: Dr Nyaradzo Chigorimbo-Tsikiwa Coinvestigators: Support request by: Dr. Nyaradzo Chigorimbo-Tsikiwa Bioinformatics support team: Ephifania Geza and Gerrit Botha We assess the impact of aSTI on HIV-1 susceptibility in the male genital tract using gene expression at a fine scale. To assess the impact of aSTI on HIV-1 susceptibility. 2023/05/04 2023/05/11 69.45
Variant calling using whole exome sequencing (WES) data. Principle Investigator: Prof. Gasnat Shaboodien Co-investigator: Polycarp Ndibangwi Bioinformatics support team: Gerrit Botha, Thandeka Mavundla, Ephiefania Geza The WES data used for this analysis were sequenced using the TWIST Human Core Exome with RefSeq spike kit. To generate VCFs for the WES data 2022/08/04 2023/05/11 40.0
Clinical and molecular epidemiology of carbapenemase-producing Enterobacterales in hospitalized patients in the Cape Town Metropole, South Africa Principal investigator : Dr Clinton Moodley & Dr. Shantelle Claassen-Weitz Bioinformatics support team: Katie Lennard Integrating clinical and molecular epidemiology and determining transmission patterns of CREs among in-patients in both private and public sectors in the Cape Town Metropole, South Africa. This to enable a better understanding of the risk factors, clinical outcomes, and transmission patterns of CREs among in-patients in the Cape Town Metropole. The objective is to inform evidence-based regional patient management, antimicrobial stewardship (AMS) and infection prevention and control (IPC) strategies and interventions. All phenotypically confirmed carbapenem non-susceptible bacterial isolates collected during the study period were submitted for confirmatory PCR, to identify the presence of a carbapenemase (CPE) or otherwise non-carbapenemase-producing carbapenem-resistant Enterobacterales (NCPE). Phenotypically confirmed carbapenem non-susceptible bacterial isolates submitted for whole genome sequencing (WGS) were stored at -80ᵒC until cultivated overnight at 35ᵒC. Two pooled libraries, representing 11 and 12 isolates respectively, were sequenced using the MiSeq Reagent Kit v2 (300 cycles). Define AMR profiles, VFs, MLSTs, plasmids, mobile genetic elements (MGE), capsular typing (using Kaptive) and Klebsiellar profiling (using Kleborate) *Department:* Medical Microbiology 2022/09/19 2022/09/19 0.0
Metagenomic sequencing of CSF samples. Principal investigators: Nicola Mulder Co-investigators: Adrian Brink and Diana Hardie Support request made by: Gert Marais Bioinformatics support team: Ephifania Geza and Gerrit Botha Given 47 patients (each sequencing run includes 10 patients and 2 controls on an Illumina MiSeq platform running 600 cycle v3 kits generating ±(10-15) Gb of short read data per run), this project is aimed at developing a bioinformatics pipeline that facilitates the prediction of the patient's disease phenotype based on polymorphisms in loci relevant to infectious responses. 1. To perform nucleotide metagenomics 1. Protein similarity and 1. Transcriptome analysis 2021/12/06 2021/12/10 0.0
Vaginal 16S full length reconstruction in women who give birth to foetal alcohol spectrum disorder babies Principal investigator : Prof Sian Hemmings (smjh@sun.ac.za) Support request made by: Ms Lauren Martin (MSc student, lcmartin@sun.ac.za) Bioinformatics support team: Katie Lennard Does our optimised Illumina iSeq100 full-length 16S rRNA gene sequencing technique and gene reassembly, when evaluated alongside PacBio 16S sequencing and V1-V2 hypervariable sequencing data, improve the species-level taxonomic resolution of the bacterial species present in the maternal vaginal swab samples obtained from women who have given birth to infants with and without Foetal Alcohol Spectrum Disorders (FASD)? Assist and teach development of a pipeline that makes use of EMIRGE and dada2 to assemble the tagmented amplicon and subsequently assess the bacterial composition and perform statistical analysis of the samples. Advise on appropriate statistics and scripts to evaluate whether Illumina iSeq100 sequencing technique and PacBio 16S sequencing improves bacterial taxonomic resolution compared with general V1-V2 16S hypervariable region amplification 2021/11/09 2021/11/09 0.0
Serratia and Group B Streptococcus characterisation Principal investigator : Dr Clinton Moodley Students: Kimona Rampersadh (PhD) Dr. Amanda Overmeyer (MMed) Bioinformatics support team: Katie Lennard The data analysis will involve WGS de novo and remapping to reference sequences. The genomes generated will need to be annotated, resistance and virulence elements identified, and phylogenetic trees constructed To determine Serratia and GBS resistance genes and virulence factors 2021/11/09 2021/11/09 29.0
S. pneumoniae genomics Principal investigator : Dr. Felix Sizwe Dube Bioinformatics support team: Jon Ambler - Primary bioinformatician Gerrit Botha - Assistance with data transfer / management Katie Lennard - Assistance with running pipelines Partners: Prof. Mark Nicol, UWA, Australia Prof. Stephen Bentley, Wellcome Trust Sanger Institute Prof. Angela Brueggemann, Oxford, UK Prof. Robert Heyderman, UCL, London, UK Prof. Nicola Mulder, UCT, Cape Town, South Africa Research group: Molecular and Cell Biology The overall goal of this study is to describe genome-wide lineages and locus-specific variation of S. pneumoniae colonising the nasopharynx of children enrolled in an intensively sampled, PCV-13 vaccinated birth cohort of 1000 infants with a high incidence of LRTI, by linking whole pneumococcal genome sequences to detailed phenotype metadata. The overall goal of this study is to describe genome-wide lineages and locus-specific variation of S. pneumoniae colonising the nasopharynx of children enrolled in an intensively sampled, PCV-13 vaccinated birth cohort of 1000 infants with a high incidence of LRTI, by linking whole pneumococcal genome sequences to detailed phenotype metadata. Hypothesis 1 The ‘unmeasured’ contribution of pneumococci to non-bacteraemic LRTI will manifest as genetic differences in pneumococcal strains between children who develop LRTI and those who do not. Research Question 1. Does acquisition of one serotype increase/decrease the likelihood of subsequent acquisition of the same or different serotype? (MSM, transition rates between serotypes). 2. Are there pneumococcal lineages that are associated with LRTI disease progression Hypothesis 2 There are evolutionary events (mutations) that occur during carriage that increase the risk of progression to LRTI. Research Question 1. What genetic changes occur during persistent colonisation that increases the risk of progression to LRTI? 2: Are there any lineages that have a protective effect from disease (population level) 2020/02/10 2021/09/22 107.1
S. pneumonia resistance gene analysis Principal investigator: Dr Clinton Moodley Co-investigators: Kimona Rampersadh (PhD); Dr. Amanda Overmeyer (MMed) Bioinformatics support team: Katie Lennard The genetic characterization of pathogenic bacteria causing infection, and associated resistance and virulence factors which may contribute to these infections. Prokaryotic nucleic acid will be sequenced on the Illumina MiSeq platform using bacterial WGS from cultured isolates, to generate reads of bacterial genomes present. The data analysis will involve WGS de novo and remapping to reference sequences. The genomes generated will need to be annotated, resistance and virulence elements identified, and phylogenetic trees constructed. 2021/05/19 2021/05/19 0.0
Longitudinal analysis of antibody titres in a convalescent COVID-19 health care worker cohort. Principal investigator: Prof Jonathan Blackburn Co-investigators: Prof Burgers; Prof Ntusi Support request made by: Michelle Mullins Bioinformatics support team: Katie Lennard All the data has been generated and pre-processed. The data was generated on two different platforms (in-house microarrays and Sengenics microarrays); but the same antigens and controls are present on both platforms. This data needs to be merged, so downstream analysis can be done. We have data on ~ 45 samples at 4 time points, and an additional ~80 samples at one time point. Each sample has ~ 26 data points. Data was generated from 4 different time points and a pre-pandemic control group. On the one platform we have data for 3 time points (V1, V2, V3), and on the other platform we have data for three time points (V1, V3, V5) as well as pre-pandemic data. All the data has been generated and pre-processed. The data was generated on two different platforms (in-house microarrays and Sengenics microarrays); but the same antigens and controls are present on both platforms. This data needs to be merged, so downstream analysis can be done. 2021/05/18 2021/05/18 0.0
Effects of agriculture on microbial diversity and composition Post-doc : Jessica da Silva Bioinformatics support team: Gerrit Botha What are the effects of agriculture on microbial diversity and composition within soils in Limpopo, South Africa? This study aims to investigate the effects of agriculture on microbial diversity and composition by comparing microbiomes from different agricultural crops, as well as proximity to natural vegetation within Limpopo, South Africa. Specifically, does the soil microbial community differ at the edges of a crop compared to the centre? Is there more diversity within smaller fields which have a greater edge to centre ratio?  Moreover, is there variability in microbial composition amongst different crop types and with different agricultural practices (e.g., tilling, herbicide/pesticide use)? 2021/02/02 2021/02/02 0.0
Transcriptomic profiling of HIV exposure in infant Treg cells Investigator: Sonwabile Dzanibe Principal investigator : Prof Clive Gray Bioinformatics support team: Katie Lennard The goal of the work is to determine transcriptomic differences in sorted Treg cells between infants who are HIV-exposed uninfected and HIV-unexposed uninfected. RNAseq analysis (Ilumina) 2020/05/25 2020/05/25 58.0
UCT-Neurology Research Group study of neuromuscular diseases in African populations Principal investigator : Prof Nicola Mulder, A/Prof Jeanine Heckmann CBIO post doc: Dr. Melissa Nel Bioinformatics support team: Gerrit Botha For this project we will be performing WGS on individuals with various NMDs including MG and ALS. 1. SA_ALS: WGS from South Africans with amyotrophic lateral sclerosis (ALS), currently 25 samples with funding secured for an additional 20 (will ship DNA soon) and application pending for funding to sequence a further 45. At a later date we will have the opportunity to apply for access to data from 75 samples which will be sequenced by collaborators. 2. SA_MG: WGS from South Africans with myasthenia gravis (MG), 25 samples and no plans to sequence more. 3. SAHGP: WGS from 24 healthy South Africans. We have applied for funding to sequence an additional 16 control genomes. We plan to apply to access other local WGS datasets when these become available to increase the sample size for controls. To get all the genomes aligned and jointly called through a b38 bwa-GATK pipeline. The b37 version of the pipeline has been setup at Wits but testing and setup are needed on Ilifu. Help would also be needed in terms of getting data transferred and moved. Help with downstream annotation and comparisons e.g VEP, SNPEff can also be expected. 2019/05/28 2020/02/20 141.65
MetH deletion detection in MTB strains XXX A large gene deletion in the Rv2124c locus has been identified in the CDC1551 clinical strain of M. tuberculosis. We’re interested to find out how widespread are this genotype and other polymorphisms mapping on this region of the genome among clinical isolates of M. tb? XXX 2018/02/12 2020/02/17 4.0
M. smegmatis variant calling pipeline Contact: Mel Chegalroyen Principal investigator: Prof. Digby Warner Submitter: Lucas Raphela To work collaboratively with a bioinformatician to optimize a pipeline for Mycobacterium smegmatis and Mycobacterium tuberculosis SNP calling to be adopted for future projects at the MMRU. SNP calling Set up pipeline for reusability Interpretation of results 2020/01/28 2020/02/17 17.0
Transcriptomic analysis of CNS TB Principal investigator : Prof Muazzam Jacobs Co-investigator: Nai-Jen Hsu Collaborator: Natalie Nieuwenhuizen (Max Planck Institute for Infection Biology) Bioinformatics support team: Katie Lennard Tuberculosis of the central nervous system (CNS-TB) is the most severe form of tuberculosis which often associates with high mortality. Moreover, Mycobacterium tuberculosis (M. tuberculosis) infection in the CNS causes the most devastating manifestation leading to severe neurological complications and morbidity. An attenuated vaccine, Bacillus Calmette-Guerin (BCG) is widely used to prevent TB, which have demonstrated various protections against CNS-TB and other neurodegenerative diseases. Little has understood of the cells that regulate infection, the respective functions and contributions of different cell types to overall protection of the CNS. There is an urgent need for understanding basic cellular and immune functionality. The CNS is an immune-privileged site which is highly regulated to control access to the CNS tissue. This is designed to maintain homeostasis and limit potential pathology damage. In response to M. tuberculosis infection, CNS resident cells produce pro-inflammatory cytokines that leads to neuroinflammation and subsequent structural destruction or neurophysiological dysfunction. Although microglia are the principle targets of M. tuberculosis infection in the brain, other CNS cell types such as astrocyte and neurons can internalize M. tuberculosis bacilli, and thereby elicit an immune response. This project aims to investigate the immune response of astrocytes, neurons and microglia during exposure to both virulent (H37Rv) and non-virulent (BCG) mycobacterial strains. Clarifying the nature of the immune responses of the resident CNS cells will be beneficial to improved vaccine development and therapeutic strategies. Perform differential gene expression analysis of microarray-based gene expression data, comparing CNS cells (astrocyte, neuron, microglia) infected with H37Rv or BCG. 2019/12/04 2020/02/17 51.5
Testing human WGS alignment, calling and joint calling on DRAGEN hardware Principal investigator : Prof Nicola Mulder Bioinformatics support team: Jon Ambler There has been a request from Nicky to test the performance of the DRAGEN hardware on the pipelines that we used to process whole genome sequencing data for the design of the H3Africa genotype array. The DRAGEN hardware cost R250K We would consider purchasing the hardware to cater for such large projects in the future if * The performance of the hardware outperforms any of the resources we have locally or at another compute facility (e.g CHPC, Wits) * The power consumption of the hardware is still in range to fit into the current threshold we have at the IDM data centre. * Getting huge amounts of human WGS data in and out of the hardware does not cause a bottleneck makes the use of the hardware for these purposes unrealistic. The testing process would consist of the following. 1. We get [this pipeline](http://edicogenome.com/pipelines/gatk-best-practices-workflow-on-dragen) up and running: and test using a GiaB sample (complete genome or just chr22). 2. We check how easy it is to add additional steps into the worklow in (1), so that it corresponds to the H3A chipdesign workflow. 3. We test [this pipeline](http://edicogenome.com/pipelines/dragen-joint-genotyping-pipeline/) . We assume the workflow in (1) is part of this pipeline so we just need to replace that part with the workflow in (2). Here we can use a few 1KG samples to test. Other things to consider but we can figure it out as we go 1) We need to understand the pipeline for getting data in and out of the hardware. E.g. can we attach external storage and do processing on there or do we need to move data to local storage so that processing is optimal. I'm quite sure we would need to work on a dataflow pipeline if we plan to process several WGS samples. 2) I'm not sure if the hybrid cloud model will work for WGS pipelines except if the processing in the cloud is being done on smaller files. We also need to take into consideration the security model if working with human data and pushing the data to the cloud. 3) We also need to understand their pricing model. If we do not opt for the cloud we probably do not need to pay for the giga base pair and can we just purchase the hardware. CPGR has recently purchased DRAGEN hardware and they have agreed that we can do the testing on their equipment. 2018/03/13 2020/02/13 0.0
Functional anti-tubercular screening of *M. tuberculosis* and *M. smegmatis* by CRISPRi-seq Principal investigator : Prof Valerie Mizrahi Co-investigator: Mandy Mason Collaborator: Dr Luiz Carvalho (Francis Crick Institute) Collaborator: Dr Jeremy Rock (Rockefeller University) Bioinformatics support team: Katie Lennard 'In this project, I aim to identify bacillary factors that mitigate cidality of anti-tubercular drugs by revealing Mtb mutants displaying hypersusceptibility to routinely used and experimental antitubercular drugs with differing modes of action. We aim to utilise high-throughput CRISPRi-seq to perform a chemical-genetic screen to estimate mutant fitness in a comprehensive pooled library of Mtb mutants representing both essential and non-essential genes. This will expose mutants with decreased fitness in response to antitubercular drug treatment over time.' Advice regarding data handling and storage, as well as data processing and analysis: Mandy has been using web based software (MAGeCk VISPR) but would appreciate advice in this regard. Mandy currently has data sets for *M. smegmatis* and is expecting sequencing from experiments in M. tuberculosis in mid-February. She would require assistance to process these. 2020/01/21 2020/01/21 18.25
Prostate cancer WGS microbiome Principal investigator : Prof Luiz Zerbini Co-investigator: Mariet Wium Co-investigator: Stefano Cacciatore Bioinformatics support team: Katie Lennard WGS was performed on clinical prostate samples (tumour and normal). The primary goal was human DNA (RNAseq was also performed) but the interest now is to try identify the microbiome. Specific caveats: the sampling procedure involves going through the ureter so contamination probable. Process WGS reads using the UCT-YAMP pipeline. Gerrit has already subtracted human reads (but there is also a step in the pipeline to do this). 2019/11/15 2020/01/14 99.0
An atlas of blood regulators of liver fibrosis during schistosomiasis Principal investigator : Dr Justin Nono Student: Severin Kamdem Bioinformatics support team: Katie Lennard Schistosomiasis is the most debilitating human helminthiasis. Whereas the acute phase of the disease can afflict individuals at first exposure, chronic schistosomiasis is a more frequent and severe form of the disease in endemic areas which affects individuals that have repeatedly been exposed to schistosomes. In such endemic areas, constant re-infection and the ensuing poorly symptomatic course of the disease at early stages undermine the benefits of available control strategy of which chemotherapy with praziquantel is the core component. Moreover, tissue fibroproliferative pathology consequent to chronic schistosomiasis which is not fully reversed by praziquantel treatment leaves most exposed individuals with a long-lasting-to-persistent impairment of the affected organ function. The development of alternative therapeutic measures that would integrate the control of fibroproliferative pathology upon infection with schistosomes is therefore desperately needed as this would provide more efficient means to achieve the elimination of the disease burden. For this, a fine knowledge of the molecular bases of tissue pathology during schistosomiasis is therefore crucial. In the present project funded by the European Union Clinical Trial Partnership, the UK Royal Society and the African Academy of Science, we question whether the onset / progression of tissue fibroproliferative pathology during schistosomiasis is individual-specific i.e. whether intrinsic parameters of a given host would make him/her more/less likely than others develop tissue fibrosis following schistosomiasis infection? As a working hypothesis, mostly given our preliminary observations, we postulate that the gene expression profile of an individual would determine his/her likelihood to develop fibroproliferative pathology during schistosomiasis. The overall objective of our study is therefore to unequivocally establish a gene expression profile that trigger/promote susceptibility to liver fibrosis during hepatointestinal schistosomiasis and in so doing generate a comprehensive database of druggable fibrosis-regulating factors. If proven determinant, the ultimate hope will be to be able to target/use identified factors to control schistosomiasis and tissue fibrosis. Illumina RNA sequencing reads have been generated i.e. up to 45 million reads per sample for a total of 40 samples (to be clustered in 4 groups of 10 for analyses). The aim is to examine gene expression in the blood of children with vs. without liver fibrosis during schistosomiasis Preprocessing and downstream analysis of Ilumina .fastq reads using the nf-core/RNAseq pipeline https://github.com/kviljoen/RNAseq 2019/12/05 2019/12/05 57.5
HIV latency transcriptomics of resting CD4+ T cells Principal investigator : Walter Nevondo Bioinformatics support team: Katie Lennard Despite anti-retroviral therapy HIV persists in a latent form of replication‐competent genome in anatomical and cellular reservoirs. This study has the following specific questions: 1. What is the permissive cellular environment that facilitates HIV-1 productive and latent infection? 2. What are the cellular regulatory mechanisms of HIV-1 latency maintenance and persistence? * Is latency maintained through global or cell-specific mechanisms? * Are there other novel mechanisms responsible for HIV-latency maintenance? 3. Does HIV-1 infected resting CD4+ T cells show specific transcriptional signature and is there any surface marker unique to these cells? XXX 2018/07/05 2019/07/08 3.0
The impact of fire and herbivores on soil microbes Principal investigator : Heidi Hawkins Support request made my: Marie-Liesse Vermeire Bioinformatics support team: Katie Lennard We study the impact of Fire and Herbivores on soil microbes, in savanna and grassland ecosystems of South Africa. Illumina 16S rRNA gene amplicon profiling used for microbiota profiling. We will use the dada2 pipeline to teach Marie-Liesse how to process the data. 2019/06/24 2019/07/08 33.5
The effect of tadpole-tail blastema extract on rhabdomyosarcoma Principal investigator : Prof Sharon Prince Support request made by: Ms Jenna Bleloch, Professor Vincent Harrison Bioinformatics support team: Katie Lennard Identifying peptides/proteins responsible for mediating anti-cancer activity of tadpole-tail blastema extract by mass spectrometry. XXX 2018/07/04 2019/07/08 0.0
TB Biomarkers Analysis (Proteomics) Principal investigator: Prof Jonathon Blackburn Bioinformatics support team: Katie Lennard TB biomarkers study consisting of 120 urine samples with the following groups based on smear culture: TB+/HIV+; TB+/HIV-;TB-/HIV+;TB-/HIV- The classification of TB+ patients has high accuracy (identified with chest x-ray and +ve sputum culture); However after that there appears to be more of a continuum of ‘TB-ness’. E.g. from this cohort the majority of patients labelled as TB- are probably latently infected with TB; further complexity stems from the fact that HIV+ individuals do no display typical symptoms of TB e.g. fever and they might have disseminated TB – for this group (HIV+/TB-) we therefore probably have the least amount of confidence in terms of TB classification; secondly, individuals who previously had TB but who are now labelled as TB- may look different from individuals who never had active TB (but who may have LTBI) as research has shown that even after TB treatment these individuals may still have TB lung lesions/granulomas. XXX 2018/03/07 2019/07/08 131.0
RNAseq analysis of Folliculitis keloidalis nuchae (FKN) Principal investigator : Dr Afolake Arowolo Bioinformatics support team: Katie Lennard Folliculitis keloidalis nuchae (FKN) XXX 2018/07/26 2019/07/08 0.0
Proteogenomic analysis of 2 differentially virulent strains of Mycobacterium tuberculosis Principle Investigators: Prof. Nicola Mulder, Prof. Jonathan Blackburn Matthys Potgieter Suereta Fortuin Jon Ambler Integration of genomics and proteomics data to improve genome annotation, detect polymorphisms, interrupted coding sequences, and differentially abundant proteins between 2 differentially virulent strains of Mycobacterium tuberculosis. XXX 2019/04/29 2019/07/08 2.0
Multi-omics analysis of 2 differentially virulent strains of Mycobacterium tuberculosis Beijing. Principle Investigators: Prof. Nicola Mulder, Prof. Jonathan Blackburn Jon Ambler Matthys Potgieter Suereta Fortuin Integration of genomics, transcriptomics, and proteomics data to investigate the determinants of differential virulence in two clinical isolates of Mycobacterium tuberculosis Beijing. XXX 2019/04/29 2019/07/08 2.0
MetaNovo: Metaproteomics database generation Principal investigator: Prof Nicola Mulder Phd student: Matthys Potgieter A pipeline to generate protein sequence databases from MGF mass spectrometry data and a UniProt release, using de novo sequencing and probabilistic ranking. XXX 2019/04/20 2019/07/08 0.0
Characterizing the effects of HIV exposure on the infant stool microbiome. Principle investigator: Jonathan Blackburn Primary author: Suereta Fortuin Bioinformatics support team: Imane Allali, Matthys Potgieter Characterizing the effects of HIV exposure on the infant stool microbiome at birth and 7 days. XXX 2019/04/25 2019/07/08 12.0
Top-level project for metaproteome related work, including pipelines and bionformatics support. Principle investigator: Prof. Nicola Mulder, Prof. Jonathan Blackburn Imane Allali Matthys Potgieter Pipelines for functional and taxonomic analysis of metaproteome data, including bioinformatics support. Differential abundance analysis, gene set enrichment, pathway analysis, and the identification of clinical polymorphisms, related to relevant sub-projects. XXX 2019/04/25 2019/07/08 12.0
TB Biomarkers Analysis (Lipidomics) Principal investigator: Prof Jonathon Blackburn Bioinformatics support team: Katie Lennard TB biomarkers study consisting of 45 urine samples with the following groups: active TB (N=15), latent TB (N=15), no TB (N=15) The aim this project is to identify lipid compounds that could be discriminatory between control (latent), active TB and TB-negative cases 2018/11/26 2019/07/08 30.0
TB Biomarkers Analysis (Lipidomics) Principal investigator: Prof Jonathon Blackburn Bioinformatics support team: Katie Lennard TB biomarkers study consisting of 26 urine samples with the following groups: active TB (N=16), latent TB (N=10) The aim this project is to identify lipid compounds that could be discriminatory between control (latent) and active TB cases 2018/10/01 2019/07/08 54.0
Microbiota profiling of cows milk with and without traditional fermentation Principal investigator : Prof Mike Levin Support request made by: Pieter de Waal Bioinformatics support team: Katie Lennard Identification and description of the microbiome found in traditionally fermented milk products versus fresh, unpasteurized cow’s milk from rural South Africa. Three (3) milk samples will be collected from farms in rural Eastern Cape: unpasteurized cow’s milk, traditionally homemade fermented milk and commercially bought fermented milk. The three sample types will be compared. Do the three sample types differ from each other? How can this knowledge be translated into the Clinical Allergology? Perform preprocessing of Illumina MiSeq 16S reads (supplied by CPGR) using CBIO's dada2 pipeline and downstream exploratory analysis in R 2019/02/26 2019/07/08 23.5
Identification of biomarkers in keloids and folliculitis keloidalis nuchae (FKN) Principal investigator: Relebohile Matobole (PhD candidate) Supervisors: Prof Khumalo and Prof Bayat Bioinformatics support team: Gerrit Botha, Jon Ambler (RNA-seq), Katie Lennard (RNA-seq Nextflow pipeline), Matthys Potgieter (Proteomics). the aim of the study is identify biomarkers (genes/proteins/pathways) that are specific to skin scarring disorders [keloids and folliculitis keloidalis nuchae (FKN)] and characterizing these biomarkers during the healing process. * RNA seq * Move the data onto the Ilifu cluster * Configure the NextFlow RNA seq pipeline * Report on the QC of the data & pipeline outputs * Export feature counts for analysis in R * Conduct gene set enrichment analysis and pathway analysis using Bioconductor GAGE package or similar * Proteomics * TBD * Integration and interpretation * Identify pathways / gene sets that are enriched in both the protein and RNA data sets. 2019/04/24 2019/06/18 130.0
Setting up the BST helpdesk Principal investigator: Prof Nicola Mulder Bioinformatics support team: Jon Ambler This project is for keeping track of the setup done on the BST helpdesk and also to handle issues that needs to be resolved or features that need to be added. Setting up a helpdesk. 2018/01/31 2019/02/06 12.0
Pathogen outbreak study - Pseudomonas single isolate WGS Principal investigator : Prof Mark Nicol Bioinformatics support team: Katie Lennard This study was prompted by an unusual outbreak of wild type Pseudomonas that coincided with the Cape Town drought. Preliminary molecular analysis suggests clonality, the interest is therefore to try an establish how this outbreak came about and whether the drought is in some way responsible. Pseudomonas are waterborne opportunistic pathogens that can form biofilms in plumbing pipes. One hypothesis is therefore that the drought, with decreased water pressure allowed increased biofilm formation and subsequently increased concentrations in drinking water. The data will include WGS of blood culture isolates and water samples from before, during, and after the outbreak (96 samples). Perform whole-genome assembly of single isolates. Trace clonality, SNV and look into drug resistance and virulence elements. 2019/01/30 2019/01/31 419.5
Genetic predictors of hypothalamic-pituitary-adrenal suppression in school age children treated with corticosteroids at the allergy clinics of the Cape Peninsula Principal investigator: Prof. Ekkehard Zöllner, Prof. Nicola Mulder Bioinformatics support team: Gerrit Botha Evidence suggests that some asthmatic children on corticosteroids may be more prone or resistant to develop hypothalamic-pituitary-adrenal suppression (HPAS) than others. A cross-sectional study with trios of asthmatic children previously investigated for HPAS, and their respective parents, will be performed to identify possible mutations or variants that may predict or protect against HPAS. DNA will be extracted from the saliva obtained from 90 patients, who previously underwent metyrapone testing, and from that of their parents. These 90 patients will be divided into 3 groups of 30 patients each according to their post-metyrapone ACTH levels (low-, mid- or high-range). Whole exome sequencing followed by regression analysis and an exact test will be done to determine statistical significance of identified variants, if any. The findings are verified by Sanger sequencing and the potential impact of the variants is determined. An initial pilot study of 10 trios is planned. To identify mutations or variants that predict or protect against HPAS in asthmatic children treated with corticosteroids (CS). This means we are looking for both previously published variants as well as novel variants. I am hoping we are going to find at least one causative variant so that we can design a suitable screening test. Mode of inheritance: Based on my reading of the literature - Mendelian, rather than complex. 2017/12/19 2018/11/27 42.7
Poxvirus genome sequencing Principal investigator: Prof Anna-Lise Williamson Support request made by: Dr Nicola Douglass Bioinformatics support team: Gerrit Botha The group has done some variant calling of lumpy skin viral strains at CAF. They would like to understand the process so that they can do the calling themselves and not be dependent on only CAF to do the analysis. The are also planning to to calling and assembly of pox viruses soon. So they would also need to be able to do viral assemblies. * To understand what sequence processing and analysis have been done on the lumpy skin virus. * To look into tools that can do calling and assembly with IonTorrent data. * To get things setup so that the group can continue analysis by themselves. 2018/03/22 2018/11/27 7.5
Setting up a H3A chip imputation service on Azure Principal investigator: Prof. Nicola Mulder Bioinformatics support team: Gerrit Botha, Ayton Meintjes CBIO staff: Mamana Mbiyavanga We have been assigned a $40K sponsorship from Azure. The initial sponsorship was $20K and was awarded sometime in March 2016. We then renewed it in March 2017 and then in some way the amount of sponsorship increased to $40K. We again renewed it for 2018 but it is only available till the end of this year. In 2016 we [setup an Ubuntu cluster](https://github.com/grbot/azure-cbio) but the CIFS mounts were not sufficient for the large numbers of reads and writes (Mamana at that point tested the H3A chip evaluation and used around $13K of our sponsorship but we were not able to show any results for that). In 2017 we were busy with other research work but looked into Azure batch for breaking down large tasks on Azure as an alternative to setting up a cluster. More recently we have been looking into using CycleCloud to spin up a cluster on Azure. For now we just want an environment so that we can run this code: https://github.com/h3abionet/chipimputation/ and then be able to run an imputation service for H3A requests. To be able to run genome imputations (using data from the H3A genotype array data as well as a reference panel) on Azure. We should be able to upload our data to a datastore on Azure and run a Nextflow workflow to impute the per sample genome data. We should then be able to download the data and share it with the H3A researcher. 2018/04/24 2018/11/26 5.25
Setting up a portable 16S rDNA pipeline for CBIO Principal investigator : Prof Nicola Mulder Bioinformatics support team: Katie Lennard, Gerrit Botha CBIO Postdoc: Samson Kilaza We already have a functional pipeline for this [here](https://github.com/h3abionet/h3abionet16S/tree/master/workflows-nxf). There is however a few additional things that can be looked at to improve the pipeline. 1. Currently it runs successfully on Hex, but we can however fine tune the job control to be more sufficient in handling memory and cpus. 2. Other groups have requested to get the pipeline up and running. We have however run in some difficulties. E.g. different mounting points on their cluster makes it necessary to recompile the docker/singularity containers to access those. They also have other scheduling policies that needs to be taken account and needs modifications. We need to find a way to have a more generic way to share our containers / configs so that it is easier for them to adapt where necessary on their side. 3. We need to include continuous integration so that whenever we make changes to code / containers the pipeline is automatically runs again and outputs are checked against a known / true set. We can consider CircleCI, Travis CI or Jenkins. Travis CI is maybe our best bet to start with. 4. Non OTU picking methods such as DADA2 seems to be the choice of many research groups currently. Katie still has on her list to evaluate it more thoroughly. We have however seen on a small set the DADA2 vs current pipeline performed similar, but on a larger set that might not be the case. The Illinois H3ABioNet node is currently setting up a DADA2 pipeline in Nextflow which they would share once it is production ready. We may consider to adapt that to our needs and give it as an additional option for processing researchers data. 1. Fine tune config parameters on Hex to more efficiently handle memory and cpus. 2. Find a more generic way to distribute our config and container files. 3. Include continues integration in our complete development process. 4. Add DADA2 as an additional option to process 16S data. 2018/05/04 2018/05/17 73.0
Testing Redmine project features Principal investigator : Prof Nicola Mulder Bioinformatics support team: Gerrit Botha This is just a project where I will be testing all functionalities of a Redmine project. Gant charts, file upload, ticketing emails, permissions e.g. To continue testing features in Redmine. 2018/01/26 2018/05/14 3.0
Setting up a portable RNA-Seq pipeline for CBIO Principal investigator: Prof Nicola Mulder Bioinformatics support team: Katie Lennard, Gerrit Botha The plan is to setup a pipeline that does differential gene expression analysis on human RNA-Seq data. Once we have something running we can start focusing on setting up pipelines for other types of RNA-Seq projects (if needed). To setup human RNA-Seq analysis pipeline on UCT Hex, but keeping in mind that we want to have it portable to other platforms e.g. CHPC, CBIO, Wits, VMs/local machine. 2018/03/14 2018/05/14 49.6
Setting up a portable metagenome assembly pipeline for CBIO Principal investigator : Prof Nicola Mulder Bioinformatics support team: Katie Lennard, Gerrit Botha The plan is to setup a metagenome assembly pipeline pipeline. Taking raw metagenome sequence data, QC, assemble metagenomes, detect environmental composition, diversity and function. To setup metagenome assembly pipeline on UCT Hex, but keeping in mind that we want to have it portable to other platforms e.g. CHPC, CBIO, Wits, VMs/local machine. We can combine knowledge from courses we attended (Wits Course - Stanford, Ulas - Berkeley, Eric - Tromso). Katie has also started looking into setting this at the end of 2017 so maybe we can start from what she already has. 2018/03/26 2018/05/14 70.0
Microbial community profiling in host protection against helminth infections Principal investigator: Dr. Justin Komguep Nono Support request made by: Thabo Mpotje Bioinformatics support team: Katie Lennard, Gerrit Botha The aim is to determine the influence of host factors and microbial community in host protection against helminth infections. To identify protective biomarkers during helminth infection 2018/04/19 2018/05/14 39.0
BST ad-hoc requests Principal investigator: Prof. Nicola Mulder, A/Prof. Nicki Tiffin Bioinformatics support team: Katie Lennard, Gerrit Botha, Suresh Maslamoney, Ayton Meintjes, Jon Ambler Ad-hoc requests from researchers should go in here. It can contain questions that needs clarification, assistance with scripts, helping with IT support or giving a hand with data transfers. Each ad-hoc requests should be considered as an issue, documented and time recorded. There requests should not take more than 3 hours of your time, if it does we should consider making it a stand-alone project. To record support given to ad-hoc requests. 2018/04/19 2018/05/14 236.50
16S rDNA analysis of mixed microbial cultures particularly in wastewater bioremediation Principal investigator: Dr Robert Huddy Support request made by: Tomas Hessler Bioinformatics support team: Katie Lennard, Gerrit Botha Our research group studies mixed microbial cultures particularly in wastewater bioremediation. We collaborate with a group at California Berkeley who perform whole-genome analysis. However, we have recently started 16S rRNA amplicon sequencing by Illumina MiSeq sequencing and plan on doing this for a number of future projects. Sequencing will be done on the V3-V4 region 16S rRNA. Illumina MiSeq. Approximately 11 GB (raw reads and processed). The group has the the following questions: * what training you could provide for metagenome analysis (QIIME1,2/galaxy/PICRUSt?) * What duration and fees would this likely require? (2-3 people) * We are looking to purchase a desktop for this work, what specs you would recommend? They also mentioned that the need help in * OTU picking and classification via gg and Qiime. PICRUSt metagenome prediction. For now the support team will share info / train Tomas in getting up and running with 16S rDNA analysis and interpretation. There is a possibility that this might lead into a long term collaboration with the group. All information above has been taken from Tomas's original support request [here](http://bst.cbio.uct.ac.za/redmine/attachments/17/Tomas_Hessler_CEBER_20180217_CBIO_bioinformatics_service_questionnair.docx). 2018/03/07 2018/05/14 1.0
S. pneumonia resistance gene analysis Principal investigator : Prof Mark Nicol Bioinformatics support team: Katie Lennard Exploring the strain-level pneumococcal population structure in the nasopharynx of South African infants during the first year of life. Also trying to determine what is the composition of the nasopharyngeal resistome in infants in early-life? To determine S. pneumonia gene resistance 2018/04/09 2018/05/14 0.0