Overview
This project is closed and read-only.
Project title
Testing human WGS alignment, calling and joint calling on DRAGEN hardware
People involved
Principal investigator : Prof Nicola Mulder
Bioinformatics support team: Jon Ambler
Short description of project
There has been a request from Nicky to test the performance of the DRAGEN hardware on the pipelines that we used to process whole genome sequencing data for the design of the H3Africa genotype array.
The DRAGEN hardware cost R250K
We would consider purchasing the hardware to cater for such large projects in the future if
- The performance of the hardware outperforms any of the resources we have locally or at another compute facility (e.g CHPC, Wits)
- The power consumption of the hardware is still in range to fit into the current threshold we have at the IDM data centre.
- Getting huge amounts of human WGS data in and out of the hardware does not cause a bottleneck makes the use of the hardware for these purposes unrealistic.
Objectives
The testing process would consist of the following.
- We get this pipeline up and running: and test using a GiaB sample (complete genome or just chr22).
- We check how easy it is to add additional steps into the worklow in (1), so that it corresponds to the H3A chipdesign workflow.
- We test this pipeline . We assume the workflow in (1) is part of this pipeline so we just need to replace that part with the workflow in (2). Here we can use a few 1KG samples to test.
Other things to consider but we can figure it out as we go
1) We need to understand the pipeline for getting data in and out of the hardware. E.g. can we attach external storage and do processing on there or do we need to move data to local storage so that processing is optimal. I'm quite sure we would need to work on a dataflow pipeline if we plan to process several WGS samples.
2) I'm not sure if the hybrid cloud model will work for WGS pipelines except if the processing in the cloud is being done on smaller files. We also need to take into consideration the security model if working with human data and pushing the data to the cloud.
3) We also need to understand their pricing model. If we do not opt for the cloud we probably do not need to pay for the giga base pair and can we just purchase the hardware.
CPGR has recently purchased DRAGEN hardware and they have agreed that we can do the testing on their equipment.
Members
Reporter: Ayton Meintjes, Gerrit Botha, Nicky Mulder