Project

General

Profile

Overview

This project is closed and read-only.

Project title
Setting up a portable 16S rDNA pipeline for CBIO

People involved
Principal investigator : Prof Nicola Mulder
Bioinformatics support team: Katie Lennard, Gerrit Botha
CBIO Postdoc: Samson Kilaza

Short description of project
We already have a functional pipeline for this here. There is however a few additional things that can be looked at to improve the pipeline.

  1. Currently it runs successfully on Hex, but we can however fine tune the job control to be more sufficient in handling memory and cpus.
  2. Other groups have requested to get the pipeline up and running. We have however run in some difficulties. E.g. different mounting points on their cluster makes it necessary to recompile the docker/singularity containers to access those. They also have other scheduling policies that needs to be taken account and needs modifications. We need to find a way to have a more generic way to share our containers / configs so that it is easier for them to adapt where necessary on their side.
  3. We need to include continuous integration so that whenever we make changes to code / containers the pipeline is automatically runs again and outputs are checked against a known / true set. We can consider CircleCI, Travis CI or Jenkins. Travis CI is maybe our best bet to start with.
  4. Non OTU picking methods such as DADA2 seems to be the choice of many research groups currently. Katie still has on her list to evaluate it more thoroughly. We have however seen on a small set the DADA2 vs current pipeline performed similar, but on a larger set that might not be the case. The Illinois H3ABioNet node is currently setting up a DADA2 pipeline in Nextflow which they would share once it is production ready. We may consider to adapt that to our needs and give it as an additional option for processing researchers data.

Objectives

  1. Fine tune config parameters on Hex to more efficiently handle memory and cpus.
  2. Find a more generic way to distribute our config and container files.
  3. Include continues integration in our complete development process.
  4. Add DADA2 as an additional option to process 16S data.

Issue tracking

open closed Total
Bug 0 0 0
Feature 3 0 3
Support 0 0 0

View all issues | Summary | Calendar | Gantt

Spent time

73.00 hours

Details | Report

Members

Developer: Gerrit Botha, Katie Lennard, Samson Kilaza

Reporter: Nicky Mulder