Project

General

Profile

Support #25

Updated by Katie Lennard over 7 years ago

Pertinent points for setup of NGI-RNAseq pipeline on UCT hex
*Main pipeline *Pipeline source code is at https://github.com/SciLifeLab/NGI-RNAseq
*Currently used pipeline source code however is at https://github.com/ewels/nf-core-RNAseq (this was kindly customized for us by the authors for easy configuration on hex and includes a config file 'uct_hex.config') so that this 'profile' can be called as a flag on the command line (further customization may be required following testing).
*Additional overview on NGI-RNAseq pipeline at https://scilifelab.github.io/courses/rnaseq/1711/slides/pipeline.pdf
*Software requirements will be met using Singularity - the image has been downloaded and stored here /scratch/DB/bio/singularity-containers/ngi-rnaseq.img using the command: singularity pull --name ngi-rnaseq.img docker://scilifelab/ngi-rnaseq
Note that the singularity image path has been specified in the aforementioned uct_hex.config file so no need to specify on job submission.

* First test: nextflow run SciLifeLab/NGI-RNAseq --help | ewels/nf-core-RNAseq
* Reference genomes and annotation files should be placed in /scratch/DB/bio/rna-seq (iGenomes (will pull iGenomes GRCh37 has been pulled to /scratch/DB/bio/rna-seq/references/ from https://ewels.github.io/AWS-iGenomes/) and this location here)
* For reproducibility please specify the pipeline version used when running the pipeline using the -r flag (e.g. –r 1.3.1)
* If using Illumina's iGenomes for reference genomes as recommended the following line should be included in the nextflow.config file includeConfig '/path/to/NGI-RNAseq/conf/igenomes.config' The igenomes.config can be used as
is referenced if we download an iGenomes reference genome, but need to specify path in our custom uct_hex.config file under the nextflow.config by creating a $params.igenomes_base parameter igenomes_base = '/scratch/DB/bio/rna-seq/references'
> The relevant ENSEMBL iGenomes GRCh37 files (as listed in the igenomes.config file) were eventually downloaded from https://ewels.github.io/AWS-iGenomes/ after communicating with the author who confirmed that the necessary files could not be found at the main Illumina igenomes website. In order to download /scratch/DB/bio/rna-seq/references/ from https://ewels.github.io/AWS-iGenomes/ these files Andrew had to install aws tools on hex, which should be loaded as follows:
module load python/anaconda-python-2.7

aws configure
> You may then be prompted for a key and a security key (you need to register an aws account to get this, which is free but you still need to specify credit card details – see https://console.aws.amazon.com)
> The location of these files (listed in the igenomes.config file https://github.com/SciLifeLab/NGI-RNAseq/blob/master/conf/igenomes.config) should be specified in the nextflow.config file under $params.igenomes_base (in this case its igenomes_base=’/scratch/DB/bio/rna-seq/references’ under the param{} section – see point 3. below)


* For reproducibility please specify The nextflow.config file also needs to be configured to our setup on hex. The following was included in the pipeline version used when running nextflow.config file to configure to hex:
1. docker.enabled = false
singularity.enabled = true
singularity.cacheDir = "/scratch/DB/bio/singularity-containers"
2. profiles{
standard {
process.executor = 'local'
}

hex {
//The next 3 lines includeConfig from
the NGI-RNAseq nextflow.config not sure if they need to be changed
includeConfig 'conf/base.config'
//Think the singularity.config line can be excluded see https://github.com/SciLifeLab/NGI-RNAseq/blob/master/conf/singularity.config
includeConfig 'conf/singularity.config'
includeConfig 'conf/igenomes.config'
//The remaining lines are from Gerrit's nextflow.config for 16S
pipeline using the -r flag (e.g. –r 1.3.1)
process.executor = 'pbs'
process.queue = 'UCTlong'
process.clusterOptions = '-M katie.viljoen@uct.ac.za -m abe -l nodes=1:ppn=1:series600'
}
}
3. Specify where iGenomes reference to be found (and other parameters can be defined here):
params {

igenomes_base = '/scratch/DB/bio/rna-seq/references'
clusterOptions = false
outDir = "/researchdata/fhgfs/katie/NGI-RNAseq-test/nextflow-output"


}

* The basic run will look something like this:
nextflow run ewels/nf-core-RNAseq SciLifeLab/NGI-RNAseq -with-singularity /scratch/DB/bio/singularity-containers/ngi-rnaseq.img --reads '/researchdata/fhgfs/katie/NGI-RNAseq-test/*_R{1,2}.fastq.gz' '*_R{1,2}.fastq.gz' --genome GRCh37 --outdir /researchdata/fhgfs/katie/NGI-RNAseq-test/nextflow-output -profile uct_hex --email katie.viljoen@uct.ac.za

hex
* Human RNAseq test data to be used: http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/ (downloaded to /researchdata/fhgfs/katie/NGI-RNAseq-test)


* First test run:


> * qsub -I -q UCTlong -d `pwd`
>nextflow > * nextflow run ewels/nf-core-RNAseq SciLifeLab/NGI-RNAseq -with-singularity /scratch/DB/bio/singularity-containers/ngi-rnaseq.img --reads '/researchdata/fhgfs/katie/NGI-RNAseq-test/*_R{1,2}.fastq.gz' --genome GRCh37 --outdir /researchdata/fhgfs/katie/NGI-RNAseq-test/nextflow-output -profile uct_hex --email katie.viljoen@uct.ac.za

hex

Back