Project

General

Profile

Support #25

Updated by Katie Lennard about 7 years ago

Pertinent points for setup of NGI-RNAseq pipeline on UCT hex
*Pipeline source code is at https://github.com/SciLifeLab/NGI-RNAseq
*Additional overview on NGI-RNAseq pipeline at https://scilifelab.github.io/courses/rnaseq/1711/slides/pipeline.pdf
*Software requirements will be met using Singularity - the image has been downloaded and stored here /scratch/DB/bio/singularity-containers/ngi-rnaseq.img using the command: singularity pull --name ngi-rnaseq.img docker://scilifelab/ngi-rnaseq
* First test: nextflow run SciLifeLab/NGI-RNAseq --help
* Reference genomes and annotation files should be placed in /scratch/DB/bio/rna-seq (will pull iGenomes GRCh37 to here)
* For reproducibility please specify the pipeline version used when running the pipeline using the -r flag (e.g. –r 1.3.1)
* If using Illumina's iGenomes for reference genomes as recommended the following line should be included in the nextflow.config file includeConfig '/path/to/NGI-RNAseq/conf/igenomes.config' The igenomes.config can be used as is if we download an iGenomes reference genome, but need to specify path in nextflow.config by creating a $params.igenomes_base parameter
> The relevant ENSEMBL iGenomes GRCh37 files (as listed in the igenomes.config file) were eventually downloaded from https://ewels.github.io/AWS-iGenomes/ after communicating with the author who confirmed that the necessary files could not be found at the main Illumina igenomes website. In order to download these files Andrew had to install aws tools on hex, which should be loaded as follows:
module load python/anaconda-python-2.7
aws configure
> You may then be prompted for a key and a security key (you need to register an aws account to get this, which is free but you still need to specify credit card details – see https://console.aws.amazon.com)
> The location of these files (listed in the igenomes.config file https://github.com/SciLifeLab/NGI-RNAseq/blob/master/conf/igenomes.config) should be specified in the nextflow.config file under $params.igenomes_base (in this case its igenomes_base=’/scratch/DB/bio/rna-seq/references’ under the param{} section – see point 3. below)

* The nextflow.config file also needs to be configured to our setup on hex. The following was included in the nextflow.config file to configure to hex:
1. docker.enabled = false
singularity.enabled = true
singularity.cacheDir = "/scratch/DB/bio/singularity-containers"
2. profiles{
standard {
process.executor = 'local'
}

hex {
//The next 3 lines includeConfig from the NGI-RNAseq nextflow.config not sure if they need to be changed
includeConfig 'conf/base.config'
//Think the singularity.config line can be excluded see https://github.com/SciLifeLab/NGI-RNAseq/blob/master/conf/singularity.config
includeConfig 'conf/singularity.config'
includeConfig 'conf/igenomes.config'
//The remaining lines are from Gerrit's nextflow.config for 16S pipeline
process.executor = 'pbs'
process.queue = 'UCTlong'
process.clusterOptions = '-M katie.viljoen@uct.ac.za -m abe -l nodes=1:ppn=1:series600'
}
}
3. Specify where iGenomes reference to be found (and other parameters can be defined here):
params {

igenomes_base = '/scratch/DB/bio/rna-seq/references' '/scratch/DB/bio/rna-seq/'
clusterOptions = false
outDir = "/researchdata/fhgfs/katie/NGI-RNAseq-test/nextflow-output"

}

* The basic run will look something like this:
nextflow run SciLifeLab/NGI-RNAseq -with-singularity /scratch/DB/bio/singularity-containers/ngi-rnaseq.img --reads '*_R{1,2}.fastq.gz' --genome GRCh37 -profile hex
* Human RNAseq test data to be used: http://h3data.cbio.uct.ac.za/assessments/RNASeq/practice/ (downloaded to /researchdata/fhgfs/katie/NGI-RNAseq-test)
* First test run:
> * qsub -I -q UCTlong -d `pwd`
> * nextflow run SciLifeLab/NGI-RNAseq -with-singularity /scratch/DB/bio/singularity-containers/ngi-rnaseq.img --reads '/researchdata/fhgfs/katie/NGI-RNAseq-test/*_R{1,2}.fastq.gz' --genome GRCh37 --outdir /researchdata/fhgfs/katie/NGI-RNAseq-test/nextflow-output -profile hex

Back