Commit 8f04370f authored by Eric CHARPENTIER's avatar Eric CHARPENTIER 🐍
Browse files

updated readme. Running pipe on test data on one core without cluster mode....

updated readme. Running pipe on test data on one core without cluster mode. Added new environement withou R packages compilation. Removed install_dependencies
parent 20ed0f19
#!/usr/bin/env Rscript
packageList <- c(
"circlize",
"clusterProfiler",
"ComplexHeatmap",
"DESeq2",
"DOSE",
"dplyr",
"fdrtool",
"fgsea",
"ggplot2",
"ggrepel",
"GO.db",
"grid",
"GSEABase",
"limma",
"pvclust",
"biomaRt",
"gplots"
)
packagesToInstall <- setdiff(packageList, installed.packages()[,"Package"])
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos = "https://cloud.r-project.org")
BiocManager::install(packagesToInstall, updates = TRUE)
......@@ -8,15 +8,15 @@ dependencies:
- bcftools=1.9
- cutadapt=1.18
- fastqc=0.11.8
- htseq=0.9.1
- htseq=0.12.4
- multiqc=1.6
- numpy=1.14
- pandas=0.23.4
- prinseq=0.20.4
- pysam=0.15.1
- pysam=0.15
- samtools=1.9
- snakemake-minimal>=5.2
- star>=2.6.1b
- star=2.7.3a
- openjdk=8.0
- simplejson
- urllib3
......@@ -24,4 +24,24 @@ dependencies:
- wget
- curl
- r-base=3.6
- r-xml
\ No newline at end of file
- r-xml
- r-curl
- r-rcpp
- r-openssl
- r-ggplot2
- r-optparse
- r-gplots
- r-fdrtool
- r-pvclust
- bioconductor-deseq2
- bioconductor-limma
- bioconductor-go.db
- bioconductor-gseabase
- bioconductor-clusterprofiler
- bioconductor-dose
- bioconductor-fgsea
- bioconductor-complexheatmap
- bioconductor-biomart
- r-rcolorbrewer
- r-tidyverse
- r-ggiraph
......@@ -6,7 +6,7 @@
"cutadapt-reverse":"AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT",
"library-type":"reverse",
"read-length":"100",
"align-cpu":"10",
"align-cpu":"1",
"reference":
{
"name":"Ensembl_GRCh37",
......@@ -14,7 +14,7 @@
"STARindexDir":"CONFIG/genome",
"fasta":"CONFIG/genome/human_g1k_v37.chr22.fasta",
"gtf":"CONFIG/genome/chr22.gff",
"biomart":"feb2014.archive.ensembl.org,ENSEMBL_MART_ENSEMBL,hsapiens_gene_ensembl"
"biomart":"37,hsapiens_gene_ensembl"
},
"samplesCondition": [
{
......@@ -57,4 +57,4 @@
"condition2": "CT"
}
}
}
\ No newline at end of file
}
......@@ -21,10 +21,9 @@ The main steps of the pipeline are:
### System requirements
The only requirement is to have a working install of [conda](https://www.anaconda.com/download/#linux) and [git](https://git-scm.com/downloads).
All tools necessary to run the pipeline are described in two conda environment files.
The species specific resources files have to be downloaded manually if not human.
The species specific resources files (reference genome, gtf) have to be downloaded manually if not human.
### Cloning the repository
### Cloning the repository
```bash
git clone "https://gitlab.univ-nantes.fr/bird_pipeline_registry/RNAseq_quantif_pipeline.git"
......@@ -33,16 +32,16 @@ cd RNAseq_quantif_pipeline
### Install dependencies
In order to generate the two conda environments needed to run the pipeline, execute the installer:
All tools required to run the pipeline are specified in a CONDA recipe. In order to build the environment:
```bash
./install_dependencies.sh
conda env create -n rna -f CONDA/rna.yml
```
Activate the main environment to prepare the input files:
```bash
source activate rna
conda activate rna
```
### Creating the input files
......@@ -125,19 +124,20 @@ If you can visualize all the step that the pipeline will process, everything's f
You can launch the pipeline on a cluster (SGE) with:
```bash
snakemake --config proj="project.json" conf="config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 50 --jobscript sge.sh --latency-wait 100 -rp
snakemake --config proj="project.json" conf="config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 30 --jobscript sge.sh --latency-wait 100 -rp --resources parallel_star=3
```
> **Note:**
> - Use the "-j" option (number of parallel jobs to run) appropriately
> - If your cluster is not managed by SGE, you will have to build a wrapper for the jobs and specify it with the option "--jobscript".
> - In order to avoid filling all the available RAM, you can specify `--resources parallel_star=X` where `X` is the number of parallel STAR alignment you want to run. Alors note that each STAR job uses 10 threads by default.
# Quick launch guide on provided test data
```bash
git clone "https://gitlab.univ-nantes.fr/bird_pipeline_registry/RNAseq_quantif_pipeline.git"
cd RNAseq_quantif_pipeline
./install_dependencies.sh
source activate rna
snakemake --config proj="CONFIG/project.json" conf="CONFIG/config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 30 --jobscript sge.sh --latency-wait 100 -rp
conda env create -n rna -f CONDA/rna.yml
conda activate rna
snakemake --config proj="CONFIG/project.json" conf="CONFIG/config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 1 --jobscript sge.sh --latency-wait 100 -rp
```
#!/bin/bash
# Stop on error
set -e
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"
MAIN_ENV_NAME=rna
ENVS=$(conda env list | awk '{print $1}' )
FOUND=1
for ENV in ${ENVS}
do
if [ "${ENV}" == "${MAIN_ENV_NAME}" ]; then
FOUND=0
fi
done
# Creation of main conda environment.
if [ ${FOUND} -eq 0 ]; then
echo "${MAIN_ENV_NAME} already created"
else
echo "Creating env ${MAIN_ENV_NAME}"
conda env create -n ${MAIN_ENV_NAME} -f ${DIR}/CONDA/rna.yml
source activate ${MAIN_ENV_NAME}
Rscript ${DIR}/CONDA/installDeEnv.R
fi
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment