Commit 9fd10b17 authored by Audrey BIHOUEE's avatar Audrey BIHOUEE
Browse files

from 3 to 2 config files

parent 7dc56131
......@@ -25,16 +25,28 @@ The species specific resources files have to be downloaded manually if not human
### Input files
Three json configuration files are necessary to run the pipeline:
Two json configuration files are necessary to run the pipeline:
- `config.json`: contains the parameters of the analysis
| | |
| :--- | :--- |
| outdir | output file directory |
| reference | name, description, FASTA, GTF files, STAR index path directory and Biomart reference of the reference genome |
| replicates | description of the samples design |
| prinseq-meanquality | mean read quality threshold in prinseq (phred score) |
| cutadapt-forward | sequence of the forward adapter |
| cutadapt-reverse | sequence of the reverse adapter |
| library-type | strand parameter for HTseq - depend on the used library kit |
| read-length | length of the read |
| align-cpu | number of cpu used for the STAR aligner |
- `project.json`: contains the description of your samples (name, path of the fastq files etc.). This file is generated by [illuminadir.jar](http://lindenb.github.io/jvarkit/IlluminaDirectory.html) :
```
$ find shortdata -type f -name "*.fastq.gz" | java -jar scripts/illuminadir.jar -J > project.json
$ find ./shortfastq -type f -name "*.fastq.gz" | java -jar scripts/illuminadir.jar -J > CONFIG/project.json
```
- `references.json`: contains the path of the necessary resources files (genome fasta file, gtf,biomart url etc...)
Samples for these two files are provided in the **CONFIG** directory at the root of the git project.
......@@ -50,10 +62,10 @@ $ cd RNAseq_quantif_pipeline
> - Replace \<login-univ\> by your login "univ nantes" to access gitlab
:two: Create a conda environment using the commands below and activate it
:two: Create a conda environment to use snakemake and activate it
```
conda create --name <env> --file CONDA/rnaseq_quantif.txt
conda create --name <env> snakemake -c bioconda
source activate <env>
```
> **Note:**
......@@ -63,19 +75,61 @@ source activate <env>
> - If you have already created an environment for this pipeline, just `source activate` it.
:three: Run pipeline
Test the launch with a dry run:
```
snakemake -s Snakefile_RNAseq_quantif_[STAR].py --config projf="CONFIG/project.json" reff="CONFIG/references.json" conff="CONFIG/config.json" -rp
snakemake -s Snakefile --config projf="CONFIG/project.json" reff="CONFIG/references.json" conff="CONFIG/config.json" --use-conda -rpn
```
If everything's fine, launch the run:
```
snakemake -s Snakefile --config projf="CONFIG/project.json" reff="CONFIG/references.json" conff="CONFIG/config.json" --use-conda -rp
```
:four: Run pipeline in cluster mode
```
snakemake -s Snakefile_RNAseq_quantif_[STAR].py --config reff="CONFIG/references.json" projf="CONFIG/project.json" conff="CONFIG/config.json" --cluster "qsub -e ./log/ -o ./log/" -j 50 --jobscript launch_pipeline.sh --latency-wait 100 -rp
snakemake -s Snakefile --config reff="CONFIG/references.json" projf="CONFIG/project.json" conff="CONFIG/config.json" --use-conda --cluster "qsub -e ./log/ -o ./log/" -j <N> --jobscript launch_pipeline.sh --latency-wait 100 -rp
```
> **Note:**
> - You can specify the number of jobs with -j <N>.
> - The path to the log output files must **exist** (`$ mkdir ./log`)
# Running the pipeline on your data
...TODO
:one: Clone this repository and move to it
```
$ git clone "https://<login-univ>@gitlab.univ-nantes.fr/bird_pipeline_registry/RNAseq_quantif_pipeline.git"
$ cd RNAseq_quantif_pipeline
```
> **Note:**
> - Replace \<login-univ\> by your login "univ nantes" to access gitlab
:two: Create a conda environment to use snakemake and activate it
```
conda create --name <env> snakemake -c bioconda
source activate <env>
```
> **Note:**
> - Replace \<env\> by the name you want to give to the environment (ex:'rnaseq').
> - The channel bioconda listing a lot of tools for bioinformatics will be used. If you want to permanently add it to your '.condarc' file: `conda config --add channels bioconda`.
> - If you have already created an environment for this pipeline, just `source activate` it.
:three: Create the CONFIG/project.json file
```
find <path/to/fastq> -type f -name "*.fastq.gz"| java -jar scripts/illuminadir.jar -J > CONFIG/project.json
```
> **Note:**
> - Replace \<path/to/fastq\> by the path(s) where are the fastq files.
> - You can vizualise your project.json with json.tool
```
cat CONFIG/project.json | python -m json.tool |more
```
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment