|
|
[*Back to home*](Home)
|
|
|
|
|
|
# Analyzing multiple plates
|
|
|
|
|
|
In order to analyse multiple plates, you need to launch the pipeline (at least the primary analysis, no need for the secondary analysis) on **every single plate** first (see above). A batch correction is performed in order to take into account the effect of the different experiments on the data.
|
|
|
|
|
|
## The input files
|
|
|
|
|
|
### The SRP analysis folders
|
|
|
|
|
|
In order to run this pipeline, you need to have analyzed at least 2 plates. The output folders of the SRP pipeline will be the input folders of the multiplates pipeline.
|
|
|
|
|
|
The folders that should be specified are the "project" folders automatically created in the SRP pipeline.
|
|
|
|
|
|
For example:
|
|
|
If you run the SRP pipeline with a samplesheet specifying "ProjectX" in the 4th column:
|
|
|
|
|
|
| | | | | | | |
|
|
|
| :--- | :---- | :---- | :--- | :---- | :---- | :--- |
|
|
|
| A01 | AAAACT | KR_1 | ProjectX | ctl | hg19 | factor1 |
|
|
|
| A02 | AAAGTT | KR_2 | ProjectX | ctl | hg19 | factor2 |
|
|
|
| A03 | AAATTG | KR_3 | ProjectX | case | hg19 | factor1 |
|
|
|
| ... | ... | ... | ... | ... | ... | ... |
|
|
|
|
|
|
and used the option `-w` specifying `/path/to/SRP-output`
|
|
|
then the project folder is `/path/to/SRP-output/ProjectX`
|
|
|
|
|
|
### The comparisons file
|
|
|
If you want to perform secondary analysis for the project, you have to create a file listing the comparisons.
|
|
|
|
|
|
:page_facing_up: Create a file listing the comparisons to perform
|
|
|
example:
|
|
|
|
|
|
| | |
|
|
|
| :--- | :--- |
|
|
|
| condition1 | condition2 |
|
|
|
| condition3 | condition4 |
|
|
|
| conditionX | conditionY |
|
|
|
| conditionZ | conditionZ |
|
|
|
|
|
|
> **Note:**
|
|
|
|
|
|
> - There should be **no empty lines**.
|
|
|
> - There should be **no header**.
|
|
|
> - You can specify the same condition in column 2 and 3 to perform only the first part of the secondary analysis (all but comparisons). If you do so, make sure that there is only one line in your file.
|
|
|
> - The first condition column is the test and the second is the control
|
|
|
|
|
|
## Running the pipeline
|
|
|
|
|
|
:one:
|
|
|
Clone this repository and move to it
|
|
|
|
|
|
```
|
|
|
git clone "https://gitlab.univ-nantes.fr/bird_pipeline_registry/srp-pipeline.git"
|
|
|
cd srp-pipeline
|
|
|
```
|
|
|
:two:
|
|
|
Create the conda environments (if not already created) using the commands below and activate the main environment
|
|
|
|
|
|
```
|
|
|
conda env create -n srp -f CONDA/srp.yml
|
|
|
conda activate srp
|
|
|
```
|
|
|
> **Note:**
|
|
|
|
|
|
> - If you have already created the environment for this pipeline, just `conda activate srp`.
|
|
|
|
|
|
:three:
|
|
|
Create the configuration file necessary for the pipeline.
|
|
|
The script used to create the configuration file is `make_multiplates_config.py` in the `SCRIPTS` folder.
|
|
|
You can visualize the help with:
|
|
|
```
|
|
|
python SCRIPTS/make_multiplates_config.py -h
|
|
|
```
|
|
|
The only mandatory option is `-p` for the output of the single plate analysis folders. You must specify at least 2 folders (2 plates) behind the `-p` option.
|
|
|
It is recommended that you specify a working directory where the files will be output with option `-w` if you want to keep the srp-pipeline git clone clean.
|
|
|
It is also recommended that you specify an analysis name with the `-n` option. The specified named will be the prefix of many output files.
|
|
|
The program outputs the config file on stdout. In the first time, you can try the command to see if everything is alright and in the second time, redirect the output to a file.
|
|
|
|
|
|
```
|
|
|
python SCRIPTS/make_multiplates_config.py -w <path_to_workdir> -n <project_name> -p <path/to/plate1/analysisFolder> <path/to/plate2/analysisFolder> > config.json
|
|
|
```
|
|
|
|
|
|
If you want secondary analysis, use option `-c` to specify the comparisons to perform.
|
|
|
|
|
|
```
|
|
|
python SCRIPTS/make_multiplates_config.py -w <path_to_workdir> -n <project_name> -p <path/to/plate1/analysisFolder> <path/to/plate2/analysisFolder> -c <comparisons_file> > config.json
|
|
|
```
|
|
|
|
|
|
In every case, check the generated configuration file to see if everything seems ok.
|
|
|
```
|
|
|
more config.json
|
|
|
```
|
|
|
:four:
|
|
|
Launch the snakemake pipeline.
|
|
|
|
|
|
Test the launch with a dry run:
|
|
|
```
|
|
|
snakemake -s multiplates.smk -nrp --config conf="config.json"
|
|
|
```
|
|
|
If you see the rules and commands that will be run, everything's fine.
|
|
|
Launch the run:
|
|
|
```
|
|
|
snakemake -rp --config conf="config.json" -j 1
|
|
|
```
|
|
|
> **Note:**
|
|
|
|
|
|
> - You can specify the number of jobs with `-j <N>`.
|
|
|
> - :warning: Beware that even if you don't specify multiple jobs, two scripts in the pipeline are still parallelized.
|
|
|
> - :warning: Never launch the pipeline on a computer that doesn't contain at least 16 cores.
|
|
|
|
|
|
:white_check_mark: If you want to launch the pipeline on a cluster, you have to specify a script to encapsulate the jobs to snakemake.
|
|
|
example for SGE:
|
|
|
```
|
|
|
snakemake -s multiplates.smk -rp --config conf="config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 10 --jobscript SCRIPTS/sge.sh --latency-wait 100
|
|
|
```
|
|
|
> **Note:**
|
|
|
|
|
|
> - The path to the log output files must **exist** (`$ mkdir ./logs`).
|
|
|
|
|
|
|
|
|
|
|
|
<div align="right">
|
|
|
|
|
|
<i><a href="Home">Back to Home</a></i>
|
|
|
</div> |
|
|
\ No newline at end of file |