Eric CHARPENTIER · e80c3e91
--- a/usage/reanalyzing.md
+++ b/usage/reanalyzing.md
+[*Back to home*](Home)
+
+# Re-analyzing data
+
+**Snakemake** is able to re-analyze data based on already generated results.  
+If you have been provided a zip or tar archive with analyzed data, you can re-analyze it without having the original [input file](usage/inputs).
+
+## The configuration file
+
+In order to create the configuration file needed to run the snakemake pipeline, you need a [samplesheet](usage/inputs#samplesheet) and eventually a [comparisons file](usage/inputs#compFile) if you need to perform a secondary and tertiary analysis.
+
+### The samplesheet (and comparisons file)
+
+You can either
+ - **create the samplesheet from scratch** making sure the name of the samples correspond to the ones found in the `CUTADAPT` folder of your previously analyzed data and that the project (column 4) is the name of the folder of this data.
+ - **generate the samplesheet from a previous configuration file** by using the script `SCRIPTS/config2inputs.py`.
+ - **use the samplesheet provided in the "INPUT_FILES" folder** of your previously analyzed data.
+
+ ### Creating the configuration file
+
+Without the original raw fastq files (not demultiplexed files), you need to use the `-a` option of the `SCRIPTS/make_srp_config.py` script.  
+You also need to make sure you define the output directory with the `-w` argument as the folder **containing** your previously analyzed folder.  
+For example, if the directory structure is like:
+
+```sh
+  📦MYPROJECT                     # main output folder
+  ┣  📂NTS-XXX                    # project folder (column 4 in samplesheet)
+  ┃  ┣  📂FASTQ                   # temporary folder for fastq files
+  ┃  ┣  📂FASTQC                  # FastQC results
+  ┃  ┣  📂CUTADAPT                # fastq files after cutadapt
+  ┃  ┣  📂MULTIQC                 # multiQC results
+  ┃  ┣  📂ALIGNMENT               # bam files after bwa alignment
+  ┃  ┣  📂EXPRESSION              # primary analysis results
+  ┃  ┣  📂DE                      # secondary analysis results
+  ┃  ┣  📂INPUT_FILES             # input files used in analysis
+  ┃  ┣  📂REPORT                  # necessary files for report (js, css, etc...)
+  ┃  ┗  📜report.html             #### MAIN REPORT FOR PROJECT 
+```
+then you must specify `MYPROJECT` with the `-w` option and `NTS-XXX` in the 4th column of your samplesheet. You may have multiple project folder.
+
+Example:
+
+```sh
+python SCRIPTS/make_srp_config.py -s <my_samplesheet> -r <path_to_reference_folder> -w <path_to_workdir> -c <comparisons_file> -a > config.json
+```
+Test your configuration with a dry run:
+
+```sh
+snakemake -nrp --config conf="config.json"
+```
+If everything is fine, the pipeline **SHOULD NOT** run the `split_fastq` rule as it should find the already created `XXX.fastq.gz` in the `CUTADAPT` directory of your previously analyzed data. If this is not the case, have a look at the reasons why snakemake wants to create these files again by looking at the output of the dry run.
+
+<div align="right">
+
+<i><a href="Home">Back to Home</a></i>
+</div>
\ No newline at end of file