|
|
[*Back to home*](Home)
|
|
|
|
|
|
# Re-analyzing data
|
|
|
|
|
|
**Snakemake** is able to re-analyze data based on already generated results.
|
|
|
If you have been provided a zip or tar archive with analyzed data, you can re-analyze it without having the original [input file](usage/inputs).
|
|
|
|
|
|
## The configuration file
|
|
|
|
|
|
In order to create the configuration file needed to run the snakemake pipeline, you need a [samplesheet](usage/inputs#samplesheet) and eventually a [comparisons file](usage/inputs#compFile) if you need to perform a secondary and tertiary analysis.
|
|
|
|
|
|
### The samplesheet (and comparisons file)
|
|
|
|
|
|
You can either
|
|
|
- **create the samplesheet from scratch** making sure the name of the samples correspond to the ones found in the `CUTADAPT` folder of your previously analyzed data and that the project (column 4) is the name of the folder of this data.
|
|
|
- **generate the samplesheet from a previous configuration file** by using the script `SCRIPTS/config2inputs.py`.
|
|
|
- **use the samplesheet provided in the "INPUT_FILES" folder** of your previously analyzed data.
|
|
|
|
|
|
### Creating the configuration file
|
|
|
|
|
|
Without the original raw fastq files (not demultiplexed files), you need to use the `-a` option of the `SCRIPTS/make_srp_config.py` script.
|
|
|
You also need to make sure you define the output directory with the `-w` argument as the folder **containing** your previously analyzed folder.
|
|
|
For example, if the directory structure is like:
|
|
|
|
|
|
```sh
|
|
|
📦MYPROJECT # main output folder
|
|
|
┣ 📂NTS-XXX # project folder (column 4 in samplesheet)
|
|
|
┃ ┣ 📂FASTQ # temporary folder for fastq files
|
|
|
┃ ┣ 📂FASTQC # FastQC results
|
|
|
┃ ┣ 📂CUTADAPT # fastq files after cutadapt
|
|
|
┃ ┣ 📂MULTIQC # multiQC results
|
|
|
┃ ┣ 📂ALIGNMENT # bam files after bwa alignment
|
|
|
┃ ┣ 📂EXPRESSION # primary analysis results
|
|
|
┃ ┣ 📂DE # secondary analysis results
|
|
|
┃ ┣ 📂INPUT_FILES # input files used in analysis
|
|
|
┃ ┣ 📂REPORT # necessary files for report (js, css, etc...)
|
|
|
┃ ┗ 📜report.html #### MAIN REPORT FOR PROJECT
|
|
|
```
|
|
|
then you must specify `MYPROJECT` with the `-w` option and `NTS-XXX` in the 4th column of your samplesheet. You may have multiple project folder.
|
|
|
|
|
|
Example:
|
|
|
|
|
|
```sh
|
|
|
python SCRIPTS/make_srp_config.py -s <my_samplesheet> -r <path_to_reference_folder> -w <path_to_workdir> -c <comparisons_file> -a > config.json
|
|
|
```
|
|
|
Test your configuration with a dry run:
|
|
|
|
|
|
```sh
|
|
|
snakemake -nrp --config conf="config.json"
|
|
|
```
|
|
|
If everything is fine, the pipeline **SHOULD NOT** run the `split_fastq` rule as it should find the already created `XXX.fastq.gz` in the `CUTADAPT` directory of your previously analyzed data. If this is not the case, have a look at the reasons why snakemake wants to create these files again by looking at the output of the dry run.
|
|
|
|
|
|
<div align="right">
|
|
|
|
|
|
<i><a href="Home">Back to Home</a></i>
|
|
|
</div> |
|
|
\ No newline at end of file |