... | ... | @@ -23,6 +23,11 @@ with \<species\> being the genome specified in the samplesheet. |
|
|
|
|
|
A correct reference must contain a **fasta file** containing transcript sequences and an **annotation file** (that we call "_sym2ref.dat") linking the name of the sequences in the fasta file to a gene name . Examples of this files are provided in the "TESTDATA/REFERENCES/hg19" folder.
|
|
|
|
|
|
## The reference directory structure
|
|
|
|
|
|
For every species specified in the samplesheet, a directory with the same name will be created if it doesn't exist under the main reference directory specified in the configuration file.
|
|
|
For example, if there are two projects in my samplesheet, one specified "hg19" and the other "mm10" and that the main directory for references specified in the configuration is "REFERENCES" then `REFERENCES/hg19` and `REFERENCES/mm10` will be created.
|
|
|
|
|
|
## The fasta file
|
|
|
|
|
|
By default, the fasta file is build in multiple steps by the snakemake pipeline:
|
... | ... | |