Modifications

Hugo LEFEUVRE · ff467856
--- a/Modules/genes_collection.md
+++ b/Modules/genes_collection.md
 ---
 title: Genes collection
 ---
 Magneto makes it possible to obtain a genes collection directly from the assembly. This eliminates the need for binning and provides preliminary taxonomic and functional information (although this is less accurate than genomes_collection).
+`magneto run genes_collection --config target=single_assembly **snakemake.args `
 This step is only performed with single-assembly data. However, even if you choose to perform co-assembly, single-assembly can be performed on your data to obtain the gene collection. You may also not wish to create this genes collection, whether using single or co-assembly, this choice can be made in the config file by setting genes_collection to False.
-## CDS search ##
+### CDS search and clustering ###
+Once the assembly has been carried out, the first step in the genes collection is to search for CDS (coding DNA sequences) within the contigs of each sample, using [prodigal](https://github.com/hyattpd/Prodigal). These CDS are then concatenated and clustered at 95% with [mmseqs2](https://github.com/soedinglab/MMseqs2) to avoid redundancy.
+### Genes abundance ###
+The abundance information of the detected and clustered genes is then calculated. The reads from each sample are mapped onto the genes using [bowtie2](https://github.com/BenLangmead/bowtie2) and [samtools](https://github.com/samtools/samtools), then the abundance is calculated from this mapping using the [coverM](https://github.com/wwood/CoverM) tool.
+### Genes functional annotation ###
+The functional annotation of genes detected with prodigal is then carried out with [eggNOGG-mapper](https://github.com/eggnogdb/eggnog-mapper). The tool compares the input sequences with the eggnog database and assigns an orthologous group to each of these sequences, enabling them to be functionally annotated.
+### Genes taxonomic annotation ###
+After the clustered genes have been translated into proteins using [seqkit](https://github.com/shenwei356/seqkit), taxonomic annotation is performed using mmseqs2.
+### Output ###
+```
+genes_collection/tables/
+├── coverm_genes_abundance
+├── genes_functions.tsv
+├── genes_length.tsv
+├── genes_taxo.UniRef50.classified.tsv
+└── genes_taxo.UniRef50.tsv
+```
 [Previous - Assembly (Module)](Modules/assembly)