Nantes Université

Update Genes collection rédigé par Hugo LEFEUVRE's avatar Hugo LEFEUVRE
--- ---
title: Genes collection title: Genes collection
--- ---
Magneto makes it possible to obtain a genes collection directly from the assembly. This eliminates the need for binning and provides preliminary taxonomic and functional information (although this is less accurate than genomes_collection). Magneto makes it possible to obtain a genes collection directly from the assembly. This eliminates the need for binning and provides preliminary taxonomic and functional information (although this is less accurate than genomes_collection).
`magneto run genes_collection --config target=single_assembly **snakemake.args `
This step is only performed with single-assembly data. However, even if you choose to perform co-assembly, single-assembly can be performed on your data to obtain the gene collection. You may also not wish to create this genes collection, whether using single or co-assembly, this choice can be made in the config file by setting genes_collection to False. This step is only performed with single-assembly data. However, even if you choose to perform co-assembly, single-assembly can be performed on your data to obtain the gene collection. You may also not wish to create this genes collection, whether using single or co-assembly, this choice can be made in the config file by setting genes_collection to False.
## CDS search ## ### CDS search and clustering ###
Once the assembly has been carried out, the first step in the genes collection is to search for CDS (coding DNA sequences) within the contigs of each sample, using [prodigal](https://github.com/hyattpd/Prodigal). These CDS are then concatenated and clustered at 95% with [mmseqs2](https://github.com/soedinglab/MMseqs2) to avoid redundancy.
### Genes abundance ###
The abundance information of the detected and clustered genes is then calculated. The reads from each sample are mapped onto the genes using [bowtie2](https://github.com/BenLangmead/bowtie2) and [samtools](https://github.com/samtools/samtools), then the abundance is calculated from this mapping using the [coverM](https://github.com/wwood/CoverM) tool.
### Genes functional annotation ###
The functional annotation of genes detected with prodigal is then carried out with [eggNOGG-mapper](https://github.com/eggnogdb/eggnog-mapper). The tool compares the input sequences with the eggnog database and assigns an orthologous group to each of these sequences, enabling them to be functionally annotated.
### Genes taxonomic annotation ###
After the clustered genes have been translated into proteins using [seqkit](https://github.com/shenwei356/seqkit), taxonomic annotation is performed using mmseqs2.
### Output ###
```
genes_collection/tables/
├── coverm_genes_abundance
├── genes_functions.tsv
├── genes_length.tsv
├── genes_taxo.UniRef50.classified.tsv
└── genes_taxo.UniRef50.tsv
```
[Previous - Assembly (Module)](Modules/assembly) [Previous - Assembly (Module)](Modules/assembly)
... ...
......