|
|
|
# Genomes collection #
|
|
|
|
---
|
|
|
|
title: Genomes collection
|
|
|
|
---
|
|
|
|
At this stage of the workflow, we have a number of bins obtained from the assembly of sample reads. The next step is to check the quality of this bins, then obtain their abundance and annotate them functionally and taxonomically.
|
|
|
|
|
|
|
|
-----
|
|
|
|
[Previous](Modules/genes_collection) |
|
|
|
### Genomes quality and dereplication ###
|
|
|
|
|
|
|
|
Refinement is the first quality control carried out on bins. This stage is carried out using the [CheckM2](https://github.com/chklovski/CheckM2) tool, which estimates completeness and contamination based on the presence of single-copy marker genes(122 archaeal and 120 bacterial ), shared universally between prokaryotes, in the bins.
|
|
|
|
|
|
|
|
For example, 70% completeness means that CheckM2 found that fraction of universal markers (approximately 90) in a bin. Similarly, 5% contamination means that ~6 markers were found in multiple copies.
|
|
|
|
|
|
|
|
The second quality control is dereplication, performed in Magneto with [dRep](https://github.com/MrOlm/drep). The usefulness of this step in metagenomics and in this workflow is to be able to identify similar bins and choose the most representative of this group. This ultimately reduces the number of bins by avoiding duplicates and keeping those of better quality.
|
|
|
|
|
|
|
|
After this quality control step, the bins can be considered as **MAGs** (Metagenome-Assembled Genomes) and as our genome collection, which will now be annotated.
|
|
|
|
|
|
|
|
### Genomes abundance ###
|
|
|
|
|
|
|
|
The procedure will be similar to that used for the [genes collection](https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto/-/wikis/Modules/genes_collection#genes-abundance), but this time the reads will be mapped onto the MAGs to obtain their abundance for each sample.
|
|
|
|
|
|
|
|
### Genomes functional annotation ###
|
|
|
|
|
|
|
|
Here too, the tools used and the method implemented are the same as for the [genes collection](https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto/-/wikis/Modules/genes_collection#genes-functional-annotation). For each MAG, the CDSs will be detected using [prodigal](https://github.com/hyattpd/Prodigal), then clustered using [mmseqs2](https://github.com/soedinglab/MMseqs2), and the functional annotation will be carried out using [eggNOGG-mapper](https://github.com/eggnogdb/eggnog-mapper).
|
|
|
|
|
|
|
|
In addition to this functional annotation with eggNOGG-mapper, 2 antimicrobial resistance gene detection tools have been added :
|
|
|
|
- [ABRicate](https://github.com/tseemann/abricate) : Enables mass screening of contigs for antimicrobial resistance or virulence genes
|
|
|
|
- [AMRFinderPlus](https://github.com/ncbi/amr) : a tool designed to find acquired antimicrobial resistance genes and point mutations in protein and/or assembled nucleotide sequences
|
|
|
|
|
|
|
|
### Genomes taxonomic annotation ###
|
|
|
|
|
|
|
|
Taxonomic annotation of MAGs is performed using [GTDB-tk](https://github.com/Ecogenomics/GTDBTk), which allows taxonomic information to be assigned to each MAG.
|
|
|
|
|
|
|
|
### Output ###
|
|
|
|
|
|
|
|
```
|
|
|
|
genomes_collection/tables/
|
|
|
|
├── abricate_summary.tsv
|
|
|
|
├── bp_covered.0.95.tsv
|
|
|
|
├── bp_covered.0.99.tsv
|
|
|
|
├── coverm_genomes_abundance.0.95
|
|
|
|
├── coverm_genomes_abundance.0.99
|
|
|
|
├── genomes_abricate_functions.tsv
|
|
|
|
├── genomes_abundance.0.95.tsv
|
|
|
|
├── genomes_abundance.0.99.tsv
|
|
|
|
├── genomes_amrfinder_functions.tsv
|
|
|
|
├── genomes_functions.tsv
|
|
|
|
├── genomes_length.tsv
|
|
|
|
├── gtdbtk.ar53.bac120.summary.tsv
|
|
|
|
├── horizontal_coverage.0.95.tsv
|
|
|
|
├── horizontal_coverage.0.99.tsv
|
|
|
|
├── reads_counts.0.95.tsv
|
|
|
|
└── reads_counts.0.99.tsv
|
|
|
|
```
|
|
|
|
|
|
|
|
[Previous - Binning (Module)](Modules/binning) |