|
|
|
# Metagenomics assemblies #
|
|
|
|
|
|
|
|
Metagenomics assemblies are almost inevitable in metagenomic analysis to reduce the fragmentation of informations and highlight sample characteristics. Reads are assembled into contigs using [megahit](https://github.com/voutcn/megahit) a fast assembler optimized for metagenomes and based on k-mers succinct de Bruijn graphs. Megahit is faster and performe comparably as SPAde and metaSPAdes. SPAde assemblies might be better in term of long contigs and ultra long contigs, it's also show a greater rate of missassemblies which required assembly curation [(1)](https://link.springer.com/article/10.1186/s12864-017-3918-9).
|
|
|
|
|
|
|
|
Megahit support paired-end and single-end data.
|
|
|
|
|
|
|
|
The assembly module first use megahit to assemble reads into contigs. Reads are back mapped against contigs for coverage statistics purpose. Contigs are then filtered depending on their length (default = 2000bp) and on their coverage. Filtering parameters are editable within the config file. Then misassemblies prediction is performed using [DeepMAsED](https://github.com/leylabmpi/DeepMAsED) and contigs are taxonomically annotated using [CAT](https://github.com/dutilh/CAT).
|
|
|
|
|
|
|
|
For now, misasassemblies are only detected, therefore, if it's required, assemblies must be manually curated.
|
|
|
|
|
|
|
|
|
|
|
|
## Single-assemby ##
|
|
|
|
|
|
|
|
Single-assembly is a per-sample approach where reads from one sample are assembled into one assembly. This approach is used by the genes_collection module and might be used by the genomes_collection module.
|
|
|
|
|
|
|
|
```genosysmics run single-assembly **snakemake.args ```
|
|
|
|
|
|
|
|
## Co-assembly ##
|
|
|
|
|
|
|
|
Co-assembly require the simka module. In this case, reads from each samples of a cluster are assembled together. This approach might be used by the genomes_collection module.
|
|
|
|
|
|
|
|
```genosysmics run co-assembly **snakemake.args ```
|
|
|
|
|
|
|
|
## Assembly ##
|
|
|
|
|
|
|
|
Finally, both single-assembly and co-assembly might be ran using the following command line :
|
|
|
|
|
|
|
|
```genosysmics run assembly **snakemake.args ``` |
|
|
\ No newline at end of file |