|
|
|
---
|
|
|
|
title: Assembly
|
|
|
|
---
|
|
|
# Metagenomics assemblies #
|
|
# Metagenomics assemblies #
|
|
|
|
|
|
|
|
Metagenomics assemblies are almost inevitable in metagenomic analysis to reduce the fragmentation of informations and highlight sample characteristics. Reads are assembled into contigs using [megahit](https://github.com/voutcn/megahit) a fast assembler optimized for metagenomes and based on k-mers succinct de Bruijn graphs. Megahit is faster and performe comparably as SPAde and metaSPAdes. SPAde assemblies might be better in term of long contigs and ultra long contigs, it's also show a greater rate of missassemblies which required assembly curation [(1)](https://link.springer.com/article/10.1186/s12864-017-3918-9).
|
|
Metagenomics assemblies are almost inevitable in metagenomic analysis to reduce the fragmentation of informations and highlight sample characteristics. Reads are assembled into contigs using [megahit](https://github.com/voutcn/megahit) a fast assembler optimized for metagenomes and based on k-mers succinct de Bruijn graphs. Megahit is faster and perform comparably as SPAde and metaSPAdes. SPAde assemblies might be better in term of long contigs and ultra long contigs, it's also show a greater rate of missassemblies which required assembly curation [(1)](https://link.springer.com/article/10.1186/s12864-017-3918-9).
|
|
|
|
|
|
|
|
Megahit support paired-end and single-end data.
|
|
Megahit support paired-end and single-end data.
|
|
|
|
|
|
|
|
The assembly module first use megahit to assemble reads into contigs. Reads are back mapped against contigs for coverage statistics purpose. Contigs are then filtered depending on their length (default = 2000bp) and on their coverage. Filtering parameters are editable within the config file. Then misassemblies prediction is performed using [DeepMAsED](https://github.com/leylabmpi/DeepMAsED) and contigs are taxonomically annotated using [CAT](https://github.com/dutilh/CAT).
|
|
The assembly module first use megahit to assemble reads into contigs. Reads are back mapped against contigs for coverage statistics purpose. Contigs are then filtered depending on their length (default = 2000bp) and on their coverage. Filtering parameters are editable within the config file.
|
|
|
|
|
|
|
|
For now, misasassemblies are only detected, therefore, if it's required, assemblies must be manually curated.
|
|
For now, misasassemblies are only detected, therefore, if it's required, assemblies must be manually curated.
|
|
|
|
|
|
| ... | @@ -13,20 +16,21 @@ For now, misasassemblies are only detected, therefore, if it's required, assembl |
... | @@ -13,20 +16,21 @@ For now, misasassemblies are only detected, therefore, if it's required, assembl |
|
|
|
|
|
|
|
Single-assembly is a per-sample approach where reads from one sample are assembled into one assembly. This approach is used by the genes_collection module and might be used by the genomes_collection module.
|
|
Single-assembly is a per-sample approach where reads from one sample are assembled into one assembly. This approach is used by the genes_collection module and might be used by the genomes_collection module.
|
|
|
|
|
|
|
|
```genosysmics run single-assembly **snakemake.args ```
|
|
```magneto run single-assembly **snakemake.args ```
|
|
|
|
|
|
|
|
## Co-assembly ##
|
|
## Co-assembly ##
|
|
|
|
|
|
|
|
Co-assembly require the [simka module](Modules/simka). In this case, reads from each samples of a cluster are assembled together. This approach might be used by the genomes_collection module.
|
|
Co-assembly require the [simka module](Modules/simka). In this case, reads from each samples of a cluster are assembled together. This approach might be used by the genomes_collection module.
|
|
|
|
|
|
|
|
```genosysmics run co-assembly **snakemake.args ```
|
|
```magneto run co-assembly **snakemake.args ```
|
|
|
|
|
|
|
|
## Assembly ##
|
|
## Assembly ##
|
|
|
|
|
|
|
|
Finally, both single-assembly and co-assembly might be ran using the following command line :
|
|
Finally, both single-assembly and co-assembly might be ran using the following command line :
|
|
|
|
|
|
|
|
```genosysmics run assembly **snakemake.args ```
|
|
```magneto run assembly **snakemake.args ```
|
|
|
|
|
|
|
|
-----
|
|
-----
|
|
|
[Previous](Modules/simka)
|
|
[Previous - Simka (Module)](Modules/simka)
|
|
|
[Next](Modules/binning) |
|
|
|
\ No newline at end of file |
|
[Next - Binning (Module)](Modules/binning) |
|
|
|
\ No newline at end of file |