Input Data
Summary of input files
File |
Description |
|---|---|
M2M output |
Output directory for each sample from Metage2Metabo |
Metadata |
Tabulated file, first column is sample identifier |
Taxonomy |
Tabulated file, first column is genome/metabolic network ID |
Abundance |
Tabulated file, normalized by column sum |
Metacyc (optional) |
Padmet format, for compound ontology |
Precomputed data |
Directory with preprocessed dataframes (for fast restart) |
The mandatory input data are the outputs of Metage2Metabo for each sample/microbial community, and the metadata associated to each of them. Additional facultative inputs are advised to gain the most out of the analysis: taxonomy of the genomes associated to the metabolic networks, abundance of these genomes in the samples/community. It is also possible to provide the Metacyc ontology of the metabolic compounds to analyse the predictions at the level of metabolite families. The latter is only relevant if the metabolic networks were obtained with PathwayTools, i.e. are made of compound identifiers that fit the Metacyc database.
Note
Metage2Metabo has a first pipeline step dedicated to the reconstruction of metabolic networks with Pathway Tools.
If you used m2m recon, your metabolic networks are compatible with the Metacyc database and PostAViz can use the Metacyc ontology of compound families.
In practice, other input data can be provided, including precomputed M2M-PostAViz tables which allow for a much faster restart when rerunning the app on previously analysed data.
Input data details
M2M output for each sample: Directory structure example:
sample_1/ community_analysis/ addedvalue.json comm_scopes.json contributions_of_microbes.json mincom.json rev_cscope.json rev_cscope.tsv targets.sbml indiv_scopes/ indiv_scopes.json rev_iscope.json rev_iscope.tsv seeds_in_indiv_scopes.json m2m_metacom.log producibility_targets.json sample_2/ ...📄 Metadata associated to samples: Tabulated file, first column is the sample identifier matching the output of M2M.
smplID
Age
Country
sample_1
2
France
sample_2
30
Canada
sample_3
68
Germany
📄 Taxonomy of the MAGs/genomes: Tabulated file, first column matches the IDs of the metabolic networks.
📊 Abundance of the MAGs/genomes in the samples/communities: Tabulated file, normalized by column sum during processing.
identifier
Sample_1
Sample_2
Sample_3
MAG_1
12.5
8.3
15.2
Genome_1
5.8
10.1
7.6
MAG_2
20.3
14.7
18.9
🚀 Precomputed data for M2M-PostAViz: Can be stored when running the tool with the
-oflag and loaded for future runs.m2m_postaviz -d Metage2metabo/samples/scopes/directory/path \ -m metadata/file/path \ -a abundance/file/path \ -t taxonomy/file/path \ -o save/directory/path # For future runs: m2m_postaviz -l save/directory/path
The preprocessed dataset is stored in a directory in the form of dataframes and files in Parquet format. Example structure:
saved_data_postaviz/ abundance_file_normalised.tsv abundance_file.tsv ... sample_cscope_directory/ Sample1.parquet.gzip ... sample_iscope_directory/ Sample1.parquet.gzip ... ...