Input Data

Summary of input files

File	Description
M2M output	Output directory for each sample from Metage2Metabo
Metadata	Tabulated file, first column is sample identifier
Taxonomy	Tabulated file, first column is genome/metabolic network ID
Abundance	Tabulated file, normalized by column sum
Metacyc (optional)	Padmet format, for compound ontology
Precomputed data	Directory with preprocessed dataframes (for fast restart)

The mandatory input data are the outputs of Metage2Metabo for each sample/microbial community, and the metadata associated to each of them. Additional facultative inputs are advised to gain the most out of the analysis: taxonomy of the genomes associated to the metabolic networks, abundance of these genomes in the samples/community. It is also possible to provide the Metacyc ontology of the metabolic compounds to analyse the predictions at the level of metabolite families. The latter is only relevant if the metabolic networks were obtained with PathwayTools, i.e. are made of compound identifiers that fit the Metacyc database.

Note

Metage2Metabo has a first pipeline step dedicated to the reconstruction of metabolic networks with Pathway Tools. If you used m2m recon, your metabolic networks are compatible with the Metacyc database and PostAViz can use the Metacyc ontology of compound families.

In practice, other input data can be provided, including precomputed M2M-PostAViz tables which allow for a much faster restart when rerunning the app on previously analysed data.

Input data details

M2M output for each sample: Directory structure example:

sample_1/
  community_analysis/
    addedvalue.json
    comm_scopes.json
    contributions_of_microbes.json
    mincom.json
    rev_cscope.json
    rev_cscope.tsv
    targets.sbml
  indiv_scopes/
    indiv_scopes.json
    rev_iscope.json
    rev_iscope.tsv
    seeds_in_indiv_scopes.json
  m2m_metacom.log
  producibility_targets.json
sample_2/
  ...

📄 Metadata associated to samples: Tabulated file, first column is the sample identifier matching the output of M2M.

smplID

Age

Country

sample_1

2

France

sample_2

30

Canada

sample_3

68

Germany
📄 Taxonomy of the MAGs/genomes: Tabulated file, first column matches the IDs of the metabolic networks.
📊 Abundance of the MAGs/genomes in the samples/communities: Tabulated file, normalized by column sum during processing.

identifier

Sample_1

Sample_2

Sample_3

MAG_1

12.5

8.3

15.2

Genome_1

5.8

10.1

7.6

MAG_2

20.3

14.7

18.9

🚀 Precomputed data for M2M-PostAViz: Can be stored when running the tool with the -o flag and loaded for future runs.

m2m_postaviz -d Metage2metabo/samples/scopes/directory/path \
             -m metadata/file/path \
             -a abundance/file/path \
             -t taxonomy/file/path \
             -o save/directory/path

# For future runs:
m2m_postaviz -l save/directory/path

The preprocessed dataset is stored in a directory in the form of dataframes and files in Parquet format. Example structure:

saved_data_postaviz/
  abundance_file_normalised.tsv
  abundance_file.tsv
  ...
  sample_cscope_directory/
    Sample1.parquet.gzip
    ...
  sample_iscope_directory/
    Sample1.parquet.gzip
    ...
  ...

smplID	Age	Country
sample_1	2	France
sample_2	30	Canada
sample_3	68	Germany

identifier	Sample_1	Sample_2	Sample_3
MAG_1	12.5	8.3	15.2
Genome_1	5.8	10.1	7.6
MAG_2	20.3	14.7	18.9