Input Data

Summary of input files

File

Description

M2M output

Output directory for each sample from Metage2Metabo

Metadata

Tabulated file, first column is sample identifier

Taxonomy

Tabulated file, first column is genome/metabolic network ID

Abundance

Tabulated file, normalized by column sum

Metacyc (optional)

Padmet format, for compound ontology

Precomputed data

Directory with preprocessed dataframes (for fast restart)

The mandatory input data are the outputs of Metage2Metabo for each sample/microbial community, and the metadata associated to each of them. Additional facultative inputs are advised to gain the most out of the analysis: taxonomy of the genomes associated to the metabolic networks, abundance of these genomes in the samples/community. It is also possible to provide the Metacyc ontology of the metabolic compounds to analyse the predictions at the level of metabolite families. The latter is only relevant if the metabolic networks were obtained with PathwayTools, i.e. are made of compound identifiers that fit the Metacyc database.

Note

Metage2Metabo has a first pipeline step dedicated to the reconstruction of metabolic networks with Pathway Tools. If you used m2m recon, your metabolic networks are compatible with the Metacyc database and PostAViz can use the Metacyc ontology of compound families.

In practice, other input data can be provided, including precomputed M2M-PostAViz tables which allow for a much faster restart when rerunning the app on previously analysed data.

Input data details

  • M2M output for each sample: Directory structure example:

    sample_1/
      community_analysis/
        addedvalue.json
        comm_scopes.json
        contributions_of_microbes.json
        mincom.json
        rev_cscope.json
        rev_cscope.tsv
        targets.sbml
      indiv_scopes/
        indiv_scopes.json
        rev_iscope.json
        rev_iscope.tsv
        seeds_in_indiv_scopes.json
      m2m_metacom.log
      producibility_targets.json
    sample_2/
      ...
    
  • 📄 Metadata associated to samples: Tabulated file, first column is the sample identifier matching the output of M2M.

    smplID

    Age

    Country

    sample_1

    2

    France

    sample_2

    30

    Canada

    sample_3

    68

    Germany

  • 📄 Taxonomy of the MAGs/genomes: Tabulated file, first column matches the IDs of the metabolic networks.

  • 📊 Abundance of the MAGs/genomes in the samples/communities: Tabulated file, normalized by column sum during processing.

    identifier

    Sample_1

    Sample_2

    Sample_3

    MAG_1

    12.5

    8.3

    15.2

    Genome_1

    5.8

    10.1

    7.6

    MAG_2

    20.3

    14.7

    18.9

  • 🚀 Precomputed data for M2M-PostAViz: Can be stored when running the tool with the -o flag and loaded for future runs.

    m2m_postaviz -d Metage2metabo/samples/scopes/directory/path \
                 -m metadata/file/path \
                 -a abundance/file/path \
                 -t taxonomy/file/path \
                 -o save/directory/path
    
    # For future runs:
    m2m_postaviz -l save/directory/path
    

    The preprocessed dataset is stored in a directory in the form of dataframes and files in Parquet format. Example structure:

    saved_data_postaviz/
      abundance_file_normalised.tsv
      abundance_file.tsv
      ...
      sample_cscope_directory/
        Sample1.parquet.gzip
        ...
      sample_iscope_directory/
        Sample1.parquet.gzip
        ...
      ...